[jira] [Updated] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-1419: -- Status: Patch Available (was: Open) The significant change is computing 'split points' rather than

Re: Mahout on Spark?

2014-02-19 Thread Gokhan Capan
I imagine in Mahout offering an option to the users to select from different execution engines (just like we currently do by giving M/R or sequential options), and starting from Spark. I am not sure what changes needed in the codebase, though. Maybe following MLI (or alike) and implementing some

[jira] [Comment Edited] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905075#comment-13905075 ] Sean Owen edited comment on MAHOUT-1419 at 2/19/14 8:57 AM:

[jira] [Updated] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-1419: -- Attachment: MAHOUT-1419.patch Random decision forest is excessively slow on numeric features

Re: Mahout on Spark?

2014-02-19 Thread Sean Owen
To set expectations appropriately, I think it's important to point out this is completely infeasible short of a total rewrite, and I can't imagine that will happen. It may not be obvious if you haven't looked at the code how completely dependent on M/R it is. You can swap out M/R and Spark if you

[jira] [Updated] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-1419: -- Attachment: MAHOUT-1419.patch Random decision forest is excessively slow on numeric features

[jira] [Updated] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-1419: -- Attachment: (was: MAHOUT-1419.patch) Random decision forest is excessively slow on numeric

[jira] [Commented] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905395#comment-13905395 ] Sean Owen commented on MAHOUT-1419: --- Yes you could compute summary statistics once and

[jira] [Comment Edited] (MAHOUT-1419) Random decision forest is excessively slow on numeric features

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905075#comment-13905075 ] Sean Owen edited comment on MAHOUT-1419 at 2/19/14 1:14 PM:

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: (was: 1329.patch) Mahout for hadoop 2 ---

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: (was: 1329-2.patch) Mahout for hadoop 2 ---

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: (was: 1329.diff) Mahout for hadoop 2 ---

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: 1329.diff Mahout for hadoop 2 ---

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: 1329.patch Mahout for hadoop 2 ---

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905464#comment-13905464 ] Sergey Svinarchuk commented on MAHOUT-1329: --- Updated patch to trunk. Build

Hadoop 2 support

2014-02-19 Thread Sergey Svinarchuk
Today I updated patch in M-1329 to trunk. It's ticket that add support hadoop2 to mahout. I builded mahout with patch and all UT was passed for hadoop1 and hadoop2. Also I tested examples/bin on the both hadoop version. Can somebody from committers review patch and test it? Thanks, Sergey! --

Re: Hadoop 2 support

2014-02-19 Thread Sean Owen
Sergey I think it already worked with 2.0, no? (Although it doesn't actually use the 2.x APIs). Is this for 2.2 and/or what are the high-level changes? I'd imagine mostly packaging stuff. On Wed, Feb 19, 2014 at 2:14 PM, Sergey Svinarchuk ssvinarc...@hortonworks.com wrote: Today I updated patch

Re: Hadoop 2 support

2014-02-19 Thread Sean Owen
Hmm I thought there was already a profile for this, but on second look, I only see a settable hadoop.version. It has both hadoop-core and hadoop-common dependencies which isn't right. I bet this patch clarifies the difference properly, and that's got to be good. I think I am thinking of how the

[jira] [Resolved] (MAHOUT-1418) Removal of write access to anything but CMS for username isabel

2014-02-19 Thread Isabel Drost-Fromm (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost-Fromm resolved MAHOUT-1418. Resolution: Fixed I distinctly remember here being a comment by [~smarthi]

[jira] [Commented] (MAHOUT-1418) Removal of write access to anything but CMS for username isabel

2014-02-19 Thread Manuel Blechschmidt (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905554#comment-13905554 ] Manuel Blechschmidt commented on MAHOUT-1418: - Hi [~isabel], the comment was

[jira] [Commented] (MAHOUT-1418) Removal of write access to anything but CMS for username isabel

2014-02-19 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905560#comment-13905560 ] Suneel Marthi commented on MAHOUT-1418: --- Yeah I deleted the comment, fine shoot me

Re: Hadoop 2 support

2014-02-19 Thread Suneel Marthi
Thanks for the patch Sergey. I tested this with Hadoop 1 and 2 and can confirm that all unit tests pass and the examples work. On Wednesday, February 19, 2014 9:39 AM, Sean Owen sro...@gmail.com wrote: Hmm I thought there was already a profile for this, but on second look, I only see a

Re: Hadoop 2 support

2014-02-19 Thread Sergey Svinarchuk
Thanks! This patch will be added in mahout 1.0? On Wed, Feb 19, 2014 at 5:39 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Thanks for the patch Sergey. I tested this with Hadoop 1 and 2 and can confirm that all unit tests pass and the examples work. On Wednesday, February 19, 2014

Re: Hadoop 2 support

2014-02-19 Thread Suneel Marthi
Yes On Wednesday, February 19, 2014 10:43 AM, Sergey Svinarchuk ssvinarc...@hortonworks.com wrote: Thanks! This patch will be added in mahout 1.0? On Wed, Feb 19, 2014 at 5:39 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: Thanks for the patch Sergey. I tested this with Hadoop 1 and

[jira] [Commented] (MAHOUT-1418) Removal of write access to anything but CMS for username isabel

2014-02-19 Thread Isabel Drost-Fromm (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905643#comment-13905643 ] Isabel Drost-Fromm commented on MAHOUT-1418: [~smarthi] No worries - I was

[jira] [Assigned] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned MAHOUT-1329: - Assignee: Suneel Marthi Mahout for hadoop 2 ---

[jira] [Created] (MAHOUT-1420) Add solr-recommender to examples

2014-02-19 Thread Andrew Musselman (JIRA)
Andrew Musselman created MAHOUT-1420: Summary: Add solr-recommender to examples Key: MAHOUT-1420 URL: https://issues.apache.org/jira/browse/MAHOUT-1420 Project: Mahout Issue Type: New

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated MAHOUT-1329: -- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Gokhan Capan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906062#comment-13906062 ] Gokhan Capan commented on MAHOUT-1329: -- Is it OK to add hadoop dependencies to the

Build failed in Jenkins: Mahout-Quality #2482

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2482/changes Changes: [smarthi] MAHOUT-1329: Mahout for Hadoop 2.x -- [...truncated 1633 lines...] A math/src/test/java/org/apache/mahout/math/jet A

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906068#comment-13906068 ] Hudson commented on MAHOUT-1329: FAILURE: Integrated in Mahout-Quality #2482 (See

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters #542

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/542/changes Changes: [smarthi] MAHOUT-1329: Mahout for Hadoop 2.x -- Started by an SCM change Building remotely on ubuntu1 in workspace

Build failed in Jenkins: Mahout-Examples-Classify-20News #431

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Classify-20News/431/changes Changes: [smarthi] MAHOUT-1329: Mahout for Hadoop 2.x -- [...truncated 1627 lines...] A

Re: Mahout on Spark?

2014-02-19 Thread peng
I was suggested to switch to MLlib for its performance, but I doubt if that is production ready, even if it is I would still favour hadoop's sturdiness and self-healing. But maybe mahout can include contribs that M/R is not fit for, like downpour SGD or graph-based algorithms? On Wed 19 Feb

Build failed in Jenkins: Mahout-Quality #2483

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2483/ -- [...truncated 1633 lines...] A math/src/test/java/org/apache/mahout/math/random/ChineseRestaurantTest.java A math/src/test/java/org/apache/mahout/math/random/NormalTest.java A

Build failed in Jenkins: mahout-nightly #1502

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/mahout-nightly/1502/changes Changes: [smarthi] MAHOUT-1329: Mahout for Hadoop 2.x -- Started by timer Building remotely on ubuntu6 in workspace https://builds.apache.org/job/mahout-nightly/ws/ Updating

[jira] [Reopened] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reopened MAHOUT-1329: --- Mahout for hadoop 2 --- Key: MAHOUT-1329

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906333#comment-13906333 ] Suneel Marthi commented on MAHOUT-1329: --- Gokhan, I remember now the conversation we

Re: Mahout on Spark?

2014-02-19 Thread Ted Dunning
On Wed, Feb 19, 2014 at 1:55 PM, peng pc...@uowmail.edu.au wrote: But maybe mahout can include contribs that M/R is not fit for, like downpour SGD or graph-based algorithms? Yes. Absolutely.

Re: Mahout on Spark?

2014-02-19 Thread Suneel Marthi
On Wednesday, February 19, 2014 7:22 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, Feb 19, 2014 at 1:55 PM, peng pc...@uowmail.edu.au wrote: But maybe mahout can include contribs that M/R is not fit for, like downpour SGD or graph-based algorithms? Yes.  Absolutely. Downpour

Jenkins build is back to normal : Mahout-Examples-Cluster-Reuters #543

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/543/changes

Jenkins build is back to normal : Mahout-Examples-Classify-20News #432

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Examples-Classify-20News/432/changes

Jenkins build is back to normal : Mahout-Quality #2484

2014-02-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2484/changes

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906449#comment-13906449 ] Hudson commented on MAHOUT-1329: SUCCESS: Integrated in Mahout-Quality #2484 (See

[jira] [Resolved] (MAHOUT-1408) Distributed cache file matching bug while running SSVD in broadcast mode

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov resolved MAHOUT-1408. -- Resolution: Won't Fix Don't see a reason to do anything. Distributed cache file

[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1346: - Fix Version/s: (was: Backlog) 1.0 Spark Bindings (DRM)

Re: Mahout on Spark?

2014-02-19 Thread Nick Pentreath
MLlib may be less production tested than Mahout that is true, but I would say Spark is heavily production tested and getting close to a true 1.0 release. Why do you favour Hadoop for sturdiness? Spark uses HDFS as an input source (or any Hadoop InputFormat) so benefits from the same fault

[jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906699#comment-13906699 ] Dmitriy Lyubimov commented on MAHOUT-1346: -- This is now tracked here

[jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906710#comment-13906710 ] Dmitriy Lyubimov commented on MAHOUT-1346: -- a few obvious optimizer rules A.t

[jira] [Updated] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1365: - Fix Version/s: (was: Backlog) 1.0 Weighted ALS-WR iterator for

[jira] [Commented] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

2014-02-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906724#comment-13906724 ] Sean Owen commented on MAHOUT-1365: --- Dmitriy isn't this exactly what is already

[jira] [Comment Edited] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906725#comment-13906725 ] Dmitriy Lyubimov edited comment on MAHOUT-1365 at 2/20/14 7:54 AM:

[jira] [Commented] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906725#comment-13906725 ] Dmitriy Lyubimov commented on MAHOUT-1365: -- quite possibly could be. The only

[jira] [Commented] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

2014-02-19 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906729#comment-13906729 ] Dmitriy Lyubimov commented on MAHOUT-1365: -- Oh. and the implicit paper doesn't