[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: 1329-3-additional.diff Mahout for hadoop 2 ---

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: (was: 1329-3-additional.diff) Mahout for hadoop 2

[jira] [Updated] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Svinarchuk updated MAHOUT-1329: -- Attachment: 1329-3-additional.patch Mahout for hadoop 2 ---

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914484#comment-13914484 ] Sergey Svinarchuk commented on MAHOUT-1329: --- I think that will be better add

[jira] [Comment Edited] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914484#comment-13914484 ] Sergey Svinarchuk edited comment on MAHOUT-1329 at 2/27/14 1:21 PM:

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Gokhan Capan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914494#comment-13914494 ] Gokhan Capan commented on MAHOUT-1329: -- Sure I can. Although my vote would be

[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-02-27 Thread Sergey Svinarchuk (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914508#comment-13914508 ] Sergey Svinarchuk commented on MAHOUT-1329: --- You can use {noformat} mvn clean

[jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread Maciej Mazur (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913488#comment-13913488 ] Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread peng
That should be easy. But that defeats the purpose of using mahout as there are already enough implementations of single node backpropagation (in which case GPU is much faster). Yexi: Regarding downpour SGD and sandblaster, may I suggest that the implementation better has no parameter server?

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread Yexi Jiang
Peng, Can you provide more details about your thought? Regards, 2014-02-27 16:00 GMT-05:00 peng pc...@uowmail.edu.au: That should be easy. But that defeats the purpose of using mahout as there are already enough implementations of single node backpropagation (in which case GPU is much

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread peng
With pleasure! the original downpour paper propose a parameter server from which subnodes download shards of old model and upload gradients. So if the parameter server is down, the process has to be delayed, it also requires that all model parameters to be stored and atomically updated on (and

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread peng
Hi Yexi, I was reading your code and found the MLP class is abstract-ish (both train functions throws exception). Is there a thread or ticket for shippable implementation? Yours Peng On Thu 27 Feb 2014 06:56:51 PM EST, peng wrote: With pleasure! the original downpour paper propose a

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread Ted Dunning
Generally for training models like this, there is an assumption that fault tolerance is not particularly necessary because the low risk of failure trades against algorithmic speed. For reasonably small chance of failure, simply re-running the training is just fine. If there is high risk of

Mahout 1.0 goals

2014-02-27 Thread Ted Dunning
I would like to start a conversation about where we want Mahout to be for 1.0. Let's suspend for the moment the question of how to achieve the goals. Instead, let's converge on what we really would like to have happen and after that, let's talk about means that will get us there. Here are some

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread Yexi Jiang
Hi, Peng, Do you mean the MultilayerPerceptron? There are three 'train' method, and only one (the one without the parameters trackingKey and groupKey) is implemented. In current implementation, they are not used. Regards, Yexi 2014-02-27 19:31 GMT-05:00 Ted Dunning ted.dunn...@gmail.com:

Re: Mahout 1.0 goals

2014-02-27 Thread Sean Owen
This sounds good, but sounds like a whole different project or projects. For example where does R appear from, what non-MR implementations, etc, what is the no Hadoop implementation? On Feb 28, 2014 12:38 AM, Ted Dunning ted.dunn...@gmail.com wrote: I would like to start a conversation about

Re: Mahout 1.0 goals

2014-02-27 Thread Ted Dunning
Well, Mahout has had (kinda sorta awful) classifiers and clustering from day one. It isn't like the only goal is recommendations. The non-MR, non-Hadoop comments are really more user centric requirements than implementations. It is important that users be able to start without a cluster and

Re: Mahout 1.0 goals

2014-02-27 Thread Sean Owen
Yes. Wasn't questioning the part about algorithms. I think each of several other of these points are probably on their own several times the amount of work that has been put into this project over the past year so I'm wondering if this close to realistic as a to do list for 1.0 of this project.

Re: Mahout 1.0 goals

2014-02-27 Thread Ted Dunning
On Thu, Feb 27, 2014 at 5:25 PM, Sean Owen sro...@gmail.com wrote: And whether the goal here should look more like polish up and maintain. That sounds like defeatism to me. I think that new things are quite possible here.

Re: Mahout 1.0 goals

2014-02-27 Thread Ted Dunning
On Thu, Feb 27, 2014 at 5:25 PM, Sean Owen sro...@gmail.com wrote: I think each of several other of these points are probably on their own several times the amount of work that has been put into this project over the past year so I'm wondering if this close to realistic as a to do list for

Re: Mahout 1.0 goals

2014-02-27 Thread Dmitriy Lyubimov
If we approach this form purely marketing standpoint, i would look at it from two points: why is Mahout used, and why it is not used. Mahout is not used because it is a collection of methods that are fairly non-uniform in their api, especially embedded api, and generaly has zero encouragement to

Re: Mahout 1.0 goals

2014-02-27 Thread Dmitriy Lyubimov
(5) Another thing i would suggest is to look at feature prep standartization -- outlier detection, scaling, hash-tricking etc. etc. Again, with abilities to customize, or it would be useless. On Thu, Feb 27, 2014 at 6:08 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: If we approach this form

Re: Mahout 1.0 goals

2014-02-27 Thread Suneel Marthi
With the announcement of http://deeplearning4j.org yesterday which is various Neural Networks implementations on Hadoop 2/JBlas that had been talked about in one of the other discussion threads on this mailing list. Do we wanna duplicate a similar effort in Mahout? In addition to what

Re: Mahout 1.0 goals

2014-02-27 Thread Andrew Musselman
Thanks for starting the conversation, Ted. I'm relatively new to the project though I've been using Mahout for a couple years in production, and am happy to see things move forward in whatever way makes sense. I think Mahout needs to ship a production-ready version if it's going to be called

Re: Mahout 1.0 goals

2014-02-27 Thread Andrew Musselman
I agree with b) and c); haven't used seq2sparse enough to grok a). On Thu, Feb 27, 2014 at 6:30 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: With the announcement of http://deeplearning4j.org yesterday which is various Neural Networks implementations on Hadoop 2/JBlas that had been

Re: Mahout 1.0 goals

2014-02-27 Thread Ted Dunning
Yes. THis is a big and important addition. On Thu, Feb 27, 2014 at 6:19 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: (5) Another thing i would suggest is to look at feature prep standartization -- outlier detection, scaling, hash-tricking etc. etc. Again, with abilities to customize, or it

Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms

2014-02-27 Thread peng
Oh, thanks a lot, I missed that one :) +1 on easiest one implemented first. I haven't think about difficulty issue, need to read more about YARN extension. Yours Peng On Thu 27 Feb 2014 08:06:27 PM EST, Yexi Jiang wrote: Hi, Peng, Do you mean the MultilayerPerceptron? There are three

Re: Mahout 1.0 goals

2014-02-27 Thread Dmitriy Lyubimov
On Thu, Feb 27, 2014 at 7:01 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: And I'm not sure if this is what Dmitriy meant in his comments (3), but I'd love to be able to do Mathematica-style work in an interactive shell and/or symbolic system where I could do A*B' and it just