[jira] [Commented] (MAHOUT-1356) Ensure unit tests fail fast when writing outside mvn target directory

2014-03-04 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920623#comment-13920623 ] Dawid Weiss commented on MAHOUT-1356: - You may want to check out how it's configured

Re: Mahout 1.0 goals

2014-03-04 Thread Ted Dunning
On Tue, Mar 4, 2014 at 2:24 PM, Sebastian Schelter wrote: > - AFAIK its also a problem to ship it license-wise as the required > libraries would not be Apache licensed > > See this discussion from the Spark community for details: > > https://github.com/apache/incubator-spark/pull/575 > This is a

[jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920319#comment-13920319 ] Sebastian Schelter commented on MAHOUT-1346: Dmitriy, I read through your wri

[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1346: - Attachment: ScalaSparkBindings.pdf @Sebastian (and et al) could you please review if not

[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1346: - Attachment: (was: ScalaSparkBindings.pdf) > Spark Bindings (DRM) > -

Re: Mahout 1.0 goals

2014-03-04 Thread Andrew Musselman
Whatever works On Tue, Mar 4, 2014 at 3:45 PM, Suneel Marthi wrote: > I believe there was an announcement that went out last month about Apache > SF embracing github. - > http://jaxenter.com/apache-ups-github-integration-potential-49460.html > > Guess this is more of an INFRA task than anything

Re: Mahout 1.0 goals

2014-03-04 Thread Suneel Marthi
I believe there was an announcement that went out last month about Apache SF embracing github. - http://jaxenter.com/apache-ups-github-integration-potential-49460.html Guess this is more of an INFRA task than anything we need to do (like the recent setting up of svnpubsub for future releases).

Build failed in Jenkins: mahout-nightly #1515

2014-03-04 Thread Apache Jenkins Server
See -- [...truncated 1990 lines...] [INFO] Copying jackson-mapper-asl-1.9.12.jar to [INFO]

Build failed in Jenkins: mahout-nightly » Mahout Release Package #1515

2014-03-04 Thread Apache Jenkins Server
See -- [INFO] [INFO]

Re: Mahout 1.0 goals

2014-03-04 Thread Andrew Musselman
One of my big wishlist items is to move Mahout to Github for workflow and community features. I remember there being discussion a while back but is there any way to move our Subversion repo to an Apache Git repo? On Thu, Feb 27, 2014 at 4:37 PM, Ted Dunning wrote: > I would like to start a con

Re: Mahout 1.0 goals

2014-03-04 Thread Sebastian Schelter
JBlas gave roughly 5x -7x performance for solving the dense linear systems in ALS when I integrated it into a prototype of Mahout's ALS for a research paper. There are some caveats with it unfortunately: - it requires certain fortran libs to be installed on the machines of the cluster - its

Re: Mahout 1.0 goals

2014-03-04 Thread Suneel Marthi
There's JBlas which is used by Spark, Deeplearning.org and other Ml projects.  IIRC, there was some prototyping done in the past using JBlas for Mahout - Sebastian or Sean can better speak to that?  It definitely has better performance than Mahout-Math. Managing the native Fortran dependencies

Re: Mahout 1.0 goals

2014-03-04 Thread Giorgio Zoppi
I would like to find some way of speed up matrix library, ie JNI+C++. 2014-03-04 22:53 GMT+01:00 Frank Scholten : > Yes, I like to work on standardizing the code around input formats. > > > On Mon, Mar 3, 2014 at 7:37 PM, Suneel Marthi >wrote: > > > To get things moving for 1.0: > > > > > > a)

Re: Mahout 1.0 goals

2014-03-04 Thread Frank Scholten
Yes, I like to work on standardizing the code around input formats. On Mon, Mar 3, 2014 at 7:37 PM, Suneel Marthi wrote: > To get things moving for 1.0: > > > a) Address the 4 issues that Sean had raised - we have already started > looking at Backlog and closing them, started looking at converti

[jira] [Commented] (MAHOUT-1356) Ensure unit tests fail fast when writing outside mvn target directory

2014-03-04 Thread Frank Scholten (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920072#comment-13920072 ] Frank Scholten commented on MAHOUT-1356: I don't know which settings to add to th

[jira] [Commented] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919887#comment-13919887 ] Sebastian Schelter commented on MAHOUT-1346: Really looking forward to this.

Re: Mahout 1.0 goals

2014-03-04 Thread Sebastian Schelter
That is a nice start, could you evolve it to a short article that describes how to run this code for a real example? Best, Sebastian On 03/04/2014 05:05 PM, Yexi Jiang wrote: Sebastian, In one of my recent projects, I used the Naive Bayes for classification, so I gave a write-up on this algor

Re: Mahout 1.0 goals

2014-03-04 Thread Dmitriy Lyubimov
Yes. I am pretty close to do fairly big commits in linalg department there. (distributed dsl expression optimizer and scripted-out SSVD). We also possibly may want to think about scala script engine to run 3rd party mahout-math scripts or interactive sessions. -d On Mon, Mar 3, 2014 at 10:02 AM

Re: Mahout 1.0 goals

2014-03-04 Thread Dmitriy Lyubimov
I probably will also put out a distributed QR (just for completeness) as currently solved for MR SSVD. but we know that actual SSVD can avoid this -- and it will in the new version -- just like in the in-core version. there are gaps still in the optimizer (i.e. optimizer has some holes for some al

[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1346: - Attachment: ScalaSparkBindings.pdf update > Spark Bindings (DRM) >

[jira] [Updated] (MAHOUT-1346) Spark Bindings (DRM)

2014-03-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1346: - Attachment: (was: ScalaSparkBindings.pdf) > Spark Bindings (DRM) > -

Re: Mahout 1.0 goals

2014-03-04 Thread Yexi Jiang
Sebastian, In one of my recent projects, I used the Naive Bayes for classification, so I gave a write-up on this algorithm. You can find the document at https://docs.google.com/document/d/1h7N0GmIKe-KG64uulPMPzkp00nowM2-HDQ48c4PIhbc/edit?usp=sharing . Feedbacks are welcome. 2014-03-04 3:57 GMT

[jira] [Commented] (MAHOUT-1252) Add support for Finite State Transducers (FST) as a DictionaryType.

2014-03-04 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919361#comment-13919361 ] Suneel Marthi commented on MAHOUT-1252: --- Hi Drew, I have some start code on this t

[jira] [Commented] (MAHOUT-1252) Add support for Finite State Transducers (FST) as a DictionaryType.

2014-03-04 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919336#comment-13919336 ] Drew Farris commented on MAHOUT-1252: - Hi Suneel, I have the start of some code that

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread yannis ats (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919314#comment-13919314 ] yannis ats commented on MAHOUT-1431: I am pretty sure that the reducer takes more tim

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919308#comment-13919308 ] Sebastian Schelter commented on MAHOUT-1431: Could you see where the addition

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919293#comment-13919293 ] Suneel Marthi commented on MAHOUT-1431: --- Comparing 0.7 to 0.8 is comparing apples-o

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread yannis ats (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919273#comment-13919273 ] yannis ats commented on MAHOUT-1431: The notion of time for every iteration is part o

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919271#comment-13919271 ] Suneel Marthi commented on MAHOUT-1431: --- The other change in 0.9 that comes to mind

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919267#comment-13919267 ] Suneel Marthi commented on MAHOUT-1431: --- Could u provide CODE snapshots of where u

[jira] [Commented] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919260#comment-13919260 ] Sebastian Schelter commented on MAHOUT-1431: That is really strange, I don't

[jira] [Updated] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread yannis ats (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yannis ats updated MAHOUT-1431: --- Description: Hi all, i tested mahout 0.8 and 0.9 in mahout emr with a large dataset as input and i

Re: kMeans Implementation

2014-03-04 Thread Gmail
I used the kMeansDriver class, in clustering.kmeans package. Yes I know that the use of MapReduce is mandatory, but I think that exists an easier implementation and especially mapreduce oriented. Anyway, I thought it was a choice driven by performances. Thank you. On 03/04/2014 11:48 AM, Sea

[jira] [Created] (MAHOUT-1431) Comparison of Mahout 0.8 vs mahout 0.9 in EMR

2014-03-04 Thread yannis ats (JIRA)
yannis ats created MAHOUT-1431: -- Summary: Comparison of Mahout 0.8 vs mahout 0.9 in EMR Key: MAHOUT-1431 URL: https://issues.apache.org/jira/browse/MAHOUT-1431 Project: Mahout Issue Type: Questi

Re: kMeans Implementation

2014-03-04 Thread Suneel Marthi
He's talking about simple kmeans which is a mapper only job. Sean's already addressed his question Sent from my iPhone > On Mar 4, 2014, at 5:49 AM, Sebastian Schelter wrote: > > We have several implementations of k-Means, which one do you refer to? > > --sebastian > >> On 03/04/2014 11:43 A

Re: kMeans Implementation

2014-03-04 Thread Sam Bessalah
I don't see why is that a problem. On Tue, Mar 4, 2014 at 11:43 AM, Gmail wrote: > Hello, > I was studying Mahout libraries and I found something of strange in your > kMeans implementation. > > I was looking inside it and I have noticed that kMeans only uses map > functions, omitting the reduce

Re: kMeans Implementation

2014-03-04 Thread Sean Owen
Although I don't know exactly what you're referring to, in general, nothing about Map/Reduce means you always use a reducer. There are plenty of tasks that are much more appropriate as a map-only or reduce-only job. So this assertion doesn't fly to start with. But if you see two jobs that might be

Re: kMeans Implementation

2014-03-04 Thread Sebastian Schelter
We have several implementations of k-Means, which one do you refer to? --sebastian On 03/04/2014 11:43 AM, Gmail wrote: Hello, I was studying Mahout libraries and I found something of strange in your kMeans implementation. I was looking inside it and I have noticed that kMeans only uses map fu

kMeans Implementation

2014-03-04 Thread Gmail
Hello, I was studying Mahout libraries and I found something of strange in your kMeans implementation. I was looking inside it and I have noticed that kMeans only uses map functions, omitting the reducers. Why have you done this choice? It is not using MapReduce programming model even if it is

Re: Mahout 1.0 goals

2014-03-04 Thread Sebastian Schelter
Yexi, could you do a small write-up, analogously to what I proposed for Giorgio. Make sure to pick a different algorithm though. --sebastian Am 03.03.2014 16:54 schrieb "Yexi Jiang" : > I'm also happy to help. > > > 2014-03-03 10:29 GMT-05:00 Giorgio Zoppi : > > > I would like to help in the api

Re: Mahout 1.0 goals

2014-03-04 Thread Sebastian Schelter
Hi Giorgio, a good first step would be to explore the current api. Could you create a writeup for our wiki how to use a clustering/classification algorithm of your choice. A small example shoulf be sufficient. This could be used as a basis for discussing API changes. Am 03.03.2014 16:29 schrieb "G