Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-28 Thread Nikaash Puri
Hi, Ok, so interestingly enough when I repartition my input data across indicators on the User IDs, I get significant speedup. This is probably because shuffle goes down since RDDs with the same user ids are more likely located on the same nodes. What’s even more interesting is the behaviour

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Khurrum Nasim
@Prakash - Albeit I’m a Mahout noob - if you can represent your problem as a network with 2d input then yes Mahout can be used (so i’ve heard). IMO - every machine based computation problem can be represented as a graph - although this may not always be optimal. Taking this notion of fuzzy

Drprecated MR algorithms

2016-04-28 Thread Andrew Palumbo
To avoid any confusion, unless there are any objections, I'll mark all of the MapReduce algorithms on the algorithms page as deprecated.

Re: Mahout contributions

2016-04-28 Thread Saikat Kanjilal
Andrew/Khurrum, To be clear this project involves building some algorithms that are not yet implemented in spark based on the wiki (namely the clustering algorithms) and then integrating them into elasticsearch and kibana through a rest API. Mahout will remain as is, I will look at

Re: Mahout contributions

2016-04-28 Thread Dmitriy Lyubimov
there might be a concept of "contrib" sub project with totally separate code tree, some asf projects do that. that way it is easy to keep it around if it turns out to be useful, and easy to strip off if it becomes unsupported (sorry for pragmatic cynicism) On Thu, Apr 28, 2016 at 2:48 PM, Khurrum

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, (1) to be clear, the ASF trademark and branding policy is not to endorse views of the 3rd party publications and to ask 3rd party writers to do a disclosure that their views are not endorsed by ASF project. To that end, ASF project can't really tell you that some publication is

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Prakash Poudyal
Dear Dmitriy, I really appreciate you as you write so long to clarify my confusion. Much appreciated. Thank you so much :) Regards Prakash Poudyal On Thu, Apr 28, 2016 at 10:13 PM, Dmitriy Lyubimov wrote: > Prakash, > > (1) to be clear, the ASF trademark and branding policy

Re: Mahout contributions

2016-04-28 Thread Andrew Palumbo
I don't think that this sort of of integration work would be a good fit directly to the Mahout project. Mahout is more about math, algorithms and an environment to develop algorithms. We stay away from direct platform integration. In the past we did have some elasticsearch/mahout

Re: Mahout contributions

2016-04-28 Thread Khurrum Nasim
I agree with Andrew. Mahout should remain indigenous. Prakash - you may want to create your own project on github using the mahout library. > On Apr 28, 2016, at 5:43 PM, Andrew Palumbo wrote: > > I don't think that this sort of of integration work would be a

RE: Mahout contributions

2016-04-28 Thread Saikat Kanjilal
This is great information thank you, based on this recommendation I won't create a JIRA but start work on my project and when the code approaches the percentages you are describing I will create the appropriate JIRA's and put together a proposal to send to the list, sound ok? Based on your

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Prakash Poudyal
Dear Suneel, Thank you so much for your reply, I was waiting for long time. Actually, I need to use fuzzy clustering to cluster the sentence in my research. I found fuzzy k clustering algorithm in Apache Mahout, thus, I am trying to use it for my purpose. Regarding your reply, of "first thing"

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Ted Dunning
On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal wrote: > Actually, I need to use fuzzy clustering to cluster the sentence in my > research. I found fuzzy k clustering algorithm in Apache Mahout, thus, I > am trying to use it for my purpose. > That's great. But that

Re: Mahout contributions

2016-04-28 Thread Khurrum Nasim
@Saikat- why use EL instead of Lucene directly. > On Apr 28, 2016, at 12:08 PM, Saikat Kanjilal wrote: > > This is great information thank you, based on this recommendation I won't > create a JIRA but start work on my project and when the code approaches the >

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
First thing, most of this code is legacy MapReduce and is not supported anymore. Hence you r not seeing answers. Back to ur question: -c specifies the folder for the initial centroids that r randomly generated. IIR, the centroids are generated when u execute the Clustering Driver. On Wed, Apr

Re: Mahout contributions

2016-04-28 Thread Saikat Kanjilal
Because EL gives you the visualization and non Lucene type query constructs as well and also that it already has a rest API that I plan on tying into mahout. I plan on wrapping some of the clustering algorithms that I implement using Mahout and Spark as a service which can then make calls into

Re: Mahout contributions

2016-04-28 Thread Khurrum Nasim
What type of JSON payload size are we talking about here ? > On Apr 28, 2016, at 1:32 PM, Saikat Kanjilal wrote: > > Because EL gives you the visualization and non Lucene type query constructs > as well and also that it already has a rest API that I plan on tying into >

Re: Mahout contributions

2016-04-28 Thread Saikat Kanjilal
I want to start with social data as an example, for example data returned from FB graph API as well user Twitter data, will send some samples later if you're interested. Sent from my iPhone > On Apr 28, 2016, at 10:41 AM, Khurrum Nasim wrote: > > > What type of

Re: Drprecated MR algorithms

2016-04-28 Thread Andrew Palumbo
the grid should be upgraded it may take some time to publish. From: Andrew Palumbo Sent: Thursday, April 28, 2016 5:24 PM To: mahout Subject: Drprecated MR algorithms To avoid any confusion, unless there are any objections, I'll mark

Re: Mahout contributions

2016-04-28 Thread Andrew Palumbo
One last thing, Saikat, in answer to your question below. To clarify, for proposed smaller scale mahout contributions (not on the roadmap or in currently open Jiras): a good workflow would be as follows: 1. Investigate your idea independently 2. Float the proposal to dev@, 3. Allow some time

smile ML library

2016-04-28 Thread Andrew Palumbo
Has anyone had any luck getting this project to build? Or integrating it's artifacts with anything else? https://github.com/haifengl/smile

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
Yes, the entire MapReduce code (which includes the fuzzy clustering that u r looking at) is not supported anymore as of Mahout 0.10.0 (suggest reading the release notes on mahout.apache.org) On Thu, Apr 28, 2016 at 2:05 PM, Prakash Poudyal wrote: > Hi! Ted, > > You

[jira] [Commented] (MAHOUT-1837) Sparse/Dense Matrix analysis for Matrix Multiplication

2016-04-28 Thread Suneel Marthi (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262703#comment-15262703 ] Suneel Marthi commented on MAHOUT-1837: --- Wouldn't it be easier to separate the 2 out into

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Prakash Poudyal
Hi! Ted, You mean Mahout is no more supporting "fuzzy K clustering for the sentences". Can you clarify in more detail . :( Prakash On Thu, Apr 28, 2016 at 6:58 PM, Ted Dunning wrote: > On Thu, Apr 28, 2016 at 10:54 AM, Prakash Poudyal < > prakashpoud...@gmail.com> >

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, if you are using any Mahout Mapreduce algorithm for research, please make sure to make this disclosure: all Mahout MapReduce algorithms are officially not supported and deprecated since February, 2014 (IIRC). I can dig up a specific issue regarding this. There also has been an

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
On Thu, Apr 28, 2016 at 1:54 PM, Prakash Poudyal wrote: > Dear Suneel, > > Thank you so much for your reply, I was waiting for long time. > > Actually, I need to use fuzzy clustering to cluster the sentence in my > research. I found fuzzy k clustering algorithm in

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Suneel Marthi
That's correct, deprecated as of Feb 2014 and will be completely purged in one of the upcoming releases (0.13.0) On Thu, Apr 28, 2016 at 2:10 PM, Dmitriy Lyubimov wrote: > Prakash, > > if you are using any Mahout Mapreduce algorithm for research, please make > sure to make

RE: Mahout contributions

2016-04-28 Thread Saikat Kanjilal
Thanks, this helps, I hope to have a proposal to dev outlining some use cases in the next few weeks. > From: ap@outlook.com > To: dev@mahout.apache.org > Subject: Re: Mahout contributions > Date: Fri, 29 Apr 2016 00:03:41 + > > One last thing, Saikat, in answer to your question below.

RE: smile ML library

2016-04-28 Thread Andrew Palumbo
Not sure if they have fkmeans.. it wouldnt surprise me if they do..They do have quite a few algorithms though. Btw. Have you checked Weka for fkmeans? I think they may have an implementation built in. Problem with smile is they have slf4j classes but no artifacts in their build files.

Re: smile ML library

2016-04-28 Thread Prakash Poudyal
Hi! SMILE has all machine learning algorithm except fuzzy clustering algorithm . :( Am I correct ? Regards Prakash On Fri, Apr 29, 2016 at 12:40 AM, Andrew Palumbo wrote: > Has anyone had any luck getting this project to build? Or integrating > it's artifacts with

[jira] [Commented] (MAHOUT-1837) Sparse/Dense Matrix analysis for Matrix Multiplication

2016-04-28 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262827#comment-15262827 ] Andrew Palumbo commented on MAHOUT-1837: The first fix is trivial (that's the actual fix in the

[jira] [Created] (MAHOUT-1838) Provide and plotting capabilities for Mahout mtrices and DRMs

2016-04-28 Thread Andrew Palumbo (JIRA)
Andrew Palumbo created MAHOUT-1838: -- Summary: Provide and plotting capabilities for Mahout mtrices and DRMs Key: MAHOUT-1838 URL: https://issues.apache.org/jira/browse/MAHOUT-1838 Project: Mahout

[jira] [Updated] (MAHOUT-1838) Provide and plotting capabilities for Mahout mtrices and DRMs

2016-04-28 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1838: --- Assignee: (was: Andrew Palumbo) > Provide and plotting capabilities for Mahout mtrices

[jira] [Updated] (MAHOUT-1838) Provide and plotting capabilities for Mahout mtrices and DRMs

2016-04-28 Thread Andrew Palumbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Palumbo updated MAHOUT-1838: --- Attachment: drmSamplePlot2d.png > Provide and plotting capabilities for Mahout mtrices and

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Prakash Poudyal
Hi! Thank you for your emails !! Actually, I need to use fuzzy clustering to cluster the sentence in my research. This is my goal. I started to use Fuzzy K means clustering of Mahout since last week !!! I found several blogs links, and many other helpful documents I was going through, as

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-28 Thread Pat Ferrel
Hmm, can’t get images through the Apache mail servers. The image is here: https://drive.google.com/file/d/0B4cAk1SMC1ChWFZiRG9DSEpkdzg/view?usp=sharing On Apr 28, 2016, at 11:55 AM, Pat Ferrel wrote: Actually on your advice Dmitriy I think these changes went in about

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-28 Thread Dmitriy Lyubimov
(sorry for repetition, the list rejects my previous replies due to quoted message size) "Auto" just reclusters the input per given _configured cluster capacity_ (there's some safe guard there though i think that doesn't blow up # of splits if the initial number of splits is ridiculously small

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Prakash Poudyal
Dear Suneel, Dmitriy and Ted, This is just gentle remainder to answer my confusion that I mention in my previous email. It would be great if you could response me sooner, so that I can go ahead. Thank you so much. Prakash On Thu, Apr 28, 2016 at 8:02 PM, Prakash Poudyal