Re: Problem with FileSystem in Kmeans

2014-03-12 Thread Sebastian Schelter
Hi Bikash, Have you tried adding hdfs:// to your input path? Maybe that helps. --sebastian On 03/11/2014 11:22 AM, Bikash Gupta wrote: Hi, I am running Kmeans in cluster where I am setting the configuration of fs.hdfs.impl and fs.file.impl before hand as mentioned below

Re: Problem with FileSystem in Kmeans

2014-03-12 Thread Bikash Gupta
Hi, Problem is not with input path, its the way Kmeans is getting executed. Let me explain. I have created CSV-Sequence using map-reduce hence my data is in HDFS After this I have run Canopy MR hence data is also in HDFS Now these two things are getting pushed in Kmeans MR. If you check

Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of stuff and have been startled by the amout of outdated and incorrect information on our website, as well as links pointing to nowhere. I think our

Re: Website, urgent help needed

2014-03-12 Thread Juan José Ramos
Hi Sebastian, I am afraid I am only familiar with the recommendation part. In previous posts, I pointed a couple of errors in this wiki page: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line If you are planning to keep it in the new

Re: Website, urgent help needed

2014-03-12 Thread Pavan Kumar N
i ll help with clustering algorithms documentation. do send me old documentation and i will check and remove errors. or better let me know how to proceed. Pavan On Mar 12, 2014 12:35 PM, Sebastian Schelter s...@apache.org wrote: Hi, As you've probably noticed, I've put in a lot of effort

Re: Website, urgent help needed

2014-03-12 Thread Pavan Kumar N
hi. just read the whole email just now as earlier i was travelling. i am on it. On Mar 12, 2014 12:35 PM, Sebastian Schelter s...@apache.org wrote: Hi, As you've probably noticed, I've put in a lot of effort over the last days to kickstart cleaning up our website. I've thrown out a lot of

Re: Website, urgent help needed

2014-03-12 Thread Kevin Moulart
I can confirm what Sebastian said, I'm fairly new on this and I did find myself so desperate at some point that I almost gave up on Mahout dut to lack of documentation, but my feeling is that it doesn't only concerns the website : the API is too few documented as well. At this point there are no

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
We don't exactly have that page, but we have pages that touch parts of it, such as https://mahout.apache.org/users/basics/creating-vectors-from-text.html It would be great if you could create a jira ticket which lists the errors. I'll fix them then. Best, Sebastian On 03/12/2014 08:42 AM,

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
Hi Pavan, Awesome that you're willing to help. The documentation are the pages listed under Clustering in the navigation bar under mahout.apache.org If you start working on one of the pages listed there (e.g. the k-Means doc), please created jira ticket in our issue tracker with a title

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
Hi Manoj, Awesome that you're willing to help. I suggest we proceed analogously to the clustering cleanup: The documentation are the pages listed under Classification in the navigation bar under mahout.apache.org If you start working on one of the pages listed there (e.g. the Naive Bayes

Re: Website, urgent help needed

2014-03-12 Thread Pavan Kumar N
Hi Kevin, go to eclipse market place and install m2eclipse . after you do a mvn install on your mahout, import the compiled mahout. I ll try to write detailed documentation with screenshots but for the moment use the above as starting point. On 12 March 2014 15:29, Sebastian Schelter

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
Hi Kevin, Thank you for offer to help! Feel free to ask questions here how to setup the sources in Eclipse. If you succeed, you could writeup what you did and we could add this to the website, as I'm sure a lot of others will have the same problem. It would be great if you could start

Re: Website, urgent help needed

2014-03-12 Thread pramit choudhary
Hi All, I would also like to participate in cleaning up the documentation. Since, I am fairly new to the Mahout infrastructure. It will in-turn help me understand things better. Do we already have a Jira ticket for organizing the cleaning up of documentation ? Just want to be sure, that I am

Re: Website, urgent help needed

2014-03-12 Thread Sebastian Schelter
Here you can see all issues (resolved and unresolved) for the next release: https://issues.apache.org/jira/browse/MAHOUT-1413?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%201.0%20ORDER%20BY%20priority%20DESC When you start to work on the cleanup of a page, make sure that there is no

Re: Website, urgent help needed

2014-03-12 Thread Kevin Moulart
Thanks, I'll do that partly on my free time since I'm working on other things at work right now :) Kévin Moulart 2014-03-12 11:07 GMT+01:00 Sebastian Schelter s...@apache.org: Hi Kevin, Thank you for offer to help! Feel free to ask questions here how to setup the sources in Eclipse. If you

Re: Website, urgent help needed

2014-03-12 Thread pramit choudhary
Thanks Sebastian, that's a great help. #Pramit On Wed, Mar 12, 2014 at 3:37 AM, Kevin Moulart kevinmoul...@gmail.comwrote: Thanks, I'll do that partly on my free time since I'm working on other things at work right now :) Kévin Moulart 2014-03-12 11:07 GMT+01:00 Sebastian Schelter

Re: Problem with FileSystem in Kmeans

2014-03-12 Thread Bikash Gupta
Should I raise JIRA ? On Wed, Mar 12, 2014 at 12:31 PM, Bikash Gupta bikash.gupt...@gmail.comwrote: Hi, Problem is not with input path, its the way Kmeans is getting executed. Let me explain. I have created CSV-Sequence using map-reduce hence my data is in HDFS After this I have run

Re: Website, urgent help needed

2014-03-12 Thread Scott C. Cote
I took the tour of the text analysis and pushed through despite the problems on the page. Commiters helped me over the hump where others might have just gave up (to your point). When I did it, I made shell scripts so that my steps would be repeatable with an anticipation of updating the page.

Re: Website, urgent help needed

2014-03-12 Thread Scott C. Cote
I’ll make it work. Don’t know markdown (assume some reduced mark”up” language) - but I’ll figure it out. I will assume that I can check with my consulting buddy “Google” and find it. :) Thank you for your contributions - glad that I can give “something” back. I’ll start off by sending the doc to

Re: Problem with FileSystem in Kmeans

2014-03-12 Thread Andrew Musselman
Yes please; if you're seeing confusing behavior when you leave the hdfs protocol off the URI then it may need some tending. On Mar 12, 2014, at 7:22 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: Should I raise JIRA ? On Wed, Mar 12, 2014 at 12:31 PM, Bikash Gupta

Re: Website, urgent help needed

2014-03-12 Thread Andrew Musselman
Thanks Scott; please just attach your work to an issue in the Jira system; if there's not one already you could file a new issue. On Mar 12, 2014, at 7:44 AM, Scott C. Cote scottcc...@gmail.com wrote: I’ll make it work. Don’t know markdown (assume some reduced mark”up” language) - but I’ll

Re: Website, urgent help needed

2014-03-12 Thread Scott C. Cote
ok On 3/12/14, 9:58 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Thanks Scott; please just attach your work to an issue in the Jira system; if there's not one already you could file a new issue. On Mar 12, 2014, at 7:44 AM, Scott C. Cote scottcc...@gmail.com wrote: I’ll make it

Compiling Mahout with maven in Eclipse

2014-03-12 Thread Kevin Moulart
Hi, I tried to fix all the problem I had to configure eclipse in order to compile mahout in it using maven clean package as goal. First I had to make a change in mahout core in the class GroupTree.java, line 171 : stack = new ArrayDequeGroupTree(); Then I tried compiling with eclipse (I

Re: Compiling Mahout with maven in Eclipse

2014-03-12 Thread Kevin Moulart
Never mind, I found where the problem lied, I deleted the full content of .m2 and retried it as non root user and it worked. Trying in Eclipse now, with tests I'll let you now if it doesn't work. Kévin Moulart 2014-03-12 16:45 GMT+01:00 Kevin Moulart kevinmoul...@gmail.com: Hi, I tried to

Re: Problem with FileSystem in Kmeans

2014-03-12 Thread Bikash Gupta
MAHOUT-1452 has been raised On Wed, Mar 12, 2014 at 8:26 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Yes please; if you're seeing confusing behavior when you leave the hdfs protocol off the URI then it may need some tending. On Mar 12, 2014, at 7:22 AM, Bikash Gupta

Automation of Canopy Clustering seeding t1 and t2

2014-03-12 Thread Bikash Gupta
Hi, Finding out right T1 and T2 in canopy is time taking task with manual intervention. I am planning to automate the process of calculation. Idea is I would increment T1 and T2 by x times of 3.1 and x times of 2.1, and would collect the approx T1 and T2 for each K cluster. Not sure if this is

Re: Automation of Canopy Clustering seeding t1 and t2

2014-03-12 Thread Suneel Marthi
Is there any rational to what u r proposing?  Its better to go with Streaming KMeans than the combination of Canopy - KMeans clustering.  Moreover, Canopy clustering (due to a single reducer in Canopy Generation phase) is more likely to fail with large datasets and that's a behavior that's

Re: Automation of Canopy Clustering seeding t1 and t2

2014-03-12 Thread Bikash Gupta
Not exactly, I was trying to build a logic for this calculation, but before that I thought to take suggestion from everyone. Anyways will give a try with Streaming KMeans. On Thu, Mar 13, 2014 at 3:43 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: Is there any rational to what u r