Question Regarding Entropy calculation in Mahout

2014-04-22 Thread Darshan Sonagara
I am Final year BE Student from Gujarat,India. right now studying in Information Technology Branch. i have Final Year project as Document Clustering using Hadoop. At this stage i am able to find final result from cluster dump command in which i can see number of document in particular cluster and

Does Mahout handle missing values in train and test data, for Decision Forest?

2014-04-22 Thread Himanshu
In Weka it is possible to mark the field with a question mark ? for unknown values and these are handled. Is there a similar way to mark unknown/missing field values in Mahout training and test data as well. Appreciate any suggestions/pointers. Breiman talks about two ways to handle missing

Re: Does Mahout handle missing values in train and test data, for Decision Forest?

2014-04-22 Thread Sean Owen
From looking at the code recently, no it is not handled. On Tue, Apr 22, 2014 at 1:27 PM, Himanshu himanshu.ash...@gmail.com wrote: In Weka it is possible to mark the field with a question mark ? for unknown values and these are handled. Is there a similar way to mark unknown/missing field

Re: Is there any website documentation repository or tool for Apache Mahout?

2014-04-22 Thread tuxdna
Also can anyone explain how the .mdtext files are eventually converted into HTML for the current Mahout website? I guess there is a static site generator written in Perl ( lib/view.pm and lib/path.pm ). But what really invokes the site generation in terms of the entry point? I was able to

Getting error in qualcluster command

2014-04-22 Thread Darshan Sonagara
I want to analyze cluster which i did clustering on mahout by kmeans algorithm. In qualcluster command there is an comman linne argument as -c what kind of file i need to give as input for kmeans algorithm. I did it for streaming kmeans. It worked. But every time i run qualcluster i am getting

Re: Getting error in qualcluster command

2014-04-22 Thread Suneel Marthi
What is the error u r seeing? the output from KMeans is (IntWritable, ClusterWritable) and for Streaming KMeans its (IntWritable, CentroidWritable) QualCluster may be expecting the later and hence works for Streaming KMeans. Could u post the error u r seeing? On Tue, Apr 22, 2014 at 9:12 AM,

Re: Question Regarding Entropy calculation in Mahout

2014-04-22 Thread Ted Dunning
On Tue, Apr 22, 2014 at 12:11 AM, Darshan Sonagara darshan.sonag...@gmail.com wrote: But the problem is that i want check that whether my clustering is good or bad. so for that i need to calculate Entropy Value. I am not having any idea how to calculate entropy in mahout or by other

Re: Getting error in qualcluster command

2014-04-22 Thread Darshan Sonagara
yes exactly sir, it is expecting CentroidWritable. so error is like ClassCast Exception. i will send snapshot soon. but can you tell me one thing that every time i run qualcluster for Streaming KMeans it is showing different output. why it is like that ? and as you you instructed earlier i checked

Re: Question Regarding Entropy calculation in Mahout

2014-04-22 Thread Darshan Sonagara
Thnks for the Replay sir, actually i am doing clustering for gathering similar king of document in same cluster as much as possible. i can see from output file by cluster dump by observing top term. i also figure out that by varying Distance Measure Technique. it differs. but i want some

Re: Getting error in qualcluster command

2014-04-22 Thread Darshan Sonagara
waiting for the replay sir . . . . On Tue, Apr 22, 2014 at 7:00 PM, Darshan Sonagara darshan.sonag...@gmail.com wrote: yes exactly sir, it is expecting CentroidWritable. so error is like ClassCast Exception. i will send snapshot soon. but can you tell me one thing that every time i run

Re: Question Regarding Entropy calculation in Mahout

2014-04-22 Thread Darshan Sonagara
waiting for the replay sir . On Tue, Apr 22, 2014 at 7:13 PM, Darshan Sonagara darshan.sonag...@gmail.com wrote: Thnks for the Replay sir, actually i am doing clustering for gathering similar king of document in same cluster as much as possible. i can see from output file by cluster dump

Re: Spark Mahout with a CLI?

2014-04-22 Thread Pat Ferrel
Sebastian created an example import around this that is really nice for several reasons so anyone interested should check out https://issues.apache.org/jira/browse/MAHOUT-1518, make sure to look at the patches, the comment thread is a bit cluttered. 1) Spark is awesome because of it’s use of