I am Final year BE Student from Gujarat,India. right now studying in
Information Technology Branch. i have Final Year project as Document
Clustering using Hadoop.
At this stage i am able to find final result from cluster dump command in
which i can see number of document in particular cluster and
In Weka it is possible to mark the field with a question mark ? for unknown
values and these are handled. Is there a similar way to mark
unknown/missing field values in Mahout training and test data as well.
Appreciate any suggestions/pointers. Breiman talks about two ways to handle
missing
From looking at the code recently, no it is not handled.
On Tue, Apr 22, 2014 at 1:27 PM, Himanshu himanshu.ash...@gmail.com wrote:
In Weka it is possible to mark the field with a question mark ? for unknown
values and these are handled. Is there a similar way to mark
unknown/missing field
Also can anyone explain how the .mdtext files are eventually converted
into HTML for the current Mahout website?
I guess there is a static site generator written in Perl ( lib/view.pm
and lib/path.pm ). But what really invokes the site generation in
terms of the entry point?
I was able to
I want to analyze cluster which i did clustering on mahout by kmeans algorithm.
In qualcluster command there is an comman linne argument as -c what kind of
file i need to give as input for kmeans algorithm.
I did it for streaming kmeans. It worked. But every time i run qualcluster i am
getting
What is the error u r seeing?
the output from KMeans is (IntWritable, ClusterWritable)
and for Streaming KMeans its (IntWritable, CentroidWritable)
QualCluster may be expecting the later and hence works for Streaming KMeans.
Could u post the error u r seeing?
On Tue, Apr 22, 2014 at 9:12 AM,
On Tue, Apr 22, 2014 at 12:11 AM, Darshan Sonagara
darshan.sonag...@gmail.com wrote:
But the problem is that i want check that whether my clustering is good or
bad. so for that i need to calculate Entropy Value. I am not having any
idea how to calculate entropy in mahout or by other
yes exactly sir,
it is expecting CentroidWritable.
so error is like ClassCast Exception.
i will send snapshot soon.
but can you tell me one thing that every time i run qualcluster for
Streaming KMeans it is showing different output. why it is like that ?
and as you you instructed earlier i checked
Thnks for the Replay sir,
actually i am doing clustering for gathering similar king of document in
same cluster as much as possible.
i can see from output file by cluster dump by observing top term.
i also figure out that by varying Distance Measure Technique. it differs.
but i want some
waiting for the replay sir . . . .
On Tue, Apr 22, 2014 at 7:00 PM, Darshan Sonagara
darshan.sonag...@gmail.com wrote:
yes exactly sir,
it is expecting CentroidWritable.
so error is like ClassCast Exception.
i will send snapshot soon.
but can you tell me one thing that every time i run
waiting for the replay sir .
On Tue, Apr 22, 2014 at 7:13 PM, Darshan Sonagara
darshan.sonag...@gmail.com wrote:
Thnks for the Replay sir,
actually i am doing clustering for gathering similar king of document in
same cluster as much as possible.
i can see from output file by cluster dump
Sebastian created an example import around this that is really nice for several
reasons so anyone interested should check out
https://issues.apache.org/jira/browse/MAHOUT-1518, make sure to look at the
patches, the comment thread is a bit cluttered.
1) Spark is awesome because of it’s use of
12 matches
Mail list logo