I don't know why ClusterDumper is not working, but I can give an
alternate solution.
Use ClusterOutputPostProcessor (clusterpp), on the clusters-*-final
directory. https://cwiki.apache.org/MAHOUT/top-down-clustering.html
It will arrange the vectors in respective directories. However, it will
still be in the form of sequence files.
Its very simple to read a sequence file and write in a human readable
format.
Classes in org.apache.mahout.common.iterator.sequencefile package can
help to read the sequence files easily.
On 07-08-2012 12:50, Videnova, Svetlana wrote:
I already generated points directory when i run cluster (kmeans in my case).
But for the moment I can't generate clustedump because of error on this line:
ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, conf);
Second parameter is double but he wants int but does not accept int .... well
pretty confused ...
-----Message d'origine-----
De : kiran kumar [mailto:[email protected]]
Envoyé : lundi 6 août 2012 18:01
À : [email protected]
Objet : Re: ClusterDumper eclipse human readable output kmeans
Hello,
Clusterdump actually shows you the top terms and vectors of centroid and each
document. But to identify what vectors are for your document, You need to
generate points directory when running clustering algorithm and use the points
directory generated in the above step when generating cluster dump.
Thanks,
Kiran Bushireddy.
On Mon, Aug 6, 2012 at 10:33 AM, Videnova, Svetlana <
[email protected]> wrote:
Hi,
My goal is to transform the vectors created by lucene.vector (thanks
to kmeans clustering) to a human readable format. For that I am using
ClusterDumper function on eclipse. But that code does not generate
none files. What am I missing? What is the best approach to transform
output of kmeans to a human readable (no unix command please I am on
windows using eclipse and cygwin).
This is the code:
Code :
Map<Integer, List<WeightedVectorWritable>> result =
ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2,
conf);
System.out.println(result.get(0).toString());
for(int j = 0; j < result.size(); j++){
List<WeightedVectorWritable> list = result.get(j);
for(WeightedVectorWritable vector : list){
System.out.println(vector.getVector().asFormatString());
}
}
Error :
Exception in thread "main" java.lang.ClassCastException:
org.apache.mahout.clustering.iterator.ClusterWritable cannot be cast
to org.apache.mahout.clustering.classify.WeightedVectorWritable
at main.LuceneDemo.main(LuceneDemo.java:260)
Thank you
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by, any other party. If you are
not an intended recipient then please promptly delete this e-mail and
any attachment and all copies and inform the sender. Thank you.
--
Thanks & Regards,
Kiran Kumar
Think green - keep it on the screen.
This e-mail and any attachment is for authorised use by the intended
recipient(s) only. It may contain proprietary material, confidential
information and/or be subject to legal privilege. It should not be copied,
disclosed to, retained or used by, any other party. If you are not an intended
recipient then please promptly delete this e-mail and any attachment and all
copies and inform the sender. Thank you.