Just succeed to make work my app. Should to use ClusterDumperWriter.gettopfeatures(ar1,arg2,arg3) and that gave me the top words on human readable format :D
-----Message d'origine----- De : Paritosh Ranjan [mailto:[email protected]] Envoyé : mardi 7 août 2012 10:32 À : [email protected] Objet : Re: ClusterDumper eclipse human readable output kmeans I don't know why ClusterDumper is not working, but I can give an alternate solution. Use ClusterOutputPostProcessor (clusterpp), on the clusters-*-final directory. https://cwiki.apache.org/MAHOUT/top-down-clustering.html It will arrange the vectors in respective directories. However, it will still be in the form of sequence files. Its very simple to read a sequence file and write in a human readable format. Classes in org.apache.mahout.common.iterator.sequencefile package can help to read the sequence files easily. On 07-08-2012 12:50, Videnova, Svetlana wrote: > I already generated points directory when i run cluster (kmeans in my case). > But for the moment I can't generate clustedump because of error on this line: > ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, > conf); Second parameter is double but he wants int but does not accept int > .... well pretty confused ... > > > > -----Message d'origine----- > De : kiran kumar [mailto:[email protected]] > Envoyé : lundi 6 août 2012 18:01 > À : [email protected] > Objet : Re: ClusterDumper eclipse human readable output kmeans > > Hello, > Clusterdump actually shows you the top terms and vectors of centroid and each > document. But to identify what vectors are for your document, You need to > generate points directory when running clustering algorithm and use the > points directory generated in the above step when generating cluster dump. > > Thanks, > Kiran Bushireddy. > > On Mon, Aug 6, 2012 at 10:33 AM, Videnova, Svetlana < > [email protected]> wrote: > >> Hi, >> >> My goal is to transform the vectors created by lucene.vector (thanks >> to kmeans clustering) to a human readable format. For that I am using >> ClusterDumper function on eclipse. But that code does not generate >> none files. What am I missing? What is the best approach to transform >> output of kmeans to a human readable (no unix command please I am on >> windows using eclipse and cygwin). >> This is the code: >> >> >> Code : >> >> Map<Integer, List<WeightedVectorWritable>> result = >> ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, >> conf); >> >> System.out.println(result.get(0).toString()); >> for(int j = 0; j < result.size(); j++){ >> List<WeightedVectorWritable> list = result.get(j); >> for(WeightedVectorWritable vector : list){ >> >> System.out.println(vector.getVector().asFormatString()); >> } >> >> } >> >> >> Error : >> >> Exception in thread "main" java.lang.ClassCastException: >> org.apache.mahout.clustering.iterator.ClusterWritable cannot be cast >> to org.apache.mahout.clustering.classify.WeightedVectorWritable >> at main.LuceneDemo.main(LuceneDemo.java:260) >> >> >> >> Thank you >> >> >> Think green - keep it on the screen. >> >> This e-mail and any attachment is for authorised use by the intended >> recipient(s) only. It may contain proprietary material, confidential >> information and/or be subject to legal privilege. It should not be >> copied, disclosed to, retained or used by, any other party. If you >> are not an intended recipient then please promptly delete this e-mail >> and any attachment and all copies and inform the sender. Thank you. >> >> > > -- > Thanks & Regards, > Kiran Kumar > > Think green - keep it on the screen. > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any attachment > and all copies and inform the sender. Thank you. > > Think green - keep it on the screen. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
