Hi there, Thanks for the reply.
I used as you say, so, this is my command: "bin/mahout kmeans -i ~/project/lucene/vectorsDir/ -k 5 -c ~/project/clustering/ -o ~/project/resultCluster -x 10", and now I'm getting this exception: Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.mahout.math.VectorWritable at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:100) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) I'm using the vectors dir that were generated by mahout's "lucene.vector" command. Tranks. 2015-10-10 11:08 GMT-05:00 Ankit Goel <ankitgoel2...@gmail.com>: > Hi, > In kmeans we need to specify number of clusters and directory of initial > vectors. When you want random initial vectors, specify k (-k 5) and > directory for initial vectors -or in this case where they will be saved. > This is specified by -c ./cluster-directory/initial (thats my preference). > You can obviously specify any location. > > > On 10-Oct-2015, at 7:47 pm, Cristian Barrientos Montoya < > cs3...@gmail.com> wrote: > > > > Hi there, > > I've been trying to run kmeans clustering on a lucene index, after > creating > > the vectors with the command tool "lucene.vector", but the kmeans > algorithm > > also needs a clusters input "-c", but I don't know where or how get > these, > > would you give me some advice or another way to to the kmeans clustering > ? > > > > My case scenario is: > > A lot of resources gotten from apache nutch, the resources are on apache > > solr (v 5.2), so I exported on a json file to create an index on lucene > (v > > 4.6), the resources are something like: > > > > { > > "title": "Title #1", > > "summary": "summary of the resource", > > "url": "www.urlresources.com/resourceId.jpg", > > "description": "Some description", > > "extension": "jpg", > > "subject": "Subject of the resource", > > "area": "resource area" > > } > > > > This is how I am indexing to lucene: > > https://gist.github.com/ColadaFF/1d6557ebaa147753bc9f > > > > And the way I am generating vectors is the same as the example on the > > mahout page: > > https://mahout.apache.org/users/basics/creating-vectors-from-text.html > > > > Am I in the right direction or should I use classification? > > > > I'm also reading some resources, but all of them don't say what to do > with > > the lucene vectors, so, also any resource you can give will be pretty > great. > > > > Thanks all of you! > >