Hi, In kmeans we need to specify number of clusters and directory of initial vectors. When you want random initial vectors, specify k (-k 5) and directory for initial vectors -or in this case where they will be saved. This is specified by -c ./cluster-directory/initial (thats my preference). You can obviously specify any location.
> On 10-Oct-2015, at 7:47 pm, Cristian Barrientos Montoya <cs3...@gmail.com> > wrote: > > Hi there, > I've been trying to run kmeans clustering on a lucene index, after creating > the vectors with the command tool "lucene.vector", but the kmeans algorithm > also needs a clusters input "-c", but I don't know where or how get these, > would you give me some advice or another way to to the kmeans clustering ? > > My case scenario is: > A lot of resources gotten from apache nutch, the resources are on apache > solr (v 5.2), so I exported on a json file to create an index on lucene (v > 4.6), the resources are something like: > > { > "title": "Title #1", > "summary": "summary of the resource", > "url": "www.urlresources.com/resourceId.jpg", > "description": "Some description", > "extension": "jpg", > "subject": "Subject of the resource", > "area": "resource area" > } > > This is how I am indexing to lucene: > https://gist.github.com/ColadaFF/1d6557ebaa147753bc9f > > And the way I am generating vectors is the same as the example on the > mahout page: > https://mahout.apache.org/users/basics/creating-vectors-from-text.html > > Am I in the right direction or should I use classification? > > I'm also reading some resources, but all of them don't say what to do with > the lucene vectors, so, also any resource you can give will be pretty great. > > Thanks all of you!