Re: Mahout clustering from lucene index

Cristian Barrientos Montoya Sat, 10 Oct 2015 09:44:58 -0700

Hi there,

Thanks for the reply.


I used as you say, so, this is my command: "bin/mahout kmeans -i
~/project/lucene/vectorsDir/ -k 5 -c ~/project/clustering/ -o
~/project/resultCluster -x 10", and now I'm getting this exception:

Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.io.IntWritable cannot be cast to
org.apache.mahout.math.VectorWritable
at
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:100)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I'm using  the vectors dir that were generated by mahout's "lucene.vector"
command.

Tranks.

2015-10-10 11:08 GMT-05:00 Ankit Goel <ankitgoel2...@gmail.com>:

> Hi,
> In kmeans we need to specify number of clusters and directory of initial
> vectors. When you want random initial vectors, specify k (-k 5) and
> directory for initial vectors -or in this case where they will be saved.
> This is specified by -c ./cluster-directory/initial (thats my preference).
> You can obviously specify any location.
>
> > On 10-Oct-2015, at 7:47 pm, Cristian Barrientos Montoya <
> cs3...@gmail.com> wrote:
> >
> > Hi there,
> > I've been trying to run kmeans clustering on a lucene index, after
> creating
> > the vectors with the command tool "lucene.vector", but the kmeans
> algorithm
> > also needs a clusters input "-c", but I don't know where or how get
> these,
> > would you give me some advice or another way to to the kmeans clustering
> ?
> >
> > My case scenario is:
> > A lot of resources gotten from apache nutch, the resources are on apache
> > solr (v 5.2), so I exported on a json file to create an index on lucene
> (v
> > 4.6), the resources are something like:
> >
> > {
> > "title": "Title #1",
> > "summary": "summary of the resource",
> > "url": "www.urlresources.com/resourceId.jpg",
> > "description": "Some description",
> > "extension": "jpg",
> > "subject": "Subject of the resource",
> > "area": "resource area"
> > }
> >
> > This is how I am indexing to lucene:
> > https://gist.github.com/ColadaFF/1d6557ebaa147753bc9f
> >
> > And the way I am generating vectors is the same as the example on the
> > mahout page:
> > https://mahout.apache.org/users/basics/creating-vectors-from-text.html
> >
> > Am I in the right direction or should I use classification?
> >
> > I'm also reading some resources, but all of them don't say what to do
> with
> > the lucene vectors, so, also any resource you can give will be pretty
> great.
> >
> > Thanks all of you!
>
>

Re: Mahout clustering from lucene index

Reply via email to