Thanks for the hint. Now I get this exception:

$ mahout seq2sparse -i ~/run/posts2.seq -o ~/run/posts2-vec -seq -nv

Nov 19, 2012 6:09:22 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0001
java.lang.IllegalStateException: java.lang.reflect.InvocationTargetException
    at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:70)
    at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
    at
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:58)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
    at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:62)
    ... 6 more
Caused by: java.lang.NoSuchFieldError: LUCENE_36
    at
org.apache.mahout.vectorizer.DefaultAnalyzer.<init>(DefaultAnalyzer.java:34)
    ... 11 more

Any idea what causes this?

Thanks,
Chris


On Sun, Nov 18, 2012 at 10:11 PM, DAN HELM <[email protected]> wrote:

> Chris,
>
> I assume you ran the kmeans algorithm?
>
> I believe the clusteredPoints file should prefix the document vectors with
> the text version of the processed documents (assuming seq2sparse was run
> with named vector (-nv) option),
> as shown in "Cluster documents using kmeans", step 3. here:
>
> https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html
>
> But for the cluster id part (the Key), I believe one does have to map that
> numeric key with the corresponding ids from main cluster results (i.e., in
> "clusters-<n>-final" results).
> As I recall the corresponding keys in the "final" folder will be CL-<id>
> or VL-<id>, specifying the state of the final cluster (converged or not):
> http://lucene.472066.n3.nabble.com/retrieve-k-means-result-td1386091.html
> I believe you just need to parse the ids from the clusteredPoints output
> (the Key) and map them to the number following "CL-" or "VL-" in the
> "final" output to identify the corresponding clusters.
>
> Dan
>
>   *From:* Christopher Laux <[email protected]>
> *To:* [email protected]
> *Sent:* Sunday, November 18, 2012 11:37 AM
> *Subject:* Conversion of point numbers to key strings
>
> Hi all,
>
> I can read mahout's output in "clusteredPoints" but that only provides
> point numbers. When I input the data to a sequence file I used strings as
> keys. Is there any way of recovering the key strings from the point
> numbers? Or do I have to keep track of that myself?
>
> Thanks,
> Chris
>
>
>
>

Reply via email to