Hello,
After I made the changes, I still get the Class not found exception. I created
my project using maven and eclipse and the jar is generated from eclipse export
jar. Do you have any other idea how to resolve this problem?
mahout kmeans -i /user/rhadoop/mahout/abac-out/sequence \
-c /user/rhadoop/mahout/abac-out/canopy-centroids/clusters-0 \
-o /user/rhadoop/mahout/abac-out/clusters-out/ \
-x 10 \
-dm clustering.AbacDistanceMeasure \
-ow
**how the $CLASSPATH looks after line 120:
CLASSPATH
/usr/lib/sqoop/postgresql-9.2-1002.jdbc4.jar:/etc/mahout/conf.dist:/usr/lib/mahout/lib/abacDistance.jar
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
HADOOP_CLASSPATH
/etc/mahout/conf.dist:/usr/lib/mahout/mahout-examples-*-job.jar:/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.2.jar
** the exec command:
CMD: /usr/lib/hadoop/bin/hadoop jar
/usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
org.apache.mahout.driver.MahoutDriver kmeans -i
/user/rhadoop/mahout/abac-out/sequence -c
/user/rhadoop/mahout/abac-out/canopy-centroids/clusters-0 -o
/user/rhadoop/mahout/abac-out/clusters-out/ -x 10 -dm
clustering.AbacDistanceMeasure -ow
13/02/13 17:59:08 INFO common.AbstractJob: Command line arguments:
{--clusters=[/user/rhadoop/mahout/abac-out/canopy-centroids/clusters-0],
--convergenceDelta=[0.5], --distanceMeasure=[clustering.AbacDistanceMeasure],
--endPhase=[2147483647], --input=[/user/rhadoop/mahout/abac-out/sequence],
--maxIter=[10], --method=[mapreduce],
--output=[/user/rhadoop/mahout/abac-out/clusters-out/], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
Exception in thread "main" java.lang.IllegalStateException:
java.lang.ClassNotFoundException: clustering.AbacDistanceMeasure
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:92)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: clustering.AbacDistanceMeasure
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
... 15 more
Thank you,
Mihai Josan
-----Original Message-----
From: Dan Filimon [mailto:[email protected]]
Sent: Wednesday, February 13, 2013 1:25 PM
To: [email protected]
Subject: Re: how to use a custom distance measure with kmeans?
Sure, that sounds like an ever better solution!
I didn't read the entire script. :)
On Wed, Feb 13, 2013 at 6:40 AM, Mahesh Balija <[email protected]>
wrote:
> Hi Dan,
>
> If we copy the jar containing the custom classes to the
> MAHOUT_HOME/lib folder wont that work fine?
> Because at line 147 of mahout script it reads all jars
> under lib folder and put into classpath.
>
> If this won't work prolly there should be some better
> way to add the custom classes to classpath rather than users modifying
> the script file.
>
> Thanks,
> Mahesh Balija,
> Calsoft Labs.
>
> On Tue, Feb 12, 2013 at 10:18 PM, Dan Filimon
> <[email protected]>wrote:
>
>> You need to add the JAR containing the distance measure you want to
>> the classpath.
>> By default the CLASSPATH is set in line 120 of the mahout script.
>> (the script itself is in the bin/ folder of your Mahout installation).
>>
>> Sadly I don't think that scripts allows you to set the class path by
>> default, but it should be a simple add.
>> You can either:
>> a. add the path to your JAR/class folder manually at line 120 b. (the
>> cleaner way) add a new variable called something like
>> MAHOUT_EXTRA_CLASSPATH to line 120 which you can set to whatever you
>> need.
>>
>> b. is a bit cleaner, but you need to modify the script anyway.
>>
>> Alternatively, if you dislike fudging with the script you can have a
>> closer look at it and see that running 'mahout classpath' gives you
>> the classpath it builds. Then you can run the hadoop script directly
>> like in line 252 of the script and edit the HADOOP_CLASSPATH (see
>> http://stackoverflow.com/questions/3799679/how-to-run-a-hadoop-program).
>>
>> This should really be better documented. Sorry you're having trouble!
>>
>> Good luck! :)
>>
>> On Tue, Feb 12, 2013 at 6:30 PM, Mihai Josan
>> <[email protected]> wrote:
>> > This is the error I receive:
>> >
>> > mahout kmeans -i /user/rhadoop/in/sequence/ \
>> >> -c /user/rhadoop/out/canopy-centroids/clusters-0 \
>> >> -o /user/rhadoop/out/clusters-out/ \
>> >> -x 10 \
>> >> -dm
>> /home/rhadoop/projects/workspace/mahout_abac/target/classes/clusterin
>> g/AbacDistanceMeasure.class
>> >
>> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> > Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>> HADOOP_CONF_DIR=/etc/hadoop/conf
>> > MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
>> > 13/02/12 17:05:57 INFO common.AbstractJob: Command line arguments:
>> {--clusters=[/user/rhadoop/out/canopy-centroids/clusters-0],
>> --convergenceDelta=[0.5],
>> --distanceMeasure=[/home/rhadoop/projects/workspace/mahout_abac/targe
>> t/classes/clustering/AbacDistanceMeasure.class],
>> --endPhase=[2147483647], --input=[/user/rhadoop/in/sequence/],
>> --maxIter=[10], --method=[mapreduce],
>> --output=[/user/rhadoop/out/clusters-out2/], --startPhase=[0],
>> --tempDir=[temp]}
>> > Exception in thread "main" java.lang.IllegalStateException:
>> java.lang.ClassNotFoundException:
>> /home/rhadoop/projects/workspace/mahout_abac/target/classes/clusterin
>> g/AbacDistanceMeasure.class
>> > at
>> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
>> > at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.jav
>> a:92)
>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> > at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.ja
>> va:49)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>> java:39)
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>> sorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
>> mDriver.java:72)
>> > at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>> > at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>> java:39)
>> > at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>> sorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> > Caused by: java.lang.ClassNotFoundException:
>> /home/rhadoop/projects/besmart/workspace/mahout_abac/target/classes/c
>> lustering/AbacDistanceMeasure.class
>> > at java.lang.Class.forName0(Native Method)
>> > at java.lang.Class.forName(Class.java:169)
>> > at
>> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
>> > ... 15 more
>> >
>> >
>> > Is this the proper way to use the custom distance measure? or
>> > should I
>> package the class? and how?
>> >
>> > Thank you in advance,
>> > Mihai Josan
>> >
>> >> Are you getting any errors?
>> >> Can you specify fully qualified class name of your distance
>> >> measure
>> (like
>> >> com.xxx.MyDistanceMeasure) and check?
>> >>
>> >> Best,
>> >> Mahesh Balija,
>> >> Calsoft Labs.
>> >>
>> >>
>> >> On Tue, Feb 12, 2013 at 2:28 PM, Mihai Josan <
>> [email protected]>wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > Can you please tell me how can I use a custom made distance
>> >> > measure
>> with
>> >> > Mahout in command line?
>> >> > I am trying to do a clusterizationusing this distance like:
>> >> >
>> >> > mahout kmeans -i in/sequence/ \
>> >> > -c out/centroids/clusters-0 \
>> >> > -o out/clusters-out/ \
>> >> > -x 10 \
>> >> > -dm MyDistanceMeasure \
>> >> > -ow
>> >> >
>> >> > Thank you in advance,
>> >> > Mihai
>> >> >
>>