You also need to specify a fully-qualified class name
On 2/12/13 11:48 AM, Dan Filimon wrote:
You need to add the JAR containing the distance measure you want to
the classpath.
By default the CLASSPATH is set in line 120 of the mahout script. (the
script itself is in the bin/ folder of your Mahout installation).
Sadly I don't think that scripts allows you to set the class path by
default, but it should be a simple add.
You can either:
a. add the path to your JAR/class folder manually at line 120
b. (the cleaner way) add a new variable called something like
MAHOUT_EXTRA_CLASSPATH to line 120 which you can set to whatever you
need.
b. is a bit cleaner, but you need to modify the script anyway.
Alternatively, if you dislike fudging with the script you can have a
closer look at it and see that running 'mahout classpath' gives you
the classpath it builds. Then you can run the hadoop script directly
like in line 252 of the script and edit the HADOOP_CLASSPATH (see
http://stackoverflow.com/questions/3799679/how-to-run-a-hadoop-program).
This should really be better documented. Sorry you're having trouble!
Good luck! :)
On Tue, Feb 12, 2013 at 6:30 PM, Mihai Josan
<[email protected]> wrote:
This is the error I receive:
mahout kmeans -i /user/rhadoop/in/sequence/ \
-c /user/rhadoop/out/canopy-centroids/clusters-0 \
-o /user/rhadoop/out/clusters-out/ \
-x 10 \
-dm
/home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
13/02/12 17:05:57 INFO common.AbstractJob: Command line arguments:
{--clusters=[/user/rhadoop/out/canopy-centroids/clusters-0],
--convergenceDelta=[0.5],
--distanceMeasure=[/home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class],
--endPhase=[2147483647], --input=[/user/rhadoop/in/sequence/], --maxIter=[10],
--method=[mapreduce], --output=[/user/rhadoop/out/clusters-out2/],
--startPhase=[0], --tempDir=[temp]}
Exception in thread "main" java.lang.IllegalStateException:
java.lang.ClassNotFoundException:
/home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
at
org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:92)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException:
/home/rhadoop/projects/besmart/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at
org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
... 15 more
Is this the proper way to use the custom distance measure? or should I package
the class? and how?
Thank you in advance,
Mihai Josan
Are you getting any errors?
Can you specify fully qualified class name of your distance measure (like
com.xxx.MyDistanceMeasure) and check?
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Feb 12, 2013 at 2:28 PM, Mihai Josan <[email protected]>wrote:
Hello,
Can you please tell me how can I use a custom made distance measure with
Mahout in command line?
I am trying to do a clusterizationusing this distance like:
mahout kmeans -i in/sequence/ \
-c out/centroids/clusters-0 \
-o out/clusters-out/ \
-x 10 \
-dm MyDistanceMeasure \
-ow
Thank you in advance,
Mihai