On Dec 20, 2013, at 2:35pm, Sameer Tilak <[email protected]> wrote: > Hi All, > I was able to resolve this issue by adding the following to my code: > > DistributedCache.addFileToClassPath(new > Path("/scratch/mahout-math-0.9-\ > SNAPSHOT.jar"), conf,fs); > DistributedCache.addFileToClassPath(new Path("/scratch/mahout-core-0.9-\ > SNAPSHOT.jar"), conf,fs); > DistributedCache.addFileToClassPath(new > Path("/scratch/mahout-core-0.9-\ > SNAPSHOT-job.jar"), conf,fs); > > Note, I did not use Tool or Toolrunner in my code.
In order for -libjars xxx to work, the main class needs to implement Tool and call ToolRunner.run() Note that this isn't a Mahout-specific issue, it's generic Hadoop usage. -- Ken >> From: [email protected] >> To: [email protected] >> Subject: KMeansDriver and distributed cache >> Date: Thu, 19 Dec 2013 17:05:26 -0800 >> >> Hi All, >> I am trying to execute the following command: >> >> hadoop jar /apps/analytics/myanalytics.jar >> myanalytics.SimpleKMeansClustering -libjars >> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar >> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar >> >> I have call the following method in my SimpleKMeansClustering class: >> >> KMeansDriver.run(conf, new Path("/scratch/dummyvector.seq"), new >> Path("/scratch/dummyvector-initclusters/part-randomSeed/"), >> new Path("/scratch/dummyvectoroutput"), new >> EuclideanDistanceMeasure(), 0.001, 10, >> true, 1.0, false); >> >> >> I unfortunately get the following error, In think somehow the jars are not >> made available in the distributed cached. I use Vectors to repreent my data >> and I write it to a sequence file. I then use that Driver to analyze that in >> the mapreduce mode. I think locally all the required jar files are >> available, however somehow in the mapreduce mode they are not available. Any >> help with this would be great! >> >> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input: /scratch/dummyvector.seq >> Clusters In: /scratch/dummyvector-initclusters/part-randomSeed Out: >> /scratch/dummyvectoroutput Distance: >> org.apache.mahout.common.distance.EuclideanDistanceMeasure >> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max >> Iterations: 10 >> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop >> library >> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & initialized >> native-zlib library >> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor >> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the same. >> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to process : >> 1 >> 13/12/19 16:59:03 INFO mapred.JobClient: Running job: job_201311111627_0310 >> 13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0% >> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id : >> attempt_201311111627_0310_m_000000_0, Status : FAILED >> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:264) >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) >> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71) >> at >> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613) >> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486) >> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) >> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470) >> at >> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50) >> at >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >> at org.apache.hadoop.mapred.Child.main(Child.java:249) >> >> 13/12/19 16:59:28 INFO mapred.JobClient: Task Id : >> attempt_201311111627_0310_m_000000_1, Status : FAILED >> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:423) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:356) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:264) >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) >> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71) >> at >> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671) >> at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613) >> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486) >> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475) >> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470) >> at >> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50) >> at >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >> at org.apache.hadoop.mapred.Child.main(Child.java:249) >> >> > -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
