Re: KMeansDriver and distributed cache

Ken Krugler Fri, 20 Dec 2013 14:48:14 -0800

On Dec 20, 2013, at 2:35pm, Sameer Tilak <[email protected]> wrote:

> Hi All,
> I was able to resolve this issue by adding the following to my code:
> 
>        DistributedCache.addFileToClassPath(new 
> Path("/scratch/mahout-math-0.9-\
> SNAPSHOT.jar"), conf,fs);
>    DistributedCache.addFileToClassPath(new Path("/scratch/mahout-core-0.9-\
> SNAPSHOT.jar"), conf,fs);
>        DistributedCache.addFileToClassPath(new 
> Path("/scratch/mahout-core-0.9-\
> SNAPSHOT-job.jar"), conf,fs);
> 
> Note, I did not use Tool or Toolrunner in my code.


In order for -libjars xxx to work, the main class needs to implement Tool and 
call ToolRunner.run()

Note that this isn't a Mahout-specific issue, it's generic Hadoop usage. 

-- Ken



>> From: [email protected]
>> To: [email protected]
>> Subject: KMeansDriver and distributed cache
>> Date: Thu, 19 Dec 2013 17:05:26 -0800
>> 
>> Hi All,
>> I am trying to execute the following command:
>> 
>> hadoop jar /apps/analytics/myanalytics.jar 
>> myanalytics.SimpleKMeansClustering -libjars 
>> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar 
>> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>> 
>> I have call the following method in my SimpleKMeansClustering class:
>> 
>>            KMeansDriver.run(conf, new Path("/scratch/dummyvector.seq"), new 
>> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
>>                             new Path("/scratch/dummyvectoroutput"), new 
>> EuclideanDistanceMeasure(), 0.001, 10,
>>                             true, 1.0, false);
>> 
>> 
>> I unfortunately get the following error, In think somehow the jars are not 
>> made available in the distributed cached. I use Vectors to repreent my data 
>> and I write it to a sequence file. I then use that Driver to analyze that in 
>> the mapreduce mode. I think locally all the required jar files are 
>> available, however somehow in the mapreduce mode they are not available. Any 
>> help with this would be great!
>> 
>> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input: /scratch/dummyvector.seq 
>> Clusters In: /scratch/dummyvector-initclusters/part-randomSeed Out: 
>> /scratch/dummyvectoroutput Distance: 
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max 
>> Iterations: 10
>> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop 
>> library
>> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & initialized 
>> native-zlib library
>> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
>> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for 
>> parsing the arguments. Applications should implement Tool for the same.
>> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to process : 
>> 1
>> 13/12/19 16:59:03 INFO mapred.JobClient: Running job: job_201311111627_0310
>> 13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%
>> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id : 
>> attempt_201311111627_0310_m_000000_0, Status : FAILED
>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>    at java.lang.Class.forName0(Native Method)
>>    at java.lang.Class.forName(Class.java:264)
>>    at 
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>>    at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
>>    at 
>> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
>>    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
>>    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>>    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>>    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
>>    at 
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>>    at 
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
>>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at javax.security.auth.Subject.doAs(Subject.java:415)
>>    at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> 
>> 13/12/19 16:59:28 INFO mapred.JobClient: Task Id : 
>> attempt_201311111627_0310_m_000000_1, Status : FAILED
>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>    at java.lang.Class.forName0(Native Method)
>>    at java.lang.Class.forName(Class.java:264)
>>    at 
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>>    at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
>>    at 
>> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
>>    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
>>    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>>    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>>    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
>>    at 
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>>    at 
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
>>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at javax.security.auth.Subject.doAs(Subject.java:415)
>>    at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> 
>>                                        
>                                         

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Re: KMeansDriver and distributed cache

Reply via email to