RE: KMeansDriver and distributed cache

Sameer Tilak Fri, 20 Dec 2013 15:01:39 -0800

Hi Ken,
Thanks. I was going through that route. I was wondering if there is any 
advantage approach that uses Tool and call ToolRunner.run() over the one that 
uses DistributedCache.addFileToClassPath. May be the former one is more generic 
and can help you with things other than adding jar files. However, using 
DistributedCache.addFileToClassPath I was able to have them seen in worker 
nodes.


> From: [email protected]
> Subject: Re: KMeansDriver and distributed cache
> Date: Fri, 20 Dec 2013 14:47:13 -0800
> To: [email protected]
> 
> 
> On Dec 20, 2013, at 2:35pm, Sameer Tilak <[email protected]> wrote:
> 
> > Hi All,
> > I was able to resolve this issue by adding the following to my code:
> > 
> >        DistributedCache.addFileToClassPath(new 
> > Path("/scratch/mahout-math-0.9-\
> > SNAPSHOT.jar"), conf,fs);
> >    DistributedCache.addFileToClassPath(new Path("/scratch/mahout-core-0.9-\
> > SNAPSHOT.jar"), conf,fs);
> >        DistributedCache.addFileToClassPath(new 
> > Path("/scratch/mahout-core-0.9-\
> > SNAPSHOT-job.jar"), conf,fs);
> > 
> > Note, I did not use Tool or Toolrunner in my code. 
> 
> In order for -libjars xxx to work, the main class needs to implement Tool and 
> call ToolRunner.run()
> 
> Note that this isn't a Mahout-specific issue, it's generic Hadoop usage. 
> 
> -- Ken
> 
> 
> 
> >> From: [email protected]
> >> To: [email protected]
> >> Subject: KMeansDriver and distributed cache
> >> Date: Thu, 19 Dec 2013 17:05:26 -0800
> >> 
> >> Hi All,
> >> I am trying to execute the following command:
> >> 
> >> hadoop jar /apps/analytics/myanalytics.jar 
> >> myanalytics.SimpleKMeansClustering -libjars 
> >> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar 
> >> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
> >> 
> >> I have call the following method in my SimpleKMeansClustering class:
> >> 
> >>            KMeansDriver.run(conf, new Path("/scratch/dummyvector.seq"), 
> >> new Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
> >>                             new Path("/scratch/dummyvectoroutput"), new 
> >> EuclideanDistanceMeasure(), 0.001, 10,
> >>                             true, 1.0, false);
> >> 
> >> 
> >> I unfortunately get the following error, In think somehow the jars are not 
> >> made available in the distributed cached. I use Vectors to repreent my 
> >> data and I write it to a sequence file. I then use that Driver to analyze 
> >> that in the mapreduce mode. I think locally all the required jar files are 
> >> available, however somehow in the mapreduce mode they are not available. 
> >> Any help with this would be great!
> >> 
> >> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input: 
> >> /scratch/dummyvector.seq Clusters In: 
> >> /scratch/dummyvector-initclusters/part-randomSeed Out: 
> >> /scratch/dummyvectoroutput Distance: 
> >> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> >> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max 
> >> Iterations: 10
> >> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop 
> >> library
> >> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & initialized 
> >> native-zlib library
> >> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> >> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for 
> >> parsing the arguments. Applications should implement Tool for the same.
> >> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to process 
> >> : 1
> >> 13/12/19 16:59:03 INFO mapred.JobClient: Running job: job_201311111627_0310
> >> 13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%
> >> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id : 
> >> attempt_201311111627_0310_m_000000_0, Status : FAILED
> >> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> >>    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> >>    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> >>    at java.security.AccessController.doPrivileged(Native Method)
> >>    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> >>    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> >>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> >>    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> >>    at java.lang.Class.forName0(Native Method)
> >>    at java.lang.Class.forName(Class.java:264)
> >>    at 
> >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
> >>    at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
> >>    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
> >>    at 
> >> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
> >>    at 
> >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
> >>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> >>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >>    at java.security.AccessController.doPrivileged(Native Method)
> >>    at javax.security.auth.Subject.doAs(Subject.java:415)
> >>    at 
> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> >>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> 
> >> 13/12/19 16:59:28 INFO mapred.JobClient: Task Id : 
> >> attempt_201311111627_0310_m_000000_1, Status : FAILED
> >> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> >>    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> >>    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> >>    at java.security.AccessController.doPrivileged(Native Method)
> >>    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> >>    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> >>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> >>    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> >>    at java.lang.Class.forName0(Native Method)
> >>    at java.lang.Class.forName(Class.java:264)
> >>    at 
> >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
> >>    at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
> >>    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
> >>    at 
> >> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
> >>    at 
> >> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
> >>    at 
> >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
> >>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> >>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >>    at java.security.AccessController.doPrivileged(Native Method)
> >>    at javax.security.auth.Subject.doAs(Subject.java:415)
> >>    at 
> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> >>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> 
> >>                                      
> >                                       
> 
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
> 
> 
> 
> 
> 
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
> 
> 
> 
> 
>

RE: KMeansDriver and distributed cache

Reply via email to