According to http://www.java2s.com/Code/Jar/m/Downloadmahoutmath07jar.htm , Vector class is in mahout-math- jar. You can pass mahout-math- jar location through "tmpjars" config as shown below.
Cheers On Fri, Jan 9, 2015 at 7:59 AM, mw <[email protected]> wrote: > Ty, i figured out that the jar file that the CVBODriver of mahout sents > to yarn does not contain that class org.apache.mahout.math.Vector. > Is there a way to include it into the jar throught the config or maybe > another way? > > Best, > Max > > > On 01/09/2015 04:23 PM, Ted Yu wrote: > > Pardon me, setJarByClass() should point to your class. > You can pass comma separated list of jars (including mahout jar) this way: > > conf.set("tmpjars", list-of-jars); > > Cheers > > On Fri, Jan 9, 2015 at 7:17 AM, Ted Yu <[email protected]> wrote: > >> In your driver, you can call: >> >> job.setJarByClass(theclass.class); >> >> where theclass can be one of the classes in mahout jar. >> >> FYI >> >> On Fri, Jan 9, 2015 at 1:30 AM, mw <[email protected]> wrote: >> >>> Hello, >>> >>> i am working on a web application that should execute lda on a external >>> yarn cluster. >>> >>> I am uploading all the relevant sequence files onto the yarn cluter. >>> This is how it try to remotely execute lda on the cluster. >>> >>> try { >>> ugi.doAs(new PrivilegedExceptionAction<Void>() { >>> public Void run() throws Exception { >>> Configuration hdoopConf = new Configuration(); >>> hdoopConf.set("fs.defaultFS", >>> "hdfs://xxx.xxx.xxx.xxx:9000/user/xx"); >>> hdoopConf.set("yarn.resourcemanager.hostname", >>> "xxx.xxx.xxx.xxx"); >>> hdoopConf.set("mapreduce.framework.name", "yarn"); >>> hdoopConf.set("mapred.framework.name", "yarn"); >>> hdoopConf.set("mapred.job.tracker", >>> "xxx.xxx.xxx.xxx"); >>> hdoopConf.set("dfs.permissions.enabled", "false"); >>> hdoopConf.set("hadoop.job.ugi", "xx"); >>> hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" ); >>> CVB0Driver driver = new CVB0Driver(); >>> try { >>> driver.run(hdoopConf, >>> sparseVectorIn.suffix("/matrix"), >>> topicsOut, k, numTerms, >>> doc_topic_smoothening, term_topic_smoothening, >>> maxIter, iteration_block_size, >>> convergenceDelta, >>> sparseVectorIn.suffix("/dictionary.file-0"), topicsOut.suffix(" >>> */DocumentTopics/*"), sparseVectorIn, >>> seed, testFraction, numTrainThreads, >>> numUpdateThreads, maxItersPerDoc, >>> numReduceTasks, backfillPerplexity); >>> } catch (ClassNotFoundException e) { >>> e.printStackTrace(); >>> } catch (InterruptedException e) { >>> e.printStackTrace(); >>> } >>> return null; >>> } >>> }); >>> } catch (InterruptedException e) { >>> e.printStackTrace(); >>> } >>> >>> I am getting the following error message: >>> >>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:344) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) >>> at >>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) >>> at >>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) >>> at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) >>> at >>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >>> >>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:344) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) >>> at >>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) >>> at >>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) >>> at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) >>> at >>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >>> >>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:344) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) >>> at >>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) >>> at >>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) >>> at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) >>> at >>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >>> >>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:344) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) >>> at >>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) >>> at >>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) >>> at >>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) >>> at >>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) >>> at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) >>> at >>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >>> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) >>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) >>> >>> java.lang.InterruptedException: Failed to complete iteration 1 stage 1 >>> at >>> org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502) >>> at >>> org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319) >>> ... >>> >>> I was told on the user-mahout mailing list that the lda jar mahout sents >>> to yarn should contain all relevant classes. Any idea why this error is >>> thrown nevertheless? >>> >>> Best, >>> Max >>> >> >> > >
