I got it working. I took me a moment to figure out that tmpjars is
actually an attribute :)
Thank you very much!
Best,
Max
On 01/09/2015 04:23 PM, Ted Yu wrote:
Pardon me, setJarByClass() should point to your class.
You can pass comma separated list of jars (including mahout jar) this way:
conf.set("tmpjars", list-of-jars);
Cheers
On Fri, Jan 9, 2015 at 7:17 AM, Ted Yu <[email protected]
<mailto:[email protected]>> wrote:
In your driver, you can call:
job.setJarByClass(theclass.class);
where theclass can be one of the classes in mahout jar.
FYI
On Fri, Jan 9, 2015 at 1:30 AM, mw <[email protected]
<mailto:[email protected]>> wrote:
Hello,
i am working on a web application that should execute lda on a
external yarn cluster.
I am uploading all the relevant sequence files onto the yarn
cluter.
This is how it try to remotely execute lda on the cluster.
try {
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
Configuration hdoopConf = new
Configuration();
hdoopConf.set("fs.defaultFS",
"hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
hdoopConf.set("yarn.resourcemanager.hostname",
"xxx.xxx.xxx.xxx");
hdoopConf.set("mapreduce.framework.name
<http://mapreduce.framework.name>", "yarn");
hdoopConf.set("mapred.framework.name
<http://mapred.framework.name>", "yarn");
hdoopConf.set("mapred.job.tracker", "xxx.xxx.xxx.xxx");
hdoopConf.set("dfs.permissions.enabled", "false");
hdoopConf.set("hadoop.job.ugi", "xx");
hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020"
);
CVB0Driver driver = new CVB0Driver();
try {
driver.run(hdoopConf,
sparseVectorIn.suffix("/matrix"),
topicsOut, k, numTerms,
doc_topic_smoothening, term_topic_smoothening,
maxIter, iteration_block_size,
convergenceDelta,
sparseVectorIn.suffix("/dictionary.file-0"),
topicsOut.suffix("//DocumentTopics//"), sparseVectorIn,
seed, testFraction,
numTrainThreads, numUpdateThreads, maxItersPerDoc,
numReduceTasks,
backfillPerplexity);
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
return null;
}
});
} catch (InterruptedException e) {
e.printStackTrace();
}
I am getting the following error message:
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at
org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at
org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at
org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at
org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
java.lang.InterruptedException: Failed to complete iteration 1
stage 1
at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319)
...
I was told on the user-mahout mailing list that the lda jar
mahout sents to yarn should contain all relevant classes. Any
idea why this error is thrown nevertheless?
Best,
Max