Re: Submitting mahout jobs to map/reduce cluster with fair scheduling

Yazan Boshmaf Sat, 10 Nov 2012 16:07:30 -0800

Thanks, Sean.

So I added the line:


MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.fairscheduler.pool=si.highpri_pipelines"

to $MAHOUT_HOME/bin/mahout and then issued

$MAHOUT_HOME/bin/mahout
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

but I still ended up with the same error. Moreover, I am still getting this
annoying NoClassDefFoundError. How can I fix it? Any thoughts on the two
issues?

.
.
.
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Exception in thread "main" java.lang.NoClassDefFoundError: classpath
Caused by: java.lang.ClassNotFoundException: classpath
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
Could not find the main class: classpath.  Program will exit.
Running on hadoop, using /mnt/vol/hadoop/bin/hadoop and
HADOOP_CONF_DIR=/mnt/vol/hadoop/conf/
MAHOUT-JOB:
/usr/local/mahout-0.8/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/11/10 15:48:14 WARN driver.MahoutDriver: No
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
classpath, will use command-line arguments only
12/11/10 15:48:14 INFO kmeans.Job: Running with default arguments
.
.
.


On Thu, Nov 8, 2012 at 11:28 PM, Sean Owen <[email protected]> wrote:

> Is this not another case where the -D arguments have to be passed
> separately to the Java process, not with program arguments? Try setting
> these in MAHOUT_OPTS.
>
>
> On Fri, Nov 9, 2012 at 5:10 AM, Yazan Boshmaf <[email protected]> wrote:
>
> > Hi Jeff,
> >
> > I tried running:
> >
> > $MAHOUT_HOME/bin/mahout
> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -t1 0.1 -t2
> > 0.00001 -x -Dmapred.input.dir=testdata -Dmapred.output.dir=output
> > -Dmapred.fairscheduler.pool=my_group.my_pool
> >
> > But i still endup with the same error. The other arguments are parsed as
> > shown by
> >
> > 12/11/08 21:00:38 INFO kmeans.Job: Running with only user-supplied
> > arguments
> > 12/11/08 21:00:38 INFO common.AbstractJob: Command line arguments:
> > {--convergenceDelta=[0.5],
> >
> >
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> > --endPhase=[2147483647], --maxIter=[-1], --startPhase=[0], --t1=[0.1],
> > --t2=[0.00001], --tempDir=[temp]}
> > 12/11/08 21:00:38 INFO kmeans.Job: Preparing Input
> >
> > And the job gets a session
> >
> > 12/11/08 21:00:39 INFO corona.SessionDriver: Got session ID
> > 201211051809.443899
> >
> > Then there is this interesting warning for the generic options (which
> > includes the -D for the JobClient)
> >
> > 12/11/08 21:00:39 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> >
> > Interestingly, the HFS input/output argument are correctly parsed, as
> shown
> > by
> >
> > 12/11/08 21:00:40 INFO FileSystem.collect: makeAbsolute: output/data
> > working directory: hdfs://my_cluster:my_port/absolute_path
> > 12/11/08 21:00:40 INFO input.FileInputFormat: Total input paths to
> process
> > : 1
> >
> > But I still get
> >
> > 12/11/08 21:00:43 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main
> got
> > an uncaught exception
> > java.io.IOException: InvalidSessionHandle(handle:This cluster is
> operating
> > in configured pools only mode.  The pool group and pool was specified as
> > 'default.defaultpool' and is not part of this cluster.  Please use the
> > Corona parameter mapred.fairscheduler.pool to set a valid pool group and
> > pool in the format <poolgroup>.<pool>)
> > at
> >
> org.apache.hadoop.corona.SessionDriver.startSession(SessionDriver.java:275)
> > ...
> >
> > And thoughts on this?
> >
> > Regards,
> > Yazan
> >
> >
> >
> > On Thu, Nov 8, 2012 at 5:11 PM, Jeff Eastman <[email protected]
> > >wrote:
> >
> > > That Job extends org.apache.mahout.common.**AbstractJob, so it probably
> > > will accept a -D argument to set "mapred.fairscheduler.pool=...**" .
> Have
> > > you tried this?
> > >
> > >
> > >
> > > On 11/8/12 3:41 PM, Yazan Boshmaf wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm trying to run the ASF Email example here:
> > >> https://cwiki.apache.org/**confluence/display/MAHOUT/**ASFEmail<
> > https://cwiki.apache.org/confluence/display/MAHOUT/ASFEmail>
> > >>
> > >> I am using an existing Hive/Hadoop cluster.
> > >>
> > >> When I run:
> > >>
> > >> $MAHOUT_HOME/bin/mahout
> > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job
> > >>
> > >> I get:
> > >>
> > >> MAHOUT-JOB:
> > >> /usr/local/mahout-0.8/trunk/**examples/target/mahout-**
> > >> examples-0.8-SNAPSHOT-job.jar
> > >> 12/11/08 12:13:54 WARN driver.MahoutDriver: No
> > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**props
> found
> > >> on
> > >> classpath, will use command-line arguments only
> > >> 12/11/08 12:13:54 INFO kmeans.Job: Running with default arguments
> > >> 12/11/08 12:13:55 INFO FileSystem.collect: makeAbsolute: output
> working
> > >> directory: hdfs://my_cluster:my_port/
> > >> 12/11/08 12:13:55 INFO kmeans.Job: Preparing Input
> > >> 12/11/08 12:13:55 INFO FileSystem.collect: make Qualify non absolute
> > path:
> > >> testdata working directory: dfs://cluster:port_num/
> > >> 12/11/08 12:13:55 INFO corona.SessionDriver: My serverSocketPort
> > port_num
> > >> 12/11/08 12:13:55 INFO corona.SessionDriver: My Address
> > ip_addrs:port_num
> > >> 12/11/08 12:13:55 INFO corona.SessionDriver: Connecting to cluster
> > manager
> > >> at data_manager:port_num
> > >> 12/11/08 12:13:55 INFO corona.SessionDriver: Got session ID
> > >> 201211051809.387193
> > >> 12/11/08 12:13:55 WARN mapred.JobClient: Use GenericOptionsParser for
> > >> parsing the arguments. Applications should implement Tool for the
> same.
> > >> 12/11/08 12:13:56 INFO FileSystem.collect: makeAbsolute: output/data
> > >> working directory: dfs://cluster:port_num/
> > >> 12/11/08 12:13:56 INFO input.FileInputFormat: Total input paths to
> > process
> > >> : 1
> > >> 12/11/08 12:13:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl
> > library
> > >> 12/11/08 12:13:56 INFO lzo.LzoCodec: Successfully loaded & initialized
> > >> native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any
> > of
> > >> the parent directories): .git]
> > >> 12/11/08 12:13:57 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main
> > got
> > >> an uncaught exception
> > >> java.io.IOException: InvalidSessionHandle(handle:**This cluster is
> > >> operating
> > >> in configured pools only mode.  The pool group and pool was specified
> as
> > >> 'default.defaultpool' and is not part of this cluster.  Please use the
> > >> Corona parameter mapred.fairscheduler.pool to set a valid pool group
> and
> > >> pool in the format <poolgroup>.<pool>)
> > >> at
> > >> org.apache.hadoop.corona.**SessionDriver.startSession(**
> > >> SessionDriver.java:275)
> > >> at
> > >> org.apache.hadoop.mapred.**CoronaJobTracker.**startFullTracker(**
> > >> CoronaJobTracker.java:670)
> > >> at
> > >> org.apache.hadoop.mapred.**CoronaJobTracker.submitJob(**
> > >> CoronaJobTracker.java:1898)
> > >> at org.apache.hadoop.mapred.**JobClient.submitJobInternal(**
> > >> JobClient.java:1259)
> > >> at org.apache.hadoop.mapreduce.**Job.submit(Job.java:459)
> > >> at org.apache.hadoop.mapreduce.**Job.waitForCompletion(Job.**java:474)
> > >> at
> > >> org.apache.mahout.clustering.**conversion.InputDriver.runJob(**
> > >> InputDriver.java:108)
> > >> at
> > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**
> > >> run(Job.java:129)
> > >> at
> > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**
> > >> main(Job.java:59)
> > >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
> > >> at
> > >> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
> > >> NativeMethodAccessorImpl.java:**39)
> > >> at
> > >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> > >> DelegatingMethodAccessorImpl.**java:25)
> > >> at java.lang.reflect.Method.**invoke(Method.java:597)
> > >> at
> > >> org.apache.hadoop.util.**ProgramDriver$**ProgramDescription.invoke(**
> > >> ProgramDriver.java:68)
> > >> at org.apache.hadoop.util.**ProgramDriver.driver(**
> > >> ProgramDriver.java:139)
> > >> at
> org.apache.mahout.driver.**MahoutDriver.main(**MahoutDriver.java:195)
> > >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
> > >> at
> > >> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
> > >> NativeMethodAccessorImpl.java:**39)
> > >> at
> > >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> > >> DelegatingMethodAccessorImpl.**java:25)
> > >> at java.lang.reflect.Method.**invoke(Method.java:597)
> > >> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156)
> > >>
> > >> My question is: How do I configure Mahout to use pools? That is, where
> > do
> > >> I
> > >> set the Corona "mapred.fairscheduler.pool" JobConf?
> > >>
> > >>
> > >
> >
>

Re: Submitting mahout jobs to map/reduce cluster with fair scheduling

Reply via email to