Re: Submitting mahout jobs to map/reduce cluster with fair scheduling

Sean Owen Thu, 08 Nov 2012 23:29:15 -0800

Is this not another case where the -D arguments have to be passed
separately to the Java process, not with program arguments? Try setting
these in MAHOUT_OPTS.



On Fri, Nov 9, 2012 at 5:10 AM, Yazan Boshmaf <[email protected]> wrote:

> Hi Jeff,
>
> I tried running:
>
> $MAHOUT_HOME/bin/mahout
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -t1 0.1 -t2
> 0.00001 -x -Dmapred.input.dir=testdata -Dmapred.output.dir=output
> -Dmapred.fairscheduler.pool=my_group.my_pool
>
> But i still endup with the same error. The other arguments are parsed as
> shown by
>
> 12/11/08 21:00:38 INFO kmeans.Job: Running with only user-supplied
> arguments
> 12/11/08 21:00:38 INFO common.AbstractJob: Command line arguments:
> {--convergenceDelta=[0.5],
>
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> --endPhase=[2147483647], --maxIter=[-1], --startPhase=[0], --t1=[0.1],
> --t2=[0.00001], --tempDir=[temp]}
> 12/11/08 21:00:38 INFO kmeans.Job: Preparing Input
>
> And the job gets a session
>
> 12/11/08 21:00:39 INFO corona.SessionDriver: Got session ID
> 201211051809.443899
>
> Then there is this interesting warning for the generic options (which
> includes the -D for the JobClient)
>
> 12/11/08 21:00:39 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
>
> Interestingly, the HFS input/output argument are correctly parsed, as shown
> by
>
> 12/11/08 21:00:40 INFO FileSystem.collect: makeAbsolute: output/data
> working directory: hdfs://my_cluster:my_port/absolute_path
> 12/11/08 21:00:40 INFO input.FileInputFormat: Total input paths to process
> : 1
>
> But I still get
>
> 12/11/08 21:00:43 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main got
> an uncaught exception
> java.io.IOException: InvalidSessionHandle(handle:This cluster is operating
> in configured pools only mode.  The pool group and pool was specified as
> 'default.defaultpool' and is not part of this cluster.  Please use the
> Corona parameter mapred.fairscheduler.pool to set a valid pool group and
> pool in the format <poolgroup>.<pool>)
> at
> org.apache.hadoop.corona.SessionDriver.startSession(SessionDriver.java:275)
> ...
>
> And thoughts on this?
>
> Regards,
> Yazan
>
>
>
> On Thu, Nov 8, 2012 at 5:11 PM, Jeff Eastman <[email protected]
> >wrote:
>
> > That Job extends org.apache.mahout.common.**AbstractJob, so it probably
> > will accept a -D argument to set "mapred.fairscheduler.pool=...**" . Have
> > you tried this?
> >
> >
> >
> > On 11/8/12 3:41 PM, Yazan Boshmaf wrote:
> >
> >> Hello,
> >>
> >> I'm trying to run the ASF Email example here:
> >> https://cwiki.apache.org/**confluence/display/MAHOUT/**ASFEmail<
> https://cwiki.apache.org/confluence/display/MAHOUT/ASFEmail>
> >>
> >> I am using an existing Hive/Hadoop cluster.
> >>
> >> When I run:
> >>
> >> $MAHOUT_HOME/bin/mahout
> >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job
> >>
> >> I get:
> >>
> >> MAHOUT-JOB:
> >> /usr/local/mahout-0.8/trunk/**examples/target/mahout-**
> >> examples-0.8-SNAPSHOT-job.jar
> >> 12/11/08 12:13:54 WARN driver.MahoutDriver: No
> >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**props found
> >> on
> >> classpath, will use command-line arguments only
> >> 12/11/08 12:13:54 INFO kmeans.Job: Running with default arguments
> >> 12/11/08 12:13:55 INFO FileSystem.collect: makeAbsolute: output working
> >> directory: hdfs://my_cluster:my_port/
> >> 12/11/08 12:13:55 INFO kmeans.Job: Preparing Input
> >> 12/11/08 12:13:55 INFO FileSystem.collect: make Qualify non absolute
> path:
> >> testdata working directory: dfs://cluster:port_num/
> >> 12/11/08 12:13:55 INFO corona.SessionDriver: My serverSocketPort
> port_num
> >> 12/11/08 12:13:55 INFO corona.SessionDriver: My Address
> ip_addrs:port_num
> >> 12/11/08 12:13:55 INFO corona.SessionDriver: Connecting to cluster
> manager
> >> at data_manager:port_num
> >> 12/11/08 12:13:55 INFO corona.SessionDriver: Got session ID
> >> 201211051809.387193
> >> 12/11/08 12:13:55 WARN mapred.JobClient: Use GenericOptionsParser for
> >> parsing the arguments. Applications should implement Tool for the same.
> >> 12/11/08 12:13:56 INFO FileSystem.collect: makeAbsolute: output/data
> >> working directory: dfs://cluster:port_num/
> >> 12/11/08 12:13:56 INFO input.FileInputFormat: Total input paths to
> process
> >> : 1
> >> 12/11/08 12:13:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl
> library
> >> 12/11/08 12:13:56 INFO lzo.LzoCodec: Successfully loaded & initialized
> >> native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any
> of
> >> the parent directories): .git]
> >> 12/11/08 12:13:57 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main
> got
> >> an uncaught exception
> >> java.io.IOException: InvalidSessionHandle(handle:**This cluster is
> >> operating
> >> in configured pools only mode.  The pool group and pool was specified as
> >> 'default.defaultpool' and is not part of this cluster.  Please use the
> >> Corona parameter mapred.fairscheduler.pool to set a valid pool group and
> >> pool in the format <poolgroup>.<pool>)
> >> at
> >> org.apache.hadoop.corona.**SessionDriver.startSession(**
> >> SessionDriver.java:275)
> >> at
> >> org.apache.hadoop.mapred.**CoronaJobTracker.**startFullTracker(**
> >> CoronaJobTracker.java:670)
> >> at
> >> org.apache.hadoop.mapred.**CoronaJobTracker.submitJob(**
> >> CoronaJobTracker.java:1898)
> >> at org.apache.hadoop.mapred.**JobClient.submitJobInternal(**
> >> JobClient.java:1259)
> >> at org.apache.hadoop.mapreduce.**Job.submit(Job.java:459)
> >> at org.apache.hadoop.mapreduce.**Job.waitForCompletion(Job.**java:474)
> >> at
> >> org.apache.mahout.clustering.**conversion.InputDriver.runJob(**
> >> InputDriver.java:108)
> >> at
> >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**
> >> run(Job.java:129)
> >> at
> >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**
> >> main(Job.java:59)
> >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
> >> at
> >> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
> >> NativeMethodAccessorImpl.java:**39)
> >> at
> >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> >> DelegatingMethodAccessorImpl.**java:25)
> >> at java.lang.reflect.Method.**invoke(Method.java:597)
> >> at
> >> org.apache.hadoop.util.**ProgramDriver$**ProgramDescription.invoke(**
> >> ProgramDriver.java:68)
> >> at org.apache.hadoop.util.**ProgramDriver.driver(**
> >> ProgramDriver.java:139)
> >> at org.apache.mahout.driver.**MahoutDriver.main(**MahoutDriver.java:195)
> >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
> >> at
> >> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
> >> NativeMethodAccessorImpl.java:**39)
> >> at
> >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> >> DelegatingMethodAccessorImpl.**java:25)
> >> at java.lang.reflect.Method.**invoke(Method.java:597)
> >> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156)
> >>
> >> My question is: How do I configure Mahout to use pools? That is, where
> do
> >> I
> >> set the Corona "mapred.fairscheduler.pool" JobConf?
> >>
> >>
> >
>

Re: Submitting mahout jobs to map/reduce cluster with fair scheduling

Reply via email to