Submitting mahout jobs to map/reduce cluster with fair scheduling

Yazan Boshmaf Thu, 08 Nov 2012 12:42:16 -0800

Hello,

I'm trying to run the ASF Email example here:
https://cwiki.apache.org/confluence/display/MAHOUT/ASFEmail


I am using an existing Hive/Hadoop cluster.

When I run:

$MAHOUT_HOME/bin/mahout
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

I get:

MAHOUT-JOB:
/usr/local/mahout-0.8/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
12/11/08 12:13:54 WARN driver.MahoutDriver: No
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
classpath, will use command-line arguments only
12/11/08 12:13:54 INFO kmeans.Job: Running with default arguments
12/11/08 12:13:55 INFO FileSystem.collect: makeAbsolute: output working
directory: hdfs://my_cluster:my_port/
12/11/08 12:13:55 INFO kmeans.Job: Preparing Input
12/11/08 12:13:55 INFO FileSystem.collect: make Qualify non absolute path:
testdata working directory: dfs://cluster:port_num/
12/11/08 12:13:55 INFO corona.SessionDriver: My serverSocketPort port_num
12/11/08 12:13:55 INFO corona.SessionDriver: My Address ip_addrs:port_num
12/11/08 12:13:55 INFO corona.SessionDriver: Connecting to cluster manager
at data_manager:port_num
12/11/08 12:13:55 INFO corona.SessionDriver: Got session ID
201211051809.387193
12/11/08 12:13:55 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
12/11/08 12:13:56 INFO FileSystem.collect: makeAbsolute: output/data
working directory: dfs://cluster:port_num/
12/11/08 12:13:56 INFO input.FileInputFormat: Total input paths to process
: 1
12/11/08 12:13:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
12/11/08 12:13:56 INFO lzo.LzoCodec: Successfully loaded & initialized
native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any of
the parent directories): .git]
12/11/08 12:13:57 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main got
an uncaught exception
java.io.IOException: InvalidSessionHandle(handle:This cluster is operating
in configured pools only mode.  The pool group and pool was specified as
'default.defaultpool' and is not part of this cluster.  Please use the
Corona parameter mapred.fairscheduler.pool to set a valid pool group and
pool in the format <poolgroup>.<pool>)
at
org.apache.hadoop.corona.SessionDriver.startSession(SessionDriver.java:275)
at
org.apache.hadoop.mapred.CoronaJobTracker.startFullTracker(CoronaJobTracker.java:670)
at
org.apache.hadoop.mapred.CoronaJobTracker.submitJob(CoronaJobTracker.java:1898)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:1259)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:459)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:474)
at
org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108)
at
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:129)
at
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

My question is: How do I configure Mahout to use pools? That is, where do I
set the Corona "mapred.fairscheduler.pool" JobConf?

Submitting mahout jobs to map/reduce cluster with fair scheduling

Reply via email to