Hello, I'm trying to run the ASF Email example here: https://cwiki.apache.org/confluence/display/MAHOUT/ASFEmail
I am using an existing Hive/Hadoop cluster. When I run: $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job I get: MAHOUT-JOB: /usr/local/mahout-0.8/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar 12/11/08 12:13:54 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only 12/11/08 12:13:54 INFO kmeans.Job: Running with default arguments 12/11/08 12:13:55 INFO FileSystem.collect: makeAbsolute: output working directory: hdfs://my_cluster:my_port/ 12/11/08 12:13:55 INFO kmeans.Job: Preparing Input 12/11/08 12:13:55 INFO FileSystem.collect: make Qualify non absolute path: testdata working directory: dfs://cluster:port_num/ 12/11/08 12:13:55 INFO corona.SessionDriver: My serverSocketPort port_num 12/11/08 12:13:55 INFO corona.SessionDriver: My Address ip_addrs:port_num 12/11/08 12:13:55 INFO corona.SessionDriver: Connecting to cluster manager at data_manager:port_num 12/11/08 12:13:55 INFO corona.SessionDriver: Got session ID 201211051809.387193 12/11/08 12:13:55 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/11/08 12:13:56 INFO FileSystem.collect: makeAbsolute: output/data working directory: dfs://cluster:port_num/ 12/11/08 12:13:56 INFO input.FileInputFormat: Total input paths to process : 1 12/11/08 12:13:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 12/11/08 12:13:56 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any of the parent directories): .git] 12/11/08 12:13:57 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main got an uncaught exception java.io.IOException: InvalidSessionHandle(handle:This cluster is operating in configured pools only mode. The pool group and pool was specified as 'default.defaultpool' and is not part of this cluster. Please use the Corona parameter mapred.fairscheduler.pool to set a valid pool group and pool in the format <poolgroup>.<pool>) at org.apache.hadoop.corona.SessionDriver.startSession(SessionDriver.java:275) at org.apache.hadoop.mapred.CoronaJobTracker.startFullTracker(CoronaJobTracker.java:670) at org.apache.hadoop.mapred.CoronaJobTracker.submitJob(CoronaJobTracker.java:1898) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:1259) at org.apache.hadoop.mapreduce.Job.submit(Job.java:459) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:474) at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:129) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) My question is: How do I configure Mahout to use pools? That is, where do I set the Corona "mapred.fairscheduler.pool" JobConf?
