Is this not another case where the -D arguments have to be passed separately to the Java process, not with program arguments? Try setting these in MAHOUT_OPTS.
On Fri, Nov 9, 2012 at 5:10 AM, Yazan Boshmaf <[email protected]> wrote: > Hi Jeff, > > I tried running: > > $MAHOUT_HOME/bin/mahout > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -t1 0.1 -t2 > 0.00001 -x -Dmapred.input.dir=testdata -Dmapred.output.dir=output > -Dmapred.fairscheduler.pool=my_group.my_pool > > But i still endup with the same error. The other arguments are parsed as > shown by > > 12/11/08 21:00:38 INFO kmeans.Job: Running with only user-supplied > arguments > 12/11/08 21:00:38 INFO common.AbstractJob: Command line arguments: > {--convergenceDelta=[0.5], > > --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], > --endPhase=[2147483647], --maxIter=[-1], --startPhase=[0], --t1=[0.1], > --t2=[0.00001], --tempDir=[temp]} > 12/11/08 21:00:38 INFO kmeans.Job: Preparing Input > > And the job gets a session > > 12/11/08 21:00:39 INFO corona.SessionDriver: Got session ID > 201211051809.443899 > > Then there is this interesting warning for the generic options (which > includes the -D for the JobClient) > > 12/11/08 21:00:39 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > Interestingly, the HFS input/output argument are correctly parsed, as shown > by > > 12/11/08 21:00:40 INFO FileSystem.collect: makeAbsolute: output/data > working directory: hdfs://my_cluster:my_port/absolute_path > 12/11/08 21:00:40 INFO input.FileInputFormat: Total input paths to process > : 1 > > But I still get > > 12/11/08 21:00:43 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main got > an uncaught exception > java.io.IOException: InvalidSessionHandle(handle:This cluster is operating > in configured pools only mode. The pool group and pool was specified as > 'default.defaultpool' and is not part of this cluster. Please use the > Corona parameter mapred.fairscheduler.pool to set a valid pool group and > pool in the format <poolgroup>.<pool>) > at > org.apache.hadoop.corona.SessionDriver.startSession(SessionDriver.java:275) > ... > > And thoughts on this? > > Regards, > Yazan > > > > On Thu, Nov 8, 2012 at 5:11 PM, Jeff Eastman <[email protected] > >wrote: > > > That Job extends org.apache.mahout.common.**AbstractJob, so it probably > > will accept a -D argument to set "mapred.fairscheduler.pool=...**" . Have > > you tried this? > > > > > > > > On 11/8/12 3:41 PM, Yazan Boshmaf wrote: > > > >> Hello, > >> > >> I'm trying to run the ASF Email example here: > >> https://cwiki.apache.org/**confluence/display/MAHOUT/**ASFEmail< > https://cwiki.apache.org/confluence/display/MAHOUT/ASFEmail> > >> > >> I am using an existing Hive/Hadoop cluster. > >> > >> When I run: > >> > >> $MAHOUT_HOME/bin/mahout > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job > >> > >> I get: > >> > >> MAHOUT-JOB: > >> /usr/local/mahout-0.8/trunk/**examples/target/mahout-** > >> examples-0.8-SNAPSHOT-job.jar > >> 12/11/08 12:13:54 WARN driver.MahoutDriver: No > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**props found > >> on > >> classpath, will use command-line arguments only > >> 12/11/08 12:13:54 INFO kmeans.Job: Running with default arguments > >> 12/11/08 12:13:55 INFO FileSystem.collect: makeAbsolute: output working > >> directory: hdfs://my_cluster:my_port/ > >> 12/11/08 12:13:55 INFO kmeans.Job: Preparing Input > >> 12/11/08 12:13:55 INFO FileSystem.collect: make Qualify non absolute > path: > >> testdata working directory: dfs://cluster:port_num/ > >> 12/11/08 12:13:55 INFO corona.SessionDriver: My serverSocketPort > port_num > >> 12/11/08 12:13:55 INFO corona.SessionDriver: My Address > ip_addrs:port_num > >> 12/11/08 12:13:55 INFO corona.SessionDriver: Connecting to cluster > manager > >> at data_manager:port_num > >> 12/11/08 12:13:55 INFO corona.SessionDriver: Got session ID > >> 201211051809.387193 > >> 12/11/08 12:13:55 WARN mapred.JobClient: Use GenericOptionsParser for > >> parsing the arguments. Applications should implement Tool for the same. > >> 12/11/08 12:13:56 INFO FileSystem.collect: makeAbsolute: output/data > >> working directory: dfs://cluster:port_num/ > >> 12/11/08 12:13:56 INFO input.FileInputFormat: Total input paths to > process > >> : 1 > >> 12/11/08 12:13:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl > library > >> 12/11/08 12:13:56 INFO lzo.LzoCodec: Successfully loaded & initialized > >> native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any > of > >> the parent directories): .git] > >> 12/11/08 12:13:57 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main > got > >> an uncaught exception > >> java.io.IOException: InvalidSessionHandle(handle:**This cluster is > >> operating > >> in configured pools only mode. The pool group and pool was specified as > >> 'default.defaultpool' and is not part of this cluster. Please use the > >> Corona parameter mapred.fairscheduler.pool to set a valid pool group and > >> pool in the format <poolgroup>.<pool>) > >> at > >> org.apache.hadoop.corona.**SessionDriver.startSession(** > >> SessionDriver.java:275) > >> at > >> org.apache.hadoop.mapred.**CoronaJobTracker.**startFullTracker(** > >> CoronaJobTracker.java:670) > >> at > >> org.apache.hadoop.mapred.**CoronaJobTracker.submitJob(** > >> CoronaJobTracker.java:1898) > >> at org.apache.hadoop.mapred.**JobClient.submitJobInternal(** > >> JobClient.java:1259) > >> at org.apache.hadoop.mapreduce.**Job.submit(Job.java:459) > >> at org.apache.hadoop.mapreduce.**Job.waitForCompletion(Job.**java:474) > >> at > >> org.apache.mahout.clustering.**conversion.InputDriver.runJob(** > >> InputDriver.java:108) > >> at > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.** > >> run(Job.java:129) > >> at > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.** > >> main(Job.java:59) > >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method) > >> at > >> sun.reflect.**NativeMethodAccessorImpl.**invoke(** > >> NativeMethodAccessorImpl.java:**39) > >> at > >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** > >> DelegatingMethodAccessorImpl.**java:25) > >> at java.lang.reflect.Method.**invoke(Method.java:597) > >> at > >> org.apache.hadoop.util.**ProgramDriver$**ProgramDescription.invoke(** > >> ProgramDriver.java:68) > >> at org.apache.hadoop.util.**ProgramDriver.driver(** > >> ProgramDriver.java:139) > >> at org.apache.mahout.driver.**MahoutDriver.main(**MahoutDriver.java:195) > >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method) > >> at > >> sun.reflect.**NativeMethodAccessorImpl.**invoke(** > >> NativeMethodAccessorImpl.java:**39) > >> at > >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** > >> DelegatingMethodAccessorImpl.**java:25) > >> at java.lang.reflect.Method.**invoke(Method.java:597) > >> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156) > >> > >> My question is: How do I configure Mahout to use pools? That is, where > do > >> I > >> set the Corona "mapred.fairscheduler.pool" JobConf? > >> > >> > > >
