Thanks, Sean. So I added the line:
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.fairscheduler.pool=si.highpri_pipelines" to $MAHOUT_HOME/bin/mahout and then issued $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job but I still ended up with the same error. Moreover, I am still getting this annoying NoClassDefFoundError. How can I fix it? Any thoughts on the two issues? . . . MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Exception in thread "main" java.lang.NoClassDefFoundError: classpath Caused by: java.lang.ClassNotFoundException: classpath at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) Could not find the main class: classpath. Program will exit. Running on hadoop, using /mnt/vol/hadoop/bin/hadoop and HADOOP_CONF_DIR=/mnt/vol/hadoop/conf/ MAHOUT-JOB: /usr/local/mahout-0.8/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar 12/11/10 15:48:14 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only 12/11/10 15:48:14 INFO kmeans.Job: Running with default arguments . . . On Thu, Nov 8, 2012 at 11:28 PM, Sean Owen <[email protected]> wrote: > Is this not another case where the -D arguments have to be passed > separately to the Java process, not with program arguments? Try setting > these in MAHOUT_OPTS. > > > On Fri, Nov 9, 2012 at 5:10 AM, Yazan Boshmaf <[email protected]> wrote: > > > Hi Jeff, > > > > I tried running: > > > > $MAHOUT_HOME/bin/mahout > > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job -t1 0.1 -t2 > > 0.00001 -x -Dmapred.input.dir=testdata -Dmapred.output.dir=output > > -Dmapred.fairscheduler.pool=my_group.my_pool > > > > But i still endup with the same error. The other arguments are parsed as > > shown by > > > > 12/11/08 21:00:38 INFO kmeans.Job: Running with only user-supplied > > arguments > > 12/11/08 21:00:38 INFO common.AbstractJob: Command line arguments: > > {--convergenceDelta=[0.5], > > > > > --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], > > --endPhase=[2147483647], --maxIter=[-1], --startPhase=[0], --t1=[0.1], > > --t2=[0.00001], --tempDir=[temp]} > > 12/11/08 21:00:38 INFO kmeans.Job: Preparing Input > > > > And the job gets a session > > > > 12/11/08 21:00:39 INFO corona.SessionDriver: Got session ID > > 201211051809.443899 > > > > Then there is this interesting warning for the generic options (which > > includes the -D for the JobClient) > > > > 12/11/08 21:00:39 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > > > Interestingly, the HFS input/output argument are correctly parsed, as > shown > > by > > > > 12/11/08 21:00:40 INFO FileSystem.collect: makeAbsolute: output/data > > working directory: hdfs://my_cluster:my_port/absolute_path > > 12/11/08 21:00:40 INFO input.FileInputFormat: Total input paths to > process > > : 1 > > > > But I still get > > > > 12/11/08 21:00:43 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main > got > > an uncaught exception > > java.io.IOException: InvalidSessionHandle(handle:This cluster is > operating > > in configured pools only mode. The pool group and pool was specified as > > 'default.defaultpool' and is not part of this cluster. Please use the > > Corona parameter mapred.fairscheduler.pool to set a valid pool group and > > pool in the format <poolgroup>.<pool>) > > at > > > org.apache.hadoop.corona.SessionDriver.startSession(SessionDriver.java:275) > > ... > > > > And thoughts on this? > > > > Regards, > > Yazan > > > > > > > > On Thu, Nov 8, 2012 at 5:11 PM, Jeff Eastman <[email protected] > > >wrote: > > > > > That Job extends org.apache.mahout.common.**AbstractJob, so it probably > > > will accept a -D argument to set "mapred.fairscheduler.pool=...**" . > Have > > > you tried this? > > > > > > > > > > > > On 11/8/12 3:41 PM, Yazan Boshmaf wrote: > > > > > >> Hello, > > >> > > >> I'm trying to run the ASF Email example here: > > >> https://cwiki.apache.org/**confluence/display/MAHOUT/**ASFEmail< > > https://cwiki.apache.org/confluence/display/MAHOUT/ASFEmail> > > >> > > >> I am using an existing Hive/Hadoop cluster. > > >> > > >> When I run: > > >> > > >> $MAHOUT_HOME/bin/mahout > > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job > > >> > > >> I get: > > >> > > >> MAHOUT-JOB: > > >> /usr/local/mahout-0.8/trunk/**examples/target/mahout-** > > >> examples-0.8-SNAPSHOT-job.jar > > >> 12/11/08 12:13:54 WARN driver.MahoutDriver: No > > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.**props > found > > >> on > > >> classpath, will use command-line arguments only > > >> 12/11/08 12:13:54 INFO kmeans.Job: Running with default arguments > > >> 12/11/08 12:13:55 INFO FileSystem.collect: makeAbsolute: output > working > > >> directory: hdfs://my_cluster:my_port/ > > >> 12/11/08 12:13:55 INFO kmeans.Job: Preparing Input > > >> 12/11/08 12:13:55 INFO FileSystem.collect: make Qualify non absolute > > path: > > >> testdata working directory: dfs://cluster:port_num/ > > >> 12/11/08 12:13:55 INFO corona.SessionDriver: My serverSocketPort > > port_num > > >> 12/11/08 12:13:55 INFO corona.SessionDriver: My Address > > ip_addrs:port_num > > >> 12/11/08 12:13:55 INFO corona.SessionDriver: Connecting to cluster > > manager > > >> at data_manager:port_num > > >> 12/11/08 12:13:55 INFO corona.SessionDriver: Got session ID > > >> 201211051809.387193 > > >> 12/11/08 12:13:55 WARN mapred.JobClient: Use GenericOptionsParser for > > >> parsing the arguments. Applications should implement Tool for the > same. > > >> 12/11/08 12:13:56 INFO FileSystem.collect: makeAbsolute: output/data > > >> working directory: dfs://cluster:port_num/ > > >> 12/11/08 12:13:56 INFO input.FileInputFormat: Total input paths to > > process > > >> : 1 > > >> 12/11/08 12:13:56 INFO lzo.GPLNativeCodeLoader: Loaded native gpl > > library > > >> 12/11/08 12:13:56 INFO lzo.LzoCodec: Successfully loaded & initialized > > >> native-lzo library [hadoop-lzo rev fatal: Not a git repository (or any > > of > > >> the parent directories): .git] > > >> 12/11/08 12:13:57 ERROR mapred.CoronaJobTracker: UNCAUGHT: Thread main > > got > > >> an uncaught exception > > >> java.io.IOException: InvalidSessionHandle(handle:**This cluster is > > >> operating > > >> in configured pools only mode. The pool group and pool was specified > as > > >> 'default.defaultpool' and is not part of this cluster. Please use the > > >> Corona parameter mapred.fairscheduler.pool to set a valid pool group > and > > >> pool in the format <poolgroup>.<pool>) > > >> at > > >> org.apache.hadoop.corona.**SessionDriver.startSession(** > > >> SessionDriver.java:275) > > >> at > > >> org.apache.hadoop.mapred.**CoronaJobTracker.**startFullTracker(** > > >> CoronaJobTracker.java:670) > > >> at > > >> org.apache.hadoop.mapred.**CoronaJobTracker.submitJob(** > > >> CoronaJobTracker.java:1898) > > >> at org.apache.hadoop.mapred.**JobClient.submitJobInternal(** > > >> JobClient.java:1259) > > >> at org.apache.hadoop.mapreduce.**Job.submit(Job.java:459) > > >> at org.apache.hadoop.mapreduce.**Job.waitForCompletion(Job.**java:474) > > >> at > > >> org.apache.mahout.clustering.**conversion.InputDriver.runJob(** > > >> InputDriver.java:108) > > >> at > > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.** > > >> run(Job.java:129) > > >> at > > >> org.apache.mahout.clustering.**syntheticcontrol.kmeans.Job.** > > >> main(Job.java:59) > > >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method) > > >> at > > >> sun.reflect.**NativeMethodAccessorImpl.**invoke(** > > >> NativeMethodAccessorImpl.java:**39) > > >> at > > >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** > > >> DelegatingMethodAccessorImpl.**java:25) > > >> at java.lang.reflect.Method.**invoke(Method.java:597) > > >> at > > >> org.apache.hadoop.util.**ProgramDriver$**ProgramDescription.invoke(** > > >> ProgramDriver.java:68) > > >> at org.apache.hadoop.util.**ProgramDriver.driver(** > > >> ProgramDriver.java:139) > > >> at > org.apache.mahout.driver.**MahoutDriver.main(**MahoutDriver.java:195) > > >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method) > > >> at > > >> sun.reflect.**NativeMethodAccessorImpl.**invoke(** > > >> NativeMethodAccessorImpl.java:**39) > > >> at > > >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** > > >> DelegatingMethodAccessorImpl.**java:25) > > >> at java.lang.reflect.Method.**invoke(Method.java:597) > > >> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156) > > >> > > >> My question is: How do I configure Mahout to use pools? That is, where > > do > > >> I > > >> set the Corona "mapred.fairscheduler.pool" JobConf? > > >> > > >> > > > > > >
