My first guess is that your step_args isn't actually specifying multiple args. What if you make it an array of strings instead of one space-separated string?
Not sure, I've not used the Python binding or this library, but, I was bitten by a very similar sort of issue recently and so it comes to mind. On Mon, Jan 31, 2011 at 5:14 PM, Thomas Söhngen <[email protected]> wrote: > Hello fellow Mahout users, > > I have strange issues running Mahout on top of Amazons Elastic MapReduce. I > wrote a python script using the boto library (see > http://pastebin.com/UxKjmRF2 for the script ). I define and run a step > like this: > > [...] > step2 = JarStep(name='Find similiar items', > jar='s3n://'+ main_bucket_name > +'/mahout-core/mahout-core-0.4-job.jar', > > main_class='org.apache.mahout.cf.taste.hadoop.item.RecommenderJob', > step_args=['--input s3n://'+ main_bucket_name > +'/data/' + run_id + '/aggregateWatched/', > '--output s3n://'+ main_bucket_name > +'/data/' + run_id + '/similiarItems/', > '--similarityClassname > SIMILARITY_PEARSON_CORRELATION' > ]) > [...] > jobid = emr_conn.run_jobflow(name = name, > log_uri = 's3n://'+ main_bucket_name > +'/emr-logging/', > enable_debugging=1, > hadoop_version='0.20', > steps=[step1,step2]) > > > The controller for the step gives me the following response: > > 2011-01-31T16:07:34.068Z INFO Fetching jar file. > 2011-01-31T16:07:57.862Z INFO Working dir /mnt/var/lib/hadoop/steps/3 > 2011-01-31T16:07:57.862Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java > -cp > /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* > -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/3 > -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop > -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA > -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/3/tmp > -Djava.library.path=/home/hadoop/lib/native/Linux-i386-32 > org.apache.hadoop.util.RunJar > /mnt/var/lib/hadoop/steps/3/mahout-core-0.4-job.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input > s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/ > --output s3n://recommendertest/data/job2011Y01M31D17H01M52S/similiarItems/ > --similarityClassname SIMILARITY_PEARSON_CORRELATION > 2011-01-31T16:08:01.880Z INFO Execution ended with ret val 0 > 2011-01-31T16:08:04.055Z INFO Step created jobs: > 2011-01-31T16:08:04.055Z INFO Step succeeded > > But the syslog tells me: > > 2011-01-31 16:08:00,631 ERROR org.apache.mahout.common.AbstractJob > (main): Unexpected --input > s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/ > while processing Job-Specific Options: > > ...producing no output at all, not even the directory. > > Next I try to run the jar as a single JobFlow from the AWS console. This is > the controller output: > > 2011-01-31T16:33:57.030Z INFO Fetching jar file. > 2011-01-31T16:34:19.520Z INFO Working dir /mnt/var/lib/hadoop/steps/2 > 2011-01-31T16:34:19.521Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java > -cp > /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* > -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/2 > -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop > -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA > -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/2/tmp > -Djava.library.path=/home/hadoop/lib/native/Linux-i386-32 > org.apache.hadoop.util.RunJar > /mnt/var/lib/hadoop/steps/2/mahout-core-0.4-job.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input > s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/ > --output s3n://recommendertest/data/job2011Y01M31D17H01M52S/similiarItems/ > --similarityClassname SIMILARITY_PEARSON_CORRELATION > 2011-01-31T16:47:22.477Z INFO Execution ended with ret val 0 > 2011-01-31T16:47:24.616Z INFO Step created jobs: > job_201101311631_0001,job_201101311631_0002,job_201101311631_0003,job_201101311631_0004,job_201101311631_0005,job_201101311631_0006,job_201101311631_0007,job_201101311631_0008,job_201101311631_0009,job_201101311631_0010,job_201101311631_0011 > 2011-01-31T16:47:47.642Z INFO Step succeeded > > As you can see, the execution (line 3) looks exactly the same(except for > the step being step 3 in the first and step 2 in the second case), but this > time the steps within the jar are executed and the syslog shows the progress > of the map and reduce steps (see http://pastebin.com/Ezn3nGb4 ). The > output directory is created, and there is a file in it, but with no content > at all (the filesize is 0 bytes). So although the JobFlow runs for about 16 > minutes on this and the logs clearly show, that there is data processed, the > output is zero. > > These errors are giving me headaches for some days now, I would really > appreciate if someone could give me a glue on this. I made the s3n folder > public, if it helps: s3n://recommendertest/data/job2011Y01M31D17H01M52S/ > > Thanks in advance, > Thomas Söhngen > > > > > > > > > >
