My first guess is that your step_args isn't actually specifying multiple
args. What if you make it an array of strings instead of one space-separated
string?

Not sure, I've not used the Python binding or this library, but, I was
bitten by a very similar sort of issue recently and so it comes to mind.

On Mon, Jan 31, 2011 at 5:14 PM, Thomas Söhngen <[email protected]> wrote:

> Hello fellow Mahout users,
>
> I have strange issues running Mahout on top of Amazons Elastic MapReduce. I
> wrote a python script using the boto library (see
> http://pastebin.com/UxKjmRF2 for the script ). I define and run a step
> like this:
>
>   [...]
>   step2 = JarStep(name='Find similiar items',
>                    jar='s3n://'+ main_bucket_name
>   +'/mahout-core/mahout-core-0.4-job.jar',
>
>  main_class='org.apache.mahout.cf.taste.hadoop.item.RecommenderJob',
>                    step_args=['--input s3n://'+ main_bucket_name
>   +'/data/' + run_id + '/aggregateWatched/',
>                               '--output s3n://'+ main_bucket_name
>   +'/data/' + run_id + '/similiarItems/',
>                               '--similarityClassname
>   SIMILARITY_PEARSON_CORRELATION'
>                              ])
>   [...]
>   jobid = emr_conn.run_jobflow(name = name,
>                             log_uri = 's3n://'+ main_bucket_name
>   +'/emr-logging/',
>                             enable_debugging=1,
>                             hadoop_version='0.20',
>                             steps=[step1,step2])
>
>
> The controller for the step gives me the following response:
>
>   2011-01-31T16:07:34.068Z INFO Fetching jar file.
>   2011-01-31T16:07:57.862Z INFO Working dir /mnt/var/lib/hadoop/steps/3
>   2011-01-31T16:07:57.862Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java
> -cp
> /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/*
> -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/3
> -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop
> -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA
> -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/3/tmp
> -Djava.library.path=/home/hadoop/lib/native/Linux-i386-32
> org.apache.hadoop.util.RunJar
> /mnt/var/lib/hadoop/steps/3/mahout-core-0.4-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input
> s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/
> --output s3n://recommendertest/data/job2011Y01M31D17H01M52S/similiarItems/
> --similarityClassname SIMILARITY_PEARSON_CORRELATION
>   2011-01-31T16:08:01.880Z INFO Execution ended with ret val 0
>   2011-01-31T16:08:04.055Z INFO Step created jobs:
>   2011-01-31T16:08:04.055Z INFO Step succeeded
>
> But the syslog tells me:
>
>   2011-01-31 16:08:00,631 ERROR org.apache.mahout.common.AbstractJob
>   (main): Unexpected --input
>   s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/
>   while processing Job-Specific Options:
>
> ...producing no output at all, not even the directory.
>
> Next I try to run the jar as a single JobFlow from the AWS console. This is
> the controller output:
>
>   2011-01-31T16:33:57.030Z INFO Fetching jar file.
>   2011-01-31T16:34:19.520Z INFO Working dir /mnt/var/lib/hadoop/steps/2
>   2011-01-31T16:34:19.521Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java
> -cp
> /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-0.20-core.jar:/home/hadoop/hadoop-0.20-tools.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/*
> -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/2
> -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop
> -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA
> -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/2/tmp
> -Djava.library.path=/home/hadoop/lib/native/Linux-i386-32
> org.apache.hadoop.util.RunJar
> /mnt/var/lib/hadoop/steps/2/mahout-core-0.4-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input
> s3n://recommendertest/data/job2011Y01M31D17H01M52S/aggregateWatched/
> --output s3n://recommendertest/data/job2011Y01M31D17H01M52S/similiarItems/
> --similarityClassname SIMILARITY_PEARSON_CORRELATION
>   2011-01-31T16:47:22.477Z INFO Execution ended with ret val 0
>   2011-01-31T16:47:24.616Z INFO Step created jobs:
> job_201101311631_0001,job_201101311631_0002,job_201101311631_0003,job_201101311631_0004,job_201101311631_0005,job_201101311631_0006,job_201101311631_0007,job_201101311631_0008,job_201101311631_0009,job_201101311631_0010,job_201101311631_0011
>   2011-01-31T16:47:47.642Z INFO Step succeeded
>
> As you can see, the execution (line 3) looks exactly the same(except for
> the step being step 3 in the first and step 2 in the second case), but this
> time the steps within the jar are executed and the syslog shows the progress
> of the map and reduce steps (see http://pastebin.com/Ezn3nGb4 ). The
> output directory is created, and there is a file in it, but with no content
> at all (the filesize is 0 bytes). So although the JobFlow runs for about 16
> minutes on this and the logs clearly show, that there is data processed, the
> output is zero.
>
> These errors are giving me headaches for some days now, I would really
> appreciate if someone could give me a glue on this. I made the s3n folder
> public, if it helps: s3n://recommendertest/data/job2011Y01M31D17H01M52S/
>
> Thanks in advance,
> Thomas Söhngen
>
>
>
>
>
>
>
>
>
>

Reply via email to