The example should work, I tested it yesterday. The simplest way to
execute it is to first build mahout using

$ mvn -DskipTests clean install

Then download the movielens1M dataset from
http://www.grouplens.org/node/73 and unzip it.

After that, go to examples/bin and point the script to the ratings.dat
file found in the movielens dataset.

$ export MAHOUT_LOCAL=true
$ bash factorize-movielens-1M.sh /path/to/ratings.dat

Best,
Sebastian


On 19.01.2013 00:20, Kamal Ali wrote:
> I'm a newbie trying to get some mahout commandline examples to work.
> 
> I tried executing factorize-movielens-1M.sh but  get an error "input path
> does not exist: /tmp/mahout-work-kali/movielens/ratings.csv"
> even after i manually created /tmp/mahout-work-ali/ and all its descendant
> directories and chmod'd them to 777.
> 
> even after i modified factorize-movielens-1M.sh to do a "ls -l " on the
> ratings.csv which show /tmp/mahout-work-kali/movielens/ratings.csv
>  exists.
> 
> [the input file u1.base already has "::" instead of \t as delimiters.]
> 
> i'm wondering if the error is something else and is being mis-reported and
> some intermediate script/program is just getting a non-zero
> return status and falling back on a stock error message.
> 
> i am on 64bit mac, jdk1.7. my ssh keys were generated using user "kali".
> 
> has anyone had success running factorize-movielens-1M.sh ?
> 
> does this factorize*sh only run in mahout local mode ?
> 
> is factorize-movielens-1M.sh cruddy and old and some other way
> should be used??
> 
> i'm primarily interested in getting ALS methods to work,
> if someone knows where in the mahout distribution one can find the
> latest or most tested ALS implementation (and the maven command to run it)
> pls let me know .
> 
> THANK YOU!
> kamal.
> 
> my hadoop-env.sh is at the end of this email.
> ================================================
> ./factorize-movielens-1M.sh     $grouplens/ml-100k/u1.base   # grouplens
> points to a directory containing the file u1.base
> creating work directory at /tmp/mahout-work-kali
> kamal: doing ls -l on movie lens dir:
> total 1544
> drwxrwxrwx  3 kali  wheel     102 Jan 18 12:20 dataset
> -rwxrwxrwx  1 kali  wheel  786544 Jan 18 13:46 ratings.csv
> kamal: doing wc -l on ratings.csv
>    80000 /tmp/mahout-work-kali/movielens/ratings.csv
> Converting ratings...
> after sed
> -rwxrwxrwx  1 kali  wheel  786544 Jan 18 13:47
> /tmp/mahout-work-kali/movielens/ratings.csv
> kamal: doing head on ratings.csv
> 1,1,5
> 1,2,3
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
> 
> Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
> HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
> MAHOUT-JOB:
> /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
> Warning: $HADOOP_HOME is deprecated.
> 
> 13/01/18 13:47:24 INFO common.AbstractJob: Command line arguments:
> {--endPhase=[2147483647],
> --input=[/tmp/mahout-work-kali/movielens/ratings.csv],
> --output=[/tmp/mahout-work-kali/dataset], --probePercentage=[0.1],
> --startPhase=[0], --tempDir=[/tmp/mahout-work-kali/dataset/tmp],
> --trainingPercentage=[0.9]}
> 2013-01-18 13:47:24.918 java[53562:1703] Unable to load realm info from
> SCDynamicStore
> 13/01/18 13:47:25 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0035
> 13/01/18 13:47:25 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kali
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: /tmp/mahout-work-kali/movielens/ratings.csv
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: /tmp/mahout-work-kali/movielens/ratings.csv
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>  at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
>  at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
> at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>  at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:90)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:64)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
>  at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> after splitDataset
> -rwxrwxrwx  1 kali  wheel  786544 Jan 18 13:47
> /tmp/mahout-work-kali/movielens/ratings.csv
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
> 
> Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
> HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
> MAHOUT-JOB:
> /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
> Warning: $HADOOP_HOME is deprecated.
> 
> 13/01/18 13:47:31 INFO common.AbstractJob: Command line arguments:
> {--alpha=[40], --endPhase=[2147483647], --implicitFeedback=[false],
> --input=[/tmp/mahout-work-kali/dataset/trainingSet/], --lambda=[0.065],
> --numFeatures=[20], --numIterations=[10],
> --output=[/tmp/mahout-work-kali/als/out], --startPhase=[0],
> --tempDir=[/tmp/mahout-work-kali/als/tmp]}
> 2013-01-18 13:47:31.259 java[53605:1703] Unable to load realm info from
> SCDynamicStore
> 13/01/18 13:47:32 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0036
> 13/01/18 13:47:32 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kali
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: /tmp/mahout-work-kali/dataset/trainingSet
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: /tmp/mahout-work-kali/dataset/trainingSet
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>  at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
>  at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
> at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>  at
> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.run(ParallelALSFactorizationJob.java:137)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.main(ParallelALSFactorizationJob.java:98)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
>  at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
> 
> Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
> HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
> MAHOUT-JOB:
> /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
> Warning: $HADOOP_HOME is deprecated.
> 
> 13/01/18 13:47:38 INFO common.AbstractJob: Command line arguments:
> {--endPhase=[2147483647],
> --input=[/tmp/mahout-work-kali/dataset/probeSet/],
> --itemFeatures=[/tmp/mahout-work-kali/als/out/M/],
> --output=[/tmp/mahout-work-kali/als/rmse/], --startPhase=[0],
> --tempDir=[/tmp/mahout-work-kali/als/tmp],
> --userFeatures=[/tmp/mahout-work-kali/als/out/U/]}
> 2013-01-18 13:47:38.142 java[53645:1703] Unable to load realm info from
> SCDynamicStore
> 13/01/18 13:47:38 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0037
> 13/01/18 13:47:38 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kali
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: /tmp/mahout-work-kali/dataset/probeSet
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: /tmp/mahout-work-kali/dataset/probeSet
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>  at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
>  at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
> at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>  at
> org.apache.mahout.cf.taste.hadoop.als.FactorizationEvaluator.run(FactorizationEvaluator.java:91)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> org.apache.mahout.cf.taste.hadoop.als.FactorizationEvaluator.main(FactorizationEvaluator.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
>  at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Warning: $HADOOP_HOME is deprecated.
> 
> Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
> HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
> MAHOUT-JOB:
> /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
> Warning: $HADOOP_HOME is deprecated.
> 
> 13/01/18 13:47:44 INFO common.AbstractJob: Command line arguments:
> {--endPhase=[2147483647],
> --input=[/tmp/mahout-work-kali/als/out/userRatings/],
> --itemFeatures=[/tmp/mahout-work-kali/als/out/M/], --maxRating=[5],
> --numRecommendations=[6],
> --output=[/tmp/mahout-work-kali/recommendations/], --startPhase=[0],
> --tempDir=[temp], --userFeatures=[/tmp/mahout-work-kali/als/out/U/]}
> 2013-01-18 13:47:44.859 java[53687:1703] Unable to load realm info from
> SCDynamicStore
> 13/01/18 13:47:45 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0038
> 13/01/18 13:47:45 ERROR security.UserGroupInformation:
> PriviledgedActionException as:kali
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: /tmp/mahout-work-kali/als/out/userRatings
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: /tmp/mahout-work-kali/als/out/userRatings
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>  at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>  at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
>  at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>  at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
> at
> org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.run(RecommenderJob.java:95)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>  at
> org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.main(RecommenderJob.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> RMSE is:
> 
> cat: /tmp/mahout-work-kali/als/rmse/rmse.txt: No such file or directory
> 
> 
> 
> Sample recommendations:
> 
> cat: /tmp/mahout-work-kali/recommendations/part-m-00000: No such file or
> directory
> 
> 
> ==================================================
> # Set Hadoop-specific environment variables here.
> 
> # The only required environment variable is JAVA_HOME.  All others are
> # optional.  When running a distributed configuration it is best to
> # set JAVA_HOME in this file, so that it is correctly defined on
> # remote nodes.
> 
> # The java implementation to use.  Required.
> export
> JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_10.jdk/Contents/Home/jre
> 
> # Extra Java CLASSPATH elements.  Optional.
> # export HADOOP_CLASSPATH=
> 
> # The maximum amount of heap to use, in MB. Default is 1000.
> # export HADOOP_HEAPSIZE=2000
> 
> # Extra Java runtime options.  Empty by default.
> # export HADOOP_OPTS=-server
> 
> # Command specific options appended to HADOOP_OPTS when specified
> export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_NAMENODE_OPTS"
> export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_SECONDARYNAMENODE_OPTS"
> export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_DATANODE_OPTS"
> export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_BALANCER_OPTS"
> export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
> $HADOOP_JOBTRACKER_OPTS"
> # export HADOOP_TASKTRACKER_OPTS=
> # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
> # export HADOOP_CLIENT_OPTS
> 
> # Extra ssh options.  Empty by default.
> # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
> 
> # Where log files are stored.  $HADOOP_HOME/logs by default.
> # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
> 
> # File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
> # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
> 
> # host:path where hadoop code should be rsync'd from.  Unset by default.
> # export HADOOP_MASTER=master:/home/$USER/src/hadoop
> 
> # Seconds to sleep between slave commands.  Unset by default.  This
> # can be useful in large clusters, where, e.g., slave rsyncs can
> # otherwise arrive faster than the master can service them.
> # export HADOOP_SLAVE_SLEEP=0.1
> 
> # The directory where pid files are stored. /tmp by default.
> # export HADOOP_PID_DIR=/var/hadoop/pids
> 
> # A string representing this instance of hadoop. $USER by default.
> # export HADOOP_IDENT_STRING=$USER
> 
> # The scheduling priority for daemon processes.  See 'man nice'.
> # export HADOOP_NICENESS=10
> 

Reply via email to