Hello - if you are running on top of hdfs then did you use hadoop fs -put to create the /tmp/mahout-work-kali/movielens/ratings.csv file?
---- Dr. Simon Thompson Note : This email contains BT information, which may be privileged or confidential. It's meant only for the individual(s) or entity named above. If you're not the intended recipient, note that disclosing, copying, distributing or using this information is prohibited. If you've received this email in error, please let me know immediately on the email address above. Thank you. We monitor our email system, and may record your emails. British Telecommunications plc Registered office: 81 Newgate Street London EC1A 7AJ Registered in England no: 1800000 ________________________________________ From: Kamal Ali [[email protected]] Sent: 18 January 2013 23:20 To: [email protected] Subject: factorize-movielens-1M.sh privilegedActionException: reports dir doesn't exist when it does exist I'm a newbie trying to get some mahout commandline examples to work. I tried executing factorize-movielens-1M.sh but get an error "input path does not exist: /tmp/mahout-work-kali/movielens/ratings.csv" even after i manually created /tmp/mahout-work-ali/ and all its descendant directories and chmod'd them to 777. even after i modified factorize-movielens-1M.sh to do a "ls -l " on the ratings.csv which show /tmp/mahout-work-kali/movielens/ratings.csv exists. [the input file u1.base already has "::" instead of \t as delimiters.] i'm wondering if the error is something else and is being mis-reported and some intermediate script/program is just getting a non-zero return status and falling back on a stock error message. i am on 64bit mac, jdk1.7. my ssh keys were generated using user "kali". has anyone had success running factorize-movielens-1M.sh ? does this factorize*sh only run in mahout local mode ? is factorize-movielens-1M.sh cruddy and old and some other way should be used?? i'm primarily interested in getting ALS methods to work, if someone knows where in the mahout distribution one can find the latest or most tested ALS implementation (and the maven command to run it) pls let me know . THANK YOU! kamal. my hadoop-env.sh is at the end of this email. ================================================ ./factorize-movielens-1M.sh $grouplens/ml-100k/u1.base # grouplens points to a directory containing the file u1.base creating work directory at /tmp/mahout-work-kali kamal: doing ls -l on movie lens dir: total 1544 drwxrwxrwx 3 kali wheel 102 Jan 18 12:20 dataset -rwxrwxrwx 1 kali wheel 786544 Jan 18 13:46 ratings.csv kamal: doing wc -l on ratings.csv 80000 /tmp/mahout-work-kali/movielens/ratings.csv Converting ratings... after sed -rwxrwxrwx 1 kali wheel 786544 Jan 18 13:47 /tmp/mahout-work-kali/movielens/ratings.csv kamal: doing head on ratings.csv 1,1,5 1,2,3 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf MAHOUT-JOB: /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. 13/01/18 13:47:24 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-kali/movielens/ratings.csv], --output=[/tmp/mahout-work-kali/dataset], --probePercentage=[0.1], --startPhase=[0], --tempDir=[/tmp/mahout-work-kali/dataset/tmp], --trainingPercentage=[0.9]} 2013-01-18 13:47:24.918 java[53562:1703] Unable to load realm info from SCDynamicStore 13/01/18 13:47:25 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0035 13/01/18 13:47:25 ERROR security.UserGroupInformation: PriviledgedActionException as:kali cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/movielens/ratings.csv Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/movielens/ratings.csv at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:90) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) after splitDataset -rwxrwxrwx 1 kali wheel 786544 Jan 18 13:47 /tmp/mahout-work-kali/movielens/ratings.csv MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf MAHOUT-JOB: /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. 13/01/18 13:47:31 INFO common.AbstractJob: Command line arguments: {--alpha=[40], --endPhase=[2147483647], --implicitFeedback=[false], --input=[/tmp/mahout-work-kali/dataset/trainingSet/], --lambda=[0.065], --numFeatures=[20], --numIterations=[10], --output=[/tmp/mahout-work-kali/als/out], --startPhase=[0], --tempDir=[/tmp/mahout-work-kali/als/tmp]} 2013-01-18 13:47:31.259 java[53605:1703] Unable to load realm info from SCDynamicStore 13/01/18 13:47:32 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0036 13/01/18 13:47:32 ERROR security.UserGroupInformation: PriviledgedActionException as:kali cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/dataset/trainingSet Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/dataset/trainingSet at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.run(ParallelALSFactorizationJob.java:137) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.main(ParallelALSFactorizationJob.java:98) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf MAHOUT-JOB: /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. 13/01/18 13:47:38 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-kali/dataset/probeSet/], --itemFeatures=[/tmp/mahout-work-kali/als/out/M/], --output=[/tmp/mahout-work-kali/als/rmse/], --startPhase=[0], --tempDir=[/tmp/mahout-work-kali/als/tmp], --userFeatures=[/tmp/mahout-work-kali/als/out/U/]} 2013-01-18 13:47:38.142 java[53645:1703] Unable to load realm info from SCDynamicStore 13/01/18 13:47:38 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0037 13/01/18 13:47:38 ERROR security.UserGroupInformation: PriviledgedActionException as:kali cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/dataset/probeSet Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/dataset/probeSet at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.mahout.cf.taste.hadoop.als.FactorizationEvaluator.run(FactorizationEvaluator.java:91) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.cf.taste.hadoop.als.FactorizationEvaluator.main(FactorizationEvaluator.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf MAHOUT-JOB: /users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. 13/01/18 13:47:44 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-kali/als/out/userRatings/], --itemFeatures=[/tmp/mahout-work-kali/als/out/M/], --maxRating=[5], --numRecommendations=[6], --output=[/tmp/mahout-work-kali/recommendations/], --startPhase=[0], --tempDir=[temp], --userFeatures=[/tmp/mahout-work-kali/als/out/U/]} 2013-01-18 13:47:44.859 java[53687:1703] Unable to load realm info from SCDynamicStore 13/01/18 13:47:45 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0038 13/01/18 13:47:45 ERROR security.UserGroupInformation: PriviledgedActionException as:kali cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/als/out/userRatings Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /tmp/mahout-work-kali/als/out/userRatings at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.run(RecommenderJob.java:95) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.main(RecommenderJob.java:69) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) RMSE is: cat: /tmp/mahout-work-kali/als/rmse/rmse.txt: No such file or directory Sample recommendations: cat: /tmp/mahout-work-kali/recommendations/part-m-00000: No such file or directory ================================================== # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_10.jdk/Contents/Home/jre # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" # export HADOOP_TASKTRACKER_OPTS= # The following applies to multiple commands (fs, dfs, fsck, distcp etc) # export HADOOP_CLIENT_OPTS # Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. # export HADOOP_PID_DIR=/var/hadoop/pids # A string representing this instance of hadoop. $USER by default. # export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10
