Hello, My test environment is using Cloudera's Hadoop (CDH beta 3) using Whirr to spawn the EC2 cluster. I am spawning the cluster from another EC2 instance.
I'm attempting to use the Kmeans example following the instructions from the Quickstart guide. I mount my testdata on the HDFS and see: drwxr-xr-x - ubuntu supergroup 0 2011-02-14 21:48 /user/ubuntu/Mahout-trunk Within Mahout-trunk is /testdata/. Note the usage of /user/ubuntu/. When I run the examples, they seem to be looking for /home/ (see error log below). Looking through the code, it looks there are functions for getInput so I assume there is a configuration setting of sorts, but it is not apparent to me. no HADOOP_HOME set, running locally Feb 14, 2011 10:05:14 PM org.slf4j.impl.JCLLoggerAdapter warn WARNING: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props found on classpath, will use command-line arguments only Feb 14, 2011 10:05:14 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Running with default arguments Feb 14, 2011 10:05:14 PM org.apache.hadoop.metrics.jvm.JvmMetrics init INFO: Initializing JVM Metrics with processName=JobTracker, sessionId= Feb 14, 2011 10:05:14 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/ubuntu/Mahout-trunk/testdata <trimmed> Thanks in advance, Jeff
