Hello,

My test environment is using Cloudera's Hadoop (CDH beta 3) using Whirr to
spawn the EC2 cluster.  I am spawning the cluster from another EC2 instance.

I'm attempting to use the Kmeans example following the instructions from the
Quickstart guide.  I mount my testdata on the HDFS and see:

drwxr-xr-x   - ubuntu supergroup          0 2011-02-14 21:48
/user/ubuntu/Mahout-trunk

Within Mahout-trunk is /testdata/.  Note the usage of /user/ubuntu/.

When I run the examples, they seem to be looking for /home/ (see error log
below).  Looking through the code, it looks there are functions for getInput
so I assume there is a configuration setting of sorts, but it is not
apparent to me.

no HADOOP_HOME set, running locally
Feb 14, 2011 10:05:14 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.mahout.clustering.syntheticcontrol.canopy.Job.props
found on classpath, will use command-line arguments only
Feb 14, 2011 10:05:14 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Running with default arguments
Feb 14, 2011 10:05:14 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
Feb 14, 2011 10:05:14 PM org.apache.hadoop.mapred.JobClient
configureCommandLineOptions
WARNING: Use GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist: file:/home/ubuntu/Mahout-trunk/testdata
<trimmed>

Thanks in advance,
Jeff

Reply via email to