It seems like seqdirectory expects the input to be on HDFS and not
local? Running the below command will write an empty output directory on
HDFS
MAHOUT_LOCAL=true $MAHOUT seqdirectory \
-i mahout-work/reuters-out \
-o mahout-work/reuters-out-seqdir \
-c UTF-8 -chunk 5
If I put the input directory into HDFS then all will work as expected.
Does seqdirectory expect its input to be on HDFS.. ie is this the
expected behavior? If so, should the example be updated?
On 6/5/11 11:07 AM, Mark wrote:
Hi all. I'm trying to run the examples/bin/build-reuters.sh but I
continue to run into the following exception.
INFO: Deleting mahout-work/reuters-kmeans-clusters
Jun 5, 2011 10:29:37 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
Jun 5, 2011 10:29:37 AM org.apache.hadoop.io.compress.CodecPool
getCompressor
INFO: Got brand-new compressor
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index:
0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:108)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
I am also confused reading the build-reuters.sh code itself. There
seems to be some disjunction between what is expected to be local and
what should be on HDFS. For example on the comments on 77-79 are:
# we know reuters-out-seqdir exists on a local disk at
# this point, if we're running in clustered mode,
# copy it up to hdfs
However upon inspection you'll notice that the reueters-out-seqdir is
actually on HDFS. It seems like the seqdirectory will never write to
local disk... even with the MAHOUT_LOCAL=true flag set.
Any ideas?
Thanks