Hello, all,
I have a small matrix stored in a local file, small_matrix, in the ASCII
format as follow.
1 0 0 0 2
0 0 3 0 0
0 0 0 0 0
0 4 0 0 0
I run the following command to convert it to Sequence Files.
mahout seqdirectory -i <some local dir>/small_matrix -o small_matrix_seq -c
ASCII -chunk 5
I see there's a new directory, "small_matrix_seq" in my root directory in
the HDFS. A single file "chunk-0" is inside. Then, I launch the Mahout
SVD with the following command line.
mahout-distribution-0.6/bin/mahout svd -i <my root dir in
HDFS>/small_matrix_seq/chunk-0 -o <my root dir in HDFS>/SVDOutput -nr 4 -nc
5 -r 4
This is the output.
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop-0.20.203.0
No HADOOP_CONF_DIR set, using /usr/local/hadoop-0.20.203.0/conf
MAHOUT-JOB:
/home/lanet/Downloads/mahout-distribution-0.6/mahout-examples-0.6-job.jar
12/11/22 16:38:36 INFO common.AbstractJob: Command line arguments:
{--endPhase=2147483647, --inMemory=false,
--input=/user/lanet/small_matrix_seq/chunk-0, --maxError=0.05,
--minEigenvalue=0.0, --numCols=5, --numRows=4,
--output=/user/lanet/SVDOutput, --rank=4, --startPhase=0, --tempDir=temp}
12/11/22 16:38:37 INFO lanczos.LanczosSolver: Finding 4 singular vectors of
matrix with 4 rows, via Lanczos
12/11/22 16:38:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
12/11/22 16:38:38 INFO mapred.JobClient: Running job: job_201211121510_0027
12/11/22 16:38:39 INFO mapred.JobClient: map 0% reduce 0%
12/11/22 16:38:57 INFO mapred.JobClient: map 100% reduce 0%
12/11/22 16:39:08 INFO mapred.JobClient: map 100% reduce 100%
12/11/22 16:39:13 INFO mapred.JobClient: Job complete: job_201211121510_0027
12/11/22 16:39:13 INFO mapred.JobClient: Counters: 26
12/11/22 16:39:13 INFO mapred.JobClient: Job Counters
12/11/22 16:39:13 INFO mapred.JobClient: Launched reduce tasks=1
12/11/22 16:39:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=15207
12/11/22 16:39:13 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
12/11/22 16:39:13 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/11/22 16:39:13 INFO mapred.JobClient: Rack-local map tasks=1
12/11/22 16:39:13 INFO mapred.JobClient: Launched map tasks=1
12/11/22 16:39:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11080
12/11/22 16:39:13 INFO mapred.JobClient: File Input Format Counters
12/11/22 16:39:13 INFO mapred.JobClient: Bytes Read=78
12/11/22 16:39:13 INFO mapred.JobClient: File Output Format Counters
12/11/22 16:39:13 INFO mapred.JobClient: Bytes Written=98
12/11/22 16:39:13 INFO mapred.JobClient: FileSystemCounters
12/11/22 16:39:13 INFO mapred.JobClient: FILE_BYTES_READ=6
12/11/22 16:39:13 INFO mapred.JobClient: HDFS_BYTES_READ=334
12/11/22 16:39:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=45393
12/11/22 16:39:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=98
12/11/22 16:39:13 INFO mapred.JobClient: Map-Reduce Framework
12/11/22 16:39:13 INFO mapred.JobClient: Map output materialized bytes=6
12/11/22 16:39:13 INFO mapred.JobClient: Map input records=0
12/11/22 16:39:13 INFO mapred.JobClient: Reduce shuffle bytes=0
12/11/22 16:39:13 INFO mapred.JobClient: Spilled Records=0
12/11/22 16:39:13 INFO mapred.JobClient: Map output bytes=0
12/11/22 16:39:13 INFO mapred.JobClient: Map input bytes=0
12/11/22 16:39:13 INFO mapred.JobClient: Combine input records=0
12/11/22 16:39:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=108
12/11/22 16:39:13 INFO mapred.JobClient: Reduce input records=0
12/11/22 16:39:13 INFO mapred.JobClient: Reduce input groups=0
12/11/22 16:39:13 INFO mapred.JobClient: Combine output records=0
12/11/22 16:39:13 INFO mapred.JobClient: Reduce output records=0
12/11/22 16:39:13 INFO mapred.JobClient: Map output records=0
Exception in thread "main" java.util.NoSuchElementException
at
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
at
org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190)
at
org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238)
at
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:200)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:123)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:283)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.main(DistributedLanczosSolver.java:289)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
What does the exception imply?
Thanks,
Chiu
On Thu, Nov 22, 2012 at 8:13 AM, Chui-Hui Chiu <[email protected]>wrote:
> Hello, all,
>
> I read the introduction page of the SVD in Mahout. The SVD application
> requires that the input matrix in the Sequence File Format. Now, I have a
> matrix with real number elements in the ASCII format. Rows are separated
> by the new line characters and columns are separated by the space
> character. The matrix looks like
>
> 1.1 1.2 1.3
> 2.1 2.2 2.3
> 3.1 3.2 3.3
>
> How is the matrix stored in Sequence Files? How do I convert the matrix
> into the appropriate format for the SVD application?
>
>
> Thanks,
> Chiu
>