Re: Input format to the Singular Value Decomposition in Mahout

Chui-Hui Chiu Thu, 22 Nov 2012 15:01:41 -0800

Hello, all,

I have a small matrix stored in a local file, small_matrix,  in the ASCII
format as follow.


1 0 0 0 2
0 0 3 0 0
0 0 0 0 0
0 4 0 0 0

I run the following command to convert it to Sequence Files.

mahout seqdirectory -i <some local dir>/small_matrix -o small_matrix_seq -c
ASCII -chunk 5

I see there's a new directory, "small_matrix_seq" in my root directory in
the HDFS.  A single file "chunk-0" is inside.  Then, I launch the Mahout
SVD with the following command line.

mahout-distribution-0.6/bin/mahout svd -i <my root dir in
HDFS>/small_matrix_seq/chunk-0 -o <my root dir in HDFS>/SVDOutput -nr 4 -nc
5 -r 4

This is the output.

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop-0.20.203.0
No HADOOP_CONF_DIR set, using /usr/local/hadoop-0.20.203.0/conf
MAHOUT-JOB:
/home/lanet/Downloads/mahout-distribution-0.6/mahout-examples-0.6-job.jar
12/11/22 16:38:36 INFO common.AbstractJob: Command line arguments:
{--endPhase=2147483647, --inMemory=false,
--input=/user/lanet/small_matrix_seq/chunk-0, --maxError=0.05,
--minEigenvalue=0.0, --numCols=5, --numRows=4,
--output=/user/lanet/SVDOutput, --rank=4, --startPhase=0, --tempDir=temp}
12/11/22 16:38:37 INFO lanczos.LanczosSolver: Finding 4 singular vectors of
matrix with 4 rows, via Lanczos
12/11/22 16:38:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
12/11/22 16:38:38 INFO mapred.JobClient: Running job: job_201211121510_0027
12/11/22 16:38:39 INFO mapred.JobClient:  map 0% reduce 0%
12/11/22 16:38:57 INFO mapred.JobClient:  map 100% reduce 0%
12/11/22 16:39:08 INFO mapred.JobClient:  map 100% reduce 100%
12/11/22 16:39:13 INFO mapred.JobClient: Job complete: job_201211121510_0027
12/11/22 16:39:13 INFO mapred.JobClient: Counters: 26
12/11/22 16:39:13 INFO mapred.JobClient:   Job Counters
12/11/22 16:39:13 INFO mapred.JobClient:     Launched reduce tasks=1
12/11/22 16:39:13 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=15207
12/11/22 16:39:13 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/11/22 16:39:13 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/11/22 16:39:13 INFO mapred.JobClient:     Rack-local map tasks=1
12/11/22 16:39:13 INFO mapred.JobClient:     Launched map tasks=1
12/11/22 16:39:13 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=11080
12/11/22 16:39:13 INFO mapred.JobClient:   File Input Format Counters
12/11/22 16:39:13 INFO mapred.JobClient:     Bytes Read=78
12/11/22 16:39:13 INFO mapred.JobClient:   File Output Format Counters
12/11/22 16:39:13 INFO mapred.JobClient:     Bytes Written=98
12/11/22 16:39:13 INFO mapred.JobClient:   FileSystemCounters
12/11/22 16:39:13 INFO mapred.JobClient:     FILE_BYTES_READ=6
12/11/22 16:39:13 INFO mapred.JobClient:     HDFS_BYTES_READ=334
12/11/22 16:39:13 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45393
12/11/22 16:39:13 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=98
12/11/22 16:39:13 INFO mapred.JobClient:   Map-Reduce Framework
12/11/22 16:39:13 INFO mapred.JobClient:     Map output materialized bytes=6
12/11/22 16:39:13 INFO mapred.JobClient:     Map input records=0
12/11/22 16:39:13 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/11/22 16:39:13 INFO mapred.JobClient:     Spilled Records=0
12/11/22 16:39:13 INFO mapred.JobClient:     Map output bytes=0
12/11/22 16:39:13 INFO mapred.JobClient:     Map input bytes=0
12/11/22 16:39:13 INFO mapred.JobClient:     Combine input records=0
12/11/22 16:39:13 INFO mapred.JobClient:     SPLIT_RAW_BYTES=108
12/11/22 16:39:13 INFO mapred.JobClient:     Reduce input records=0
12/11/22 16:39:13 INFO mapred.JobClient:     Reduce input groups=0
12/11/22 16:39:13 INFO mapred.JobClient:     Combine output records=0
12/11/22 16:39:13 INFO mapred.JobClient:     Reduce output records=0
12/11/22 16:39:13 INFO mapred.JobClient:     Map output records=0
Exception in thread "main" java.util.NoSuchElementException
at
com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152)
at
org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190)
at
org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238)
at
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:200)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:123)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:283)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.main(DistributedLanczosSolver.java:289)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


What does the exception imply?


Thanks,
Chiu

On Thu, Nov 22, 2012 at 8:13 AM, Chui-Hui Chiu <[email protected]>wrote:

> Hello, all,
>
> I read the introduction page of the SVD in Mahout.  The SVD application
> requires that the input matrix in the Sequence File Format.  Now, I have a
> matrix with real number elements in the ASCII format.  Rows are separated
> by the new line characters and columns are separated by the space
> character.  The matrix looks like
>
> 1.1 1.2 1.3
> 2.1 2.2 2.3
> 3.1 3.2 3.3
>
> How is the matrix stored in Sequence Files?  How do I convert the matrix
> into the appropriate format for the SVD application?
>
>
> Thanks,
> Chiu
>

Re: Input format to the Singular Value Decomposition in Mahout

Reply via email to