Hey Avishay, Attached files are stripped from apache mailing list postings, so I didn't see your CSVtoSeq.java, but given the error, I'll bet you a million to one the cause of the error is that in constructing your SequentialAccessSparseVector instances from CSV format, you use the constructor which does not specify what the cardinality of the vector is going to be. This causes the default cardinality (Integer.MAX_VALUE = 2^31 - 1) to be used.
Make sure that you know what cardinality your vectors should be at construction time, and use the constructor which sets this value properly (or alternately, copy the values from one vector with default cardinality to a new vector with the correct cardinality once you know it. This latter idea can be very helpful if you want to build up your vector as a RandomAccessSparseVector [these have fast mutation rates, as they are map-based], and then "seal" them into immutable SequentialAccessSparseVector instances at the end. The problem with this latter part is that we don't currently have a "copy" constructor which takes both a specified cardinality and another vector. It's about a 2 line patch to add this, and it's a good idea to do so, for exactly this kind of case...). Let me know if you find this was or was not the problem. -jake On Mon, Jun 28, 2010 at 10:54 AM, Avishay Livne1 <[email protected]>wrote: > > > Hi, > > I'm trying to use Mahout's SVD with no success so far. > I converted my input from CSV format using the attached class. > Then I run the following command > hadoop jar mahout-examples-0.3.job > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver > -i /hdfs/data/svd/user_doc_score -o /hdfs/data/svd/svd-output -r 10 -nr > 6040 -nc 3282 -sym 0 > > and get this error: > org.apache.mahout.math.CardinalityException: My cardinality is: 2147483647, > but the other is: 3282 > at org.apache.mahout.math.RandomAccessSparseVector.dot > (RandomAccessSparseVector.java:275) > at org.apache.mahout.math.hadoop.TimesSquaredJob > $TimesSquaredMapper.scale(TimesSquaredJob.java:200) > at org.apache.mahout.math.hadoop.TimesSquaredJob > $TimesSquaredMapper.map(TimesSquaredJob.java:191) > at org.apache.mahout.math.hadoop.TimesSquaredJob > $TimesSquaredMapper.map(TimesSquaredJob.java:147) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > Any ideas/suggestions? > > Thanks, > Avishay > > (See attached file: CSVtoSeq.java)
