On Thu, Jun 10, 2010 at 10:28 AM, Kris Jack <[email protected]> wrote: > > Thanks very much for the help. I looked into the problem a little deeper > and found that the org.apache.mahout.utils.vectors.lucene.Driver was > writing > out LongWriters instead of IntWriters so I just changed the code in there. > Should this code be using IntWriters or LongWriters? >
The reason why the Lucene Driver uses long is that Solr encodes uid's as long. Kinda backwards, that Mahout wants ints, and Solr wants longs, but that's the way it is. Maybe the lucene Driver could take a boolean flag on whether to encode the keys as long or int? Anyone have opinions on this? > After writing the to a sequence file and running your matrix transposition > and multiplication, I get an output called part-0000. If I read it using $ > mahout seqdumper --seqFile part-00000 then it outputs: > I would use "mahout vectordump" instead of "mahout seqdumper" and you'll get nicer output. -jake
