Vector dump doesn't seem to dump a key:text, value:vectorwritable
$ mahout dumpTxtVec -s /user/khuang/trial_01252012/vec_named/tf-vectors/part-r-00000 Input Path: /user/trial_01252012/vec_named/tf-vectors/part-r-00000 Key class: first book nature specialword boxes Value class: org.apache.mahout.math.NamedVector@2 Key class: fourth fake example with fake Value class: org.apache.mahout.math.NamedVector@40000002 Key class: second book fun Value class: org.apache.mahout.math.NamedVector@2 12/01/25 19:18:33 INFO driver.MahoutDriver: Program took 351 ms On 1/25/12 7:10 PM, "Suneel Marthi" <[email protected]> wrote: > > > > >________________________________ > From: Katherine Huang <[email protected]> >To: "[email protected]" <[email protected]> >Sent: Wednesday, January 25, 2012 9:52 PM >Subject: seq2sparse generated dictionary is missing words > >I am doing a trial run starting with a sequence file that contains: (this >is from seqdumper and I just made my key the same as my value): > >Key class: class org.apache.hadoop.io.Text Value Class: class >org.apache.hadoop.io.Text >Key: first book nature specialword boxes: Value: first book nature >specialword boxes >Key: fourth fake example with fake: Value: fourth fake example with fake >Key: second book fun: Value: second book fun >Key: third unique document item: Value: third unique document item >Key: fifth bag of words: Value: fifth bag of words >Count: 5 > > >When I run >mahout seq2sparse -i /user/trial_01252012/processed_doc_trial/ -o >/khuang/trial_01252012/keyword_Vectors_461_named -ow -md 1 -a >org.apache.lucene.analysis.WhitespaceAnalyzer -wt tf -seq nv > >And I look dump tokenized vectors: >mahout seqdumper -s /user/trial_01252012/vec_named/tf-vectors/part-r-00000 > >Did you mean to call vectordump to dump your vectors? > >I only have three of my 'orig' documents: > >Input Path: /user/khuang/trial_01252012/vec_named/tf-vectors/part-r-00000 >Key class: class org.apache.hadoop.io.Text Value Class: class >org.apache.mahout.math.VectorWritable >Key: first book nature specialword boxes: Value: >org.apache.mahout.math.VectorWritable@e5d391d >Key: fourth fake example with fake: Value: >org.apache.mahout.math.VectorWritable@e5d391d >Key: second book fun: Value: org.apache.mahout.math.VectorWritable@e5d391d >Count: 3 > > >In addition, the dictionary is missing words. Is there a reason for this?
