Vector dump doesn't seem to dump a key:text, value:vectorwritable

$ mahout dumpTxtVec -s
/user/khuang/trial_01252012/vec_named/tf-vectors/part-r-00000

Input Path: /user/trial_01252012/vec_named/tf-vectors/part-r-00000
Key class: first book nature specialword boxes  Value class:
org.apache.mahout.math.NamedVector@2
Key class: fourth fake example with fake  Value class:
org.apache.mahout.math.NamedVector@40000002
Key class: second book fun  Value class:
org.apache.mahout.math.NamedVector@2
12/01/25 19:18:33 INFO driver.MahoutDriver: Program took 351 ms




On 1/25/12 7:10 PM, "Suneel Marthi" <[email protected]> wrote:

>
>
>
>
>________________________________
> From: Katherine Huang <[email protected]>
>To: "[email protected]" <[email protected]>
>Sent: Wednesday, January 25, 2012 9:52 PM
>Subject: seq2sparse generated dictionary is missing words
> 
>I am doing a trial run starting with a sequence file that contains: (this
>is from seqdumper and I just made my key the same as my value):
>
>Key class: class org.apache.hadoop.io.Text Value Class: class
>org.apache.hadoop.io.Text
>Key: first book nature specialword boxes: Value: first book nature
>specialword boxes
>Key: fourth fake example with fake: Value: fourth fake example with fake
>Key: second book fun: Value: second book fun
>Key: third unique document item: Value: third unique document item
>Key: fifth bag of words: Value: fifth bag of words
>Count: 5
>
>
>When I run
>mahout seq2sparse -i /user/trial_01252012/processed_doc_trial/ -o
>/khuang/trial_01252012/keyword_Vectors_461_named -ow -md 1 -a
>org.apache.lucene.analysis.WhitespaceAnalyzer -wt tf -seq ­nv
>
>And I look dump tokenized vectors:
>mahout seqdumper -s /user/trial_01252012/vec_named/tf-vectors/part-r-00000
>
>Did you mean to call vectordump to dump your vectors?
>
>I only have three of my 'orig' documents:
>
>Input Path: /user/khuang/trial_01252012/vec_named/tf-vectors/part-r-00000
>Key class: class org.apache.hadoop.io.Text Value Class: class
>org.apache.mahout.math.VectorWritable
>Key: first book nature specialword boxes: Value:
>org.apache.mahout.math.VectorWritable@e5d391d
>Key: fourth fake example with fake: Value:
>org.apache.mahout.math.VectorWritable@e5d391d
>Key: second book fun: Value: org.apache.mahout.math.VectorWritable@e5d391d
>Count: 3
>
>
>In addition, the dictionary is missing words. Is there a reason for this?

Reply via email to