That output was from the cvb-topic-doc
Just ran it over the cvb-output and got
{how:0.04017873894220033,you:0.02145053718356672,your:0.015258172557645477,open:0.013308164155725721,use:0.012769472622541729,search:0.011104235407141033,web:0.011054257715475113,up:0.008681792828858947,do:0.007301515628267762,install:0.00724558241579199}
{ruby:0.013622365167924151,search:0.009151768347011088,ucd:0.009024998353413487,street:0.008153757865114717,information:0.008010912906951214,symfony2:0.007610946012031929,college:0.007488257104426453,get:0.006831742925644331,form:0.006798612982669133,us:0.006420604369821055}
There was only the 2 rows as above, so going by your last mail it only found 2
topics?
If so how come it only output 2 topics when I gave it -k 20 as a parameter
On 19 Apr 2013, at 13:55, David LaBarbera wrote:
> Did you run vectordump with the lda output directory (cvb-output in your
> case) or document topic output (cvb-topic-doc)?
> Depending on which you're looking at, you'll have
>
> lda output:
> each row corresponds to a topic and the elements are (term
> index:probability). The terms correspond to what's in the dictionary
> (contentDataDir/sparseVectors/dictionary.file-0). You can add the dictionary
> to the command line, so the output will be (term:probability). The flag
> should be
> --dictionary ./contentDataDir/sparseVectors/dictionary.file-0
> --dictionaryType sequencefile
>
> dt output:
> each row is a document and the elements are (topic:probability).
>
>
> David
>
> On Apr 19, 2013, at 8:30 AM, Chris Harrington <[email protected]> wrote:
>
>> Just ran vectordump over the output from cub but I have no idea what I'm
>> looking at
>>
>> {1.0:0.0689751034234147,0hu:0.052798138507741114,06:0.046108327846619585,091:0.04079964524901706,1:0.03488226667358313,10g:0.03471651100042406,07:0.03051583712303273,10.30am:0.029957963431693112,1171:0.028424194208528646,10.4.10:0.028173810240271588}
>>
>> Can someone give me an explanation of the above
>>
>> In the Mahout in Action book there was a table which displayed topic with
>> top terms, how would I go from the above to something like that. i.e.
>> topic 0 -> term1, term2 term3….termN
>> topic 1 -> term1, term2 term3….termN
>> etc.
>>
>>
>> On 19 Apr 2013, at 10:19, Chris Harrington wrote:
>>
>>> Found the issue it was the folder I gave it for outputting the matrix in
>>> the rowed command, for cvb I gave it the ./contentDataDir/matrix as the
>>> matrix location instead I should have supplied
>>> ./contentDataDir/martrix/matrix
>>>
>>> On 17 Apr 2013, at 12:46, Chris Harrington wrote:
>>>
>>>> So I've got 0.8 now but I'm running into an error,
>>>>
>>>> ../../workspace2/trunk/bin/mahout seqdirectory -i
>>>> ./contentDataDir/output-content-segment -o ./contentDataDir/sequenced
>>>>
>>>> ../../workspace2/trunk/bin/mahout seq2sparse -i ./contentDataDir/sequenced
>>>> -o ./contentDataDir/sparseVectors --namedVector -wt tf
>>>>
>>>> ../../workspace2/trunk/bin/mahout rowid -i
>>>> ./contentDataDir/sparseVectors/tf-vectors/ -o ./contentDataDir/matrix
>>>>
>>>> ../../workspace2/trunk/bin/mahout cvb -i ./contentDataDir/matrix -o
>>>> cvb-output -k 100 -x 1 -dict
>>>> ./contentDataDir/sparseVectors/dictionary.file-0 -dt cvb-topic-doc -mt
>>>> cvb-topic-model
>>>>
>>>> but the cvb command hits a class cast exception
>>>>
>>>> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
>>>> org.apache.mahout.math.VectorWritable
>>>> at
>>>> org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.map(CachingCVB0Mapper.java:55)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>>
>>>> I thought the seq2sparse took care of turning hadoop Text into mahouts
>>>> VectorWritable. Where have I gone wrong?
>>>>
>>>>
>>>>
>>>> On 16 Apr 2013, at 14:45, Jake Mannix wrote:
>>>>
>>>>> You should just be building off of trunk (0.8-snapshot) in which case you
>>>>> should be working just fine.
>>>>>
>>>>>
>>>>> On Tue, Apr 16, 2013 at 6:43 AM, Chris Harrington
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I've been trying to get the vector dumper to work on the output from cub
>>>>>> but it's throwing lots of errors.
>>>>>>
>>>>>> I found several old mails on the mailing list regrading this issue
>>>>>> specifically this
>>>>>>
>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3CCAHSfFsy2oWRuzwVzGW57LRYaJ+LuudNu-W5EO0wnV_ff=uy...@mail.gmail.com%3E
>>>>>>
>>>>>> That thread is a bit old so I was wondering was there a patch or anything
>>>>>> to fix it or do I need to use the 0.8-snapshot?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> -jake
>>>>
>>>
>>
>