Thanks Dan, it solved.

On Sun, Oct 28, 2012 at 10:40 PM, DAN HELM <[email protected]> wrote:
> Hi Diego,
> A number of us had the same issue when first working with the new CVB
> algorithm. The vector keys for CVB need to be Integers. You can use the
> rowid utility to convert the output from seq2sparse to the form needed by
> CVB, e.g.,
> http://comments.gmane.org/gmane.comp.apache.mahout.user/13112
> Dan
>
> From: Diego Ceccarelli <[email protected]>
> To: [email protected]
> Sent: Sunday, October 28, 2012 5:21 PM
> Subject: Using LDA in Mahout 0.0.7
>
> Dear all,
>
> I'm trying to use the LDA framework in Mahout and I'm experiencing
> some troubles.
> I saw these tutorials [1,2], and I decided to apply lda to a collection with
> 1M of tweets to see how it works. I indexed them with lucene as suggested
> in [2]. Then I discovered that in the last version this is not supported
> and I had to to use a sequence file.
> I saw the util 'seqdirectory' in [2] but it's a bit impractical to create
> one million documents,
> each one with a tweet. So I wrote a small java app that takes a file where
> each line
> is a document and creates a sequence file  <Text,Text>  containing the id
> (line number)
> and the tweet.
> Then  I used seq2sparse util:
>
> ./bin/mahout seq2sparse -i ../lda-hello-world/tweet-sequence-file -o
> /tmp/vector -wt tf -a org.apache.lucene.analysis.WhitespaceAnalyzer -ow
>
> and I created the vectors. (it succeeded without problems)
>
> Now, I discovered that lda now it's called cvb (why did you change the name?
> is
> a bit confusing.. ) so I tried to run the command, but I got this error
>
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> org.apache.hadoop.io.IntWritable
> (full stack trace here [3])
>
> I also tried the local version:
>
> ./bin/mahout cvb0_local -i /tmp/vector/tf-vectors  -d
> /tmp/vector/dictionary.file-0 --numTopics 100 --docOutputFile /tmp/out
> --topicOutputFile /tmp/topic
>
> (why the parameters' names are different???)
> But i got a similar error:
> Exception in thread "main" java.lang.ClassCastException: java.lang.Integer
> cannot be cast to java.lang.String
> (full stack trace here [4])
>
> Where i'm wrong?? could please help me?
> Thanks
> Diego
>
> [1] https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html
> [2] https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
> [3] http://pastebin.com/nV3T74fe
> [4] http://pastebin.com/JH1xQHuC
>
>



-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
_______________
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040
________________________________________

Reply via email to