Found the issue it was the folder I gave it for outputting the matrix in the rowed command, for cvb I gave it the ./contentDataDir/matrix as the matrix location instead I should have supplied ./contentDataDir/martrix/matrix
On 17 Apr 2013, at 12:46, Chris Harrington wrote: > So I've got 0.8 now but I'm running into an error, > > ../../workspace2/trunk/bin/mahout seqdirectory -i > ./contentDataDir/output-content-segment -o ./contentDataDir/sequenced > > ../../workspace2/trunk/bin/mahout seq2sparse -i ./contentDataDir/sequenced -o > ./contentDataDir/sparseVectors --namedVector -wt tf > > ../../workspace2/trunk/bin/mahout rowid -i > ./contentDataDir/sparseVectors/tf-vectors/ -o ./contentDataDir/matrix > > ../../workspace2/trunk/bin/mahout cvb -i ./contentDataDir/matrix -o > cvb-output -k 100 -x 1 -dict ./contentDataDir/sparseVectors/dictionary.file-0 > -dt cvb-topic-doc -mt cvb-topic-model > > but the cvb command hits a class cast exception > > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.mahout.math.VectorWritable > at > org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.map(CachingCVB0Mapper.java:55) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > I thought the seq2sparse took care of turning hadoop Text into mahouts > VectorWritable. Where have I gone wrong? > > > > On 16 Apr 2013, at 14:45, Jake Mannix wrote: > >> You should just be building off of trunk (0.8-snapshot) in which case you >> should be working just fine. >> >> >> On Tue, Apr 16, 2013 at 6:43 AM, Chris Harrington <[email protected]>wrote: >> >>> Hi all, >>> >>> I've been trying to get the vector dumper to work on the output from cub >>> but it's throwing lots of errors. >>> >>> I found several old mails on the mailing list regrading this issue >>> specifically this >>> >>> >>> http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3CCAHSfFsy2oWRuzwVzGW57LRYaJ+LuudNu-W5EO0wnV_ff=uy...@mail.gmail.com%3E >>> >>> That thread is a bit old so I was wondering was there a patch or anything >>> to fix it or do I need to use the 0.8-snapshot? >> >> >> >> >> -- >> >> -jake >
