Christopher - I had the same confusion with vectordump output on a hadoop cluster. The solution is that it's not trying to write a file to your hdfs: -o will go locally. So when I just named a file (it did not want to create a local directory), it wound up in the /bin I was working out of.
Best, Liz On Thu, Aug 8, 2013 at 9:47 PM, Suneel Marthi <[email protected]>wrote: > Seems like you are specifying a directory as input to vectordump. > It should be a 'file' something like > /opt/mahout/cvb-output-topic/part-xxxx in your case. > > Give that a try. > > > > > ________________________________ > From: Christopher Schindler <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Thursday, August 8, 2013 8:35 PM > Subject: RE: Using CVB; LdaTopics confusion > > > Thank you Suneel, I appreciate the pointer. I am using Mahout 0.8 but I > was following the wiki and not the examples/*. > I've gotten CVB to run successfully but now vectordump is giving me > trouble. The call: > bin/mahout vectordump -i /opt/mahout/cvb-output-topic -o > /opt/mahout/output -p true -c /opt/mahout/output/vectors.csv -dt > sequencefile > The error returned either:Exception in thread "main" > java.io.FileNotFoundException: /opt/mahout/output/ (No such file or > directory)[ variant is triggered if I specify -c] > OR > Exception in thread "main" java.io.FileNotFoundException: > /opt/mahout/output (Permission denied)[ no -c param specified] > Which is odd for several reasons. First, that's a HDFS directory and the > utilities have been writing and creating directories in that location just > fine through the prior steps. Second, the output directory does existing in > HDFS. I've tried various combinations (referencing a directory that > does/doesn't exist, appending an actual file to the path and others) with > no success. > Any insight? > Cheers! > Chris > > > Date: Wed, 7 Aug 2013 01:58:52 -0700 > > From: [email protected] > > Subject: Re: Using CVB; LdaTopics confusion > > To: [email protected] > > > > If u r using Mahout 0.8, suggest that you look at the CVB invocation in > examples/bin/cluster-reuters.sh as reference for the sequence of steps (and > other command line options for each step). > > > > ldatopics has been deprecated (in 0.8) and removed completely (in 0.9). > > > > Anyways, the input vectors directory in ur case would be - > '/opt/mahout/cvb-output/topic_dist.out', but I would desist from using it > as its been deprecated. > > > > > > > > > > > > ________________________________ > > From: Christopher Schindler <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Wednesday, August 7, 2013 2:34 AM > > Subject: Using CVB; LdaTopics confusion > > > > > > Hi all, > > A noob question I'm sure but I'm stuck. I'm using CVB to cluster a text > index of articles. > > Here's the CVB call: > > bin/mahout cvb \ -i /opt/mahout/lucene-sparse-vectors-cvb/matrix \ -dict > /opt/mahout/cvb-output/dict.file-* \ -o > /opt/mahout/cvb-output/topic_terms.out \ -dt > /opt/mahout/cvb-output/topic_dist.out \ -k 200 \-mt > /opt/mahout/output/iterations/ \-x 20 -a .25 -ow > > I'm trying to access the topics using ldatopics per > https://cwiki.apache.org/confluence/display/MAHOUT/Latent+Dirichlet+Allocation > . > > My latest combination was: bin/mahout ldatopics -i > opt/mahout/cvb-output/ -d /opt/mahout/cvb-output/dict.file-* > > However, it returns an error stating: ERROR driver.MahoutDriver: : Try > the new Collapsed Variation Bayes LDA, try bin/mahout cvb or bin/mahout > cvb0_local > > The spec is:bin/mahout ldatopics \ -i <input vectors directory> \ > -d <input dictionary file> \ > > What is the vectors directory supposed to be? Many thanks in advance. > > Cheers! > > Chris >
