Thanks Ted for the answer. "Should be sparse, but I can't say for sure."
Could anybody confirm? in the quickstart-kmeans.sh script there's a step to convert the data to SequenceFile format (seqdirectory) and then a second step to convert the SequenceFiles to sparse vector format ( seq2sparse). That's why I'm asking. On Sat, Nov 20, 2010 at 3:45 PM, Ted Dunning <[email protected]> wrote: > On Sat, Nov 20, 2010 at 8:47 AM, Mike Perry <[email protected] > >wrote: > > > Hello all, > > > > Does the script to convert a Lucene index to Mahout vectors write > sequence > > files in sparse vector representation? my impression is that it doesn't > but > > I want to verify that. > > > > Should be sparse, but I can't say for sure. > > > > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to > > sparse format (I know about the seq2sparse option). Could someone point > out > > where in the code it actually constructs the sparse vectors? it seems to > > me > > that one of the methods in DictionaryVectorizer generates the vectors but > I > > couldn't > > find where exactly. > > > > Look for VectorWritable. >
