Excellent. Thanks! On Sun, Nov 21, 2010 at 2:22 PM, Drew Farris <[email protected]> wrote:
> Per o.a.m.utils.vectors.lucene.TFDFMapper, which is called from > o.a.m.utils.vectors.lucene.Driver, the vectors created are instances > of RandomAccessSparseVector > > On Sun, Nov 21, 2010 at 9:28 AM, Mike Perry <[email protected]> > wrote: > > Thanks Ted for the answer. > > > > "Should be sparse, but I can't say for sure." > > > > Could anybody confirm? in the quickstart-kmeans.sh script there's a step > to > > convert the data to SequenceFile format (seqdirectory) and then > > a second step to convert the SequenceFiles to sparse vector format ( > > seq2sparse). That's why I'm asking. > > > > > > On Sat, Nov 20, 2010 at 3:45 PM, Ted Dunning <[email protected]> > wrote: > > > >> On Sat, Nov 20, 2010 at 8:47 AM, Mike Perry <[email protected] > >> >wrote: > >> > >> > Hello all, > >> > > >> > Does the script to convert a Lucene index to Mahout vectors write > >> sequence > >> > files in sparse vector representation? my impression is that it > doesn't > >> but > >> > I want to verify that. > >> > > >> > >> Should be sparse, but I can't say for sure. > >> > >> > >> > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to > >> > sparse format (I know about the seq2sparse option). Could someone > point > >> out > >> > where in the code it actually constructs the sparse vectors? it seems > to > >> > me > >> > that one of the methods in DictionaryVectorizer generates the vectors > but > >> I > >> > couldn't > >> > find where exactly. > >> > > >> > >> Look for VectorWritable. > >> > > >
