Excellent. Thanks!

On Sun, Nov 21, 2010 at 2:22 PM, Drew Farris <[email protected]> wrote:

> Per o.a.m.utils.vectors.lucene.TFDFMapper, which is called from
> o.a.m.utils.vectors.lucene.Driver, the vectors created are instances
> of RandomAccessSparseVector
>
> On Sun, Nov 21, 2010 at 9:28 AM, Mike Perry <[email protected]>
> wrote:
> > Thanks Ted for the answer.
> >
> > "Should be sparse, but I can't say for sure."
> >
> > Could anybody confirm? in the quickstart-kmeans.sh script there's a step
> to
> > convert the data to SequenceFile format (seqdirectory) and then
> > a second step to convert the SequenceFiles to sparse vector format (
> > seq2sparse). That's why I'm asking.
> >
> >
> > On Sat, Nov 20, 2010 at 3:45 PM, Ted Dunning <[email protected]>
> wrote:
> >
> >> On Sat, Nov 20, 2010 at 8:47 AM, Mike Perry <[email protected]
> >> >wrote:
> >>
> >> > Hello all,
> >> >
> >> > Does the script to convert a Lucene index to Mahout vectors write
> >> sequence
> >> > files in sparse vector representation? my impression is that it
> doesn't
> >> but
> >> > I want to verify that.
> >> >
> >>
> >> Should be sparse, but I can't say for sure.
> >>
> >>
> >> > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to
> >> > sparse format (I know about the seq2sparse option). Could someone
> point
> >> out
> >> > where in the code it actually constructs the sparse vectors?  it seems
> to
> >> > me
> >> > that one of the methods in DictionaryVectorizer generates the vectors
> but
> >> I
> >> > couldn't
> >> > find where exactly.
> >> >
> >>
> >> Look for VectorWritable.
> >>
> >
>

Reply via email to