Yes.

It would convert into a dense or a sparse format.  The clusterer can handle
either kind of vector.  Sparse is usually more appropriate.


On Tue, Jun 25, 2013 at 2:28 PM, Chirag Lakhani <[email protected]> wrote:

> Just to clarify the UDF would convert the data into a dense or sparse
> vector format?
>
>
> On Mon, Jun 24, 2013 at 12:55 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Better would be to build a Hive UDF that vectorizes your data directly
> from
> > the Hive table and produces a sequence file with vectors ready to
> cluster.
> >  Then use the streaming k-means stuff.
> >
> >
> >
> > On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani <[email protected]>
> > wrote:
> >
> > > What data base interfaces are there for Mahout?  The website mentions
> > > MongoDB and Cassandra.  I get the feeling these are for recommender
> > systems
> > > only.  Are there any database that Mahout can interface directly in
> order
> > > to perform clustering?
> > >
> > > I am thinking of an example where I have a large table in Hive of
> > customer
> > > data and I want to do customer segmentation.  Normally I make a CSV
> file
> > of
> > > this data and then manually import it into some Java code.  Is there a
> > > better way of doing that?
> > >
> >
>

Reply via email to