Just to clarify the UDF would convert the data into a dense or sparse
vector format?


On Mon, Jun 24, 2013 at 12:55 PM, Ted Dunning <[email protected]> wrote:

> Better would be to build a Hive UDF that vectorizes your data directly from
> the Hive table and produces a sequence file with vectors ready to cluster.
>  Then use the streaming k-means stuff.
>
>
>
> On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani <[email protected]>
> wrote:
>
> > What data base interfaces are there for Mahout?  The website mentions
> > MongoDB and Cassandra.  I get the feeling these are for recommender
> systems
> > only.  Are there any database that Mahout can interface directly in order
> > to perform clustering?
> >
> > I am thinking of an example where I have a large table in Hive of
> customer
> > data and I want to do customer segmentation.  Normally I make a CSV file
> of
> > this data and then manually import it into some Java code.  Is there a
> > better way of doing that?
> >
>

Reply via email to