Just to clarify the UDF would convert the data into a dense or sparse vector format?
On Mon, Jun 24, 2013 at 12:55 PM, Ted Dunning <[email protected]> wrote: > Better would be to build a Hive UDF that vectorizes your data directly from > the Hive table and produces a sequence file with vectors ready to cluster. > Then use the streaming k-means stuff. > > > > On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani <[email protected]> > wrote: > > > What data base interfaces are there for Mahout? The website mentions > > MongoDB and Cassandra. I get the feeling these are for recommender > systems > > only. Are there any database that Mahout can interface directly in order > > to perform clustering? > > > > I am thinking of an example where I have a large table in Hive of > customer > > data and I want to do customer segmentation. Normally I make a CSV file > of > > this data and then manually import it into some Java code. Is there a > > better way of doing that? > > >
