Yes. It would convert into a dense or a sparse format. The clusterer can handle either kind of vector. Sparse is usually more appropriate.
On Tue, Jun 25, 2013 at 2:28 PM, Chirag Lakhani <[email protected]> wrote: > Just to clarify the UDF would convert the data into a dense or sparse > vector format? > > > On Mon, Jun 24, 2013 at 12:55 PM, Ted Dunning <[email protected]> > wrote: > > > Better would be to build a Hive UDF that vectorizes your data directly > from > > the Hive table and produces a sequence file with vectors ready to > cluster. > > Then use the streaming k-means stuff. > > > > > > > > On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani <[email protected]> > > wrote: > > > > > What data base interfaces are there for Mahout? The website mentions > > > MongoDB and Cassandra. I get the feeling these are for recommender > > systems > > > only. Are there any database that Mahout can interface directly in > order > > > to perform clustering? > > > > > > I am thinking of an example where I have a large table in Hive of > > customer > > > data and I want to do customer segmentation. Normally I make a CSV > file > > of > > > this data and then manually import it into some Java code. Is there a > > > better way of doing that? > > > > > >
