Better would be to build a Hive UDF that vectorizes your data directly from the Hive table and produces a sequence file with vectors ready to cluster. Then use the streaming k-means stuff.
On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani <[email protected]> wrote: > What data base interfaces are there for Mahout? The website mentions > MongoDB and Cassandra. I get the feeling these are for recommender systems > only. Are there any database that Mahout can interface directly in order > to perform clustering? > > I am thinking of an example where I have a large table in Hive of customer > data and I want to do customer segmentation. Normally I make a CSV file of > this data and then manually import it into some Java code. Is there a > better way of doing that? >
