Better would be to build a Hive UDF that vectorizes your data directly from
the Hive table and produces a sequence file with vectors ready to cluster.
 Then use the streaming k-means stuff.



On Mon, Jun 24, 2013 at 4:43 PM, Chirag Lakhani <[email protected]> wrote:

> What data base interfaces are there for Mahout?  The website mentions
> MongoDB and Cassandra.  I get the feeling these are for recommender systems
> only.  Are there any database that Mahout can interface directly in order
> to perform clustering?
>
> I am thinking of an example where I have a large table in Hive of customer
> data and I want to do customer segmentation.  Normally I make a CSV file of
> this data and then manually import it into some Java code.  Is there a
> better way of doing that?
>

Reply via email to