Here is how Twitter does it with Pig: http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
We use a similar approach and I think that Pig, being somewhat lower-level with better support of nested objects, is a better tool than Hive. It should be possible to do something similar with Hive but we haven't tried. The trick is to implement the learner as a serializer. Then, the number of reducers will determine how many parallel learners (bags) you can run. igor decide.com On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher <qiaoresearc...@gmail.com>wrote: > > How to run machine learning algorithms (whatever ML algorithms) directly > in Hive? assume the input and output already stored as Hive tables. > > ps: I know mahout is available there, but would prefer run machine > learning algorithms directly in Hive > > many thanks, > > >