On Fri, Dec 28, 2012 at 5:30 PM, Adam Baron <[email protected]> wrote:

> I'm trying to get familiar with the the parallel MapReduce Classification
> algorithms offered in Mahout .  ... Online Passive Aggressive and Hidden
> Markov Models might be
> ready to explore as well.


I don't think that either of these really got to full production quality in
Mahout. The HMM, in particular, may have slow convergence on large problems
which is just where you want the parallel program.

I thought that the Online Passive Aggressive code never made is very far,
either.


> Also, is there a parallel version of Logistic Regression officially in
> Mahout?


Nope.


> ... I ask because I
> came across this parallel Logistic Regression implementation which is
> apparently based off of Mahout, though not in Mahout:
> https://github.com/jpatanooga/KnittingBoar/wiki/Code-Development-Notes
>

Yes.  That is a personal project of Josh Patterson's.  He should comment on
it.

It appears to be based on parameter averaging [1], which is an OK approach,
but I think that you can do better.  I would generally recommend an
alternative with asynchronous parameter updates.  Jeff Dean describes a
nice implementation in [2].  Josh's work is based on an experimental
map-reduce+ implementation (where + indicates iterated reduce similar to
BSP).  The Google learner can be implemented using the standard hack of
long-lived mappers that simply re-read their inputs repeatedly in an
asynchronous way.

An alternative BSP implementation can be found in Giraph [3].  All BSP
implementations tend to use batch synchronous update.

Graphlab [4] uses asynchronous updates.  I don't know the details of what
they have available.


Also, are there any other parallel MapReduce Classification algorithms in
> Mahout which I failed to mention worth checking out?
>

I think you did a good survey.


[1] http://www.aclweb.org/anthology-new/N/N10/N10-1069.pdf
[2] http://techtalks.tv/talks/57639/
[3] http://incubator.apache.org/giraph/
[4] http://graphlab.org/

Reply via email to