On Wed, May 2, 2012 at 11:13 AM, Ted Dunning <[email protected]> wrote:

> On Wed, May 2, 2012 at 11:06 AM, Timothy Potter <[email protected]
> >wrote:
>
> > We're really keen on Ted's pig-vector project
> > (https://github.com/tdunning/pig-vector) as we're building a number of
> > classifiers on Mahout's SGD framework, with the bulk of our data being
> > in Cassandra processed almost entirely with Pig. We'd love to hear
> > about any planned features for the pig-vector project we can help out
> > on. Any similar Pig-Mahout projects we should know about?
> >
>
> The huge problem with pig-vector is that dependency on elephant bird makes
> it really almost impossible to build.  Elephant bird has obscure
> dependencies on things like yaml-beans.  That is a problem because the
> yaml-beans maintainer has a forceful way of expressing his distaste for all
> things to do with Maven and thus refuses to publish any artifacts in
> standard ways.  Actually, the maintainer has a rather forceful manner that
> he applies to all interactions as far as I can tell.
>
> On the other hand, the necessary capabilities that pig-vector needs from
> Elephant bird are quite minor and could probably be reasonably extract.  I
> am under-water, however, and thus cannot finish that right away.  I can and
> will assist anybody who has the necessary time and enthusiasm.  This might
> make a very nice pig day effort.
>
>
> > In general, we're reaching out today to see who else in the community
> > is interested in better Pig / Mahout integration and what types of
> > challenges they're facing? Any cool UDFs you'd like to share?
> >
>
> Praneet at UCI ([email protected]) has been doing some interesting
> work here to do with feature sharding in pig.  Perhaps he can speak up.
>

Hello Timothy,

I have tried writing sharded versions of classifiers and they seem to work
well. But my code requires a pre-processing step before the classification
and re-aggregation of results (which was easy when I worked with Weka).
However, to be able to do the same in Mahout, I need something like
pig-vector to take of the pre-processing part.

So yes, I am very interested in Pig / Mahout integration! But admittedly I
only have introductory knowledge of Pig. And as far the integration part
goes, my contribution so far has been limited to testing the stuff Ted has
written.

But the idea of Pig-Mahout hackday sounds great! And I would definitely
like to be involved in it.



-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine

Reply via email to