Re: Precision used by mahout

Sean Owen Wed, 20 Feb 2013 03:15:33 -0800

This is entirely in-core MapReduce. That's valid, but one of the major
points of MapReduce as we know it is distributing the computation over many
machines (i.e. Hadoop). Eventually you outgrow just 1 computer. That
said... we continue to see bigger and bigger machines available. I can rent
a machine with 224GB of RAM on EC2 now...


In practical terms -- this is "MapReduce" but implemented on a completely
different framework. It would have nothing to do with Mahout. You might be
able to reimplement it though. If you were going to implement it again
anyway -- MapReduce is probably not the best choice. It is worth its price
in complexity if it means you can leverage Hadoop and the like to deal with
machine failure. You don't have that problem here. You also don't
necessarily have to structure computation such that workers have no means
of communicating, because it's in-core. M/R implementations have to
compromise to meet these constraints, but if they're not actually
constraints (inside a GPU) it's just more complex and sub-optimal.



On Wed, Feb 20, 2013 at 11:05 AM, shruti ranade <[email protected]>wrote:

> That was of great help. Thanks for the input. There is something
> called*MARS
> *for accelerating MapReduce using GPUs. I am quite not sure about this. But
> here's the link <http://www.cse.ust.hk/gpuqp/Mars_tr.pdf>. This paper
> might
> give you a better idea of what we are trying to achieve. And we are
> thinking of using JCUDA for parallelizing things on Nividia Tesla. We are
> still at Ground zero. But if we can accelerate the performance of mahout's
> k-means on GPU, i guess it will be a huge thing.
>
> P.S : Perhaps the paper would transform the artificial marriage to
> something better
>
>
> On Wed, Feb 20, 2013 at 4:08 PM, Sean Owen <[email protected]> wrote:
>
> > I think this is quite possible too. I just think there's little point in
> > matching this up with Hadoop. They represent entirely different
> > architectures for large-scale computation. I mean, you can probably write
> > an M/R job that uses GPUs on workers, but I imagine it would be an
> > artificial marriage of technologies. Probably Hadoop being used simply to
> > distribute data.
> >
> > If you want to use a GPU, and want to use it properly, most of your work
> is
> > to create an effective in-core parallel implementation, not distributed
> > across computers and distributed file systems. You use JNI or CUDA
> bindings
> > in Java to push computations into hardware from Java.
> >
> > This is an exercise in a) modifying a matrix/vector library to use native
> > hardware, then b) writing algorithms that use that library. I think your
> > best starting point in Java may be something more general like Commons
> > Math.
> >
> >
> >
> >
> > On Wed, Feb 20, 2013 at 10:22 AM, 万代豊 <[email protected]> wrote:
> >
> > > This is the agenda that I'm interested in too.
> > > I believe Item-Based Recomemndation in Mahout (Not only about Mahout
> > > though) should spend sometime
> > > doing multiplication of cooccurrence matrix and user preference vector.
> > > If we could pass this multiplication task off loaded to GGPU, then that
> > > will be a great acceleration.
> > > What I'm not really clear is how double precision multiplication task
> > > inside Java Virtual Machine can take advantage of the HW accelerator.(I
> > > mean how can you make GGPU visible to Mahout through JVM?)
> > >
> > > If we could get over this in addition to what Ted Dunning presented the
> > > other day on Solr involment in building/loading cooccurrence matrix for
> > > Mahout recommendation, it should be a big leap in innovating Mahout
> > > recommendation.
> > >
> > > Am I missing sothing or just dreamig?
> > > Regards,,,
> > > Y.Mandai
> > >
> > > 2013/2/20 Sean Owen <[email protected]>
> > >
> > > > I think all of the code uses double-precision floats. I imagine much
> of
> > > it
> > > > could work as well with single-precision floats.
> > > >
> > > > MapReduce and a GPU are very different things though, and I'm not
> sure
> > > how
> > > > you would use both together effectively.
> > > >
> > > >
> > > > On Wed, Feb 20, 2013 at 7:10 AM, shruti ranade <
> > [email protected]
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am a beginner in mahout. I am working on k-means MR
> implementation
> > > and
> > > > > trying to run it on a GPGPU.* I wanted to know if mahout
> computations
> > > are
> > > > > all double precision or single precision. *
> > > > >
> > > > > Suggest me any documentation that I need to refer to.
> > > > >
> > > > > Thanks,
> > > > > Shruti
> > > > >
> > > >
> > >
> >
>

Re: Precision used by mahout

Reply via email to