I would like to endorse this point. If your sparse data fits in memory on a single machine, it is very unlikely that you will be able to improve on the cost of doing a stochastic projection on that one machine using any Hadoop based solution.
Even with MPI and crazy RDMA networking, I doubt that you would beat it by much, if any. On Wed, Aug 1, 2012 at 12:36 PM, Dmitriy Lyubimov <[email protected]> wrote: > also as Lance mentioned, usually "coefficient of performance" per core > for distributed methods is lower than that of an iterative method. It > is hard (if even possible) to achieve 100% scalability here. Simply > put, if you have 5 computers to solve same problem, it will not be > solved 5 times faster than a comparable method on a single computer. > > On Wed, Aug 1, 2012 at 11:29 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > I only know comparisons of parallel algorithms only. There's > > performance and accuracy comparison between Mahout's SSVD and Lanczos > > done in dissertation of N. Halko (see link at SSVD page on Mahout > > wiki). There's also a "Heigen" SVD paper that discusses distributed > > modified Lanczos method of a proprietary Hadoop-based implemetnation > > at Yahoo. Even though it doesn't draw side-by-side comparisons, it > > does present benchmark figures for the Heigen implementation so one > > can approximately draw comparisons between Heigen and Mahout methods. > > > > w.r.t to parallel vs. non-parallel, IMO the bottom line is > > practicality, not necessarily speed. There are some SVD problems that > > one might argue that single computer solution is not practical and > > which a distributed algorithm may actually shift into realm of > > practical solutions. (in a sense that you don't need days to solve > > it). But IMO direct comparison still doesn't make a lot of sense. > > > > On Sat, Jul 28, 2012 at 9:27 AM, mohsen jadidi <[email protected]> > wrote: > >> Thank you for your replies. What I am interested to know is that if I > want > >> to compute the SVD for huge matrix , how much faster my computation get > by > >> using Mahout. > >> > >> On Fri, Jul 27, 2012 at 8:12 PM, Dmitriy Lyubimov <[email protected]> > wrote: > >> > >>> IMO it doesn't make much sense to compare non-parallel and a parallel > >>> algorithm (assuming they are running approximately same flops-sized > >>> computation). Which is probably why there's not so many (i don't know > >>> any). > >>> > >>> However, there are studies comparing parallel approaches (e.g. certain > >>> mahout vs. giraph methods) given same amount of flops capacity in a > >>> cluster, but i think you need to be more specific because there are > >>> too many areas of interest you are talking about. > >>> > >>> On Fri, Jul 27, 2012 at 8:57 AM, mohsen jadidi < > [email protected]> > >>> wrote: > >>> > Hey all, > >>> > > >>> > I am looking for some case studies which has evaluated some of > Mahout > >>> > algorithm implementation like different decomposition or different > >>> > classifier. I just want to know how much faster is the Mahout in > compare > >>> of > >>> > regular non. paralleled algorithms.I couldnt find anything useful. > >>> > > >>> > Thanks in advance, > >>> > > >>> > -- > >>> > Mohsen Jadidi > >>> > >> > >> > >> > >> -- > >> Mohsen Jadidi >
