Oh. my commit for 922 branch did not go thru in its entirety for some reason. need to fix this.
On Sat, Dec 17, 2011 at 3:25 PM, Dmitriy Lyubimov <[email protected]> wrote: > yeah ok i requested slightly less k+p so not 4 mins for AB' but it > still should be slightly under Bt running time (as in 8 mins perhaps). > > On Sat, Dec 17, 2011 at 3:13 PM, Dmitriy Lyubimov <[email protected]> wrote: >> In my tests, ABt-job took under 3 minutes per mapper and practically >> no time for reducing. so it should be running at about 4 minutes on a >> cluster with sufficient capacity (in your case, something like 10=11 >> nodes, it seemed). Ok i'll rerun in our QA on Monday again to see >> what's happening. >> >> On Sat, Dec 17, 2011 at 3:04 PM, Dmitriy Lyubimov <[email protected]> wrote: >>> ABt-job 37mins, 22sec >>> this guy should run under Bt-job (under 9 minutes in your case) i >>> think . In my tests it was. is this with 922 patch? >>> >>> And it should be mentioned that the cluster size couldn't accomodate >>> all the generated tasks, is this correct assessment? >>> >>> >>> On Sat, Dec 17, 2011 at 2:58 PM, Sebastian Schelter <[email protected]> wrote: >>>> On 17.12.2011 17:27, Dmitriy Lyubimov wrote: >>>>> Interesting. >>>>> >>>>> Well so how did your decomposing go? >>>> >>>> I tested the decomposition of the wikipedia pagelink graph (130M edges, >>>> 5.6M vertices making approx. quarter of a billion non-zeros in the >>>> symmetric adjacency matrix) on a 6 machine hadoop cluster. >>>> >>>> Got these running times for k = 10, p = 5 and one power-iteration: >>>> >>>> Q-job 1mins, 41sec >>>> Bt-job 9mins, 30sec >>>> ABt-job 37mins, 22sec >>>> Bt-job 9mins, 41sec >>>> U-job 30sec >>>> >>>> I think I'd need a couple more machines to handle the twitter graph >>>> though... >>>> >>>> --sebastian >>>> >>>> >>>>> On Dec 17, 2011 6:00 AM, "Sebastian Schelter" <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi there, >>>>>> >>>>>> I played with Mahout to decompose the adjacency matrices of large graphs >>>>>> lately. I stumbled on a paper of Christos Faloutsos that describes a >>>>>> variation of the Lanczos algorithm they use for this on top of Hadoop. >>>>>> They even explicitly mention Mahout: >>>>>> >>>>>> "Very recently(March 2010), the Mahout project [2] provides >>>>>> SVD on top of HADOOP. Due to insufficient documentation, we were not >>>>>> able to find the input format and run a head-to-head comparison. But, >>>>>> reading the source code, we discovered that Mahout suffers from two >>>>>> major issues: (a) it assumes that the vector (b, with n=O(billion) >>>>>> entries) fits in the memory of a single machine, and (b) it implements >>>>>> the full re-orthogonalization which is inefficient." >>>>>> >>>>>> http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf >>>>>> >>>>>> --sebastian >>>>>> >>>>> >>>>
