On Sunday, 2 September 2012 <x-apple-data-detectors://1>, Dmitriy Lyubimov wrote:
> I'll take a look although it may take me a while to find time. > > I have SSVD flow with power iterations in R and i collate results from > that and java version. I don't immediately have a code to convert form > csv/R to Mahout (only in reverse direction) Is this from Danny Bickson useful here? http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html Dan (pls excuse formatting; using webmail on a phone) which is a shame, i > should've made more progress on R and Mahout integration. Which is why > it will take me time. > > Indeed, SSVD is for large matrices and we did accuracy comparisons on > large data sets (e.g. wikipedia). LLNL doesn't work as gppd on small > matrices. > > However if i find that uniform 0-mean distribution is problematic for > small problems, i may want to do a patch to address that. > > Also the fact that your data in the tale has a flat spectrum adds to > error a lot. I.e. what you have is one principal direction and a lot > of random noise around it, and like i said before, random data is not > going to produce good results beyond that one principal direction. > Detecting a trend in what is mostly noise is not working well with > this method esp. in conjunction with so few samples. > > but there is still a concern in a sense that power iterations > should've helped more than they did. I'll take a closer look but it > will take me a while to figure if there's something we can improve > here. > > One thing is the way to read U matrix: strictly speaking, rows of U > matrix are not necessarily coming in the same order as rows of A in > the final output. But they are keyed by the same keys (so it is > possible that what you thing is U[1,] is actually something else). But > it will take me some time to verify that. > > > On Sat, Sep 1, 2012 at 9:26 PM, Ahmed Elgohary <[email protected]> wrote: > > - I am using k = 30 and p = 2 so (k+p)<99 (Rank(A)) > > - I am attaching the csv file of the matrix A > > - yes, the difference is significant. Here is the output of the > sequential > > SSVD: > > u(1,1:10): > > -0.0987 0.1334 0.1676 -0.0251 -0.2201 -0.0629 -0.0601 > > -0.0575 0.0079 0.0519 > > > > and the output of matlab's svd: > > u(1,1:10): > > -0.0987 -0.1320 0.1662 0.0492 -0.1828 0.1156 -0.0678 > > 0.0504 -0.0160 0.0350 > > > > and the output of mahout's SSVD: > > u(1,1:10): > > 0.0962 0.1924 -0.2125 0.1668 -0.0188 0.0867 -0.0908 > > 0.0264 -0.0443 0.0207 > > > > > > - The code I am using is like this: > > //convert A.csv to a sequencefile /data/A > > SSVDSolver ssvdSolver = new SSVDSolver(new Configuration(), new Path[] { > new > > Path("/data/A") }, new Path("/ssvd/output"), 1000, 30, 2, 1); > > ssvdSolver.setQ(1); > > ssvdSolver.run(); > > //convert ssvdSolver.getUPath() to a csv file > > > > I am not sure what you mean: > > > > "Did you account for the fact that your matrix is small enough that it > > probably wasn't divided correctly?" > > > > --ahmed > > > > > > On Sat, Sep 1, 2012 at 10:52 AM, Dmitriy Lyubimov <[email protected]> > wrote: > >> > >> No its zero mean uniform of course. A murmur scaled to -1...1 range. > >> > >> I used to use normal too but you advised there were not much difference > >> and > >> i actually did not see much either. > >> > >> I also think that in this case me moving the input to R via decimals > >> actually created precision errors too. I will double check. And my > >> synthetic test input has a flat tale in the lower singular numbers which > >> of > >> course messes up some singular vectors in the tale but doesnt affect > >> singular values. I will check for these things and look again. But i > dont > >> see a fundamental problems with the resuls i see, they are the same down > >> to > >> eighth digit after the dot, so there is no fundamental problem here. > >> On Sep 1, 2012 1:03 AM, "Ted Dunning" <[email protected]> wrote: > >> > >> > Oho... > >> > > >> > If the uniform randoms have non-zero means, then this could be a > >> > significant effect that leads to some loss of significance in the > >> > results. > >> > For small matrices the resulting difference shouldn't be huge but it > >> > might > >> > well be observable. > >> > > >> > On Sat, Sep 1, 2012 at 3:45 AM, Dmitriy Lyubimov <[email protected]> > >> > wrote: > >> > > >> > > sorry, i meant "random trinary" > >> > > > >> > > On Sat, Sep 1, 2012 at 12:39 AM, Dmitriy Lyubimov < > [email protected]> > >> > > wrote: > >> > > > Hm. there is slight error between R full rank SVD and Mahout MR > SSVD > >> > > > for my unit test modified for 100x100 k= 3 p=10. > >> > > > > >> > > > First left vector (R/SSVD) : > >> > > >> s$u[,1] > >> > > > [1] -0.050741660 -0.083985411 0.078767108 -0.044487425 > >> > > > -0.010380367 > >> > > > [6] 0.069635451 0.158337400 0.029102044 -0.168156173 > >> > > > -0.127921554 > >> > > > [11] 0.012698809 -0.027140724 0.069357925 -0.015605283 > >> > > > 0.076614201 > >> > > > [16] -0.158582188 0.143656275 0.033886221 -0.055111330 > >> > > > -0.029299261 > >> > > > [21] 0.059667350 0.039205405 0.042027376 0.048541162 > >> > > > 0.158267382 > >> > > > [
