Ok. so that's what i suspected. The method generally is not intended to run on inputs with ranks smaller than k+p parameters. MR version doesn't even check for it.
However as i mentioned in manual, i did run tests with -q=0 in which case correspondent singular vectors on the right should be reset to 0.0, not NaNs . It is possible that with -q=1 power iterations do something inadmissible in that situation. just for the record, what -q setting have you used? On Mon, Nov 3, 2014 at 2:00 PM, Yang <[email protected]> wrote: > it does have something to do with K. previously I used a formular to > determine my rank to use by > > rank = N - p - 1 = 64 - 5 -1 = 58 , where N is the number of columns of > the original matrix. > > then I tried using rank = 50, it worked. > > well.... as I write this email, I realized that the reason might be that > the actual rank R of the original matrix may be much smaller than N, that > could be the reason. but it is a bit difficult to figure out that R > beforehand. > > > thanks > Yang > > On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > is the matrix by any chance constructed so that it may have rank < k? I > > think MR code is not checking for that. > > > > In spark shell i have : > > > > mahout> val a = dense( (0,0),(0,0) ) > > a: org.apache.mahout.math.DenseMatrix = > > { > > 0 => {} > > 1 => {} > > } > > mahout> svd(a) > > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix, > > org.apache.mahout.math.DenseVector) = > > ({ > > 0 => {0:1.0} > > 1 => {1:1.0} > > },{ > > 0 => {0:-1.0} > > 1 => {1:-1.0} > > },{}) > > > > But : > > > > mahout> ssvd(a,2,0) > > > > java.lang.AssertionError: assertion failed: Rank-deficiency detected > during > > s-SVD > > > > or > > mahout> val drmA = drmParallelize(a,2) > > mahout> dssvd(drmA, k=2) > > java.lang.IllegalArgumentException: R is rank-deficient. > > > > > > the MR version doesn't check for these effects and it may create some > > degenerate results, although i thought those should be 0s, at least when > > -q=0. I am not sure for -q=1,2... > > > > > > > > > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <[email protected]> wrote: > > > > > i am talking about the MR one. > > > > > > thanks > > > yang > > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <[email protected]> wrote: > > > > > > > This is not a known problem... > > > > > > > > there are few ssvd here, sequential, MR and spark one. for the > record, > > > > which one are you running? > > > > > > > > > > > > > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <[email protected]> wrote: > > > > > > > > > we are running ssvd on a dataset (this one is relatively small, > with > > > 8000 > > > > > rows, number of columns is 64 ), we ran it with rank = 58, since > > > > sampling > > > > > p=5. > > > > > > > > > > the result had NaN on multiple columns. > > > > > > > > > > why would this appear ? > > > > > > > > > > I am now running with lower rank=20 , to see if it goes away. > > > > > > > > > > > > > > > Thanks > > > > > Yang > > > > > > > > > > > > > > >
