it does have something to do with K. previously I used a formular to determine my rank to use by
rank = N - p - 1 = 64 - 5 -1 = 58 , where N is the number of columns of the original matrix. then I tried using rank = 50, it worked. well.... as I write this email, I realized that the reason might be that the actual rank R of the original matrix may be much smaller than N, that could be the reason. but it is a bit difficult to figure out that R beforehand. thanks Yang On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <[email protected]> wrote: > is the matrix by any chance constructed so that it may have rank < k? I > think MR code is not checking for that. > > In spark shell i have : > > mahout> val a = dense( (0,0),(0,0) ) > a: org.apache.mahout.math.DenseMatrix = > { > 0 => {} > 1 => {} > } > mahout> svd(a) > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix, > org.apache.mahout.math.DenseVector) = > ({ > 0 => {0:1.0} > 1 => {1:1.0} > },{ > 0 => {0:-1.0} > 1 => {1:-1.0} > },{}) > > But : > > mahout> ssvd(a,2,0) > > java.lang.AssertionError: assertion failed: Rank-deficiency detected during > s-SVD > > or > mahout> val drmA = drmParallelize(a,2) > mahout> dssvd(drmA, k=2) > java.lang.IllegalArgumentException: R is rank-deficient. > > > the MR version doesn't check for these effects and it may create some > degenerate results, although i thought those should be 0s, at least when > -q=0. I am not sure for -q=1,2... > > > > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <[email protected]> wrote: > > > i am talking about the MR one. > > > > thanks > > yang > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <[email protected]> wrote: > > > > > This is not a known problem... > > > > > > there are few ssvd here, sequential, MR and spark one. for the record, > > > which one are you running? > > > > > > > > > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <[email protected]> wrote: > > > > > > > we are running ssvd on a dataset (this one is relatively small, with > > 8000 > > > > rows, number of columns is 64 ), we ran it with rank = 58, since > > > sampling > > > > p=5. > > > > > > > > the result had NaN on multiple columns. > > > > > > > > why would this appear ? > > > > > > > > I am now running with lower rank=20 , to see if it goes away. > > > > > > > > > > > > Thanks > > > > Yang > > > > > > > > > >
