in 1), i meant rank(A)<=k+p of course.
On Sun, Oct 7, 2012 at 3:17 PM, Dmitriy Lyubimov <[email protected]> wrote: > So, to summarize the answer to your question: > > 1) k+p <= min(m,n). (SVD rank deficiency nessary but not actually > sufficient requirement. Most strictly, rank(A)<=min(k+p) -- but this > more strict requirement will not cause the problem message you are > seeing). > 2) number of rows of A in _every mapper split_ of the input A is at > least k+p. Note that in situation of multiple input files, this also > implies that number of rows in every input file is at least k+p. > > -d > > On Sun, Oct 7, 2012 at 3:02 PM, Dmitriy Lyubimov <[email protected]> wrote: >> PS. Also keep in mind that if you use multiple files as input, and >> their sizes are smaller than hdfs split size, it will also mean that >> some splits will have reduced size even if the total input size looks >> benign. I think there was at least one case (which also pertains to >> "problem too smail" case) where a user discovered that one of the >> input files of distributed row matrix had less than k+p rows hence >> setting off this block height deficiency problem. >> >> -d >> >> On Sun, Oct 7, 2012 at 2:55 PM, Dmitriy Lyubimov <[email protected]> wrote: >>> Ahmed, if you are getting this, in all cases people talked about it it >>> meant their problem was too small. If A has m x n geometry, then it >>> must be true that k+p<=min(m,n). >>> >>> Another possible reason is if height of blocks of A crerated in the >>> mappers are less than k+p. In practice we yet to see a problem that >>> actually may ever run into condition (although it is definitely >>> possible if you occasionally have very dense very large row vectors so >>> they take up enough space to create split block height problem). If it >>> is indeed split block height problem, then the remedy is to increase >>> split size either by hadoop parameter or (i think) one of the SSVD >>> command line parameters. Although like i said nobody yet ran into >>> block height deficiency problem yet so i have no knowledge of verified >>> resolution of this problem by means of manipulating hadoop parameter >>> setup in Mahout. >>> >>> -d >>> >>> On Sun, Oct 7, 2012 at 2:16 PM, Ahmed Elgohary <[email protected]> wrote: >>>> Hi, >>>> >>>> Can someone list all the constrains on the parameters (k,p &aBlockRows) >>>> that should be satisfied in order for the Q-job in ssvd to work fine? I >>>> tried many values, made sure that (k+p<=m & k+p<=n & p is in the range 20 >>>> .. 200). but I am still getting the errors: "Givens thin QR: must be true: >>>> m>=n" or ""new m can't be less than n". >>>> >>>> thanks, >>>> >>>> --ahmed
