So, to summarize the answer to your question: 1) k+p <= min(m,n). (SVD rank deficiency nessary but not actually sufficient requirement. Most strictly, rank(A)<=min(k+p) -- but this more strict requirement will not cause the problem message you are seeing). 2) number of rows of A in _every mapper split_ of the input A is at least k+p. Note that in situation of multiple input files, this also implies that number of rows in every input file is at least k+p.
-d On Sun, Oct 7, 2012 at 3:02 PM, Dmitriy Lyubimov <[email protected]> wrote: > PS. Also keep in mind that if you use multiple files as input, and > their sizes are smaller than hdfs split size, it will also mean that > some splits will have reduced size even if the total input size looks > benign. I think there was at least one case (which also pertains to > "problem too smail" case) where a user discovered that one of the > input files of distributed row matrix had less than k+p rows hence > setting off this block height deficiency problem. > > -d > > On Sun, Oct 7, 2012 at 2:55 PM, Dmitriy Lyubimov <[email protected]> wrote: >> Ahmed, if you are getting this, in all cases people talked about it it >> meant their problem was too small. If A has m x n geometry, then it >> must be true that k+p<=min(m,n). >> >> Another possible reason is if height of blocks of A crerated in the >> mappers are less than k+p. In practice we yet to see a problem that >> actually may ever run into condition (although it is definitely >> possible if you occasionally have very dense very large row vectors so >> they take up enough space to create split block height problem). If it >> is indeed split block height problem, then the remedy is to increase >> split size either by hadoop parameter or (i think) one of the SSVD >> command line parameters. Although like i said nobody yet ran into >> block height deficiency problem yet so i have no knowledge of verified >> resolution of this problem by means of manipulating hadoop parameter >> setup in Mahout. >> >> -d >> >> On Sun, Oct 7, 2012 at 2:16 PM, Ahmed Elgohary <[email protected]> wrote: >>> Hi, >>> >>> Can someone list all the constrains on the parameters (k,p &aBlockRows) >>> that should be satisfied in order for the Q-job in ssvd to work fine? I >>> tried many values, made sure that (k+p<=m & k+p<=n & p is in the range 20 >>> .. 200). but I am still getting the errors: "Givens thin QR: must be true: >>> m>=n" or ""new m can't be less than n". >>> >>> thanks, >>> >>> --ahmed
