Re: SSVD Parameters Constrains

Dmitriy Lyubimov Sun, 07 Oct 2012 15:20:44 -0700

Second correction. In 1)  i meant rank (A)>=k+p of course. (I am quite
distracted, i guess, sorry). I probably need to put it into the
manual. Some things that are obvious to me are not so obvious to other
people.


On Sun, Oct 7, 2012 at 3:18 PM, Dmitriy Lyubimov <[email protected]> wrote:
> in 1), i meant rank(A)<=k+p of course.
>
> On Sun, Oct 7, 2012 at 3:17 PM, Dmitriy Lyubimov <[email protected]> wrote:
>> So, to summarize the answer to your question:
>>
>> 1) k+p <= min(m,n). (SVD rank deficiency nessary but not actually
>> sufficient requirement. Most strictly, rank(A)<=min(k+p) -- but this
>> more strict requirement will not cause the problem message you are
>> seeing).
>> 2) number of rows of A in _every mapper split_ of the input A  is at
>> least k+p. Note that in situation of multiple input files, this also
>> implies that number of rows in every input file is at least k+p.
>>
>> -d
>>
>> On Sun, Oct 7, 2012 at 3:02 PM, Dmitriy Lyubimov <[email protected]> wrote:
>>> PS. Also keep in mind that if you use multiple files as input, and
>>> their sizes are smaller than hdfs split size, it will also mean that
>>> some splits will have reduced size even if the total input size looks
>>> benign. I think there was at least one case (which also pertains to
>>> "problem too smail" case) where a user discovered that one of the
>>> input files of distributed row matrix had less than k+p rows hence
>>> setting off this block height deficiency problem.
>>>
>>> -d
>>>
>>> On Sun, Oct 7, 2012 at 2:55 PM, Dmitriy Lyubimov <[email protected]> wrote:
>>>> Ahmed, if you are getting this, in all cases people talked about it it
>>>> meant their problem was too small.  If A has m x n geometry, then it
>>>> must be true that k+p<=min(m,n).
>>>>
>>>> Another possible reason is if height of blocks of A crerated in the
>>>> mappers are less than k+p. In practice we yet to see a problem that
>>>> actually may ever run into condition (although it is definitely
>>>> possible if you occasionally have very dense very large row vectors so
>>>> they take up enough space to create split block height problem). If it
>>>> is indeed split block height problem, then the remedy is to increase
>>>> split size either by hadoop parameter or (i think) one of the SSVD
>>>> command line parameters. Although like i said nobody yet ran into
>>>> block height deficiency problem yet so i have no knowledge of verified
>>>> resolution of this problem by means of manipulating hadoop parameter
>>>> setup in Mahout.
>>>>
>>>> -d
>>>>
>>>> On Sun, Oct 7, 2012 at 2:16 PM, Ahmed Elgohary <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> Can someone list all the constrains on the parameters (k,p &aBlockRows)
>>>>> that should be satisfied in order for the Q-job in ssvd to work fine? I
>>>>> tried many values, made sure that (k+p<=m & k+p<=n & p is in the range 20
>>>>> .. 200). but I am still getting the errors: "Givens thin QR: must be true:
>>>>> m>=n" or ""new m can't be less than n".
>>>>>
>>>>> thanks,
>>>>>
>>>>> --ahmed

Re: SSVD Parameters Constrains

Reply via email to