On Thu, Apr 4, 2013 at 9:36 AM, Sean Owen <[email protected]> wrote:

> Yeah I've got the pivoting part down -- I think. The problem is that I
> can't seem to identify the problem by simple thresholding. For
> example, a diagonal like "10 9 8 0.0001 0.0000001" obviously has a
> problem. But so might "100 90 80 10"... I think deficient rank is
> actually only one possible cause of this problem I'm not describing
> well.
>
> Yes I also used an SVD, same kind of thing; this happens with
> 'healthy' full-rank factorizations.
>
> I think I'm trying to capture something about the action of the
> operator defined by Y * (Y' * Y)^-1, and operator norm certainly does
> that, and "> 1" seems to be a good criteria but I am far enough away
> from the areas I'm solidly comfortable with to feel like this is
> necessarily a reliable way to think about this.
>
>
So, the problem is that the kxk matrix is ill-conditioned, or is there more
to it?


> On Thu, Apr 4, 2013 at 2:13 PM, Ted Dunning <[email protected]> wrote:
> > Typically, to deal with this kind of problem, you need to follow one of
> two
> > courses.
> >
> > First, you can use a so-called rank-revealing QR which uses a pivoting
> > strategy to push all of the small elements of R as far down the diagonal
> as
> > possible.  This gives you a reliable way of finding the problems and can
> > give you approximate solutions of a limited rank decomposition of A.
> >
> > Typically, it is better to use SVD instead of QR in these cases.  You can
> > truncate S (the matrix with the singular values) at whatever point you
> deem
> > correct and get an optimal least squares solution.
> >
> > The Mahout QR that I whipped up a couple of months ago is not rank
> > revealing, but it is pretty easy to convert the Gram-Schmidt algorithm to
> > be such.  The SVD we have should work just fine.
> >
> >
> >
> > On Thu, Apr 4, 2013 at 12:41 PM, Sean Owen <[email protected]> wrote:
> >
> >> This is more of a linear algebra question, but I thought it worth
> >> posing to the group --
> >>
> >> As part of a process like ALS, you solve a system like A = X * Y' for
> >> X or for Y, given the other two. A is sparse (m x n); X and Y are tall
> >> and skinny (m x k, m x n, where k << m,n)
> >>
> >> For example to solve for X, just:   X = A * Y * (Y' * Y)^-1
> >>
> >> This fails if the k x k matrix Y' * Y is not invertible of course.
> >> This can happen if the data is tiny and k is actually large relative
> >> to m,n.
> >>
> >> It also goes badly if it is nearly not invertible. The solution for X
> >> can become very large, for example, for a small A, which is "obviously
> >> wrong". You can -- often -- detect this by looking at the diagonal of
> >> R in a QR decomposition, looking for near-zero values.
> >>
> >> However I find a similar behavior even when the rank k seems
> >> intuitively fine (easily low enough given the data), but when, for
> >> example, the regularization term is way too high. X and Y are so
> >> constrained that the inverse above becomes a badly behaved operator
> >> too.
> >>
> >> I think I understand the reasons for this intuitively. The goal isn't
> >> to create a valid solution since there is none here; the goal is to
> >> define and detect this "bad" situation reliably and suggest a fix to
> >> parameters if possible.
> >>
> >> I have had better success looking at the operator norm of (Y' * Y)^-1
> >> (its largest singular value) to get a sense of when it is going to
> >> potentially scale its input greatly, since that's a sign it's bad, but
> >> I feel like I'm missing the rigorous understanding of what to do with
> >> that info. I'm looking for a way to think about a cutoff or threshold
> >> for that singular value that will make it be rejected (>1?) but think
> >> I have some unknown-unknowns in this space.
> >>
> >> Any insights or pointers into the next concept that's required here
> >> are appreciated.
> >>
> >> Sean
> >>
>

Reply via email to