That's interesting! if we could get rid of qr step, or at least decrease
flops complexity there, that would constitute an improvement novel enough to
be worthy of a separate paper!
 On Aug 13, 2011 2:29 PM, "Ted Dunning" <[email protected]> wrote:
> Dmitriy,
>
> I have had some thoughts on this code and I think it is possible to
> eliminate the progressive QR decomposition of Y entirely and gain
> significant speed. I have some preliminary sequential code but need to
work
> up a larger example and a parallel implementation.
>
> On Sat, Aug 13, 2011 at 2:11 PM, Dmitriy Lyubimov <[email protected]>
wrote:
>
>> NP.
>>
>> thanks for testing it out.
>>
>> I would appreciate if you could let me know how it goes with non-full
rank
>> decomposition and perhaps at larger scale.
>>
>> One thing to keep in mind is that it projects it into m x k+p _dense_
>> matrix, assuming that k+p is much less than non-zero elements in a sparse
>> row vector. If it is not the case, you actually would create more
>> computation, not less, with a random projection. One person tried to use
it
>> with m= millions but rows were so sparse that there were only a handful
>> (~10
>> avg) non-zero items per row (somewhat typical for user ratings), but he
>> tried to compute actually hundreds of singular values which of course
>> created more intermediate work than something like Lanczos would probably
>> do. That's not a good application of this method.
>>
>> Another thing is also that you need to have good singular value decay in
>> your data, otherwise this methods would be surprisingly far from true
>> vectors (in my experiments).
>>
>> -d
>>
>>
>> On Sat, Aug 13, 2011 at 1:48 PM, Eshwaran Vijaya Kumar <
>> [email protected]> wrote:
>>
>> > Dmitriy,
>> > That sounds great. I eagerly await the patch.
>> > Thanks
>> > Esh
>> > On Aug 13, 2011, at 1:37 PM, Dmitriy Lyubimov wrote:
>> >
>> > > Ok, i got u0 working.
>> > >
>> > > The problem is of course that something called BBt job is to be
coerced
>> > to
>> > > have 1 reducer (it's fine, every mapper won't yeld more than
>> > > upper-triangular matrix of k+p x k+p geometry, so even if you end up
>> > having
>> > > thousands of them, reducer would sum them up just fine.
>> > >
>> > > it worked before apparently because configuration hold 1 reducer by
>> > default
>> > > if not set explicitly, i am not quite sure if that's something in
>> hadoop
>> > mr
>> > > client or mahout change that now precludes it from working.
>> > >
>> > > anyway, i got a patch (really a one-liner) and an example equivalent
to
>> > > yours worked fine for me with 3 reducers.
>> > >
>> > > Also, in the tests, it also requests 3 reducers, but the reason it
>> works
>> > in
>> > > tests and not in distributed mapred is because local mapred doesn't
>> > support
>> > > multiple reducers. I investigated this issue before and apparently
>> there
>> > > were a couple of patches floating around but for some reason those
>> > changes
>> > > did not take hold in cdh3u0.
>> > >
>> > > I will publish patch in a jira shortly and will commit it Sunday-ish.
>> > >
>> > > Thanks.
>> > > -d
>> > >
>> > >
>> > > On Fri, Aug 5, 2011 at 7:06 PM, Eshwaran Vijaya Kumar <
>> > > [email protected]> wrote:
>> > >
>> > >> OK. So to add more info to this, I tried setting the number of
>> reducers
>> > to
>> > >> 1 and now I don't get that particular error. The singular values and
>> > left
>> > >> and right singular vectors appear to be correct though (verified
using
>> > >> Matlab).
>> > >>
>> > >> On Aug 5, 2011, at 1:55 PM, Eshwaran Vijaya Kumar wrote:
>> > >>
>> > >>> All,
>> > >>> I am trying to test Stochastic SVD and am facing some errors where
it
>> > >> would be great if someone could clarifying what is going on. I am
>> > trying to
>> > >> feed the solver a DistributedRowMatrix with the exact same
parameters
>> > that
>> > >> the test in LocalSSVDSolverSparseSequentialTest uses, i.e, Generate
a
>> > 1000
>> > >> X 100 DRM with SequentialSparseVectors and then ask for blockHeight
>> 251,
>> > p
>> > >> (oversampling) = 60, k (rank) = 40. I get the following error:
>> > >>>
>> > >>> Exception in thread "main" java.io.IOException: Unexpected overrun
in
>> > >> upper triangular matrix files
>> > >>> at
>> > >>
>> >
>>
org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.loadUpperTriangularMatrix(SSVDSolver.java:471)
>> > >>> at
>> > >>
>> >
>>
org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:268)
>> > >>> at com.mozilla.SSVDCli.run(SSVDCli.java:89)
>> > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> > >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> > >>> at com.mozilla.SSVDCli.main(SSVDCli.java:129)
>> > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >>> at
>> > >>
>> >
>>
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > >>> at
>> > >>
>> >
>>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >>> at java.lang.reflect.Method.invoke(Method.java:597)
>> > >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>> > >>>
>> > >>> Also, I am using CDH3 with Mahout recompiled to work with CDH3
jars.
>> > >>>
>> > >>> Thanks
>> > >>> Esh
>> > >>>
>> > >>
>> > >>
>> >
>> >
>>

Reply via email to