On Aug 13, 2011, at 2:11 PM, Dmitriy Lyubimov wrote:

> NP.
> 
> thanks for testing it out.
> 
> I would appreciate if you could let me know how it goes with non-full rank
> decomposition and perhaps at larger scale.
> 

Sure thing.
> One thing to keep in mind is that it projects it into m x k+p _dense_
> matrix, assuming that k+p is much less than non-zero elements in a sparse
> row vector. If it is not the case, you actually would create more
> computation, not less, with a random projection. One person tried to use it
> with m= millions but rows were so sparse that there were only a handful (~10
> avg) non-zero items per row (somewhat typical for user ratings), but he
> tried to compute actually hundreds of singular values which of course
> created more intermediate work than something like Lanczos would probably
> do. That's not a good application of this method.

So this is a bit surprising: In my situation, the k would be relatively low < 
20. Since I am working with text data, I suspect that the rows are pretty 
sparse, although I have not instrumented row non zero element distributions 
yet. Based on your notes, I was planning to set k + p = 500 (or less depending 
on width of matrix) so that I would get reasonably good singular vectors. I 
guess I will do some more tuning.

> Another thing is also that you need to have good singular value decay in
> your data, otherwise this methods would be surprisingly far from true
> vectors (in my experiments).
> 

I am not too sure off hand whether this is true for my dataset. 


> -d
> 
> 
> On Sat, Aug 13, 2011 at 1:48 PM, Eshwaran Vijaya Kumar <
> [email protected]> wrote:
> 
>> Dmitriy,
>> That sounds great. I eagerly await the patch.
>> Thanks
>> Esh
>> On Aug 13, 2011, at 1:37 PM, Dmitriy Lyubimov wrote:
>> 
>>> Ok, i got u0 working.
>>> 
>>> The problem is of course that something called BBt job is to be coerced
>> to
>>> have 1 reducer (it's fine, every mapper won't yeld more than
>>> upper-triangular matrix of k+p x k+p geometry, so even if you end up
>> having
>>> thousands of them, reducer would sum them up just fine.
>>> 
>>> it worked before apparently because configuration hold 1 reducer by
>> default
>>> if not set explicitly, i am not quite sure if that's something in hadoop
>> mr
>>> client or mahout change that now precludes it from working.
>>> 
>>> anyway, i got a patch (really a one-liner) and an example equivalent to
>>> yours worked fine for me with 3 reducers.
>>> 
>>> Also, in the tests, it also requests 3 reducers, but the reason it works
>> in
>>> tests and not in distributed mapred is because local mapred doesn't
>> support
>>> multiple reducers. I investigated this issue before and apparently there
>>> were a couple of patches floating around but for some reason those
>> changes
>>> did not take hold in cdh3u0.
>>> 
>>> I will publish patch in a jira shortly and will commit it Sunday-ish.
>>> 
>>> Thanks.
>>> -d
>>> 
>>> 
>>> On Fri, Aug 5, 2011 at 7:06 PM, Eshwaran Vijaya Kumar <
>>> [email protected]> wrote:
>>> 
>>>> OK. So to add more info to this, I tried setting the number of reducers
>> to
>>>> 1 and now I don't get that particular error. The singular values and
>> left
>>>> and right singular vectors appear to be correct though (verified using
>>>> Matlab).
>>>> 
>>>> On Aug 5, 2011, at 1:55 PM, Eshwaran Vijaya Kumar wrote:
>>>> 
>>>>> All,
>>>>> I am trying to test Stochastic SVD and am facing some errors where it
>>>> would be great if  someone could clarifying what is going on. I am
>> trying to
>>>> feed the solver a DistributedRowMatrix with the exact same parameters
>> that
>>>> the test in  LocalSSVDSolverSparseSequentialTest uses, i.e, Generate a
>> 1000
>>>> X 100 DRM with SequentialSparseVectors and then ask for blockHeight 251,
>> p
>>>> (oversampling) = 60, k (rank) = 40. I get the following error:
>>>>> 
>>>>> Exception in thread "main" java.io.IOException: Unexpected overrun in
>>>> upper triangular matrix files
>>>>>      at
>>>> 
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.loadUpperTriangularMatrix(SSVDSolver.java:471)
>>>>>      at
>>>> 
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:268)
>>>>>      at com.mozilla.SSVDCli.run(SSVDCli.java:89)
>>>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>>      at com.mozilla.SSVDCli.main(SSVDCli.java:129)
>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>      at
>>>> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>      at
>>>> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>      at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>>>> 
>>>>> Also, I am using CDH3 with Mahout recompiled to work with CDH3 jars.
>>>>> 
>>>>> Thanks
>>>>> Esh
>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to