Re: SSVD for dimensional reduction + Kmeans

Pat Ferrel Fri, 10 Aug 2012 09:22:50 -0700

There seems to be some internal constraint on k and/or p, which is making a 
test difficult. The test has a very small input doc set and choosing the wrong 
k it is very easy to get a failure with this message:

        java.lang.IllegalArgumentException: new m can't be less than n
                at 
org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)

I have a working test but I had to add some docs to the test data and have 
tried to reverse engineer the value for k (desiredRank). I came up with the 
following but I think it is only an accident that it works.

        int p = 15; //default value for CLI
        int desiredRank = sampleData.size() - p - 1;//number of docs - p - 1, 
?????? not sure why this works

This seems likely to be an issue only because of the very small data set and 
the relationship of rows to columns to p to k. But for the purposes of creating 
a test if someone (Dmitriy?) could tell me how to calculate a reasonable p and 
k from the dimensions of the tiny data set it would help.

This test is derived from a non-active SVD test but I'd be up for cleaning it 
up and including it as an example in the working but non-active tests. I also 
fixed a couple trivial bugs in the non-active Lanczos tests for what it's worth.

On Aug 9, 2012, at 4:47 PM, Dmitriy Lyubimov <[email protected]> wrote:

Reading "overview and usage" doc linked on that page
https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singular+Value+Decomposition
should help to clarify outputs and usage.

On Thu, Aug 9, 2012 at 4:44 PM, Dmitriy Lyubimov <[email protected]> wrote:
> On Thu, Aug 9, 2012 at 4:34 PM, Pat Ferrel <[email protected]> wrote:
>> Quoth Grant Ingersoll:
>>> To put this in bin/mahout speak, this would look like, munging some names 
>>> and taking liberties with the actual argument to be passed in:
>>> 
>>> bin/mahout svd (original -> svdOut)
>>> bin/mahout cleansvd ...
>>> bin/mahout transpose svdOut -> svdT
>>> bin/mahout transpose original -> originalT
>>> bin/mahout matrixmult originalT svdT -> newMatrix
>>> bin/mahout kmeans newMatrix
>> 
>> I'm trying to create a test case from testKmeansDSVD2 to use SSVDSolver. 
>> Does SSVD require the EigenVerificationJob to clean the eigen vectors?
> 
> No
> 
>> if so where does SSVD put the equivalent of 
>> DistributedLanczosSolver.RAW_EIGENVECTORS? Seems like they should be in V* 
>> but SSVD creates V so should I transpose V* then run it through the 
>> EigenVerificationJob?
> no
> 
> SSVD is SVD, meaning it produces U and V with no further need to clean that
> 
>> I get errors when I do so trying to figure out if I'm on the wrong track.

Re: SSVD for dimensional reduction + Kmeans

Reply via email to