Hi Dan, Which files are you looking for ? Are not they present in <output_dir>/calculations/ ?
Aniruddha -----Original Message----- From: Dan Brickley [mailto:[email protected]] Sent: Thursday, July 26, 2012 2:35 PM To: [email protected] Cc: [email protected] Subject: Re: eigendecomposition of very large matrices On 26 Jul 2012, at 21:12, Aniruddha Basak <[email protected]> wrote: > Hi, > I am trying to use SSVD instead of Lanczos, as a part of Spectral Kmeans. I started looking at this. I wanted to find a way to recycle the existing first steps of the spectralkmeans code, at least as far as generating the laplacian re-representation of the affinity matrix. I thought we could persuade it not to clean/delete the intermediate working files... How are you approaching this? Dan > However, I could not find the relation between the eigenvectors and U, V > matrices. > Can someone please tell me, how to retrieve the eigenvectors from SSVD > decomposition ? > > Thanks, > Aniruddha > > > > -----Original Message----- > From: Dmitriy Lyubimov [mailto:[email protected]] > Sent: Thursday, July 19, 2012 10:53 PM > To: [email protected] > Subject: RE: eigendecomposition of very large matrices > > Pps if you do insist on having a lot of k then you'll benefit from smaller > hdfs block size, not larger. > On Jul 19, 2012 10:50 PM, "Dmitriy Lyubimov" <[email protected]> wrote: > >> Yeah I see OK. Both two experiments conducted with mahout ssvd I am >> familiar with dealt with input size greater than yours element wise, >> on a quite modest node count. So i don't think your input size will >> be a problem. But the number of singular values will be. >> >> But I doubt any input will yield anything useful beyond k=200 but >> statistical noise. Even if you have a good decay of the singular values. >> But I bet you don't need that many. You can fit significantly more >> 'clusters' on a 'fairly small' dimensional space. >> On Jul 19, 2012 6:33 PM, "Aniruddha Basak" <[email protected]> wrote: >> >>> Thanks Dmitriy for your reply. >>> The matrix I am working on, has 10-20 non zero entries per row. So >>> its very sparse. >>> I am trying to do spectral clustering which involves eigen-decomposition. >>> I am wondering whether anyone has tried to do spectral clustering >>> using mahout for very large affinity matrix (input). >>> >>> Aniruddha >>> >>> >>> -----Original Message----- >>> From: Dmitriy Lyubimov [mailto:[email protected]] >>> Sent: Thursday, July 19, 2012 6:28 PM >>> To: [email protected] >>> Subject: Re: eigendecomposition of very large matrices >>> >>> very significant sparsity may be a problem though for -q >=1 parameters. >>> Again, depends on the hardware you have and the # of non-zero >>> elements in the input. but -q=1 is still the most recommended setting here. >>> >>> >>> On Thu, Jul 19, 2012 at 6:20 PM, Dmitriy Lyubimov >>> <[email protected]> >>> wrote: >>>> you may try SSVD. >>>> https://cwiki.apache.org/confluence/display/MAHOUT/Stochastic+Singu >>>> lar >>>> +Value+Decomposition >>>> >>>> but 4k eigenvectors (or, rather, singular values) is kind of still >>>> a lot though and may push the precision out of the error estimates. >>>> I don't we had precision study for that many. Also need quite a bit >>>> of memory to compute that (not to mention flops). More >>>> realistically you probably may try 1k singular values . You may try >>>> more if you have access to more powerful hardware than we did in >>>> the studies but distributed computation time will grow at about >>>> k^1.5, i.e. faster than linear, even if you have enough nodes for the >>>> tasks. >>>> >>>> -d >>>> >>>> On Thu, Jul 19, 2012 at 6:12 PM, Aniruddha Basak >>>> <[email protected]> >>> wrote: >>>>> Hi, >>>>> I am working on a clustering problem which involves determining >>>>> the largest "k" eigenvectors of a very large matrix. The matrices, >>>>> I work on, are typically of the order of 10^6 by 10^6. >>>>> Trying to do this using the Lanczos solver available in Mahout, I >>>>> found it is very slow and takes around 1.5 minutes to compute each >>> eigenvectors. >>>>> Hence to get 4000 eigenvectors, it takes 100 hours or 4 days !! >>>>> >>>>> So I am looking for something faster to solve the "Eigen decomposition" >>>>> problem for very large sparse matrix. Please suggest me what >>>>> should I >>> use ? >>>>> >>>>> >>>>> Thanks, >>>>> Aniruddha >>>>> >>> >>
