Re: Problems Running Mahout SSVD

2013-02-19 Thread K.D.P. Ross
Just to follow up: I now have my real data, which, is much sparser than the totally-random data … and, unsurprisingly, it exhibits a good bit more regularity, so it's compressible to the point that the on-disc SequenceFile is small enough that there's only a single map job, which, of course, means

Re: Problems Running Mahout SSVD

2013-02-19 Thread Dmitriy Lyubimov
Well, even with sparse data, your problem is probably still quite small for this. Btw if i have time i will probably put this method into spark rdd and bagel which should speed things up by removing some inevitable sorting overhead. In fact, methinks, having mahout sparse vectors and matrices as

Re: Problems Running Mahout SSVD

2013-02-14 Thread K.D.P. Ross
Appreciate the replies! Yes this problem has been pretty much beaten to shreds. In fact so much so i wrote it into troubleshooting in section 5 of the manual (https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17modificationDate=134085000). Aha, it

Re: Problems Running Mahout SSVD

2013-02-11 Thread Dmitriy Lyubimov
Yes this problem has been pretty much beaten to shreds. In fact so much so i wrote it into troubleshooting in section 5 of the manual ( https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17modificationDate=134085000 ). Are you sure those are not you

Re: Problems Running Mahout SSVD

2013-02-11 Thread Dmitriy Lyubimov
Also, Mahout's distributed algebra operates on distributed row matrix format (which is a sequence file of Vectors). I am a little bit confused how you are able to run that stuff on the text input? Most likely this file is just ignored because it is not a sequence file and your input ends up being

Re: Problems Running Mahout SSVD

2013-02-11 Thread Dmitriy Lyubimov
Ok, so you are using the DRM. but basically what it means is that block solver of QR cannot solve it due to rank deficiency if any of your splits contain less than k+p rows of input -- i suggest you to investigate your splitting along those lines. I agree message is internal to QR solver and

Re: Problems Running Mahout SSVD

2013-02-11 Thread Dmitriy Lyubimov
Perhaps I can suggest as a first measure to run a simple local MR job on your file which just counts # of rows in every map split. You should not see any that is less than k+p (110?). Since you are using local mode and not actual hdfs blocks, there may be some irregularities. Also since random