Just to follow up: I now have my real data, which, is much
sparser than the totally-random data … and, unsurprisingly,
it exhibits a good bit more regularity, so it's compressible
to the point that the on-disc SequenceFile is small enough
that there's only a single map job, which, of course, means
Well, even with sparse data, your problem is probably still quite small for
this.
Btw if i have time i will probably put this method into spark rdd and bagel
which should speed things up by removing some inevitable sorting overhead.
In fact, methinks, having mahout sparse vectors and matrices as
Appreciate the replies!
Yes this problem has been pretty much beaten to shreds. In
fact so much so i wrote it into troubleshooting in section
5 of the manual
(https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17modificationDate=134085000).
Aha, it
Yes this problem has been pretty much beaten to shreds. In fact so much so
i wrote it into troubleshooting in section 5 of the manual (
https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=17modificationDate=134085000
).
Are you sure those are not you
Also, Mahout's distributed algebra operates on distributed row matrix
format (which is a sequence file of Vectors). I am a little bit confused
how you are able to run that stuff on the text input? Most likely this file
is just ignored because it is not a sequence file and your input ends up
being
Ok, so you are using the DRM.
but basically what it means is that block solver of QR cannot solve it due
to rank deficiency if any of your splits contain less than k+p rows of
input -- i suggest you to investigate your splitting along those lines. I
agree message is internal to QR solver and
Perhaps I can suggest as a first measure to run a simple local MR job on
your file which just counts # of rows in every map split. You should not
see any that is less than k+p (110?). Since you are using local mode and
not actual hdfs blocks, there may be some irregularities.
Also since random