for the third time, in context of lsa, faster and hence perhaps better alternative to lanczos is ssvd. Is there any specific reason you want to use lanczos solver in context of LSA?
-d On Sun, Feb 26, 2012 at 6:40 AM, Peyman Mohajerian <[email protected]> wrote: > Hi Guys, > > Per you advice I did upgrade to Mahout .6 and did a bunch of API > changes and in the meantime realized I had a bug with my input matrix, > zero rows read from Solr b/c multiple fields in Solr were index and > not just the one I was interested in, that issues is fixed and I have > a matrix with these dimensions: (.numCols mat) 1000 (.numRows mat) > 15932 (or the transpose) > Unfortunately I'm getting the below error now, in the context of some > other Mahout algorithm there was a mention of '/tmp' vs '/_tmp' > causing this issue but in this particular case the matrix is in > memory!! I'm using this google package: guava-r09.jar > > SEVERE: java.util.NoSuchElementException > at > com.google.common.collect.AbstractIterator.next(AbstractIterator.java:152) > at > org.apache.mahout.math.hadoop.TimesSquaredJob.retrieveTimesSquaredOutputVector(TimesSquaredJob.java:190) > at > org.apache.mahout.math.hadoop.DistributedRowMatrix.timesSquared(DistributedRowMatrix.java:238) > at > org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:104) > at lsa4solr.mahout_matrix$decompose_svd.invoke(mahout_matrix.clj:165) > > > Any suggestion? > Thanks, > Peyman > > > > On Mon, Feb 20, 2012 at 10:38 AM, Dmitriy Lyubimov <[email protected]> wrote: >> Peyman, >> >> >> Yes, what Ted said. Please take 0.6 release. Also try ssvd, it may >> benefit you in some regards compared to Lanczos. >> >> -d >> >> On Sun, Feb 19, 2012 at 10:34 AM, Peyman Mohajerian <[email protected]> >> wrote: >>> Hi Dmitriy & Others, >>> >>> Dmitriy thanks for your previous response. >>> I have a follow up question to my LSA project. I have managed to >>> upload 1,500 documents from two different news groups (one about >>> graphics and one about Atheism >>> http://people.csail.mit.edu/jrennie/20Newsgroups/) to Solr. However my >>> LanczosSolver in Mahout.4 does not find any eigenvalues (there are >>> eigenvectors as you see in the follow up logs). >>> The only things I'm doing different from >>> (https://github.com/algoriffic/lsa4solr) is that I'm not using the >>> 'Summary' field but rather the actual 'text' field in Solr. I'm >>> assuming the issue is that Summary field already removes the noise and >>> make the clustering work and the raw index data does not do that, am I >>> correct or there are other potential explanations? For the desired >>> rank I'm using values between 10-100 and looking for #clusters between >>> 2-10 (different values for different trials), but always the same >>> result comes out, no clusters found. >>> If my issue is related to not having summarization done, how can that >>> be done in Solr? I wasn't able to fine a Summary field in Solr. >>> >>> Thanks >>> Peyman >>> >>> >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal >>> auxiliary matrix. >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 0 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 1 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 2 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 3 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 4 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 5 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 6 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 7 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 8 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 9 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: Eigenvector 10 found with eigenvalue 0.0 >>> Feb 19, 2012 3:25:20 AM >>> org.apache.mahout.math.decomposer.lanczos.LanczosSolver solve >>> INFO: LanczosSolver finished. >>> >>> >>> On Sun, Jan 1, 2012 at 10:06 PM, Dmitriy Lyubimov <[email protected]> wrote: >>>> In Mahout lsa pipeline is possible with seqdirectory, seq2sparse and ssvd >>>> commands. Nuances are understanding dictionary format and llr anaylysis of >>>> n-grams and perhaps use a slightly better lemmatizer than the default one. >>>> >>>> With indexing part you are on your own at this point. >>>> On Jan 1, 2012 2:28 PM, "Peyman Mohajerian" <[email protected]> wrote: >>>> >>>>> Hi Guys, >>>>> >>>>> I'm interested in this work: >>>>> >>>>> http://www.ccri.com/blog/2010/4/2/latent-semantic-analysis-in-solr-using-clojure.html >>>>> >>>>> I looked at some of the comments and notices that there was interest >>>>> in incorporating it into Mahout, back in 2010. I'm also having issues >>>>> running this code due to dependencies on older version of Mahout. >>>>> >>>>> I was wondering if LSA is now directly available in Mahout? Also if I >>>>> upgrade to the latest Mahout would this Clojure code work? >>>>> >>>>> Thanks >>>>> Peyman >>>>>
