Re: Lucene & LSA

2006-12-14 Thread Miles Efron
U of Tennessee professor Michael Berry maintains a good site regarding software for computing SVD on large, sparse matrices: http://www.cs.utk.edu/~lsi/ The site also points to the LSI patent. FWIW it's very easy to extract term-doc counts from a lucene index and format them for softw

Re: Lucene & LSA

2006-12-14 Thread Marvin Humphrey
On Dec 14, 2006, at 11:16 AM, Soeren Pekrul wrote: it is possible to extract the matrix from the indexing file? I don’t know any API to extract the matrix from the index file directly. How could we make it work to write an open source decomposed vector model search engine a la LSA witho

Re: Lucene & LSA

2006-12-14 Thread Soeren Pekrul
mariolone wrote: They are successful to extract the matrix. But with collections of large documents is not one too much expensive solution? I have a quite small collection with 14,960 documents and 29,828 unique terms. If I remember right it took a few minutes on a normal laptop computer to

Re: Lucene & LSA

2006-12-14 Thread mariolone
Thanks for the aid, Soren!!! They are successful to extract the matrix. But with collections of large documents is not one too much expensive solution? it is possible to extract the matrix from the indexing file? Mario Sören Pekrul wrote: > > Hello Mario, > > I had a similar problem a few

Re: Lucene & LSA

2006-12-14 Thread Soeren Pekrul
Hello Mario, I had a similar problem a few weeks ago (thread "How to get Term Weights (document term matrix)?", 2006-11-02, http://www.gossamer-threads.com/lists/lucene/java-user/41726). I think there is no simple function creating a document term matrix or accessing it. I extracted the matr