Re: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Xiangyu Jin
I am not sure. But guess there are three possilities, (1). see that you use Field.Text(contents, stringBuffer.toString()) This will store all your string of text into document object. And it might be long ... I do not know the detail how Lucene implemented. I think you can try use unstored

Re: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Xiangyu Jin
Ok, I see. Seems most ppl think is the third possiblity On Fri, 10 Dec 2004, Xiangyu Jin wrote: I am not sure. But guess there are three possilities, (1). see that you use Field.Text(contents, stringBuffer.toString()) This will store all your string of text into document object

Re: similarity matrix - more clear

2004-11-30 Thread Xiangyu Jin
I also have the same task as you do. According to my understanding, suppose their are N documents, your approach will take N^2 similarity calculations. Although there are N(N-1)/2 distinct document pairs, the similarity calculation (according to my understanding) in Lucene is asymmetric, so

Lucene's ranking function VS Standard VSM model

2004-11-30 Thread Xiangyu Jin
whether I understand correctly, but the major reason comes from Lucene's query parser. It defaults each term appear once. If we issue a query term multiple times in the query string, it will result in some un-expected results. For detail information, pls refer to the attached link. thanks xiangyu

Does Lucene perform ranking in the retrieved set?

2004-11-30 Thread Xiangyu Jin
those candidate docs, then I can perform my own similarity calculations (since I might need to rewrite the normalization factor, so only modify the similarity model seems will not work). Or, is there document describe the produre of how Lucene perform search? thanks xiangyu jin