I am not sure. But guess there are three possilities,
(1). see that you use
Field.Text(contents, stringBuffer.toString())
This will store all your string of text into document object.
And it might be long ...
I do not know the detail how Lucene implemented.
I think you can try use unstored
Ok, I see. Seems most ppl think is the third possiblity
On Fri, 10 Dec 2004, Xiangyu Jin wrote:
I am not sure. But guess there are three possilities,
(1). see that you use
Field.Text(contents, stringBuffer.toString())
This will store all your string of text into document object
I also have the same task as you do. According to my understanding,
suppose their are N documents, your approach will take N^2 similarity
calculations.
Although there are N(N-1)/2 distinct document pairs,
the similarity calculation (according to my understanding) in Lucene is
asymmetric, so
whether I understand correctly, but the
major reason comes from Lucene's query parser. It defaults
each term appear once. If we issue a query term multiple
times in the query string, it will result in some un-expected
results.
For detail information, pls refer to the attached link.
thanks
xiangyu
those candidate docs,
then I can perform my own similarity calculations (since
I might need to rewrite the normalization factor, so
only modify the similarity model seems will not
work).
Or, is there document describe the produre of how Lucene
perform search?
thanks
xiangyu jin