Re: Too many open files issue

2004-11-26 Thread Doug Cutting
John Wang wrote:
In the Lucene code, I don't see where the reader speicified when
creating a field is closed. That holds on to the file.
I am looking at DocumentWriter.invertDocument()
It is closed in a finally clause on line 170, when the TokenStream is 
closed.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Dutch Analyzer dictionary format?

2004-11-26 Thread Twan Kogels
Hello all,
I'm using lucene to search through a couple of documents to find 
interesting documents. Most documents are in Dutch language. I saw that the 
default snowball stemmer wasn't doing well on text written in a foreign 
language. Lucky i found a Dutch text analyzer in de lucene sandbox project.

I've read the javadoc and found out it needs a stemdictionary. You can load 
this dictionary with the following function:
DutchAnalyzer.setStemDictionary(File f)

The format needs to be a tab separator list (word [tab] stem).
To be sure i do everything correctly i've got a question about the dictonary:
Can i just get:
http://snowball.tartarus.org/dutch/diffs.txt
and convert it to a tab separated list and then feed it to the 
setStemDictionary() function?

Kind regards,
Twan Kogels 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Dutch Analyzer dictionary format?

2004-11-26 Thread Otis Gospodnetic
Judging from everything you've said, the answer is yes.  I don't use
Dutch Analyzer, so I'm not 100% sure about this, but it sounds easy
enough to try.

Otis

--- Twan Kogels [EMAIL PROTECTED] wrote:

 Hello all,
 
 I'm using lucene to search through a couple of documents to find 
 interesting documents. Most documents are in Dutch language. I saw
 that the 
 default snowball stemmer wasn't doing well on text written in a
 foreign 
 language. Lucky i found a Dutch text analyzer in de lucene sandbox
 project.
 
 I've read the javadoc and found out it needs a stemdictionary. You
 can load 
 this dictionary with the following function:
 DutchAnalyzer.setStemDictionary(File f)
 
 The format needs to be a tab separator list (word [tab] stem).
 
 To be sure i do everything correctly i've got a question about the
 dictonary:
 Can i just get:
 http://snowball.tartarus.org/dutch/diffs.txt
 and convert it to a tab separated list and then feed it to the 
 setStemDictionary() function?
 
 Kind regards,
 Twan Kogels 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Are similarity scores computed when using sort?

2004-11-26 Thread Aphinyanaphongs, Yindalon
I have an search application that is very performance conscious.  I've looked 
through the IndexSearcher code, and haven't been able to clarify whether a 
similarity score is calculated if the results are sorted by some numerical 
field value? Basically, it would be preferable to not incur the computational 
cost of generating a similarity score if it is never used.

Thanks
Yin