RE: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Uwe Schindler
The finalize() thing does not work correctly, as the reader holds still
references to other stuff when not explicitely closed. As it references
them, the finalizer() is never called, as it is not to be gc'd.

You must close the reader explicit, that's all. So just close it afterusing.
With Near Realtime Search, you normally get an IR, then wrap it with
IndexSearcher, do your search, and close it after that. You can even call
writer.getReader() from different threads, refcounting will close the
readers correctly. So for each request, take a new one and close after
usage.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Jamie [mailto:ja...@stimulussoft.com]
> Sent: Wednesday, September 29, 2010 11:50 AM
> To: java-user@lucene.apache.org
> Subject: Re: File Handle Leaks During Lucene 3.0.2 Merge
> 
>   Hi Uwe
> 
> Thanks in advance for your help. Well, I just tried searching again and it
made
> no difference. My LuceneIndex getReader() function will call
> writer.getReader() on occasion or return a cached copy. To make sure that
> IndexReader's are closed when they are no longer needed, I wrap the
> IndexReader as follows:
> 
> public class VolumeIndexReader extends FilterIndexReader {
> 
>  public VolumeIndexReader(IndexReader in) {
>  super(in);
>  }
> 
>  public void finalize() {
>  try { in.close(); } catch (Exception e) {}
>  }
> 
>  public IndexReader reopen(boolean readonly) throws IOException {
>  return super.reopen(readonly);
>  }
> }
> 
> You'll notice finalizer calls IndexReader.close(). After users conduct
multiple
> searches, the index reader should be closed in time. Therefore, its
confusing to
> me to see that open handles are still present. Clearly, I am doing
something
> wrong, but what?
> 
> Jamie
> 
> 
> 
> On 2010/09/29 8:21 PM, Uwe Schindler wrote:
> > The "deleted" files are only freed by OS kernel if no longer an
> > IndexReader accesses them. Did you get a new realtime reader after
> > merging and*closed* the old one?
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail:u...@thetaphi.de



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Jamie

 Uwe

If I recall correctly when you call writer.getReader(), the returned 
IndexReader can consume alot of memory with large indexes. To ensure 
that the same index reader is reused across multiple search threads, I 
keep a cached copy of the reader and return it. If a search thread 
closes the reader, then it will be closed for the other search threads 
and the search will fail. From my test, the finalize method in 
VolumeIndexReader example I gave you is called. The file handle leaks 
are coming from the core index loop, where I call .commit() as opposed 
to closing the index. Since the writer stays open, handles left by merge 
operations are never deleted. A solution is too close the index 
periodically to force the handles to be swept up by the OS.


Jamie

On 2010/09/30 10:55 AM, Uwe Schindler wrote:

The finalize() thing does not work correctly, as the reader holds still
references to other stuff when not explicitely closed. As it references
them, the finalizer() is never called, as it is not to be gc'd.

You must close the reader explicit, that's all. So just close it afterusing.
With Near Realtime Search, you normally get an IR, then wrap it with
IndexSearcher, do your search, and close it after that. You can even call
writer.getReader() from different threads, refcounting will close the
readers correctly. So for each request, take a new one and close after
usage.

Uwe




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
Opening an NRT reader per-search can be too costly if you have a high
search rate.

It's better to rate-limit for that case, eg to at most 10X per second
(every 100 msec) reopens.  There's a useful class in the Lucene in
Action 2 source code (NOTE: I am a co-author), SearcherManager, which
simplifies this for you.  You can download the source code from
http://manning.com/lucene, but we are also in the process of donating
this source code to Lucene

Also note that you need not worry about when Lucene does merges
under-the-hood.  Ie, Lucene takes care of this, and there's nothing
the app needs to "do" to handle merges & NRT readers, unless you want
to install a segment warmer that pre-warms newly merged segments
before making them visible to the next NRT reader (the SearcherManager
also makes this easy -- subclass it and override the warm method).

Mike

On Thu, Sep 30, 2010 at 4:55 AM, Uwe Schindler  wrote:
> The finalize() thing does not work correctly, as the reader holds still
> references to other stuff when not explicitely closed. As it references
> them, the finalizer() is never called, as it is not to be gc'd.
>
> You must close the reader explicit, that's all. So just close it afterusing.
> With Near Realtime Search, you normally get an IR, then wrap it with
> IndexSearcher, do your search, and close it after that. You can even call
> writer.getReader() from different threads, refcounting will close the
> readers correctly. So for each request, take a new one and close after
> usage.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -Original Message-
>> From: Jamie [mailto:ja...@stimulussoft.com]
>> Sent: Wednesday, September 29, 2010 11:50 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: File Handle Leaks During Lucene 3.0.2 Merge
>>
>>   Hi Uwe
>>
>> Thanks in advance for your help. Well, I just tried searching again and it
> made
>> no difference. My LuceneIndex getReader() function will call
>> writer.getReader() on occasion or return a cached copy. To make sure that
>> IndexReader's are closed when they are no longer needed, I wrap the
>> IndexReader as follows:
>>
>> public class VolumeIndexReader extends FilterIndexReader {
>>
>>      public VolumeIndexReader(IndexReader in) {
>>          super(in);
>>      }
>>
>>      public void finalize() {
>>          try { in.close(); } catch (Exception e) {}
>>      }
>>
>>      public IndexReader reopen(boolean readonly) throws IOException {
>>          return super.reopen(readonly);
>>      }
>> }
>>
>> You'll notice finalizer calls IndexReader.close(). After users conduct
> multiple
>> searches, the index reader should be closed in time. Therefore, its
> confusing to
>> me to see that open handles are still present. Clearly, I am doing
> something
>> wrong, but what?
>>
>> Jamie
>>
>>
>>
>> On 2010/09/29 8:21 PM, Uwe Schindler wrote:
>> > The "deleted" files are only freed by OS kernel if no longer an
>> > IndexReader accesses them. Did you get a new realtime reader after
>> > merging and*closed* the old one?
>> >
>> > -
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail:u...@thetaphi.de
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
Comments inline...

On Thu, Sep 30, 2010 at 5:26 AM, Jamie  wrote:
>  Uwe
>
> If I recall correctly when you call writer.getReader(), the returned
> IndexReader can consume alot of memory with large indexes

The reopened reader shares sub-readers with the previous one, so, if
all that's changed since a last reopen was flushing a small segment,
then the additional resources consumed will be small.

> To ensure that
> the same index reader is reused across multiple search threads, I keep a
> cached copy of the reader and return it. If a search thread closes the
> reader, then it will be closed for the other search threads and the search
> will fail.

It's good to cache the reader, but, finalize would worry me too since
you have no control over when GC gets around to calling it... you risk
tying up resources for longer than necessary.

> From my test, the finalize method in VolumeIndexReader example I
> gave you is called. The file handle leaks are coming from the core index
> loop, where I call .commit() as opposed to closing the index. Since the
> writer stays open, handles left by merge operations are never deleted. A
> solution is too close the index periodically to force the handles to be
> swept up by the OS.

IndexWriter has a reader pool, internally, where it holds open
SegmentReaders for the still-live segments in the index.  This is used
by IndexReader.reopen to share open SegmentReaders.

But the open files should correspond only to segments still "live" in
the index.  After segments are merged away, these readers are dropped.
 Is this what you are seeing?

Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Jamie

 Hi Michael / Uwe

>It's good to cache the reader, but, finalize would worry me too since
>you have no control over when GC gets around to calling it... you risk
>tying up resources for longer than necessary.

I did it this way, as I didn't want to over complicate the code by 
introducing mechanisms to track the number of search threads using a 
shared indexreader. Admittedly, its not a very clean solution but in my 
case it does work. Is there a particular technique for knowing when to a 
close a reader when there are multiple search threads using that reader? 
Should I keep some kind of counter and override the close method of the 
reader such that the underlying reader is only closed when everyone's 
done with it?

IndexWriter has a reader pool, internally, where it holds open
SegmentReaders for the still-live segments in the index.  This is used
by IndexReader.reopen to share open SegmentReaders.

But the open files should correspond only to segments still "live" in
the index.  After segments are merged away, these readers are dropped.
  Is this what you are seeing?

I dont fully understand your explanation/question. When I run lsof, I am 
seeing the following:


/usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyr.cfs 
(deleted)
/usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyp.cfs 
(deleted)


I assume these are left by the OS after the merge operation tried to 
delete old segments. The OS is unable to delete the files. I think its 
because our new code never closes the indexwriter, but rather uses the 
indexwriter.commit() method to apply the changes. Is this correct?


Jamie


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Uwe Schindler
Hi Jamie,
>  >It's good to cache the reader, but, finalize would worry me too since
>you
> have no control over when GC gets around to calling it... you risk  >tying
up
> resources for longer than necessary.
> 
> I did it this way, as I didn't want to over complicate the code by
introducing
> mechanisms to track the number of search threads using a shared
indexreader.
> Admittedly, its not a very clean solution but in my case it does work. Is
there a
> particular technique for knowing when to a close a reader when there are
> multiple search threads using that reader?
> Should I keep some kind of counter and override the close method of the
> reader such that the underlying reader is only closed when everyone's done
> with it?

The easiest would be an AtomicInteger for each cached reader that gets
incremented before you start a search and decremented on finishing search.
You can safely close the reader, when the integer is 0.

Uwe


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Merge policy, optimization for small frequently changing indexes.

2010-09-30 Thread Naveen Kumar
Hi
I have a Very large number (say 3 million) of frequently changing Small
indexes. 90% of these indexes contain about 50 documents, while a few 2-3%
indexes have about 100,000 documents each (these being the more frequently
used indexes).
Each index belongs to a signed in user, thus can have unpredictable index
changes, in short periods of time, or can be stagnant for a long time.
What kind of indexing policy (mergefactor, maxmergedocs) would be optimal
for this kind of index. Is optimizing for this kind of index needed? or
wise?, if yes, what would be a good way to optimize (please note number of
indexes are very large).

Any suggestions would be very helpful.

-- 
Thanks
Naveen Kumar


How Does Fuzzy Query Work ??

2010-09-30 Thread ahmed algohary
Hi all,

I wonder how lucene FuzzyQuery works as it seems to take much longer time
than a normal query. Does it generate all the possible terms and search for
them ??

--
Ahmed Elgohary


how to get the first term from index?

2010-09-30 Thread Sahin Buyrukbilen
Hi all,

I need to get the first term in my index and iterate it. Can anybody help
me?

Best.


Re: how to get the first term from index?

2010-09-30 Thread Anshum
Hi Sahin,
Incase you intend to get an enumerator on the terms in an index, you could
use the following call [indexreader.terms()] from IndexReader to get the
enumerator on terms and just iterate.

http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/index/IndexReader.html#terms()
Hope this is what you intended!

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Thu, Sep 30, 2010 at 11:54 PM, Sahin Buyrukbilen <
sahin.buyrukbi...@gmail.com> wrote:

> Hi all,
>
> I need to get the first term in my index and iterate it. Can anybody help
> me?
>
> Best.
>


Re: how to get the first term from index?

2010-09-30 Thread Sahin Buyrukbilen
Thank you Anshum, it seems to be working, I need to play with it.



On Thu, Sep 30, 2010 at 2:34 PM, Anshum  wrote:

> Hi Sahin,
> Incase you intend to get an enumerator on the terms in an index, you could
> use the following call [indexreader.terms()] from IndexReader to get the
> enumerator on terms and just iterate.
>
>
> http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/index/IndexReader.html#terms()
> Hope this is what you intended!
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Thu, Sep 30, 2010 at 11:54 PM, Sahin Buyrukbilen <
> sahin.buyrukbi...@gmail.com> wrote:
>
> > Hi all,
> >
> > I need to get the first term in my index and iterate it. Can anybody help
> > me?
> >
> > Best.
> >
>


Re: How Does Fuzzy Query Work ??

2010-09-30 Thread Robert Muir
On Thu, Sep 30, 2010 at 8:41 AM, ahmed algohary wrote:

> Hi all,
>
> I wonder how lucene FuzzyQuery works as it seems to take much longer time
> than a normal query. Does it generate all the possible terms and search for
> them ??
>
>
In current versions of lucene it is documented to be slow: "Warning: this
query is not very scalable with its default prefix length of 0 - in this
case, *every* term will be enumerated and cause an edit score calculation."
http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/search/FuzzyQuery.html

If you want it to be faster, use lucene trunk, which uses a different, more
sophisticated algorithm:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652

-- 
Robert Muir
rcm...@gmail.com


RE: Problem searching in the same sentence

2010-09-30 Thread Sirish Vadala

I have tried the below code:

Field field = new Field(fieldName, validFieldValue,
(store) ? Field.Store.YES : Field.Store.NO,
(tokenize) ? Field.Index.ANALYZED : Field.Index.NOT_ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS);

However, I still have the same problem. It doesn't return me the highlight
snippets. Any other hints would be highly appreciated.

Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-tp1501269p168.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
You can also use the IndexReader's incRef/decRef methods.

Mike

On Thu, Sep 30, 2010 at 6:12 AM, Uwe Schindler  wrote:
> Hi Jamie,
>>  >It's good to cache the reader, but, finalize would worry me too since
>>you
>> have no control over when GC gets around to calling it... you risk  >tying
> up
>> resources for longer than necessary.
>>
>> I did it this way, as I didn't want to over complicate the code by
> introducing
>> mechanisms to track the number of search threads using a shared
> indexreader.
>> Admittedly, its not a very clean solution but in my case it does work. Is
> there a
>> particular technique for knowing when to a close a reader when there are
>> multiple search threads using that reader?
>> Should I keep some kind of counter and override the close method of the
>> reader such that the underlying reader is only closed when everyone's done
>> with it?
>
> The easiest would be an AtomicInteger for each cached reader that gets
> incremented before you start a search and decremented on finishing search.
> You can safely close the reader, when the integer is 0.
>
> Uwe
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
On Thu, Sep 30, 2010 at 5:59 AM, Jamie  wrote:
>  Hi Michael / Uwe
>
>>It's good to cache the reader, but, finalize would worry me too since
>>you have no control over when GC gets around to calling it... you risk
>>tying up resources for longer than necessary.
>
> I did it this way, as I didn't want to over complicate the code by
> introducing mechanisms to track the number of search threads using a shared
> indexreader. Admittedly, its not a very clean solution but in my case it
> does work. Is there a particular technique for knowing when to a close a
> reader when there are multiple search threads using that reader? Should I
> keep some kind of counter and override the close method of the reader such
> that the underlying reader is only closed when everyone's done with it?

See Uwe's response (or SearcherManager).

>> IndexWriter has a reader pool, internally, where it holds open
>> SegmentReaders for the still-live segments in the index.  This is used
>> by IndexReader.reopen to share open SegmentReaders.
>>
>> But the open files should correspond only to segments still "live" in
>> the index.  After segments are merged away, these readers are dropped.
>>  Is this what you are seeing?
>>
> I dont fully understand your explanation/question. When I run lsof, I am
> seeing the following:
>
> /usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyr.cfs
> (deleted)
> /usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyp.cfs
> (deleted)
>
> I assume these are left by the OS after the merge operation tried to delete
> old segments. The OS is unable to delete the files. I think its because our
> new code never closes the indexwriter, but rather uses the
> indexwriter.commit() method to apply the changes. Is this correct?

Ahh I see they are deleted but held open... hmmm.

Though this is also what you'd see if there were still a reader open.
Are you certain all readers were closed (finalized) when you ran lsof?

Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Looking for advice on using Lucene to semantically compare two documents

2010-09-30 Thread Jonathan Ciampi
Advice on comparing two documents.
 
Summary
This project is not a search engine but a semantic comparison between two 
documents.  The purpose of this application is to assist users in modifying the 
text in a document to improve the relevancy rank of the document to another 
document.  For example, the user would want to compare Document A to Document B 
to identify the text in Document A that has relevancy to Document B.  Then, the 
user would want the ability to identify the text to modify to improve the 
relevancy rating.
 
 
Description: 
 
Both documents are XML with tags identifying the keywords or blocks of text in 
the document.  

 
Sample Structure
 
Document A
DocumentA
This is keyword 1 
Keywords can be any length 
Some keywords will match Document B 
Some keywords will not match 
Keywords can contain text, numbers, and symbols 
 
Document B
DocumentB
This is Document B keyword 1 
Document B serves as the basis or standard for comparing 
Document A will be modified by the user to match the keywords in 
Document B 

Document A and Document B will always be compared to each 
other 

This application is to help users add text, numbers and symbols to 
improve their relevancy ranking 

 
We believe we need to use Lucene to do semantic searches to determine 
relevance.  Our preferred output would be to show a user the words from each 
document with their relevancy.  To remove excessive data, the output would show 
all keywords from Document B, and only those with a relevancy ranking  from 
Document A.
 
Sample Output
 
Document B Document A Relevancy 
This is Document B keyword 1 This is keyword 1 .25 
This is Document B keyword 1 Keywords can be any length .25 
This is Document B keyword 1 Some keywords will match Document B .25 
This is Document B keyword 1 Some keywords will not match .25 
This is Document B keyword 1 Keywords can contain text, numbers, and symbols 
.25 

Document B serves as the basis or standard for comparing Some keywords will 
match Document B .5 

Document A will be modified by the user to match the   keywords in Document B 
This is keyword  1 .1 

Document A will be modified by the user to match the   keywords in Document B 
Keywords can be any length .1 

Document A will be modified by the user to match the   keywords in Document B 
Some keywords will not match .1 

Document A will be modified by the user to match the   keywords in Document B 
Some keywords will match Document B .75 

Document A will be modified by the user to match the   keywords in Document B 
Keywords can contain text, numbers, and symbols .1 

This application is to help users add text, numbers and   symbols to improve 
their relevancy ranking Keywords can contain text, numbers, and symbols .9 

  
  Jon Ciampi
Mobile (415) 990-3151


Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Jamie

 Hi Mike

 I managed to get hold of a copy of your book through Safari Books. 
Quite an impressive online reading system they have there! I integrated 
your SearchManager class into our code, but I am still seeing file 
handles marked deleted in the index directory. I am running the 
following command on Linux:


sudo watch -n 0 "lsof | grep /var/index | grep deleted | wc -l"

Every 0.1s: lsof | grep /var/index | grep deleted |...  Fri Oct  1 
09:37:36 2010


54

The deleted file handles fluctuate up and down. 54 -> 102 -> 64 -> 32, 
etc. They seem stable though. Is this to be expected when using NRT search?


 I am pretty certain that all Searchers are released at the end of 
every search. I double checked it at least twenty times.


Jamie



On 2010/09/30 11:56 PM, Michael McCandless wrote:

On Thu, Sep 30, 2010 at 5:59 AM, Jamie  wrote:

  Hi Michael / Uwe


It's good to cache the reader, but, finalize would worry me too since
you have no control over when GC gets around to calling it... you risk
tying up resources for longer than necessary.

I did it this way, as I didn't want to over complicate the code by
introducing mechanisms to track the number of search threads using a shared
indexreader. Admittedly, its not a very clean solution but in my case it
does work. Is there a particular technique for knowing when to a close a
reader when there are multiple search threads using that reader? Should I
keep some kind of counter and override the close method of the reader such
that the underlying reader is only closed when everyone's done with it?

See Uwe's response (or SearcherManager).


IndexWriter has a reader pool, internally, where it holds open
SegmentReaders for the still-live segments in the index.  This is used
by IndexReader.reopen to share open SegmentReaders.

But the open files should correspond only to segments still "live" in
the index.  After segments are merged away, these readers are dropped.
  Is this what you are seeing?


I dont fully understand your explanation/question. When I run lsof, I am
seeing the following:

/usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyr.cfs
(deleted)
/usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyp.cfs
(deleted)

I assume these are left by the OS after the merge operation tried to delete
old segments. The OS is unable to delete the files. I think its because our
new code never closes the indexwriter, but rather uses the
indexwriter.commit() method to apply the changes. Is this correct?

Ahh I see they are deleted but held open... hmmm.

Though this is also what you'd see if there were still a reader open.
Are you certain all readers were closed (finalized) when you ran lsof?

Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Uwe Schindler
Hi Jamie,

YES, ist expected for the reasons described above (segments are still
referenced by the open IndexReaders, but files were already deleted by
IndexWriter). The approx. number of open, but already deleted files should
be approx. stable.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Jamie [mailto:ja...@stimulussoft.com]
> Sent: Friday, October 01, 2010 7:41 AM
> To: java-user@lucene.apache.org
> Subject: Re: File Handle Leaks During Lucene 3.0.2 Merge
> 
>   Hi Mike
> 
>   I managed to get hold of a copy of your book through Safari Books.
> Quite an impressive online reading system they have there! I integrated
your
> SearchManager class into our code, but I am still seeing file handles
marked
> deleted in the index directory. I am running the following command on
Linux:
> 
> sudo watch -n 0 "lsof | grep /var/index | grep deleted | wc -l"
> 
> Every 0.1s: lsof | grep /var/index | grep deleted |...  Fri Oct  1
> 09:37:36 2010
> 
> 54
> 
> The deleted file handles fluctuate up and down. 54 -> 102 -> 64 -> 32,
etc. They
> seem stable though. Is this to be expected when using NRT search?
> 
>   I am pretty certain that all Searchers are released at the end of every
search.
> I double checked it at least twenty times.
> 
> Jamie
> 
> 
> 
> On 2010/09/30 11:56 PM, Michael McCandless wrote:
> > On Thu, Sep 30, 2010 at 5:59 AM, Jamie  wrote:
> >>   Hi Michael / Uwe
> >>
> >>> It's good to cache the reader, but, finalize would worry me too
> >>> since you have no control over when GC gets around to calling it...
> >>> you risk tying up resources for longer than necessary.
> >> I did it this way, as I didn't want to over complicate the code by
> >> introducing mechanisms to track the number of search threads using a
> >> shared indexreader. Admittedly, its not a very clean solution but in
> >> my case it does work. Is there a particular technique for knowing
> >> when to a close a reader when there are multiple search threads using
> >> that reader? Should I keep some kind of counter and override the
> >> close method of the reader such that the underlying reader is only
closed
> when everyone's done with it?
> > See Uwe's response (or SearcherManager).
> >
> >>> IndexWriter has a reader pool, internally, where it holds open
> >>> SegmentReaders for the still-live segments in the index.  This is
> >>> used by IndexReader.reopen to share open SegmentReaders.
> >>>
> >>> But the open files should correspond only to segments still "live"
> >>> in the index.  After segments are merged away, these readers are
dropped.
> >>>   Is this what you are seeing?
> >>>
> >> I dont fully understand your explanation/question. When I run lsof, I
> >> am seeing the following:
> >>
> >> /usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyr.cf
> >> s
> >> (deleted)
> >> /usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/logs/index/_jyp.cf
> >> s
> >> (deleted)
> >>
> >> I assume these are left by the OS after the merge operation tried to
> >> delete old segments. The OS is unable to delete the files. I think
> >> its because our new code never closes the indexwriter, but rather
> >> uses the
> >> indexwriter.commit() method to apply the changes. Is this correct?
> > Ahh I see they are deleted but held open... hmmm.
> >
> > Though this is also what you'd see if there were still a reader open.
> > Are you certain all readers were closed (finalized) when you ran lsof?
> >
> > Mike
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org