Re: CachedSearcher

2002-07-17 Thread Doug Cutting

Halcsy Pter wrote:
 I made an IndexReaderCache class from the code you have sent (the code in 
demo/Search.jhtml).
 But this causes exception:
 IndexSearcher searcher = new IndexSearcher(cache.getReader(/data/index));
 searcher.close();
 
 
 searcher = new IndexSearcher(cache.getReader(/data/index));
 searcher.search(aQuery);
 
 when I call the close method the searcher closes the indexreader

You don't need to close the searcher.  If you don't close it, you won't 
have this problem.  Finalizers will close the open files.

Doug



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: CachedSearcher

2002-07-17 Thread David Smiley


On Tuesday, July 16, 2002, at 01:23  PM, Scott Ganyo wrote:

 Point taken.  Indeed, these were general recommendations that 
 may/may not
 have a strong impact on Lucene's specific use of finalization.  My only
 specific performance claim is that there will be a negative impact 
 of some
 degree using finalizers.  Whether that impact is noticable or not will
 probably depend upon a number of factors.  So I will avoid making any
 further judgements on the impact of finalization in Lucene on the
 performance until I have proof.

 Benchmarks aside, my point on the file handles is something that hit us
 square between the eyes.  Before we started caching and explicitly 
 closing
 our Searchers we would regularly run out of file handles because of 
 Lucene.
 This was despite increasing our allocated file handles to ludicrous 
 levels
 in the OS.  I would recommend that, in general, Java developers 
 would be
 well advised to explicitly release external resources when done with 
 them
 rather than allowing finalization to take care of it.

 Scott


Ahh, I take back my last comment about renaming close() to 
dispose().  If the IndexReader simply had a bunch of in-memory data, 
then dispose() would be appropriate.  If it holds onto resources 
outside of the VM (typical examples are Window objects, file streams, 
network sockets, etc. then close() should be one of those mandatory 
methods to be invoked when done with it.  In general one should *not* 
/rely/ on the GC to clean up external resources.  That's an important 
lesson repeated in various articles and books and testimonials I've 
learned over years of Java development.

This might clear up the issues some people have been having with not 
having enough file handles available on their OS.

~ David Smiley


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: CachedSearcher

2002-07-16 Thread Halcsy Pter



 -Original Message-
 From: Doug Cutting [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, July 16, 2002 1:00 AM
 To: Lucene Users List
 Subject: Re: CachedSearcher
 
 
 Why is this more complicated than the code in demo/Search.jhtml 
 (included below)?  FSDirectory closes files as they're GC'd, so you 
 don't have to explicitly close the IndexReaders or Searchers.

I'll check this code, but I think it could hang up with a lot of opened IndexReader.
http://developer.java.sun.com/developer/TechTips/2000/tt0124.html

(If a lot of searcher is requested ant a writer is always modificating the index). 

peter

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: CachedSearcher

2002-07-16 Thread Hang Li

Halácsy Péter wrote:

 Hello!
 A lot of people requested a code to cache opened Searcher objects until the index is 
not modified. The first version of this was writed by Scott Ganyo and submitted as 
IndexAccessControl to the list.

 Now I've decoupled the logic that is needed to manage searher.

 The usage is very simple:
 IndexSearcherCache isc = new IndexSearcherCache(new File(/path/to/the/index));
 for(int i= 0; i++; i 100) {
   Searcher searcher = isc.getSearcher();
   // search here
   searcher.close();
 }

 only one Searcher will be opened here if no other thread is writing the index; if 
the index was modified getSearcher() will close the old one and create a new.

 Unfortunatly to compile and use this code one has to modify the lucene source:

 1. change all package-protected abstract method to public in Searcher.java

   /** Frees resources associated with this Searcher. */
   abstract public void close() throws IOException;

   abstract int docFreq(Term term) throws IOException;
   abstract int maxDoc() throws IOException;
   abstract TopDocs search(Query query, Filter filter, int n)
throws IOException;

   /** Frees resources associated with this Searcher. */
   public abstract  void close() throws IOException;

   public abstract int docFreq(Term term) throws IOException;
   public abstract int maxDoc() throws IOException;
   public abstract TopDocs search(Query query, Filter filter, int n)
throws IOException;

 2 change package protected TopDocs to public (in TopDocs.java)
 final class TopDocs {  -- public final class TopDocs {

 Or you can use the modified files I've attached.

 I hope this code is helpful.

 The main idea to have an interface SearcherSource something similar to DataSource in 
javax.sql. SearcherSource is responsible for creating searcher object. One 
implementation is SearcherCache that encapsulates the logic of caching searcher. 
IndexSearcherCache - as you might figure out - can cache IndexSearcher objects. 
Someone could implement a MultiSearcherCache class that manages... (recreates the 
searcher if one of the searchers need reopening).

 I create IndexSearcherCache in my init method and pass the object as a 
SearcherSource to the working methods. In the destroy process I call release() 
method. In this way I can later change the implementation of the cache as far as it 
implementing SearcherSource.

 peter

 ps: of cource you can change the code, class/method/package/.. names;
 Unfortunatly a lot of System.out.println debugging code is used but it is very good 
to understand the behaviour.

   
 Name: CachedSearcher.zip
CachedSearcher.zip   Type: Zip Compressed Data (application/x-zip-compressed)
 Encoding: base64
  Description: CachedSearcher.zip

   Name: TopDocs.java
TopDocs.java   Type: unspecified type (application/octet-stream)
   Encoding: base64
Description: TopDocs.java

Name: Searcher.java
Searcher.java   Type: unspecified type (application/octet-stream)
Encoding: base64
 Description: Searcher.java

Part 1.5Type: Plain Text (text/plain)

I am new here, I am sorry if this question has been asked before. Why there are so 
many final and package-protected methods?  I want to change the way TermQuery doing 
scores. Ideally, I would like to have  subclasses of TermQuery and TermScorer, and 
place them in my OWN package. Currently, I have to put these two in lucene, and I have 
to copy almost every line of the TermQuery class into my new query class except the 
line returns Scorer. Note, this
may be a bad example, but I still want to know if we can make Lucene more extendable 
from outside in the future.


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: CachedSearcher

2002-07-16 Thread Scott Ganyo

I'd like to see the finalize() methods removed from Lucene entirely.  In a
system with heavy load and lots of gc, using finalize() causes problems.  To
wit:

1) I was at a talk at JavaOne last year where the gc performance experts
from Sun (the engineers actually writing the HotSpot gc) were giving
performance advice.  They specifically stated that finalize() should be
avoided if at all possible because the following steps have to happen for
finalized objects:
  a) register the object when created
  b) notice the object when it becomes unreachable
  c) finalize the object
  d) notice the object when it becomes unreachable (again)
  e) reclaim the object

This leads to the following effects in the vm:
  a) allocation is slower
  b) heap is bigger
  c) gc pauses are longer

The Sun engineers recommended that if you really do need an automatic clean
up process, that Weak references are *much* more efficient and should be
used in preference to finalize().

2) External resources (i.e. file handles) are not released until the reader
is closed.  And, as many have found, Lucene eats file handles for breakfast,
lunch, and dinner.

Scott

 -Original Message-
 From: Halcsy Pter [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, July 16, 2002 12:43 AM
 To: Lucene Users List
 Subject: RE: CachedSearcher
 
 
 
 
  -Original Message-
  From: Doug Cutting [mailto:[EMAIL PROTECTED]]
  Sent: Tuesday, July 16, 2002 1:00 AM
  To: Lucene Users List
  Subject: Re: CachedSearcher
  
  
  Why is this more complicated than the code in demo/Search.jhtml 
  (included below)?  FSDirectory closes files as they're GC'd, so you 
  don't have to explicitly close the IndexReaders or Searchers.
 
 I'll check this code, but I think it could hang up with a lot 
 of opened IndexReader.
 http://developer.java.sun.com/developer/TechTips/2000/tt0124.html
 
 (If a lot of searcher is requested ant a writer is always 
 modificating the index). 
 
 peter
 
 --
 To unsubscribe, e-mail:   
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: 
 mailto:[EMAIL PROTECTED]
 



Re: CachedSearcher

2002-07-16 Thread Doug Cutting

Scott Ganyo wrote:
 I'd like to see the finalize() methods removed from Lucene entirely.  In a
 system with heavy load and lots of gc, using finalize() causes problems.
  [ ... ]
  External resources (i.e. file handles) are not released until the reader
 is closed.  And, as many have found, Lucene eats file handles for breakfast,
 lunch, and dinner.

Lucene does open and close lots of files relative to many other applications, 
but the number of files opened is still many orders of magnitude less than the 
number of other objects allocated.  I would be very surprised if finalizers for 
the hundreds of files that Lucene might open in a session would have any 
measurable impact on garbage collector performance given the millions of other 
objects that the garbage collector might process in that session.

As usual, one should not make performance claims without performing benchmarks. 
  It would be a simple matter to comment out the finalize() methods, recompile 
and compare indexing and search speed.  If the improvement is significant, then 
we can consider removing finalize methods.

Doug


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: CachedSearcher

2002-07-16 Thread Scott Ganyo

Point taken.  Indeed, these were general recommendations that may/may not
have a strong impact on Lucene's specific use of finalization.  My only
specific performance claim is that there will be a negative impact of some
degree using finalizers.  Whether that impact is noticable or not will
probably depend upon a number of factors.  So I will avoid making any
further judgements on the impact of finalization in Lucene on the
performance until I have proof.

Benchmarks aside, my point on the file handles is something that hit us
square between the eyes.  Before we started caching and explicitly closing
our Searchers we would regularly run out of file handles because of Lucene.
This was despite increasing our allocated file handles to ludicrous levels
in the OS.  I would recommend that, in general, Java developers would be
well advised to explicitly release external resources when done with them
rather than allowing finalization to take care of it.

Scott

 -Original Message-
 From: Doug Cutting [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, July 16, 2002 11:56 AM
 To: Lucene Users List
 Subject: Re: CachedSearcher
 
 
 Scott Ganyo wrote:
  I'd like to see the finalize() methods removed from Lucene 
 entirely.  In a
  system with heavy load and lots of gc, using finalize() 
 causes problems.
   [ ... ]
   External resources (i.e. file handles) are not released 
 until the reader
  is closed.  And, as many have found, Lucene eats file 
 handles for breakfast,
  lunch, and dinner.
 
 Lucene does open and close lots of files relative to many 
 other applications, 
 but the number of files opened is still many orders of 
 magnitude less than the 
 number of other objects allocated.  I would be very surprised 
 if finalizers for 
 the hundreds of files that Lucene might open in a session 
 would have any 
 measurable impact on garbage collector performance given the 
 millions of other 
 objects that the garbage collector might process in that session.
 
 As usual, one should not make performance claims without 
 performing benchmarks. 
   It would be a simple matter to comment out the finalize() 
 methods, recompile 
 and compare indexing and search speed.  If the improvement is 
 significant, then 
 we can consider removing finalize methods.
 
 Doug
 
 
 --
 To unsubscribe, e-mail:   
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



Re: CachedSearcher

2002-07-16 Thread Doug Cutting

Hang Li wrote:
 Why there are so many final and package-protected methods?

The package private stuff was motivated by Javadoc.  When I wrote Lucene I 
wanted the Javadoc to make it easy to use.  Thus I did not want the Javadoc 
cluttered with lots of methods that 99% of users did not need to know about.

So a problem is how to distinguish methods that are meant for end users from 
those that only may rarely be needed by an expert developer.  Perhaps we could 
establish a Javadoc convention for those methods that most users don't need to 
know about.  For example, their documentation could begin Expert: or 
something.  What do folks think of that?

Also, many package private methods really are internal methods that are not 
designed to be called outside of the implementation.  Trying to override them 
probably won't work.  When stuff that is tricky to use is documented and easy 
to use, folks will use it, it won't work, and they'll complain, wasting 
everyone's time.  So we must be careful about what is made public.  I would 
rather err on the side of exposing less than more--folks who know what they're 
doing can always add code into a lucene package.  It's not ideal, but it works.

Some 'final' declarations made a performance difference when javac did 
inlining, but no longer do, and should probably be removed now.  Some still 
keep people from subclassing things that are not designed to be subclassed.  So 
these should also be considered on a case-by-case basis.

  I want to change the way TermQuery doing scores.

Could you please make a proposal to the lucene-dev list of which methods and 
classes should be made public or protected or non-final, and what documentation 
should be added?

Thanks,

Doug


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: CachedSearcher

2002-07-16 Thread David Smiley


On Monday, July 15, 2002, at 10:19  PM, Kelvin Tan wrote:

 FSDirectory closes files as they're GC'd, so you
 don't have to explicitly close the IndexReaders or Searchers.

 Doug


 hmmm...is this documented somewhere? I go through quite abit of trouble
 just to close Searchers (because Hits become invalid when the 
 Searcher is
 closed).

 If the object has a close() method with public modifier, isn't it a 
 common
 idiom that client code needs to invoke close() explicitly?

I absolutely agree.  If letting it get GC'ed is fine, then just about 
any other name, like dispose might be better.

 If there's no
 real need to call close, maybe it can be changed to protected?

I wouldn't go that far.

~ David Smiley


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: CachedSearcher

2002-07-16 Thread Halcsy Pter



 -Original Message-
 From: Doug Cutting [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, July 16, 2002 6:44 PM
 To: Lucene Users List
 Subject: Re: CachedSearcher
 
 
 Kelvin Tan wrote:
  If the object has a close() method with public modifier, 
 isn't it a common 
  idiom that client code needs to invoke close() explicitly? 
 If there's no 
  real need to call close, maybe it can be changed to protected?
 
 Yes, that is a common idiom.  In the case of Lucene's 
 FSDirectory, it's still a 
 good idea to close it when you know its no longer needed, to 
 minimize the 
 number of open files, but sometimes it is difficult to know 
 when it is no 
 longer needed.  Finalizers are intended for precisely this 
 purpose.  But you're 
 right, probably this should be better documented.
 
 Doug
 


Doug!
I made an IndexReaderCache class from the code you have sent (the code in 
demo/Search.jhtml).
But this causes exception:
IndexSearcher searcher = new IndexSearcher(cache.getReader(/data/index));
searcher.close();


searcher = new IndexSearcher(cache.getReader(/data/index));
searcher.search(aQuery);

when I call the close method the searcher closes the indexreader but the cache (or 
your getReader method) returns the closed reader one more time

that's why I made a subclass of searcher that can be closed if the user doesn't want 
to use it any more

you wrote: sometimes it is difficult to know  when it is no longer needed

I think: use a cache and you don't have to know when it is no longer needed! ;)

peter



IndexReaderCache.java
Description: IndexReaderCache.java

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]


Re: CachedSearcher

2002-07-15 Thread Doug Cutting

Halcsy Pter wrote:
 A lot of people requested a code to cache opened Searcher objects until the index is 
not modified. The first version of this was writed by Scott Ganyo and submitted as 
IndexAccessControl to the list.
 
 Now I've decoupled the logic that is needed to manage searher.
 
 Unfortunatly to compile and use this code one has to modify the lucene source: 

Why is this more complicated than the code in demo/Search.jhtml 
(included below)?  FSDirectory closes files as they're GC'd, so you 
don't have to explicitly close the IndexReaders or Searchers.

Doug

   /** Keep a cache of open IndexReader's, so that an index does not
* have to opened for each query.  The cache re-opens an index when
* it has changed so that additions and deletions are visible ASAP.
   */

   static Hashtable indexCache = new Hashtable();  // name-CachedIndex

   class CachedIndex {// a cache entry
 IndexReader reader;  // an open reader
 long modified;   // reader's mod. date

 CachedIndex(String name) throws IOException {
   modified = IndexReader.lastModified(name);  // get mod. date
   reader = IndexReader.open(name);   // open reader
 }
   }

   IndexReader getReader(String name) throws ServletException {
 CachedIndex index =  // look in cache
   (CachedIndex)indexCache.get(name);

 try {
   if (index != null// check up-to-date
  (index.modified == IndexReader.lastModified(name)))
return index.reader;  // cache hit
   else {
index = new CachedIndex(name);// cache miss
   }
 } catch (IOException e) {
   StringWriter writer = new StringWriter();
   PrintWriter pw = new PrintWriter(writer);
   throw new ServletException(Could not open index  + name + :  +
 e.getClass().getName() + -- +
 e.getMessage());
 }

 indexCache.put(name, index); // add to cache
 return index.reader;
   }


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: CachedSearcher

2002-07-15 Thread Kelvin Tan

FSDirectory closes files as they're GC'd, so you
don't have to explicitly close the IndexReaders or Searchers.

Doug


hmmm...is this documented somewhere? I go through quite abit of trouble
just to close Searchers (because Hits become invalid when the Searcher is
closed).

If the object has a close() method with public modifier, isn't it a common
idiom that client code needs to invoke close() explicitly? If there's no
real need to call close, maybe it can be changed to protected?

Regards,
Kelvin


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]