Re: CachedSearcher
Halcsy Pter wrote: I made an IndexReaderCache class from the code you have sent (the code in demo/Search.jhtml). But this causes exception: IndexSearcher searcher = new IndexSearcher(cache.getReader(/data/index)); searcher.close(); searcher = new IndexSearcher(cache.getReader(/data/index)); searcher.search(aQuery); when I call the close method the searcher closes the indexreader You don't need to close the searcher. If you don't close it, you won't have this problem. Finalizers will close the open files. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: CachedSearcher
On Tuesday, July 16, 2002, at 01:23 PM, Scott Ganyo wrote: Point taken. Indeed, these were general recommendations that may/may not have a strong impact on Lucene's specific use of finalization. My only specific performance claim is that there will be a negative impact of some degree using finalizers. Whether that impact is noticable or not will probably depend upon a number of factors. So I will avoid making any further judgements on the impact of finalization in Lucene on the performance until I have proof. Benchmarks aside, my point on the file handles is something that hit us square between the eyes. Before we started caching and explicitly closing our Searchers we would regularly run out of file handles because of Lucene. This was despite increasing our allocated file handles to ludicrous levels in the OS. I would recommend that, in general, Java developers would be well advised to explicitly release external resources when done with them rather than allowing finalization to take care of it. Scott Ahh, I take back my last comment about renaming close() to dispose(). If the IndexReader simply had a bunch of in-memory data, then dispose() would be appropriate. If it holds onto resources outside of the VM (typical examples are Window objects, file streams, network sockets, etc. then close() should be one of those mandatory methods to be invoked when done with it. In general one should *not* /rely/ on the GC to clean up external resources. That's an important lesson repeated in various articles and books and testimonials I've learned over years of Java development. This might clear up the issues some people have been having with not having enough file handles available on their OS. ~ David Smiley -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: CachedSearcher
-Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 16, 2002 1:00 AM To: Lucene Users List Subject: Re: CachedSearcher Why is this more complicated than the code in demo/Search.jhtml (included below)? FSDirectory closes files as they're GC'd, so you don't have to explicitly close the IndexReaders or Searchers. I'll check this code, but I think it could hang up with a lot of opened IndexReader. http://developer.java.sun.com/developer/TechTips/2000/tt0124.html (If a lot of searcher is requested ant a writer is always modificating the index). peter -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: CachedSearcher
Halácsy Péter wrote: Hello! A lot of people requested a code to cache opened Searcher objects until the index is not modified. The first version of this was writed by Scott Ganyo and submitted as IndexAccessControl to the list. Now I've decoupled the logic that is needed to manage searher. The usage is very simple: IndexSearcherCache isc = new IndexSearcherCache(new File(/path/to/the/index)); for(int i= 0; i++; i 100) { Searcher searcher = isc.getSearcher(); // search here searcher.close(); } only one Searcher will be opened here if no other thread is writing the index; if the index was modified getSearcher() will close the old one and create a new. Unfortunatly to compile and use this code one has to modify the lucene source: 1. change all package-protected abstract method to public in Searcher.java /** Frees resources associated with this Searcher. */ abstract public void close() throws IOException; abstract int docFreq(Term term) throws IOException; abstract int maxDoc() throws IOException; abstract TopDocs search(Query query, Filter filter, int n) throws IOException; /** Frees resources associated with this Searcher. */ public abstract void close() throws IOException; public abstract int docFreq(Term term) throws IOException; public abstract int maxDoc() throws IOException; public abstract TopDocs search(Query query, Filter filter, int n) throws IOException; 2 change package protected TopDocs to public (in TopDocs.java) final class TopDocs { -- public final class TopDocs { Or you can use the modified files I've attached. I hope this code is helpful. The main idea to have an interface SearcherSource something similar to DataSource in javax.sql. SearcherSource is responsible for creating searcher object. One implementation is SearcherCache that encapsulates the logic of caching searcher. IndexSearcherCache - as you might figure out - can cache IndexSearcher objects. Someone could implement a MultiSearcherCache class that manages... (recreates the searcher if one of the searchers need reopening). I create IndexSearcherCache in my init method and pass the object as a SearcherSource to the working methods. In the destroy process I call release() method. In this way I can later change the implementation of the cache as far as it implementing SearcherSource. peter ps: of cource you can change the code, class/method/package/.. names; Unfortunatly a lot of System.out.println debugging code is used but it is very good to understand the behaviour. Name: CachedSearcher.zip CachedSearcher.zip Type: Zip Compressed Data (application/x-zip-compressed) Encoding: base64 Description: CachedSearcher.zip Name: TopDocs.java TopDocs.java Type: unspecified type (application/octet-stream) Encoding: base64 Description: TopDocs.java Name: Searcher.java Searcher.java Type: unspecified type (application/octet-stream) Encoding: base64 Description: Searcher.java Part 1.5Type: Plain Text (text/plain) I am new here, I am sorry if this question has been asked before. Why there are so many final and package-protected methods? I want to change the way TermQuery doing scores. Ideally, I would like to have subclasses of TermQuery and TermScorer, and place them in my OWN package. Currently, I have to put these two in lucene, and I have to copy almost every line of the TermQuery class into my new query class except the line returns Scorer. Note, this may be a bad example, but I still want to know if we can make Lucene more extendable from outside in the future. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: CachedSearcher
I'd like to see the finalize() methods removed from Lucene entirely. In a system with heavy load and lots of gc, using finalize() causes problems. To wit: 1) I was at a talk at JavaOne last year where the gc performance experts from Sun (the engineers actually writing the HotSpot gc) were giving performance advice. They specifically stated that finalize() should be avoided if at all possible because the following steps have to happen for finalized objects: a) register the object when created b) notice the object when it becomes unreachable c) finalize the object d) notice the object when it becomes unreachable (again) e) reclaim the object This leads to the following effects in the vm: a) allocation is slower b) heap is bigger c) gc pauses are longer The Sun engineers recommended that if you really do need an automatic clean up process, that Weak references are *much* more efficient and should be used in preference to finalize(). 2) External resources (i.e. file handles) are not released until the reader is closed. And, as many have found, Lucene eats file handles for breakfast, lunch, and dinner. Scott -Original Message- From: Halcsy Pter [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 16, 2002 12:43 AM To: Lucene Users List Subject: RE: CachedSearcher -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 16, 2002 1:00 AM To: Lucene Users List Subject: Re: CachedSearcher Why is this more complicated than the code in demo/Search.jhtml (included below)? FSDirectory closes files as they're GC'd, so you don't have to explicitly close the IndexReaders or Searchers. I'll check this code, but I think it could hang up with a lot of opened IndexReader. http://developer.java.sun.com/developer/TechTips/2000/tt0124.html (If a lot of searcher is requested ant a writer is always modificating the index). peter -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: CachedSearcher
Scott Ganyo wrote: I'd like to see the finalize() methods removed from Lucene entirely. In a system with heavy load and lots of gc, using finalize() causes problems. [ ... ] External resources (i.e. file handles) are not released until the reader is closed. And, as many have found, Lucene eats file handles for breakfast, lunch, and dinner. Lucene does open and close lots of files relative to many other applications, but the number of files opened is still many orders of magnitude less than the number of other objects allocated. I would be very surprised if finalizers for the hundreds of files that Lucene might open in a session would have any measurable impact on garbage collector performance given the millions of other objects that the garbage collector might process in that session. As usual, one should not make performance claims without performing benchmarks. It would be a simple matter to comment out the finalize() methods, recompile and compare indexing and search speed. If the improvement is significant, then we can consider removing finalize methods. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: CachedSearcher
Point taken. Indeed, these were general recommendations that may/may not have a strong impact on Lucene's specific use of finalization. My only specific performance claim is that there will be a negative impact of some degree using finalizers. Whether that impact is noticable or not will probably depend upon a number of factors. So I will avoid making any further judgements on the impact of finalization in Lucene on the performance until I have proof. Benchmarks aside, my point on the file handles is something that hit us square between the eyes. Before we started caching and explicitly closing our Searchers we would regularly run out of file handles because of Lucene. This was despite increasing our allocated file handles to ludicrous levels in the OS. I would recommend that, in general, Java developers would be well advised to explicitly release external resources when done with them rather than allowing finalization to take care of it. Scott -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 16, 2002 11:56 AM To: Lucene Users List Subject: Re: CachedSearcher Scott Ganyo wrote: I'd like to see the finalize() methods removed from Lucene entirely. In a system with heavy load and lots of gc, using finalize() causes problems. [ ... ] External resources (i.e. file handles) are not released until the reader is closed. And, as many have found, Lucene eats file handles for breakfast, lunch, and dinner. Lucene does open and close lots of files relative to many other applications, but the number of files opened is still many orders of magnitude less than the number of other objects allocated. I would be very surprised if finalizers for the hundreds of files that Lucene might open in a session would have any measurable impact on garbage collector performance given the millions of other objects that the garbage collector might process in that session. As usual, one should not make performance claims without performing benchmarks. It would be a simple matter to comment out the finalize() methods, recompile and compare indexing and search speed. If the improvement is significant, then we can consider removing finalize methods. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: CachedSearcher
Hang Li wrote: Why there are so many final and package-protected methods? The package private stuff was motivated by Javadoc. When I wrote Lucene I wanted the Javadoc to make it easy to use. Thus I did not want the Javadoc cluttered with lots of methods that 99% of users did not need to know about. So a problem is how to distinguish methods that are meant for end users from those that only may rarely be needed by an expert developer. Perhaps we could establish a Javadoc convention for those methods that most users don't need to know about. For example, their documentation could begin Expert: or something. What do folks think of that? Also, many package private methods really are internal methods that are not designed to be called outside of the implementation. Trying to override them probably won't work. When stuff that is tricky to use is documented and easy to use, folks will use it, it won't work, and they'll complain, wasting everyone's time. So we must be careful about what is made public. I would rather err on the side of exposing less than more--folks who know what they're doing can always add code into a lucene package. It's not ideal, but it works. Some 'final' declarations made a performance difference when javac did inlining, but no longer do, and should probably be removed now. Some still keep people from subclassing things that are not designed to be subclassed. So these should also be considered on a case-by-case basis. I want to change the way TermQuery doing scores. Could you please make a proposal to the lucene-dev list of which methods and classes should be made public or protected or non-final, and what documentation should be added? Thanks, Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: CachedSearcher
On Monday, July 15, 2002, at 10:19 PM, Kelvin Tan wrote: FSDirectory closes files as they're GC'd, so you don't have to explicitly close the IndexReaders or Searchers. Doug hmmm...is this documented somewhere? I go through quite abit of trouble just to close Searchers (because Hits become invalid when the Searcher is closed). If the object has a close() method with public modifier, isn't it a common idiom that client code needs to invoke close() explicitly? I absolutely agree. If letting it get GC'ed is fine, then just about any other name, like dispose might be better. If there's no real need to call close, maybe it can be changed to protected? I wouldn't go that far. ~ David Smiley -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: CachedSearcher
-Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 16, 2002 6:44 PM To: Lucene Users List Subject: Re: CachedSearcher Kelvin Tan wrote: If the object has a close() method with public modifier, isn't it a common idiom that client code needs to invoke close() explicitly? If there's no real need to call close, maybe it can be changed to protected? Yes, that is a common idiom. In the case of Lucene's FSDirectory, it's still a good idea to close it when you know its no longer needed, to minimize the number of open files, but sometimes it is difficult to know when it is no longer needed. Finalizers are intended for precisely this purpose. But you're right, probably this should be better documented. Doug Doug! I made an IndexReaderCache class from the code you have sent (the code in demo/Search.jhtml). But this causes exception: IndexSearcher searcher = new IndexSearcher(cache.getReader(/data/index)); searcher.close(); searcher = new IndexSearcher(cache.getReader(/data/index)); searcher.search(aQuery); when I call the close method the searcher closes the indexreader but the cache (or your getReader method) returns the closed reader one more time that's why I made a subclass of searcher that can be closed if the user doesn't want to use it any more you wrote: sometimes it is difficult to know when it is no longer needed I think: use a cache and you don't have to know when it is no longer needed! ;) peter IndexReaderCache.java Description: IndexReaderCache.java -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: CachedSearcher
Halcsy Pter wrote: A lot of people requested a code to cache opened Searcher objects until the index is not modified. The first version of this was writed by Scott Ganyo and submitted as IndexAccessControl to the list. Now I've decoupled the logic that is needed to manage searher. Unfortunatly to compile and use this code one has to modify the lucene source: Why is this more complicated than the code in demo/Search.jhtml (included below)? FSDirectory closes files as they're GC'd, so you don't have to explicitly close the IndexReaders or Searchers. Doug /** Keep a cache of open IndexReader's, so that an index does not * have to opened for each query. The cache re-opens an index when * it has changed so that additions and deletions are visible ASAP. */ static Hashtable indexCache = new Hashtable(); // name-CachedIndex class CachedIndex {// a cache entry IndexReader reader; // an open reader long modified; // reader's mod. date CachedIndex(String name) throws IOException { modified = IndexReader.lastModified(name); // get mod. date reader = IndexReader.open(name); // open reader } } IndexReader getReader(String name) throws ServletException { CachedIndex index = // look in cache (CachedIndex)indexCache.get(name); try { if (index != null// check up-to-date (index.modified == IndexReader.lastModified(name))) return index.reader; // cache hit else { index = new CachedIndex(name);// cache miss } } catch (IOException e) { StringWriter writer = new StringWriter(); PrintWriter pw = new PrintWriter(writer); throw new ServletException(Could not open index + name + : + e.getClass().getName() + -- + e.getMessage()); } indexCache.put(name, index); // add to cache return index.reader; } -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: CachedSearcher
FSDirectory closes files as they're GC'd, so you don't have to explicitly close the IndexReaders or Searchers. Doug hmmm...is this documented somewhere? I go through quite abit of trouble just to close Searchers (because Hits become invalid when the Searcher is closed). If the object has a close() method with public modifier, isn't it a common idiom that client code needs to invoke close() explicitly? If there's no real need to call close, maybe it can be changed to protected? Regards, Kelvin -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]