Re: QueryFilter vs CachingWrapperFilter vs RangeQuery
If you run the same query again, the IndexSearcher will go all the way to the index again - no caching. Some caching will be done by your file system, possibly, but that's it. Lucene is fast, so don't optimize early. Otis --- Ben Rooney <[EMAIL PROTECTED]> wrote: > thanks chris, > > you are correct that i'm not sure if i need the caching ability. it > is > more to understand right now so that if we do need to implement it, i > am > able to. > > the reason for the caching is that we will have listing pages for > certain content types. for example a listing page of articles. this > listing will be generated against lucene engine using a basic query. > the page will also have the ability to filter the articles based on > date > range as one example. so caching those results could be beneficial. > > however, we will also potentially want to cache the basic query so > that > subsequent queries will hit a cache. when new content is published > or > content is removed from the site, the caches will need to be > invalidated > so new results are created. > > for the basic query, is there any caching mechanism built into the > SearchIndexer or do we need to build our own caching mechanism? > > thanks > ben > > On Tue, 2004-07-12 at 12:29 -0800, Chris Hostetter wrote: > > > : > executes the search, i would keep a static reference to > SearchIndexer > > : > and then when i want to invalidate the cache, set it to null or > create > > > > : design of your system. But, yes, you do need to keep a reference > to it > > : for the cache to work properly. If you use a new IndexSearcher > > : instance (I'm simplifying here, you could have an IndexReader > instance > > : yourself too, but I'm ignoring that possibility) then the > filtering > > : process occurs for each search rather than using the cache. > > > > Assuming you have a finite number of Filters, and assuming those > Filters > > are expensive enough to be worth it... > > > > Another approach you can take to "share" the cache among multiple > > IndexReaders is to explicitly call the bits method on your > filter(s) once, > > and then cache the resulting BitSet anywhere you want (ie: > serialize it to > > disk if you so choose). and then impliment a "BitsFilter" class > that you > > can construct directly from a BitSet regardless of the IndexReader. > The > > down side of this approach is that it will *ONLY* work if you > arecertain > > that the index is never being modified. If any documents get > added, or > > the index gets re-optimized you must regenerate all of the BitSets. > > > > (That's why the CachingWrapperFilter's cache is keyed off of hte > > IndexReader ... as long as you're re-using the same IndexReader, it > know's > > that the cached BitSet must still be valid, because an IndexReader > > allways sees the same index as when it was opened, even if another > > thread/process modifies it.) > > > > > > class BitsFilter { > >BitSet bits; > >public BitsFilter(BitSet bits) { > > this.bits=bits; > >} > >public BitSet bigs(IndexReader r) { > > return bits.clone(); > >} > > } > > > > > > > > > > -Hoss > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryFilter vs CachingWrapperFilter vs RangeQuery
thanks chris, you are correct that i'm not sure if i need the caching ability. it is more to understand right now so that if we do need to implement it, i am able to. the reason for the caching is that we will have listing pages for certain content types. for example a listing page of articles. this listing will be generated against lucene engine using a basic query. the page will also have the ability to filter the articles based on date range as one example. so caching those results could be beneficial. however, we will also potentially want to cache the basic query so that subsequent queries will hit a cache. when new content is published or content is removed from the site, the caches will need to be invalidated so new results are created. for the basic query, is there any caching mechanism built into the SearchIndexer or do we need to build our own caching mechanism? thanks ben On Tue, 2004-07-12 at 12:29 -0800, Chris Hostetter wrote: > : > executes the search, i would keep a static reference to SearchIndexer > : > and then when i want to invalidate the cache, set it to null or create > > : design of your system. But, yes, you do need to keep a reference to it > : for the cache to work properly. If you use a new IndexSearcher > : instance (I'm simplifying here, you could have an IndexReader instance > : yourself too, but I'm ignoring that possibility) then the filtering > : process occurs for each search rather than using the cache. > > Assuming you have a finite number of Filters, and assuming those Filters > are expensive enough to be worth it... > > Another approach you can take to "share" the cache among multiple > IndexReaders is to explicitly call the bits method on your filter(s) once, > and then cache the resulting BitSet anywhere you want (ie: serialize it to > disk if you so choose). and then impliment a "BitsFilter" class that you > can construct directly from a BitSet regardless of the IndexReader. The > down side of this approach is that it will *ONLY* work if you arecertain > that the index is never being modified. If any documents get added, or > the index gets re-optimized you must regenerate all of the BitSets. > > (That's why the CachingWrapperFilter's cache is keyed off of hte > IndexReader ... as long as you're re-using the same IndexReader, it know's > that the cached BitSet must still be valid, because an IndexReader > allways sees the same index as when it was opened, even if another > thread/process modifies it.) > > > class BitsFilter { >BitSet bits; >public BitsFilter(BitSet bits) { > this.bits=bits; >} >public BitSet bigs(IndexReader r) { > return bits.clone(); >} > } > > > > > -Hoss > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >
Re: QueryFilter vs CachingWrapperFilter vs RangeQuery
erik, thanks for the reply i get the filter know and understand how the caching works. however the caching is only on the filtering level which means i can cache results that are filtered. but if i do a basic search against the index and want to cache that, do i need to create my own caching mechanism or does the SearchIndexer cache the results already? if it caches them already, then to clear the cache, is it again removing any references to the SearchIndexer instance? thanks again, ben On Tue, 2004-07-12 at 15:18 -0500, Erik Hatcher wrote: > On Dec 7, 2004, at 3:06 PM, Ben Rooney wrote: > > i'm trying to understand the difference/effects between QueryFilter vs > > CachingWrapperFilter and when you would use one vs the other and how > > they work exactly. > > QueryFilter caches the results (bit set of documents) of a query by > IndexReader. > > CachingWrapperFilter does not actually do any filtering of its own, but > merely wraps the results of another non-caching filter, such as > DateFilter. CachingWrapperFilter was added to disconnect caching from > filtering. QueryFilter is the exception as it came first and already > does caching. If you're using QueryFilter, there is no need to concern > yourself with CachingWrapperFilter. > > > also, when exactly will the cache be cleared. looking at the source > > code, it appears when the IndexReader is released it would be cleared. > > does this mean i should keep a reference to the SearchIndexer until i > > want the results to be cleared? for example, in a class file the > > executes the search, i would keep a static reference to SearchIndexer > > and then when i want to invalidate the cache, set it to null or create > > a > > new instance of it? > > How you keep a reference to the IndexSearcher instance is up to the > design of your system. But, yes, you do need to keep a reference to it > for the cache to work properly. If you use a new IndexSearcher > instance (I'm simplifying here, you could have an IndexReader instance > yourself too, but I'm ignoring that possibility) then the filtering > process occurs for each search rather than using the cache. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] >
Re: QueryFilter vs CachingWrapperFilter vs RangeQuery
: > executes the search, i would keep a static reference to SearchIndexer : > and then when i want to invalidate the cache, set it to null or create : design of your system. But, yes, you do need to keep a reference to it : for the cache to work properly. If you use a new IndexSearcher : instance (I'm simplifying here, you could have an IndexReader instance : yourself too, but I'm ignoring that possibility) then the filtering : process occurs for each search rather than using the cache. Assuming you have a finite number of Filters, and assuming those Filters are expensive enough to be worth it... Another approach you can take to "share" the cache among multiple IndexReaders is to explicitly call the bits method on your filter(s) once, and then cache the resulting BitSet anywhere you want (ie: serialize it to disk if you so choose). and then impliment a "BitsFilter" class that you can construct directly from a BitSet regardless of the IndexReader. The down side of this approach is that it will *ONLY* work if you arecertain that the index is never being modified. If any documents get added, or the index gets re-optimized you must regenerate all of the BitSets. (That's why the CachingWrapperFilter's cache is keyed off of hte IndexReader ... as long as you're re-using the same IndexReader, it know's that the cached BitSet must still be valid, because an IndexReader allways sees the same index as when it was opened, even if another thread/process modifies it.) class BitsFilter { BitSet bits; public BitsFilter(BitSet bits) { this.bits=bits; } public BitSet bigs(IndexReader r) { return bits.clone(); } } -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryFilter vs CachingWrapperFilter vs RangeQuery
On Dec 7, 2004, at 3:06 PM, Ben Rooney wrote: i'm trying to understand the difference/effects between QueryFilter vs CachingWrapperFilter and when you would use one vs the other and how they work exactly. QueryFilter caches the results (bit set of documents) of a query by IndexReader. CachingWrapperFilter does not actually do any filtering of its own, but merely wraps the results of another non-caching filter, such as DateFilter. CachingWrapperFilter was added to disconnect caching from filtering. QueryFilter is the exception as it came first and already does caching. If you're using QueryFilter, there is no need to concern yourself with CachingWrapperFilter. also, when exactly will the cache be cleared. looking at the source code, it appears when the IndexReader is released it would be cleared. does this mean i should keep a reference to the SearchIndexer until i want the results to be cleared? for example, in a class file the executes the search, i would keep a static reference to SearchIndexer and then when i want to invalidate the cache, set it to null or create a new instance of it? How you keep a reference to the IndexSearcher instance is up to the design of your system. But, yes, you do need to keep a reference to it for the cache to work properly. If you use a new IndexSearcher instance (I'm simplifying here, you could have an IndexReader instance yourself too, but I'm ignoring that possibility) then the filtering process occurs for each search rather than using the cache. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
QueryFilter vs CachingWrapperFilter vs RangeQuery
hello, hope someone can help explain things to me. i've been searching for sometime and i have not been able to find anything to answer my questions. i'm trying to understand the difference/effects between QueryFilter vs CachingWrapperFilter and when you would use one vs the other and how they work exactly. also, when exactly will the cache be cleared. looking at the source code, it appears when the IndexReader is released it would be cleared. does this mean i should keep a reference to the SearchIndexer until i want the results to be cleared? for example, in a class file the executes the search, i would keep a static reference to SearchIndexer and then when i want to invalidate the cache, set it to null or create a new instance of it? on top of this, using the RangeQuery object in a search does not seem to be prudent as the time is almost 4 times that of using a filter. i basically can dig on this as when doing a query, lucene needs to do scoring for all the documents that match where as using a filter it ignores scoring. to test them out, i created an index against a 2 document repository where the files in the repository are simply properties files. in the properties files, i set the publishDate property so that all documents are of year 2004. my test runs 4 queries. the first test is a basic one that returns all documents in the index that contains the word 'document'. the second test adds the query from the first test to a BooleanQuery along with a RangeQuery for the year 2004. the third test uses the query from the first test along with QueryFilter constructed using the RangeQuery. the final test is the same as the third query but the QueryFilter is wrapped in a CachingWrapperFilter class. each test runs a search against the index 100 times with the same configuration. the output from my test is as follows: 2004-12-07 20:30:03,888 DEBUG (SearchManager.java: main:138) - 2 total matching documents 2004-12-07 20:30:04,602 INFO (SearchManager.java: main:141) - query 1 - all docs - total time (ms): 768 2004-12-07 20:30:04,653 DEBUG (SearchManager.java: main:146) - 2 total matching documents 2004-12-07 20:30:06,598 INFO (SearchManager.java: main:149) - query 2 - 2004 range query - no cache - total time (ms): 1996 2004-12-07 20:30:06,614 DEBUG (SearchManager.java: main:155) - 2 total matching documents 2004-12-07 20:30:07,223 INFO (SearchManager.java: main:158) - query 3 - 2004 docs filter - no cache - total time (ms): 623 2004-12-07 20:30:07,230 DEBUG (SearchManager.java: main:164) - 2 total matching documents 2004-12-07 20:30:07,838 INFO (SearchManager.java: main:167) - query 4 - 2004 docs filter - cached - total time (ms): 613 as can be seen, there is not much different between the third and fourth queries and hence my confusion with the two types of filters. looking at the source code, there is not much different between them either. the following is the test source code: package com.blastradius.search; import java.io.File; import java.util.Date; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.CachingWrapperFilter; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.QueryFilter; import org.apache.lucene.search.RangeQuery; import org.apache.lucene.search.Searcher; import com.blastradius.search.parsers.PropertiesParser; /** * * @author brooney */ public class SearchManager { public final static String INDEX_DIR = "index"; public final static String ROOT_DIR = "webroot"; public final static File rootDir = new File(SearchManager.ROOT_DIR); private final static Log logger = LogFactory.getLog(SearchManager.class); public static void main(String[] args) { Date start = null; Date end = null; Hits hits = null; try { Searcher searcher = new IndexSearcher(SearchManager.INDEX_DIR); Analyzer analyzer = new StandardAnalyzer(); Query query = QueryParser.parse("document", "contents", analyzer); Query rangeQuery = new R