Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Ryan McKinley Thu, 23 Apr 2009 16:38:10 -0700

thanks!


On Apr 23, 2009, at 6:32 PM, Mark Miller wrote:

Looks like its my fault. Auto resolution was moved uptoIndexSearcher in Lucene, and it looks like SolrIndexSearcher is nottickling it first. I'll take a look.
- Mark

Ryan McKinley wrote:
Ok, not totally resolved....
Things work fine when I have my custom Filter alone or with otherFilters, however if I add a query string to the mix it breaks withan IllegalStateException:
java.lang.IllegalStateException: Auto should be resolved before now
at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:216)at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:73)atorg.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)atorg.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:58)atorg.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1214)atorg.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:924)atorg.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:345)atorg.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:171)atorg.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)atorg.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
atorg.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
This is for a query:
 /solr/flat/select?q=SGID&bounds=-144 2.4 -72 67 WITHIN
bounds=XXX triggers my custom filter to kick in.
Any thoughts where to look? This error is new since upgrading thelucene libs (in recent solr)
Thanks!
ryan


On Apr 20, 2009, at 7:14 PM, Ryan McKinley wrote:
thanks!
everything got better when I removed my logic to cache based onthe index modification time.
On Apr 20, 2009, at 4:51 PM, Yonik Seeley wrote:
On Mon, Apr 20, 2009 at 4:17 PM, Ryan McKinley<ryan...@gmail.com> wrote:
This issue started on java-user, but I am moving it to solr-dev:
http://www.lucidimagination.com/search/document/46481456bc214ccb/bitset_filter_arrayindexoutofboundsexception
I am using solr trunk and building an RTree from stored documentfields.This process worked fine until a recent change in 2.9 that hasdifferent
document id strategy then I was used to.

In that thread, Yonik suggested:
- pop back to the top level from the sub-reader, if you reallyneed a single
set
- if a set-per-reader will work, then cache per segment (betterfor
incremental updates anyway)

I'm not quite sure what you mean by a "set-per-reader".
I meant RTree per reader (per segment reader).
Previously I was
building a single RTree and using it until the the last modifiedtime hadchanged. This avoided building an index anytime a new readerwas opened and
the index had not changed.
I *think* that our use of re-open will return the same IndexReader
instance if nothing has changed... so you shouldn't have to tryand do
that yourself.
I'm fine building a new RTree for each reader if
that is required.
If that works just as well, it will put you in a better positionforfaster incremental updates... new RTrees will be built only forthose
segments that have changed.
Is there any existing code that deals with this situation?
To cache an RTree per reader, you could use the same logic as
FieldCache uses... a weak map with the reader as the key.
If a single top-level RTree that covers the entire index worksbetter
for you, then you can cache the RTree based on the top level multi
reader and translate the ids... that was my fix forExternalFileField.
See FileFloatSource.getValues() for the implementation.
- - - -

Yonik also suggested:
Relatively new in 2.9, you can pass null to enumerate over allnon-deleted
docs:
TermDocs td = reader.termDocs(null);
It would probably be a lot faster to iterate over indexed valuesthough.
If I iterate of indexed values (from the FieldCache i presume)then how do i
get access to the document id?
IndexReader.terms(Term t) returns a TermEnum that can iterate over
terms, starting at t.
IndexReader.termDocs(Term t or TermEnum te) will give you thelist of
documents that match a term.


-Yonik
--
- Mark

http://www.lucidimagination.com

Re: lucene 2.9 migration issues -- MultiReader vs IndexReader document ids

Reply via email to