Collating results from multiple indexes

2010-01-25 Thread Aaron McKee
Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN? As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an

ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Aaron McKee
I was wondering if anyone might have any insight on the following problem. I'm using the latest Solr code from SVN and indexing around 17m XML records via DIH. With perfect replicability, the following exception is thrown on the same aggregate file (#236, and each XML file has ~50k records),

De-basing / re-basing docIDs, or how to effectively pass calculated values from a Scorer or Filter up to (Solr's) QueryComponent.process

2009-10-06 Thread Aaron McKee
(Posted here, per Yonik's suggestion) In the code I'm working with, I generate a cache of calculated values as a by-product within a Filter.getDocidSet implementation (and within a Query-ized version of the filter and its Scorer method) . These values are keyed off the IndexReader's docID

Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Aaron McKee
:50 PM, Aaron McKee ucbmc...@gmail.com wrote: Hello, Let me preface this by admitting that I'm still fairly new to Lucene and Solr, so I apologize if any of this sounds naive and I'm open to thinking about my problem differently. I'm currently responsible for a rather large dataset of business

Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Aaron McKee
touch a rather large number of code points. Best regards, Aaron Yonik Seeley wrote: On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee ucbmc...@gmail.com wrote: I suppose I'm curious why the omitTfAndPositions option conflates two apparently independent features. This relates to the index

Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-18 Thread Aaron McKee
on IDF. Of course, I'm sure there are others who probably wouldn't need or care about IDF, either, but still want phrase matching. Cheers, Aaron Yonik Seeley wrote: On Fri, Sep 18, 2009 at 11:05 AM, Aaron McKee ucbmc...@gmail.com wrote: I wonder, though, if it could also make sense to support

Disabling tf (term frequency) during indexing and/or scoring

2009-09-14 Thread Aaron McKee
Hello, Let me preface this by admitting that I'm still fairly new to Lucene and Solr, so I apologize if any of this sounds naive and I'm open to thinking about my problem differently. I'm currently responsible for a rather large dataset of business records that I'm trying to build a