Thanks Mike. I'm not sure this _should_ be fixed mind you, but thought I'd ask.
On Thu, Sep 22, 2016 at 10:16 AM, Michael McCandless
wrote:
> You could index the prefix terms (edge ngrams), assuming your queries
> are prefix queries; this way there would typically be far fewer terms
> to visit than all 200 M terms.
>
> Auto-prefix terms also tried to solves this more "automatically", so
> you don't have to mess with edge ngrams, but we reverted it because of
> the added code complexity and lack of real-word use cases especially
> once we switched numerics from postings to dimensional points
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Sep 22, 2016 at 1:01 PM, Erick Erickson
> wrote:
>> In MultiTermConstantScoreWrapper there's this block around line 174 in 6x:
>>
>> do {
>> docs = termsEnum.postings(docs, PostingsEnum.NONE);
>> builder.add(docs);
>> } while (termsEnum.next() != null);
>>
>> In the case of lots and lots of terms in a multiValued field this can
>> take quite a bit of time. In my test case I have 100K docs with 200M
>> terms (pathological I understand, but it illustrates the issue). If
>> I'm reading this right it loops through all the terms and, for each
>> term, creates a sub-list of docs for the term and adds the sub-list to
>> the "master list". So a query like 'field:*' takes 20+ seconds.
>>
>> Is there anything we can/should do to short circuit this kind of
>> thing? In this case I got 200M terms by ngramming 3-32 (again, far too
>> many ngrams I understand). It's not clear to me whether it's an easy
>> check to say "stop when all the docs have been added to the master
>> list"
>>
>> I can raise a JIRA if it makes sense.
>>
>> For supporting this particular use-case, we could index a separate
>> field "has_field1_value" but the general case still holds.
>>
>> Erick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org