Mark,
I know you've already commited a patch along these lines (LUCENE-494) and
I can see how in a lot of cases that would be a great solution, but i'm
still interested in the orriginal idea you proposed (a 'maxDf' in
TermQuery) because i anticipate situations in which you don't want to
ignore th
[ http://issues.apache.org/jira/browse/LUCENE-494?page=all ]
Mark Harwood updated LUCENE-494:
Attachment: QueryAutoStopWordAnalyzerTest.java
> Analyzer for preventing overload of search service by queries with common
> terms in large indexes
> -
[ http://issues.apache.org/jira/browse/LUCENE-494?page=all ]
Mark Harwood updated LUCENE-494:
Attachment: QueryAutoStopWordAnalyzer.java
> Analyzer for preventing overload of search service by queries with common
> terms in large indexes
> -
Analyzer for preventing overload of search service by queries with common terms
in large indexes
Key: LUCENE-494
URL: http://issues.apache.org/jira/browse/LUCENE-494
Project: Lu
[Answering my own question]
I think a reasonable solution is to have a generic analyzer for use at
query-time that can wrap my application's choice of analyzer and
automatically filter out what it sees as stop words. It would initialize
itself from an IndexReader and create a StopFilter for th
mark harwood wrote:
For these outlier situations is it worth adding a
"maxDf" property to TermQuery like BooleanQuery's
maxClause query-time control? I could fix my problem
in my own app-specific query construction code but I
wonder if others would find it a useful fix to add to
TermQuery in the
I've just been doing some benchmarking on a reasonably
large-scale system (38 million docs) and ran into an
issue where certain *very* common terms would
dramatically slow query responses.
Some terms were abnormally common because I had
constructed the index by taking several copies and
merging th
Hello,
Given a query, I want to be able to, for each query term, get the number of
occurrences of the term. I have tried what I'm including below and it does not
seem to provide reliable results. Seems to work fine with exact matching but
as soon as stemming kicks in, all bets are off as to v