Re: Preventing "killer" queries

2006-02-07 Thread Chris Hostetter
Mark, I know you've already commited a patch along these lines (LUCENE-494) and I can see how in a lot of cases that would be a great solution, but i'm still interested in the orriginal idea you proposed (a 'maxDf' in TermQuery) because i anticipate situations in which you don't want to ignore th

[jira] Updated: (LUCENE-494) Analyzer for preventing overload of search service by queries with common terms in large indexes

2006-02-07 Thread Mark Harwood (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-494?page=all ] Mark Harwood updated LUCENE-494: Attachment: QueryAutoStopWordAnalyzerTest.java > Analyzer for preventing overload of search service by queries with common > terms in large indexes > -

[jira] Updated: (LUCENE-494) Analyzer for preventing overload of search service by queries with common terms in large indexes

2006-02-07 Thread Mark Harwood (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-494?page=all ] Mark Harwood updated LUCENE-494: Attachment: QueryAutoStopWordAnalyzer.java > Analyzer for preventing overload of search service by queries with common > terms in large indexes > -

[jira] Created: (LUCENE-494) Analyzer for preventing overload of search service by queries with common terms in large indexes

2006-02-07 Thread Mark Harwood (JIRA)
Analyzer for preventing overload of search service by queries with common terms in large indexes Key: LUCENE-494 URL: http://issues.apache.org/jira/browse/LUCENE-494 Project: Lu

Re: Preventing "killer" queries

2006-02-07 Thread markharw00d
[Answering my own question] I think a reasonable solution is to have a generic analyzer for use at query-time that can wrap my application's choice of analyzer and automatically filter out what it sees as stop words. It would initialize itself from an IndexReader and create a StopFilter for th

Re: Preventing "killer" queries

2006-02-07 Thread Doug Cutting
mark harwood wrote: For these outlier situations is it worth adding a "maxDf" property to TermQuery like BooleanQuery's maxClause query-time control? I could fix my problem in my own app-specific query construction code but I wonder if others would find it a useful fix to add to TermQuery in the

Preventing "killer" queries

2006-02-07 Thread mark harwood
I've just been doing some benchmarking on a reasonably large-scale system (38 million docs) and ran into an issue where certain *very* common terms would dramatically slow query responses. Some terms were abnormally common because I had constructed the index by taking several copies and merging th

How to map terms to frequencies for a doc - ?

2006-02-07 Thread Dmitry Goldenberg
Hello, Given a query, I want to be able to, for each query term, get the number of occurrences of the term. I have tried what I'm including below and it does not seem to provide reliable results. Seems to work fine with exact matching but as soon as stemming kicks in, all bets are off as to v