Re: Weird time results doing wildcard queries

2005-09-09 Thread J.J. Larrea
>just to clarify, i ment take the call to getMoreDocs(50) which is >currently in the Hits constructor, and refactor it out and into the >"Searcher.search" methods. that way the behavior is hte same as before >for all existing clients, but new subclasses cna change the behavior so >that hte "search

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
: > * move the call to getMoreDocs(int) from Hits to Searcher.search : : Hmm... Hits is passed to the caller and works as a standalone cache. : While it maintains a reference to the Searcher, it only uses that to : resolve Documents upon misses. Perhaps the current separation of : concerns is ac

Re: Weird time results doing wildcard queries

2005-09-08 Thread J.J. Larrea
At 8:01 PM -0700 9/8/05, Chris Hostetter wrote: >: Which makes me wonder whether the caching logic of Hits, optimized for >: random- rather than linear-access, and not tuneable or controllable in >: 1.4.3, should be reviewed for a subsequent release, at least the >: API-breaking 2.0. I'll wager th

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
: Which makes me wonder whether the caching logic of Hits, optimized for : random- rather than linear-access, and not tuneable or controllable in : 1.4.3, should be reviewed for a subsequent release, at least the : API-breaking 2.0. I'll wager that a majority of applications do nothing : other th

Re: Weird time results doing wildcard queries

2005-09-08 Thread J.J. Larrea
t methods (a HitCollector would probably be >best) > > > > > > >: Date: Thu, 8 Sep 2005 17:05:18 -0600 >: From: Richard Krenek <[EMAIL PROTECTED]> >: Reply-To: java-user@lucene.apache.org >: To: java-user@lucene.apache.org >: Subject: Re: Weird time

Re: Weird time results doing wildcard queries

2005-09-08 Thread Yonik Seeley
A HitCollector returns docs by the order they are found (in the index, not by relevance). Use a search method that returns TopDocs if you want the first n documents without executing the query more than once (Hits uses this internally). -Yonik Now hiring -- http://tinyurl.com/7m67g On 9/8/05,

Re: Weird time results doing wildcard queries

2005-09-08 Thread Richard Krenek
This answers a lot of questions and observations. We looked in the source code of the Hits object and found the getMoreDocs(int min) method which does what you stated below. We are assuming you meant for us to use a HitCollector instead. This brings up a new question does the Searcher call the

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
om: Richard Krenek <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Re: Weird time results doing wildcard queries : : I did the change and here are the results: : : Query (default field is COMP_PART_NUMBER): 2444* : Query: COMP_PART_NUMBER:2

Re: Weird time results doing wildcard queries

2005-09-08 Thread Daniel Naber
On Friday 09 September 2005 00:40, Chris Hostetter wrote: > 1) How similar, and how many? ... If i remember correctly, the Hits > constructor does some work to pre-fetch the first 100 results. What's really expensive in fetching documents is the disk access (often one disk seek per matching docu

Re: Weird time results doing wildcard queries

2005-09-08 Thread Richard Krenek
I did the change and here are the results: Query (default field is COMP_PART_NUMBER): 2444* Query: COMP_PART_NUMBER:2444* Query Time: 328 ms - time for query to run. 383 total matching documents. Cycle Time: 141 ms - time to run through hits. Query (default field is COMP_PART_NUMBER): *91822* Qu

Re: Weird time results doing wildcard queries

2005-09-08 Thread Yonik Seeley
The Hits class collects the document ids from the query in batches. If you iterate beyond what was collected, the query is re-executed to collect more ids. You can use the expert level search methods on IndexSearcher if this isn't what you want. -Yonik On 9/8/05, Richard Krenek <[EMAIL PROTEC

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
: is if the query starts with a wildcard. In the case where it starts with a : wildcard, lucene has no option but to linearly go over every term in the : index to see if it matches your pattern. It must visit every singe term in That would explain why the search itself takes a while, but not why a

Re: Weird time results doing wildcard queries

2005-09-08 Thread Richard Krenek
I understand that for the query, but why does it matter once you have the Hits object? That is the part I'm baffled on. The query with the wildcard in the front takes a lot longer, but we expected that. On 9/8/05, Jeremy Meyer <[EMAIL PROTECTED]> wrote: > > The issue isn't with multiple wildcar

Re: Weird time results doing wildcard queries

2005-09-08 Thread Jeremy Meyer
The issue isn't with multiple wildcards exactly. Specifically, the problem is if the query starts with a wildcard. In the case where it starts with a wildcard, lucene has no option but to linearly go over every term in the index to see if it matches your pattern. It must visit every singe term i