Re: Document-Map, Hits-List
Yes, it's not wise to just pull all Document instances from Hits instance, unless you really need them all. I don't do that, I really just provide a wrapper, like this: /** * A simple List implementation wrapping a Hits object. * * @author Otis Gospodnetic * @version $Id: HitList.java,v 1.4 2004/11/11 14:08:33 otis Exp $ */ public class HitList extends AbstractList { private Hits _hits; /** * Creates a new HitList instance. * * @param hits Hits to wrap */ public HitList(Hits hits) { _hits = hits; } /** * @see java.util.List#get(int) */ public Object get(int index) { try { return _hits.doc(index); } catch (IOException e) { throw new RuntimeException(e); } } /** * @see java.util.List#size() */ public int size() { return _hits.length(); } ... ... Otis --- Luke Francl <[EMAIL PROTECTED]> wrote: > On Wed, 2004-12-01 at 10:27, Otis Gospodnetic wrote: > > > This is very similar to what I do - I create a List of Maps from > Hits > > and its Documents. So I think this change may be handy, if doable > (I > > didn't look into changing the two Lucene classes, actually). > > > How do you avoid the problem Eric just mentioned, iterating through > all > the Hits at once to populate this data structure? > > I do a similar thing, creating a List of asset references from a > field > in each Lucene Document in my Hits list (actual data for display > retrieved from a separate datastore). I was not aware of any > performance > problems from doing this, but now I am wondering about the > implications. > > Thanks, > Luke > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Document-Map, Hits-List
Hello, --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Dec 1, 2004, at 11:31 AM, Luke Francl wrote: > > I do a similar thing, creating a List of asset references from a > field > > in each Lucene Document in my Hits list (actual data for display > > retrieved from a separate datastore). I was not aware of any > > performance > > problems from doing this, but now I am wondering about the > > implications. > > The performance "concern" (lets not say "problem") is when you get > 10,000,000 (or so :) results back from a search. No user wants to > see > all of that, only the first 20, perhaps. Calling Hits.doc(i) pulls > the > document data from the index and populates a Document instance. > There > is file I/O involved, and doing lots of unnecessary Hits.doc(i) calls > > may potentially be noticeable. If you're only getting 100 hits back > then you'll likely not even notice. (all numbers quoted here are > just > random figures - don't quote me on actual performance numbers :). Somewhat related and interesting post from Tim Bray: http://tbray.org/ongoing/When/200x/2004/11/26/SearchSort > In my current application, I have a paging feature. Each new page > does > a search again using the same query, but I only iterate through the > 20 > that should display on that page and build a highlighted data > structure > to hand to the presentation of only the appropriate ones for the > range. Same here. I make use of List's subList method a lot. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Document-Map, Hits-List
On Dec 1, 2004, at 11:31 AM, Luke Francl wrote: I do a similar thing, creating a List of asset references from a field in each Lucene Document in my Hits list (actual data for display retrieved from a separate datastore). I was not aware of any performance problems from doing this, but now I am wondering about the implications. The performance "concern" (lets not say "problem") is when you get 10,000,000 (or so :) results back from a search. No user wants to see all of that, only the first 20, perhaps. Calling Hits.doc(i) pulls the document data from the index and populates a Document instance. There is file I/O involved, and doing lots of unnecessary Hits.doc(i) calls may potentially be noticeable. If you're only getting 100 hits back then you'll likely not even notice. (all numbers quoted here are just random figures - don't quote me on actual performance numbers :). In my current application, I have a paging feature. Each new page does a search again using the same query, but I only iterate through the 20 that should display on that page and build a highlighted data structure to hand to the presentation of only the appropriate ones for the range. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Document-Map, Hits-List
On Dec 01, 2004, at 17:41, Luke Francl wrote: Yes, but Otis hasn't implemented that interface. He's wrapping his Hits with a List of Maps. Right... I'm sure that Otis knows what he is doing :) As far as implementation goes, you have at least 3 options: - Implement List and Map directly in Lucene's relevant objects (e.g. Hits and Document) - Extend Hits and Document to achieve the same - Wrap Hits and Document in another class which implements the relevant interfaces (e.g HitsList and DocumentMap) The point of the exercise being to provide a standard interface while still benefiting from the underlying Lucene optimizations. PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Document-Map, Hits-List
On Wed, 2004-12-01 at 10:39, petite_abeille wrote: > You don't need to iterate through anything upfront... you simply do it > on-demand... eg when invoking List.get() you would invoke the > underlying Hits.doc()... > > In other words, there is _no_ new data structure... simply an > additional interface... Yes, but Otis hasn't implemented that interface. He's wrapping his Hits with a List of Maps. Luke
Re: Document-Map, Hits-List
On Dec 01, 2004, at 17:31, Luke Francl wrote: How do you avoid the problem Eric just mentioned, iterating through all the Hits at once to populate this data structure? You don't need to iterate through anything upfront... you simply do it on-demand... eg when invoking List.get() you would invoke the underlying Hits.doc()... In other words, there is _no_ new data structure... simply an additional interface... PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Document-Map, Hits-List
On Wed, 2004-12-01 at 10:27, Otis Gospodnetic wrote: > This is very similar to what I do - I create a List of Maps from Hits > and its Documents. So I think this change may be handy, if doable (I > didn't look into changing the two Lucene classes, actually). How do you avoid the problem Eric just mentioned, iterating through all the Hits at once to populate this data structure? I do a similar thing, creating a List of asset references from a field in each Lucene Document in my Hits list (actual data for display retrieved from a separate datastore). I was not aware of any performance problems from doing this, but now I am wondering about the implications. Thanks, Luke