Re: ValueListHandler pattern with Lucene
On Monday 12 April 2004 20:54, [EMAIL PROTECTED] wrote: On Sunday 11 April 2004 17:46, Erik Hatcher wrote: In other words, you need to invent your own pattern here?! :) I just experimented a bit and came up with the ValueListSupplier which replaces the ValueList in the VLH. Seems to work so far... :-) Comments are greatly appreciated! FYI http://www.nitwit.de/vlh2/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Sunday 11 April 2004 17:46, Erik Hatcher wrote: In other words, you need to invent your own pattern here?! :) I just experimented a bit and came up with the ValueListSupplier which replaces the ValueList in the VLH. Seems to work so far... :-) Comments are greatly appreciated! Timo public class ValueListSupplier implements IValueListIterator { private final Log log = LogFactory.getLog(this.getClass()); // TODO junit test case private Hits hits; protected BitSet fetched; protected List list; protected int index; public ValueListSupplier(Hits hits) { int size = hits.length(); this.list = new ArrayList(size); // stupid idiots at SUN for (int i = 0; i size; i++) list.add(null); this.fetched = new BitSet(); this.hits = hits; this.index = 0; } public List getList() { return list; } public int size() { return list.size(); } public boolean hasPrevious() { return index 0; } public boolean hasNext() { return index size(); } /** * @param index */ public synchronized void move(int index) { this.index = index; } public void reset() { move(0); } public Object current() { validate(index, index + 1); return list.get(index); } public List previous(int count) { int from = Math.max(0, index - count); int to = index; validate(from, to); move(from); return list.subList(from, to); } public List next(int count) { int from = index; int to = Math.min(Math.max(0, size() - 1), index + count); validate(from, to); move(to); return list.subList(from, to); } /** * @param from * starting index (inclusive) * @param to * ending index (exclusive) */ private void validate(int from, int to) { while ((from = fetched.nextClearBit(from)) to) { log.debug(fetching # + from); try { list.set(from, SearchResultAdapter.wrap(hits.doc(from))); fetched.set(from); } catch (IOException e) { // TODO potentially bug e.printStackTrace(); } } } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Saturday 10 April 2004 20:40, Erik Hatcher wrote: Thats the beauty it is up to you to load the doc iff you want it. As I want all of them I don't see why this should be faster at all... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Apr 11, 2004, at 5:25 AM, [EMAIL PROTECTED] wrote: On Saturday 10 April 2004 20:40, Erik Hatcher wrote: Thats the beauty it is up to you to load the doc iff you want it. As I want all of them I don't see why this should be faster at all... Then have a look at the Hits class. It is doing more work for caching and keeping a most recently used collection of documents around. By using a HitCollector you are bypassing those mechanisms. Whether it is measurably faster would depend on several other factors. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Sunday 11 April 2004 13:40, Erik Hatcher wrote: using a HitCollector you are bypassing those mechanisms. Whether it is measurably faster would depend on several other factors. Well, it is hardly faster, so this is no real solution :-\ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Saturday 10 April 2004 20:40, Erik Hatcher wrote: Thats the beauty it is up to you to load the doc iff you want it. Well, there's another problem with HitCollector: the list I build is not sorted by score :-( - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Apr 11, 2004, at 9:32 AM, [EMAIL PROTECTED] wrote: On Saturday 10 April 2004 20:40, Erik Hatcher wrote: Thats the beauty it is up to you to load the doc iff you want it. Well, there's another problem with HitCollector: the list I build is not sorted by score :-( HitCollector was just an option - and apparently not the right one for your use. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Apr 11, 2004, at 10:00 AM, [EMAIL PROTECTED] wrote: On Sunday 11 April 2004 15:56, Erik Hatcher wrote: HitCollector was just an option - and apparently not the right one for your use. So, any other option? :-) Well, yes the one we already discussed. Let your presentation tier talk directly to Hits, so you are as efficient as possible with access to documents, and only fetch what you need. Again, don't let patterns get in your way. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Sunday 11 April 2004 17:16, Erik Hatcher wrote: Well, yes the one we already discussed. Let your presentation tier talk directly to Hits, so you are as efficient as possible with access to documents, and only fetch what you need. Again, don't let patterns get in your way. Well, the sense of tiers and (BTW: language-independant) patterns is to modularize software and make things exchangable. This way neither the presentation tier nor the search engine is exchangable. The problem actually is that VLH is designed to have a static list of VOs. VLH needs to evolve to support something like a data provider that dynamically may add data. The problems here so far is that an Iterator must throw an ConcurrentModificationException if the backing data is modified but as data in a VLH is actually never removed but only added this should be something possible to implement. Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Apr 11, 2004, at 11:28 AM, [EMAIL PROTECTED] wrote: On Sunday 11 April 2004 17:16, Erik Hatcher wrote: Well, yes the one we already discussed. Let your presentation tier talk directly to Hits, so you are as efficient as possible with access to documents, and only fetch what you need. Again, don't let patterns get in your way. Well, the sense of tiers and (BTW: language-independant) patterns is to modularize software and make things exchangable. This way neither the presentation tier nor the search engine is exchangable. The problem actually is that VLH is designed to have a static list of VOs. VLH needs to evolve to support something like a data provider that dynamically may add data. The problems here so far is that an Iterator must throw an ConcurrentModificationException if the backing data is modified but as data in a VLH is actually never removed but only added this should be something possible to implement. In other words, you need to invent your own pattern here?! :) The benefit of agility is to know that any decision you make now is not something that prohibits you from change later. Do you really think you're going to plug-and-play with search engines? Or will you be sticking with Lucene for the foreseeable future? Are you trying to plan for a future without Lucene when there is no use-case for doing so? If you code with coupling to Lucene, do you see that as making life harder in the future, or are you smart enough and flexible enough to change your software as times change? Throw your patterns away when they don't solve the problem. Be pragmatic _and_ agile. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Friday 09 April 2004 23:59, Ype Kingma wrote: When you need 3000 hits and their stored fields, you might consider using the lower level search API with your own HitCollector. I apologize for the stupid question but ... where's the actualy result in HitCollector? :-) collect(int doc, float score) Where doc is the index and score is its score - and where's the Document? Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Apr 10, 2004, at 5:08 AM, [EMAIL PROTECTED] wrote: On Friday 09 April 2004 23:59, Ype Kingma wrote: When you need 3000 hits and their stored fields, you might consider using the lower level search API with your own HitCollector. I apologize for the stupid question but ... where's the actualy result in HitCollector? :-) collect(int doc, float score) Where doc is the index and score is its score - and where's the Document? Thats the beauty it is up to you to load the doc iff you want it. In many situations, loading the doc would slow things down dramatically. For example, QueryFilter uses a HitCollector internally, but could care less about the actual document object, just its id (which you get from the int doc). To get the doc: Document document = searcher.doc(doc); (I'd use 'id' for the int, personally). Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ValueListHandler pattern with Lucene
Hi! I implemented a VLH pattern Lucene's search hits but noticed that hits.doc() is quite slow (3000+ hits took about 500ms). So, I want to ask people here for a solution. I tought about something like a wrapper for the VO (value/transfer object), i.e. that the VO does not actually contain the value but a reference to lucene's Hits instance. But this somewhat a hack... Any ideas? Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Apr 9, 2004, at 3:18 PM, [EMAIL PROTECTED] wrote: I implemented a VLH pattern Lucene's search hits but noticed that hits.doc() is quite slow (3000+ hits took about 500ms). So, I want to ask people here for a solution. I tought about something like a wrapper for the VO (value/transfer object), i.e. that the VO does not actually contain the value but a reference to lucene's Hits instance. But this somewhat a hack... Any ideas? This is an interesting architecture question. If you are trying to decouple things so much that you want to package up all documents in another data structure and ship them to another tier, you're asking for a heap of resources for a large Hits collection. Do you really need *all* documents from Hits? If not, then you should not be pulling them all with hits.doc(). If you truly do need all hits, use a HitCollector instead of Hits (see the other search() methods). Packaging up a Hits instance could be problematic - you need to be sure the *same* IndexSearcher is around when you start navigating through the hits. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
Patterns are work-arounds for language deficiences :) Don't use patterns because some book said so - use them if they are the pragmatic choice. Flattening data for reports or search results and perhaps being a little more coupled to Lucene between tiers in order to avoid performance problems seems a wise way to approach it. Or go straight to Lucene from the presentation tier - no one said you had to proxy it through some other layer. I would highly recommend *against* loading all documents from a search into a collection and passing it across tiers - you're only asking for trouble. Erik On Apr 9, 2004, at 4:06 PM, [EMAIL PROTECTED] wrote: On Friday 09 April 2004 21:30, Erik Hatcher wrote: Do you really need *all* documents from Hits? If not, then you should Only the user knows ;-) Well, no, I very likely only need one or a few but nevertheless I have to pull all hit results to the presentation tier... That's just the problem. Using a VLH I have to fetch all hits from the Hits instance and put them into the VLH - ordinarily you would lazily only fetch the hits you actually need them - at the time you need them. That's just my question :-) So, to repeat, my idea was to use a wrapper for the VOs in order to fetch only some hits at a time... It's actually a VLH pattern drawback. Maybe I should ask the blueprint people ;-) Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ValueListHandler pattern with Lucene
On Friday 09 April 2004 21:18, [EMAIL PROTECTED] wrote: Hi! I implemented a VLH pattern Lucene's search hits but noticed that hits.doc() is quite slow (3000+ hits took about 500ms). So, I want to ask people here for a solution. I tought about something like a wrapper for the VO (value/transfer object), i.e. that the VO does not actually contain the value but a reference to lucene's Hits instance. But this somewhat a hack... Lucene's Hits already wraps quite a bit. Under the hoods it will redo your search in case you need more than 100 results. Hits was designed for displaying a few web pages of search results. When you need 3000 hits and their stored fields, you might consider using the lower level search API with your own HitCollector. This will allow you to do a single search, and retrieve the stored document fields in order of document number after the search. Documents are stored physically in document number order, so retrieval in that order is normally close to optimal. Actual savings depend a lot on the circumstances, though. I checked the VLH pattern very briefly. The lower level search API of Lucene seems to fit in quite well for the retrieval side of it, ie. the DataAccessObject, for a larger number of results. However, you'll have to throw some more RAM than Hits does at the difference between the physical order of Lucene and the order in which the client needs to iterate the data. Kind regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]