Re: ValueListHandler pattern with Lucene

2004-04-26 Thread lucene
On Monday 12 April 2004 20:54, [EMAIL PROTECTED] wrote:
 On Sunday 11 April 2004 17:46, Erik Hatcher wrote:
  In other words, you need to invent your own pattern here?!  :)

 I just experimented a bit and came up with the ValueListSupplier which
 replaces the ValueList in the VLH. Seems to work so far... :-) Comments are
 greatly appreciated!

FYI http://www.nitwit.de/vlh2/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-12 Thread lucene
On Sunday 11 April 2004 17:46, Erik Hatcher wrote:
 In other words, you need to invent your own pattern here?!  :)

I just experimented a bit and came up with the ValueListSupplier which 
replaces the ValueList in the VLH. Seems to work so far... :-) Comments are 
greatly appreciated!

Timo

public class ValueListSupplier implements IValueListIterator
{
private final Log log = LogFactory.getLog(this.getClass());

// TODO junit test case
private Hits hits;
protected BitSet fetched;
protected List list;
protected int index;

public ValueListSupplier(Hits hits)
{
int size = hits.length();
this.list = new ArrayList(size);
// stupid idiots at SUN
for (int i = 0; i  size; i++) list.add(null);
this.fetched = new BitSet();
this.hits = hits;
this.index = 0;
}

public List getList()
{
return list;
}

public int size()
{
return list.size();
}

public boolean hasPrevious()
{
return index  0;
}

public boolean hasNext()
{
return index  size();
}

/**
 * @param index
 */
public synchronized void move(int index)
{
this.index = index;
}

public void reset()
{
move(0);
}

public Object current()
{
validate(index, index + 1);
return list.get(index);
}

public List previous(int count)
{
int from = Math.max(0, index - count);
int to = index;

validate(from, to);
move(from);
return list.subList(from, to);
}

public List next(int count)
{
int from = index;
int to = Math.min(Math.max(0, size() - 1), index + count);

validate(from, to);
move(to);
return list.subList(from, to);
}

/**
 * @param from
 * starting index (inclusive)
 * @param to
 * ending index (exclusive)
 */
private void validate(int from, int to)
{
while ((from = fetched.nextClearBit(from))  to)
{
log.debug(fetching # + from);

try
{
list.set(from, 
SearchResultAdapter.wrap(hits.doc(from)));
fetched.set(from);
}
catch (IOException e)
{
// TODO potentially bug
e.printStackTrace();
}
}
}

}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-11 Thread lucene
On Saturday 10 April 2004 20:40, Erik Hatcher wrote:
 Thats the beauty it is up to you to load the doc iff you want it.

As I want all of them I don't see why this should be faster at all...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-11 Thread Erik Hatcher
On Apr 11, 2004, at 5:25 AM, [EMAIL PROTECTED] wrote:
On Saturday 10 April 2004 20:40, Erik Hatcher wrote:
Thats the beauty it is up to you to load the doc iff you want it.
As I want all of them I don't see why this should be faster at all...
Then have a look at the Hits class.  It is doing more work for caching 
and keeping a most recently used collection of documents around.  By 
using a HitCollector you are bypassing those mechanisms.  Whether it is 
measurably faster would depend on several other factors.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ValueListHandler pattern with Lucene

2004-04-11 Thread lucene
On Sunday 11 April 2004 13:40, Erik Hatcher wrote:
 using a HitCollector you are bypassing those mechanisms.  Whether it is
 measurably faster would depend on several other factors.

Well, it is hardly faster, so this is no real solution :-\

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-11 Thread lucene
On Saturday 10 April 2004 20:40, Erik Hatcher wrote:
 Thats the beauty it is up to you to load the doc iff you want it.

Well, there's another problem with HitCollector: the list I build is not 
sorted by score :-(

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-11 Thread Erik Hatcher
On Apr 11, 2004, at 9:32 AM, [EMAIL PROTECTED] wrote:
On Saturday 10 April 2004 20:40, Erik Hatcher wrote:
Thats the beauty it is up to you to load the doc iff you want it.
Well, there's another problem with HitCollector: the list I build is 
not
sorted by score :-(
HitCollector was just an option - and apparently not the right one for 
your use.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ValueListHandler pattern with Lucene

2004-04-11 Thread Erik Hatcher
On Apr 11, 2004, at 10:00 AM, [EMAIL PROTECTED] wrote:
On Sunday 11 April 2004 15:56, Erik Hatcher wrote:
HitCollector was just an option - and apparently not the right one for
your use.
So, any other option? :-)
Well, yes the one we already discussed.  Let your presentation tier 
talk directly to Hits, so you are as efficient as possible with access 
to documents, and only fetch what you need.

Again, don't let patterns get in your way.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ValueListHandler pattern with Lucene

2004-04-11 Thread lucene
On Sunday 11 April 2004 17:16, Erik Hatcher wrote:
 Well, yes the one we already discussed.  Let your presentation tier
 talk directly to Hits, so you are as efficient as possible with access
 to documents, and only fetch what you need.

 Again, don't let patterns get in your way.

Well, the sense of tiers and (BTW: language-independant) patterns is to 
modularize software and make things exchangable. This way
neither the presentation tier nor the search engine is exchangable.

The problem actually is that VLH is designed to have a static list of VOs. VLH 
needs to evolve to support something like a data provider that dynamically 
may add data. The problems here so far is that an Iterator must throw an 
ConcurrentModificationException if the backing data is modified but as data 
in a VLH is actually never removed but only added this should be something 
possible to implement.

Timo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-11 Thread Erik Hatcher
On Apr 11, 2004, at 11:28 AM, [EMAIL PROTECTED] wrote:
On Sunday 11 April 2004 17:16, Erik Hatcher wrote:
Well, yes the one we already discussed.  Let your presentation 
tier
talk directly to Hits, so you are as efficient as possible with access
to documents, and only fetch what you need.

Again, don't let patterns get in your way.
Well, the sense of tiers and (BTW: language-independant) patterns is to
modularize software and make things exchangable. This way
neither the presentation tier nor the search engine is exchangable.
The problem actually is that VLH is designed to have a static list of 
VOs. VLH
needs to evolve to support something like a data provider that 
dynamically
may add data. The problems here so far is that an Iterator must throw 
an
ConcurrentModificationException if the backing data is modified but as 
data
in a VLH is actually never removed but only added this should be 
something
possible to implement.
In other words, you need to invent your own pattern here?!  :)

The benefit of agility is to know that any decision you make now is not 
something that prohibits you from change later.  Do you really think 
you're going to plug-and-play with search engines?  Or will you be 
sticking with Lucene for the foreseeable future?  Are you trying to 
plan for a future without Lucene when there is no use-case for doing 
so?  If you code with coupling to Lucene, do you see that as making 
life harder in the future, or are you smart enough and flexible enough 
to change your software as times change?

Throw your patterns away when they don't solve the problem.  Be 
pragmatic _and_ agile.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ValueListHandler pattern with Lucene

2004-04-10 Thread lucene
On Friday 09 April 2004 23:59, Ype Kingma wrote:
 When you need 3000 hits and their stored fields, you might
 consider using the lower level search API with your own HitCollector.

I apologize for the stupid question but ... where's the actualy result in 
HitCollector? :-) 

  collect(int doc, float score) 

Where doc is the index and score is its score - and where's the Document?

Timo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-10 Thread Erik Hatcher
On Apr 10, 2004, at 5:08 AM, [EMAIL PROTECTED] wrote:
On Friday 09 April 2004 23:59, Ype Kingma wrote:
When you need 3000 hits and their stored fields, you might
consider using the lower level search API with your own HitCollector.
I apologize for the stupid question but ... where's the actualy result 
in
HitCollector? :-)

  collect(int doc, float score)

Where doc is the index and score is its score - and where's the 
Document?
Thats the beauty it is up to you to load the doc iff you want it.  
In many situations, loading the doc would slow things down 
dramatically.  For example, QueryFilter uses a HitCollector internally, 
but could care less about the actual document object, just its id 
(which you get from the int doc).  To get the doc:

	 Document document = searcher.doc(doc);

(I'd use 'id' for the int, personally).

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


ValueListHandler pattern with Lucene

2004-04-09 Thread lucene
Hi!

I implemented a VLH pattern Lucene's search hits but noticed that hits.doc() 
is quite slow (3000+ hits took about 500ms).

So, I want to ask people here for a solution. I tought about something like a 
wrapper for the VO (value/transfer object), i.e. that the VO does not 
actually contain the value but a reference to lucene's Hits instance. But 
this somewhat a hack...

Any ideas?

Timo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-09 Thread Erik Hatcher
On Apr 9, 2004, at 3:18 PM, [EMAIL PROTECTED] wrote:
I implemented a VLH pattern Lucene's search hits but noticed that 
hits.doc()
is quite slow (3000+ hits took about 500ms).

So, I want to ask people here for a solution. I tought about something 
like a
wrapper for the VO (value/transfer object), i.e. that the VO does not
actually contain the value but a reference to lucene's Hits instance. 
But
this somewhat a hack...

Any ideas?
This is an interesting architecture question.  If you are trying to 
decouple things so much that you want to package up all documents in 
another data structure and ship them to another tier, you're asking for 
a heap of resources for a large Hits collection.

Do you really need *all* documents from Hits?  If not, then you should 
not be pulling them all with hits.doc().

If you truly do need all hits, use a HitCollector instead of Hits (see 
the other search() methods).

Packaging up a Hits instance could be problematic - you need to be sure 
the *same* IndexSearcher is around when you start navigating through 
the hits.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ValueListHandler pattern with Lucene

2004-04-09 Thread Erik Hatcher
Patterns are work-arounds for language deficiences :)

Don't use patterns because some book said so - use them if they are the 
pragmatic choice.  Flattening data for reports or search results and 
perhaps being a little more coupled to Lucene between tiers in order to 
avoid performance problems seems a wise way to approach it.  Or go 
straight to Lucene from the presentation tier - no one said you had to 
proxy it through some other layer.

I would highly recommend *against* loading all documents from a search 
into a collection and passing it across tiers - you're only asking for 
trouble.

	Erik

On Apr 9, 2004, at 4:06 PM, [EMAIL PROTECTED] wrote:

On Friday 09 April 2004 21:30, Erik Hatcher wrote:
Do you really need *all* documents from Hits?  If not, then you should
Only the user knows ;-) Well, no, I very likely only need one or a few 
but
nevertheless I have to pull all hit results to the presentation tier...

That's just the problem. Using a VLH I have to fetch all hits from the 
Hits
instance and put them into the VLH - ordinarily you would lazily only 
fetch
the hits you actually need them - at the time you need them.

That's just my question :-)

So, to repeat, my idea was to use a wrapper for the VOs in order to 
fetch only
some hits at a time...

It's actually a VLH pattern drawback. Maybe I should ask the blueprint
people ;-)
Timo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ValueListHandler pattern with Lucene

2004-04-09 Thread Ype Kingma
On Friday 09 April 2004 21:18, [EMAIL PROTECTED] wrote:
 Hi!

 I implemented a VLH pattern Lucene's search hits but noticed that
 hits.doc() is quite slow (3000+ hits took about 500ms).

 So, I want to ask people here for a solution. I tought about something like
 a wrapper for the VO (value/transfer object), i.e. that the VO does not
 actually contain the value but a reference to lucene's Hits instance. But
 this somewhat a hack...

Lucene's Hits already wraps quite a bit. Under the hoods it will
redo your search in case you need more than 100 results.
Hits was designed for displaying a few web pages of search results.

When you need 3000 hits and their stored fields, you might
consider using the lower level search API with your own HitCollector.

This will allow you to do a single search, and retrieve the stored
document fields in order of document number after the search.
Documents are stored physically in document number order,
so retrieval in that order is normally close to optimal.

Actual savings depend a lot on the circumstances, though.

I checked the VLH pattern very briefly. The lower level search
API of Lucene seems to fit in quite well for the retrieval side
of it, ie. the DataAccessObject, for a larger number of results.
However, you'll have to throw some more RAM than Hits does
at the difference between the physical order of
Lucene and the order in which the client needs to iterate
the data.

Kind regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]