Re: Beyond Lucene 2.0 Index Design

2007-01-11 Thread Ming Lei
Marvin, Several posts back on this thread, I talked about an algorithm of impact-sorted posting list for conjunctive boolean query. Your concerns on impact-sorting in boolean retrieval model is valid. But practically, the approximation (as in my original post) should work well enough for large corp

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
The idea of "impact" and "impact-sorted posting list" should practically work with boolean model by approximation in the following way: (1) Index Structure Inverted-Index : * posting-list: + (sorted by impact) occurrence: position (2) Retrieval Algorithm for boolean query "a AND b" set an impa

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
the aggregated significance into "impact". Then you can do away fields in a vector-space model of retreival. But there is usually some semantics of fields in a boolean model and semi-structured information retrieval, which you can not get rid of. Michael --- Ming Lei <[EMAIL PROT

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
Just my two cents, I think what he meant by "single field" is the following: If the concept of "field" was introduced to differentiate the significance of term occurrences in difference regions of a document, (eg, the occurence in title is more important than in body, etc), that significance can b

Re: Beyond Lucene 2.0 Index Design

2007-01-10 Thread Ming Lei
I have a couple of questions about the original post of the new index design: (1) Question on the posting list > > f. ,],...[docN, freq > > > > ,]) What is the "impact" per posting list? I am under the impression that "impact" or "frequency" is per pair of doc and term. And it seem that "impact

Repost: cleanup remotesearchable object

2007-01-08 Thread Ming Lei
Can anyone help answer the question or at least point out if the question is vague or should be directed to some other place. Thanks --- Ming Lei <[EMAIL PROTECTED]> wrote: > Can I solely rely on RMI's remote object cleanup > mechanism for this? > It seems that RemoteSeac

cleanup remotesearchable object

2007-01-07 Thread Ming Lei
Can I solely rely on RMI's remote object cleanup mechanism for this? It seems that RemoteSeachable.close() has to be called separately. Should we add a finalize() to RemoteSearchable to call close()? Am I missing anything here? Please shed some light on this. Thanks

A search server runs on an index periodically refreshed by an indexer

2007-01-05 Thread Ming Lei
Question 1: A search server runs on an index that are periodically refreshed with newer versions. For example, it starts with c:/lucene/ind_dir_0, then later on the indexer creates c:/lucene/ind_dir_1 and so on. I would like the search server to automatically pick up the latest version when it is a