> If your indexes change a lot sorting and merging two
> ID lists (one from the index and one from the DB) in
> Java space will certainly be the most effective solution.

I can't fit any of the lists into memory, but I could imagine
maintaining an identifier mapping table; due to the fact that Lucene
doesn't use stable identifiers this table would of course have to be
rebuilt every time the index is updated...


> That's the easy part. Once you have the document IDs you can always
> retrieve the data from the DB (if you paginate the results you even
> only have to retrieve the data page after page).

It is possible that after paginating through the results a user decides
to download all matched documents, completely. The only strategy I have
been able to come up with that performs acceptably is to load objects as
blobs from a single table and use other tables only for restricting the
result set. Full text query results obviously don't fit in well, unless
I store them into a temporary table. Unfortunately getting them into the
temporary table isn't very efficient, especially for those cases when no
more than the first ten items are ever viewed anyways.


> This is the hard part in any retrieval system. If the sorting
> attributes are not too big I still would extract them with the IDs and
> do the sorting in Java space. If they are really big the TMP table you
> choose is certainly the best option.

I would already be quite happy if results could be obtained sorted by
identifier rather than relevance; I hope future versions of Lucene will
be a bit more flexible here...


--
Eric Jain

_______________________________________________
sapdb.general mailing list
[EMAIL PROTECTED]
http://listserv.sap.com/mailman/listinfo/sapdb.general

Reply via email to