thanks for leading this forward.
On 21 Jan 2004, at 11:19, Wallmer, Martin wrote:
Hi,
to comprise the discussion (as I understood):
- we need a fast search engine for content - we need a fast search engine for properties - security issues must be taken into account - it should be integrated with the SEARCH helper
agreed
- if we have two different implementations for properties
and content, result must be merged, which might have a
performance drawback
correct
- if we use one implementation for both, it might be necessary
to replicate data, or content search might not perform well
data replication doesn't concern me if stores can be made somewhat "layered" via interception.
- we should be open to think about content type specific
search engines. AKAIK Lucene is perfect for text based documents,
but think of an engine, that finds text in scanned documents (jpeg)
using OCR, or find sinfonies in F major in sound files :-)
Yes. Just like we can plug in different stores in different parts of the tree, we should be able to associate different search engines to these stores.
so pros and cons for both scenarios.
I'd like to suggest the following:
1. Introduce the index store (as Christophe already done). The index store has on principle the methods - index (uri, nodeRevisionDescriptor, nodeRevisionContent) - drop (uri, nodeRevisionDescriptor) They are called at create, update, delete. No search method!
I'm not sure I agree that it should be the store that does the indexing. Aren't indexing and storing two orthogonal aspects of content management?
Just like we have a way to plug different stores at different parts of the tree, shouldn't we allow for plug in different indexers for different parts of the tree?
For example, we could have
<stores> /foo/** -> FileStore /bar/** -> RDBMSStore </stores>
then
<indexers> /foo/images/** -> OCRIndexer /foo/symphonies/** -> SoundIndexer *.xml -> XMLIndexer @* -> RDBMSIndexer * -> LuceneIndexer * -> RDBMSIndexer </indexers>
this also means that, for example, accessing "/blah/mydocument.xml" would yield three potential query engines: Lucene, RDBMS and XML, and one can decide which one to use depending on the type of query one has to perform.
What do you think?
-- Stefano.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
