Hi,
to comprise the discussion (as I understood):
- we need a fast search engine for content
- we need a fast search engine for properties
- security issues must be taken into account
- it should be integrated with the SEARCH helper
- if we have two different implementations for properties
and content, result must be merged, which might have a
performance drawback
- if we use one implementation for both, it might be necessary
to replicate data, or content search might not perform well
- we should be open to think about content type specific
search engines. AKAIK Lucene is perfect for text based documents,
but think of an engine, that finds text in scanned documents (jpeg)
using OCR, or find sinfonies in F major in sound files :-)
so pros and cons for both scenarios.
I'd like to suggest the following:
1. Introduce the index store (as Christophe already done).
The index store has on principle the methods
- index (uri, nodeRevisionDescriptor, nodeRevisionContent)
- drop (uri, nodeRevisionDescriptor)
They are called at create, update, delete. No search method!
2. Make it possible to have separate search engines for properties
and content, as the underlying datastore might have very
distinct querying capabilities (Currently you may only have one
search engine for a store).
So, let's regard two scenrios: one with content and metadata on the
filesystem, and the other with content on the filesystem and metadata
in an RDBMS.
Scenario 1:
Create a resource - the index method creates indexes for
content and properties using Lucene.
SEARCH for property and content - the search engine for this
store implements both property and content. It creates the
calls to Lucene to deliver the resource ids, the resources
can be loaded and returned in a SearchQueryResult.
Scenario 2:
Create a resource - the index method creates an index for
the content, no index for properties. RDBMS does this for
you.
SEARCH for properties - the property search engine creates
an SQL statement, which is perfomed by RDBMS. The resource
ids are returned, load the resources and return as
SearchQueryResult.
SEARCH for content - same as scenario 1
SEARCH for property and content - both search engines do their
part, both SearchQueryResults are merged.
If we agree to express the SEARCH as XML (<basicsearch> as defined
by DASL), a lot of the infrastructure is already present.
Security:
In the current implementation the search engine delivers all results,
hidden resources are filtered in a second step. This is true for all
methods that deliver resources (propfind, ...). Of course it would be
more performant, if you delegate this to the search engine. However, this
highly depends, how security is configured, if you store the ACLs in
your RDBMS (or replicate them into Lucene??). I'd suggest to postpone
this issue.
What do you think about that?
Regards,
Martin
__________________________
Martin Wallmer
Research & Development
Software AG ++49 6151 92 1831
Uhlandstr. 12
D 64297 Darmstadt
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]