Integrate Indexstore and SEARCH (was Indexing store)

Wallmer, Martin Wed, 21 Jan 2004 03:20:28 -0800

Hi,

to comprise the discussion (as I understood):


  - we need a fast search engine for content 
  - we need a fast search engine for properties
  - security issues must be taken into account
  - it should be integrated with the SEARCH helper

  - if we have two different implementations for properties 
    and content, result must be merged, which might have a 
    performance drawback

  - if we use one implementation for both, it might be necessary 
    to replicate data, or content search might not perform well

  - we should be open to think about content type specific 
    search engines. AKAIK Lucene is perfect for text based documents,
    but think of an engine, that finds text in scanned documents (jpeg)
    using OCR, or find sinfonies in F major in sound files :-)
    
so pros and cons for both scenarios.



I'd like to suggest the following:

 1. Introduce the index store (as Christophe already done). 
    The index store has on principle the methods
    - index (uri, nodeRevisionDescriptor, nodeRevisionContent)
    - drop  (uri, nodeRevisionDescriptor)
    They are called at create, update, delete. No search method!

 2. Make it possible to have separate search engines for properties 
    and content, as the underlying datastore might have very 
    distinct querying capabilities (Currently you may only have one 
    search engine for a store).

So, let's regard two scenrios: one with content and metadata on the 
filesystem, and the other with content on the filesystem and metadata 
in an RDBMS.

Scenario 1:
   Create a resource - the index method creates indexes for 
   content and properties using Lucene.

   SEARCH for property and content - the search engine for this 
   store implements both property and content. It creates the 
   calls to Lucene to deliver the resource ids, the resources 
   can be loaded and returned in a SearchQueryResult.

Scenario 2:
   Create a resource - the index method creates an index for 
   the content, no index for properties. RDBMS does this for 
   you.

   SEARCH for properties - the property search engine creates
   an SQL statement, which is perfomed by RDBMS. The resource 
   ids are returned, load the resources and return as 
   SearchQueryResult.

   SEARCH for content - same as scenario 1

   SEARCH for property and content - both search engines do their
   part, both SearchQueryResults are merged.


If we agree to express the SEARCH as XML (<basicsearch> as defined
by DASL), a lot of the infrastructure is already present. 


Security:
In the current implementation the search engine delivers all results,
hidden resources are filtered in a second step. This is true for all
methods that deliver resources (propfind, ...). Of course it would be
more performant, if you delegate this to the search engine. However, this 
highly depends, how security is configured, if you store the ACLs in
your RDBMS (or replicate them into Lucene??). I'd suggest to postpone 
this issue.

 
What do you think about that?

Regards,
Martin



__________________________
Martin Wallmer
Research & Development
Software AG    ++49 6151 92 1831
Uhlandstr. 12
D 64297 Darmstadt


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Integrate Indexstore and SEARCH (was Indexing store)

Reply via email to