RE: Integrate Indexstore and SEARCH (was Indexing store)

Wallmer, Martin Wed, 21 Jan 2004 07:05:36 -0800

Hi Stefano,

> -----Original Message-----
> From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED]
> Sent: Mittwoch, 21. Januar 2004 12:53
> To: Slide Developers Mailing List
> Subject: Re: Integrate Indexstore and SEARCH (was Indexing store)
> 
> 
> Martin,
> 
> thanks for leading this forward.
> 
> On 21 Jan 2004, at 11:19, Wallmer, Martin wrote:
> 
> > Hi,
> >
> > to comprise the discussion (as I understood):
> >
> >   - we need a fast search engine for content
> >   - we need a fast search engine for properties
> >   - security issues must be taken into account
> >   - it should be integrated with the SEARCH helper
> 
> agreed
> 
> >   - if we have two different implementations for properties
> >     and content, result must be merged, which might have a
> >     performance drawback
> 
> correct
> 
> >   - if we use one implementation for both, it might be necessary
> >     to replicate data, or content search might not perform well
> 
> data replication doesn't concern me if stores can be made somewhat 
> "layered" via interception.
> 
> >   - we should be open to think about content type specific
> >     search engines. AKAIK Lucene is perfect for text based 
> documents,
> >     but think of an engine, that finds text in scanned 
> documents (jpeg)
> >     using OCR, or find sinfonies in F major in sound files :-)
> 
> Yes. Just like we can plug in different stores in different parts of 
> the tree, we should be able to associate different search engines to 
> these stores.
> 
> > so pros and cons for both scenarios.
> >
> >
> >
> > I'd like to suggest the following:
> >
> >  1. Introduce the index store (as Christophe already done).
> >     The index store has on principle the methods
> >     - index (uri, nodeRevisionDescriptor, nodeRevisionContent)
> >     - drop  (uri, nodeRevisionDescriptor)
> >     They are called at create, update, delete. No search method!
> 
> I'm not sure I agree that it should be the store that does the 
> indexing. Aren't indexing and storing two orthogonal aspects 
> of content 
> management?


If you like we could call it just "Indexer" :-)

> 
> Just like we have a way to plug different stores at different 
> parts of 
> the tree, shouldn't we allow for plug in different indexers for 
> different parts of the tree?
> 
> For example, we could have
> 
> <stores>
>   /foo/** -> FileStore
>   /bar/** -> RDBMSStore
> </stores>
> 
> then
> 
> <indexers>
>   /foo/images/** -> OCRIndexer
>   /foo/symphonies/** -> SoundIndexer
>   *.xml -> XMLIndexer
>   @* -> RDBMSIndexer
>   * -> LuceneIndexer
>   * -> RDBMSIndexer
> </indexers>
> 

you could write an indexer, that knows how to index different content
types, the index method could gain this info from NoderRevisionDescriptor.
So you could have one Indexer for one store.


> this also means that, for example, accessing "/blah/mydocument.xml" 
> would yield three potential query engines: Lucene, RDBMS and XML, and 
> one can decide which one to use depending on the type of 
> query one has 
> to perform.

Is it possible to integrate an own indexer with Lucene? Extract the metadata and 
pass it to Lucene to store it? Sou you would only need one Lucene search engine.

In Domain.xml we could have something like:

<store name="jdbc" classname="org.apache.slide.store.BindingStore">
  <nodestore classname="org.apache.slide.store.impl.rdbms.JDBCStore">
    <parameter name="driver">com.mysql.jdbc.Driver</parameter>
    <parameter 
name="url">jdbc:mysql://localhost:3306/test?autoReconnect=true</parameter>
    <parameter name="user">wam</parameter>
    <parameter name="password"/>
    <parameter 
name="adapter">org.apache.slide.store.impl.rdbms.MySqlRDBMSAdapter</parameter>
  </nodestore>
  <securitystore>
    <reference store="nodestore"/>
  </securitystore>
  <lockstore>
    <reference store="nodestore"/>
  </lockstore>
  <revisiondescriptorsstore>
    <reference store="nodestore"/>
  </revisiondescriptorsstore>
  <revisiondescriptorstore>
    <reference store="nodestore"/>
  </revisiondescriptorstore>
  <contentstore classname="org.apache.slide.store.txfile.TxFileContentStore">
    <parameter name="rootpath">mysql/store/content</parameter>
    <parameter name="workpath">mysql/work/content</parameter>
  </contentstore>
  <indexer classname="my.lucene.Indexer"/>
  <searchengine>
    <parameter name="propertySearchClass">my.sql.SearchEngine</parameter>
    <parameter name="ContentSearchClass">my.Lucene.SearchEngine</parameter>
  </searchengine>
</store>

Does this make sense?


Regards,
Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Integrate Indexstore and SEARCH (was Indexing store)

Reply via email to