Hi Stefano,


>my humble suggestion would be to implement a full-text search using
>lucene as the backend and connecting it thru DASL. That would be, IMO,
>the most elegant way to add full-text indexing of documents.
>In order to get the content indexed, I would write an interceptor and
>feed lucene with content everytime some new content gets in. Note that
>lucene is very modular, so it would be possible to even write
>mime-type-aware parsers and tokenizers (for example, indexing PDF
>documents or Word documents (thru POI)). But if you just want to do
>text, HTML and XML, I think lucene ships with those tokenizers already.
>
>So, the idea is
>
>  document --(PUT)---> interceptor ---> Store
>                           |
>                           v
>                         lucene
>                           ^
>                           |
>  request --(SEARCH)---> LuceneSearchImpl ---\
>                                             |
>  response --(PROPFIND-like) <---------------/
>
>NOTE: you need to intercept also CHECKIN in a deltaV-aware repository.


ok , if I understand your design, all interactions with Lucene or another search 
engine  is made in the slide webdav layer (mainly org.apache.slide.webdav.method) 
package. Correct ? 

I'm not using webdav within Slide. I'm using directly the Slide Helper classes into 
our application. Maybe I'm the only one in this mailing list but we have integrate 
Slide into an application via the helper classes :-) . Webdav is not required for all 
CMS solutions. So, Is it not interesting to make interaction with Lucene in the Slide 
kernel ?  via helper class or whatever ? 


>this allows you to keep whatever store you want for persistance and
>allows DASL clients to interoperate directly (note that DASL has
>pluggable search languages, for example, (modified from the DASL spec))


Sound good to keep a data storage for propeties and other info which not really 
connected to the search engine but What about searching on properties AND content ?

The simplest solution can be to write an Lucene implementation for the 
RevisionDescriptor Store because Lucene can manage properties. Currently, I don't know 
if it is good or not. Otherwise if we want to search on props and content, we need to 
synchronize the lucene index and the persistence layer.

Thanks for this first brain storming !

 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to