Hello Christophe,

Stefano Mazzocchi <[EMAIL PROTECTED]> writes:

> my humble suggestion would be to implement a full-text search using
> lucene as the backend and connecting it thru DASL. That would be, IMO,
> the most elegant way to add full-text indexing of documents.
> 
> 
> In order to get the content indexed, I would write an interceptor and
> feed lucene with content everytime some new content gets in. Note that
> lucene is very modular, so it would be possible to even write
> mime-type-aware parsers and tokenizers (for example, indexing PDF
> documents or Word documents (thru POI)). But if you just want to do
> text, HTML and XML, I think lucene ships with those tokenizers already.
> 
> 
> So, the idea is
> 
>   document --(PUT)---> interceptor ---> Store
>                            |
>                            v
>                          lucene
>                            ^
>                            |
>   request --(SEARCH)---> LuceneSearchImpl ---\
>                                              |
>   response --(PROPFIND-like) <---------------/
> 
> NOTE: you need to intercept also CHECKIN in a deltaV-aware repository.

Depending on your requirements, you may also access the Lucene 
files directly for querying and use the WebDAV layer only for 
indexing. I use this architecture for my project. 

However, if you use the interceptor, you will have problems
with rollbacks. I think, the interceptors are not informed,if
a rollback happened.

Btw. there is still a issue with NodeRevisionContent. You must not call
streamContent or readContent in a interceptors "pre" methods, but only 
getContent or getContentBytes. Otherwise the store may  get an exhausted stream.
On the other side if you call any of those read methods in a interceptors
"post" method, you may not get any content at all. 

Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to