Hello Christophe, Stefano Mazzocchi <[EMAIL PROTECTED]> writes:
> my humble suggestion would be to implement a full-text search using > lucene as the backend and connecting it thru DASL. That would be, IMO, > the most elegant way to add full-text indexing of documents. > > > In order to get the content indexed, I would write an interceptor and > feed lucene with content everytime some new content gets in. Note that > lucene is very modular, so it would be possible to even write > mime-type-aware parsers and tokenizers (for example, indexing PDF > documents or Word documents (thru POI)). But if you just want to do > text, HTML and XML, I think lucene ships with those tokenizers already. > > > So, the idea is > > document --(PUT)---> interceptor ---> Store > | > v > lucene > ^ > | > request --(SEARCH)---> LuceneSearchImpl ---\ > | > response --(PROPFIND-like) <---------------/ > > NOTE: you need to intercept also CHECKIN in a deltaV-aware repository. Depending on your requirements, you may also access the Lucene files directly for querying and use the WebDAV layer only for indexing. I use this architecture for my project. However, if you use the interceptor, you will have problems with rollbacks. I think, the interceptors are not informed,if a rollback happened. Btw. there is still a issue with NodeRevisionContent. You must not call streamContent or readContent in a interceptors "pre" methods, but only getContent or getContentBytes. Otherwise the store may get an exhausted stream. On the other side if you call any of those read methods in a interceptors "post" method, you may not get any content at all. Martin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
