On 25 Nov 2003, at 23:22, Christophe Lombart wrote:

I want to restart a debate on Lucene. My customer ask me to have a search engine in our application in short term.

I know there is a implementation for searching document based on webdav search but in my case, I'm not using the webdav part from Slide.
We are using directly the helper classes. So, in a such context, I think tools like lucene sound great for me. I didn't check completly the current search implemenation but what about the performance ? What about the slide object tree lookup when a search is started ? What about the performance for a full text search ? Obviously, Lucene is quite robust for that.

From now, I don't know if this integration is complex but what not to try ? Please clarify your position, it is not clear for me. Why are you not interesting by Lucene ? If you gives some recommandations, I'm ok to start this integration.

Christophe,

my humble suggestion would be to implement a full-text search using lucene as the backend and connecting it thru DASL. That would be, IMO, the most elegant way to add full-text indexing of documents.

In order to get the content indexed, I would write an interceptor and feed lucene with content everytime some new content gets in. Note that lucene is very modular, so it would be possible to even write mime-type-aware parsers and tokenizers (for example, indexing PDF documents or Word documents (thru POI)). But if you just want to do text, HTML and XML, I think lucene ships with those tokenizers already.

So, the idea is

 document --(PUT)---> interceptor ---> Store
                          |
                          v
                        lucene
                          ^
                          |
 request --(SEARCH)---> LuceneSearchImpl ---\
                                            |
 response --(PROPFIND-like) <---------------/

NOTE: you need to intercept also CHECKIN in a deltaV-aware repository.

this allows you to keep whatever store you want for persistance and allows DASL clients to interoperate directly (note that DASL has pluggable search languages, for example, (modified from the DASL spec))

SEARCH / HTTP/1.1
Host: jakarta.apache.org
Content-Type: application/xml
Content-Length: xxx

<?xml version="1.0" encoding="UTF-8"?>
<d:searchrequest xmlns:d="DAV:"
  xmlns:text="http://apache.org/slide/search/fulltext/1.0";>
  <text:query>slide AND DASL</text:query>
</d:searchrequest>

might yield

HTTP/1.1 207 Multi-Status
Content-Type: text/xml; charset="utf-8"
Content-Length: xxx

<?xml version="1.0" encoding="UTF-8"?>
<D:multistatus xmlns:D="DAV:" xmlns:meta="http://whatever.org/";>
  <D:response>
    <D:href>http://jakarta.apache.org/slide/index.html</D:href>
    <D:propstat>
      <D:prop>
        ...
      </D:prop>
      <D:status>HTTP/1.1 200 OK</D:status>
    </D:propstat>
  </D:response>
  ...
</D:multistatus>

Note that language capability discovery is done thru the OPTION call:

OPTIONS / HTTP/1.1
Host: jakarta.apache.org

might return

HTTP/1.1 200 OK
Allow: OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, COPY, MOVE
Allow: MKCOL, PROPFIND, PROPPATCH, LOCK, UNLOCK, SEARCH
DASL: <DAV:basicsearch>
DASL: <http://apache.org/slide/search/fulltext/1.0>

and the last line indicates that we support that specific type of query language.

Hope this helps.

--
Stefano.

Attachment: smime.p7s
Description: S/MIME cryptographic signature



Reply via email to