Hi Murray,

Thanks for the info.

You're right, the only reason to store plain text is to permit searching.

 I think your approach is valid for me. I don't know anything about
Lucene, thereby I have much to read, investigate, ...
Soon I'll come back with more questions ... :)

Regards,

Xoan

2005/4/22, Murray Altheim <[EMAIL PROTECTED]>:

> Xoan,
> 
> All searches happen this way, but that process of indexing goes
> on *before* the user does the search, which is why it seems fast.
> I've integrated Lucene into my Xindice collections, with a
> listener that notes when a document is created, changed or deleted.
> There's an initial cost of indexing the whole collection (if the
> database is populated all at once), but the cost is incremental
> and almost unnoticeable otherwise.
> 
> Because Lucene uses a model whereby you feed documents to various
> indexers depending on their type (so a text document goes to a
> different one than an HTML document, which needs a text stripper
> to remove the markup), you don't need a separate text document
> stored for each HTML document, if the only reason you're doing
> that is having the text available for searching. You only create
> the text temporarily for the indexer to function, then dump it.
> 
> Murray

Reply via email to