Hi Murray, Thanks for the info.
You're right, the only reason to store plain text is to permit searching. I think your approach is valid for me. I don't know anything about Lucene, thereby I have much to read, investigate, ... Soon I'll come back with more questions ... :) Regards, Xoan 2005/4/22, Murray Altheim <[EMAIL PROTECTED]>: > Xoan, > > All searches happen this way, but that process of indexing goes > on *before* the user does the search, which is why it seems fast. > I've integrated Lucene into my Xindice collections, with a > listener that notes when a document is created, changed or deleted. > There's an initial cost of indexing the whole collection (if the > database is populated all at once), but the cost is incremental > and almost unnoticeable otherwise. > > Because Lucene uses a model whereby you feed documents to various > indexers depending on their type (so a text document goes to a > different one than an HTML document, which needs a text stripper > to remove the markup), you don't need a separate text document > stored for each HTML document, if the only reason you're doing > that is having the text available for searching. You only create > the text temporarily for the indexer to function, then dump it. > > Murray