Jim Fulton wrote at 2005-5-27 10:45 -0400: > ... >> You cannot make text extraction cheap (as it handles potentially large >> data). > >You can't make it cheap in all applications. For most applications, >text extraction and comparison is very cheap. > >I'm guessing that you are refering to indexing large (book size) >documents. I would argue that this is pretty specialized.
No, I am speaking about a repository with office documents (letters, reports, drafts, documentation, ...) which apparently is not too rare at least in a Plone like context. >And it is usually not the case that text extraction is expensive. I analysed last year text extraction from office documents. WVware extraction for documents in the order of 1 MB size took time in the order of seconds; OpenOffice text extraction in the order of 10 seconds (after optimization; standard - twice as much). Definitely, I do not like this time for any change in a metadatum or a workflow change. While a user accepts some seconds delays when he uploads large documents, he feels it unacceptable to wait for seconds when he performs e.g. a workflow action on such a document. -- Dieter _______________________________________________ Zope3-dev mailing list Zope3email@example.com Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com