Jim Fulton wrote at 2005-5-27 10:45 -0400:
>> You cannot make text extraction cheap (as it handles potentially large
>You can't make it cheap in all applications. For most applications,
>text extraction and comparison is very cheap.
>I'm guessing that you are refering to indexing large (book size)
>documents. I would argue that this is pretty specialized.
No, I am speaking about a repository with office documents (letters,
reports, drafts, documentation, ...) which apparently is not too
rare at least in a Plone like context.
>And it is usually not the case that text extraction is expensive.
I analysed last year text extraction from office documents.
WVware extraction for documents in the order of 1 MB size
took time in the order of seconds; OpenOffice text extraction
in the order of 10 seconds (after optimization; standard - twice
Definitely, I do not like this time for any change in a metadatum
or a workflow change. While a user accepts some seconds delays
when he uploads large documents, he feels it unacceptable to
wait for seconds when he performs e.g. a workflow action on such
Zope3-dev mailing list