--On 12. Dezember 2005 14:54:09 -0500 "Garth B." <[EMAIL PROTECTED]> wrote:
- Digging further in this file, "mimetype" is only defined when extract_content() in content.py calls "icc.addBinary(...)". This only happens when the indexed object provides a txng_get() hook (or I suppose if an adapter exists).
Exactly. That's the indented behavior.
That whole block (around lines 81 - 93) never gets hit with my PDFs or Word docs during indexing. When I index a large number of PDFs I will get a number of TypeErrors raised around line 110 when extract_content() notices that the data isn't a [unicode] string.
Likely because your implementation does not provide the txng_hook. I *strongly* recommended providing an adapter for IIndexableContent. The original behavior of TXNG 2.X to provide binary content content through an attribute or a method (which is the default behavior of almost index implementations) is no longer supported in 3.X because it just sucks. So either use txng_get() (which is deprecated for 3.X) or implemented the IIndexableContent API. That's the way to go.
Description: PGP signature
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )