Hi all,

I have an item in the portal_catalog of my Plone site that has some
string as description.  The real object meanwhile has had a code
change so the description field now returns unicode.  When I now
recatalog that object it throws an error:

  Module Products.ZCatalog.Catalog, line 359, in catalogObject
  Module Products.ZCatalog.Catalog, line 318, in updateMetadata
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 159: 
ordinal not in range(128)
> /home/maurits/buildout/projectdeploy/parts/zope2/lib/python/Products/ZCatalog/Catalog.py(318)updateMetadata()
-> if data.get(index, 0) != newDataRecord:

This happens when the current data in the catalog get compared to the
new data.  If there is a difference, the new data is stored.  But to
compare the old string with the new unicode the string is converted to
unicode.  This fails because the string has non ascii characters in
it.  So basically what happens is this error:

>>> unicode("ä", 'utf-8') == u"ä"
True
>>> "ä" == u"ä"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal 
not in range(128)

Logical enough.  This can be fixed in ZCatalog:

[EMAIL PROTECTED]:~/svn/Zope-210/lib/python/Products/ZCatalog $ svn diff
Index: Catalog.py
===================================================================
--- Catalog.py  (revision 84388)
+++ Catalog.py  (working copy)
@@ -304,7 +304,15 @@
                 # meta_data is stored as a tuple for efficiency
                 data[index] = newDataRecord
         else:
-            if data.get(index, 0) != newDataRecord:
+            try:
+                changed = data.get(index, 0) != newDataRecord
+            except UnicodeDecodeError:
+                # Converting some string to unicode fails.  This
+                # conversion happens when a string and a unicode need
+                # to be compared.  Those two are not the same, so
+                # logically there has been a change, so:
+                changed = True
+            if changed:
                 data[index] = newDataRecord
         return index
         
Index: tests/testCatalog.py
===================================================================
--- tests/testCatalog.py        (revision 84388)
+++ tests/testCatalog.py        (working copy)
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 ##############################################################################
 #
 # Copyright (c) 2002 Zope Corporation and Contributors. All Rights Reserved.
@@ -177,6 +177,13 @@
     def __nonzero__(self):
         self.fail("__nonzero__() was called")
 
+class zdummyText(ExtensionClass.Base):
+    def __init__(self, text):
+        self.text = text
+
+    def title(self):
+        return self.text
+
 class FakeTraversalError(KeyError):
     """fake traversal exception for testing"""
 
@@ -261,6 +268,12 @@
         data = self._catalog.getMetadataForUID('1')
         self.assertEqual(data['title'], '1')
 
+        text = zdummyText('A string with an accent: \xc3\xa4.')
+        self._catalog.catalog_object(text, '1')
+        text.text = unicode("A simple unicode.")
+        self._catalog.catalog_object(text, '1')
+        
+
     def testReindexIndexDoesntDoMetadata(self):
         self.d['0'].num = 9999
         self._catalog.reindexIndex('title', {})
===================================================================

With that change it works: on the live site I can edit and save that
item without errors.

Without the change to the code, the added test fails at precisely the
point where the change should be done.  But if I change the code the
test still fails because something similar goes wrong in the
KeywordIndex, with this traceback:

===================================================================
Error in test testUpdateMetadata 
(Products.ZCatalog.tests.testCatalog.TestZCatalog)
Traceback (most recent call last):
  File "unittest.py", line 260, in run
    testMethod()
  File 
"/home/maurits/svn/Zope-210/lib/python/Products/ZCatalog/tests/testCatalog.py", 
line 274, in testUpdateMetadata
    self._catalog.catalog_object(text, '1')
  File "/home/maurits/svn/Zope-210/lib/python/Products/ZCatalog/ZCatalog.py", 
line 536, in catalog_object
    update_metadata=update_metadata)
  File "/home/maurits/svn/Zope-210/lib/python/Products/ZCatalog/Catalog.py", 
line 368, in catalogObject
    blah = x.index_object(index, object, threshold)
  File 
"/home/maurits/svn/Zope-210/lib/python/Products/PluginIndexes/common/UnIndex.py",
 line 235, in index_object
    res += self._index_object(documentId, obj, threshold, attr)
  File 
"/home/maurits/svn/Zope-210/lib/python/Products/PluginIndexes/KeywordIndex/KeywordIndex.py",
 line 85, in _index_object
    fdiff = difference(oldKeywords, newKeywords)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25: 
ordinal not in range(128)
===================================================================

This is a bit trickier to fix, as the variable fdiff that is
calculated here is needed later on.


But at this point I would like to ask: is this a solution direction I
want to explore?  Is the basic fix above sane?

Or should we change nothing here and should add-on developers just be
careful of what they let end up in the catalog?

The fix/workaround from a user's point of view is: clear and rebuild
the catalog as that gets rid of any old data so no comparison needs to
be done anymore.  That solves the problem for me.


For reference, the PoiIssue from above was created when its class had
this method (simplified):

    def Description(self):
        # return the contents of the details field
        return self.getRawDetails()

And currently the code is this:

    def Description(self):
        details = self.getRawDetails()
        if not isinstance(details, unicode):
            encoding = getSiteEncoding(self)
            details = unicode(details, encoding)
        return details

And that change means to solve another occasional unicode error when
adding issues in Japanese: http://plone.org/products/poi/issues/135

I am the maintainer of Poi btw, and I am writing some migration code
now that triggers this error.  So writing some other migration to
first fix that recatalog issue specifically for the Poi content is
doable too.


-- 
Maurits van Rees | http://maurits.vanrees.org/
            Work | http://zestsoftware.nl/
"This is your day, don't let them take it away." [Barlow Girl]

_______________________________________________
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )

Reply via email to