Re: [Zope-dev] ZCatalog: updateMetadata and comparing string and unicode

2008-03-06 Thread Dieter Maurer
Maurits van Rees wrote at 2008-3-5 23:57 +:
 ...
I have an item in the portal_catalog of my Plone site that has some
string as description.  The real object meanwhile has had a code
change so the description field now returns unicode.  When I now
recatalog that object it throws an error:

  Module Products.ZCatalog.Catalog, line 359, in catalogObject
  Module Products.ZCatalog.Catalog, line 318, in updateMetadata
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 159: 
ordinal not in range(128)
 /home/maurits/buildout/projectdeploy/parts/zope2/lib/python/Products/ZCatalog/Catalog.py(318)updateMetadata()
- if data.get(index, 0) != newDataRecord:

You must not mix unicode and str as keys in the same index.
If you do, errors as the above are very likely.

You can try the following approaches:

  *  if you know the encoding used by your str objects,
 you can set Python's default encoding to this encoding.
 Whenever unicode and str come together, the str
 is converted to unicode using this encoding (which hopefully
 is the correct one in all such cases).

 sys.setdefaultencoding is only available at startup.
 Thus, setting defaultencoding must happen in a sitecustomize
 or site module.

  *  You completely switch to unicode for the given index
 and convert the BTrees used be the index.

 An index usually uses two BTrees: the so called forward index
 (usually called _index)
 (it maps the index terms to sets of record ids indexed under this term)
 and the reverse index (usually called _unindex)
 (it maps record ids to the values corresponding
 to these objects).

 You need to convert the keys of the forward index
 and the values of the reverse index. For a FieldIndex,
 the value is the index term, for a KeywordIndex it it
 a sequence of index terms (all need be converted).

 The forward index can be converted as follows:

 self._index = OOBTree(((s.decode(your encoding), v) for (s,v) in 
self._index.items()))

 The reverse index uses an IOBTree and is similar to the above.
 But the details depend on index type.



-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] ZCatalog: updateMetadata and comparing string and unicode

2008-03-06 Thread Benji York

Dieter Maurer wrote:

 sys.setdefaultencoding is only available at startup.
 Thus, setting defaultencoding must happen in a sitecustomize
 or site module.


Or if you're sufficiently devious, it's available any time (not that 
actually using it is a good idea, but...):


 import sys
 sys.setdefaultencoding
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: 'module' object has no attribute 'setdefaultencoding'
 del sys.modules['sys']
 import sys
 sys.setdefaultencoding
built-in function setdefaultencoding

--
Benji York
Senior Software Engineer
Zope Corporation
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] ZCatalog: updateMetadata and comparing string and unicode

2008-03-05 Thread Maurits van Rees
Hi all,

I have an item in the portal_catalog of my Plone site that has some
string as description.  The real object meanwhile has had a code
change so the description field now returns unicode.  When I now
recatalog that object it throws an error:

  Module Products.ZCatalog.Catalog, line 359, in catalogObject
  Module Products.ZCatalog.Catalog, line 318, in updateMetadata
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 159: 
ordinal not in range(128)
 /home/maurits/buildout/projectdeploy/parts/zope2/lib/python/Products/ZCatalog/Catalog.py(318)updateMetadata()
- if data.get(index, 0) != newDataRecord:

This happens when the current data in the catalog get compared to the
new data.  If there is a difference, the new data is stored.  But to
compare the old string with the new unicode the string is converted to
unicode.  This fails because the string has non ascii characters in
it.  So basically what happens is this error:

 unicode(ä, 'utf-8') == uä
True
 ä == uä
Traceback (most recent call last):
  File stdin, line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal 
not in range(128)

Logical enough.  This can be fixed in ZCatalog:

[EMAIL PROTECTED]:~/svn/Zope-210/lib/python/Products/ZCatalog $ svn diff
Index: Catalog.py
===
--- Catalog.py  (revision 84388)
+++ Catalog.py  (working copy)
@@ -304,7 +304,15 @@
 # meta_data is stored as a tuple for efficiency
 data[index] = newDataRecord
 else:
-if data.get(index, 0) != newDataRecord:
+try:
+changed = data.get(index, 0) != newDataRecord
+except UnicodeDecodeError:
+# Converting some string to unicode fails.  This
+# conversion happens when a string and a unicode need
+# to be compared.  Those two are not the same, so
+# logically there has been a change, so:
+changed = True
+if changed:
 data[index] = newDataRecord
 return index
 
Index: tests/testCatalog.py
===
--- tests/testCatalog.py(revision 84388)
+++ tests/testCatalog.py(working copy)
@@ -1,3 +1,4 @@
+# -*- coding: utf-8 -*-
 ##
 #
 # Copyright (c) 2002 Zope Corporation and Contributors. All Rights Reserved.
@@ -177,6 +177,13 @@
 def __nonzero__(self):
 self.fail(__nonzero__() was called)
 
+class zdummyText(ExtensionClass.Base):
+def __init__(self, text):
+self.text = text
+
+def title(self):
+return self.text
+
 class FakeTraversalError(KeyError):
 fake traversal exception for testing
 
@@ -261,6 +268,12 @@
 data = self._catalog.getMetadataForUID('1')
 self.assertEqual(data['title'], '1')
 
+text = zdummyText('A string with an accent: \xc3\xa4.')
+self._catalog.catalog_object(text, '1')
+text.text = unicode(A simple unicode.)
+self._catalog.catalog_object(text, '1')
+
+
 def testReindexIndexDoesntDoMetadata(self):
 self.d['0'].num = 
 self._catalog.reindexIndex('title', {})
===

With that change it works: on the live site I can edit and save that
item without errors.

Without the change to the code, the added test fails at precisely the
point where the change should be done.  But if I change the code the
test still fails because something similar goes wrong in the
KeywordIndex, with this traceback:

===
Error in test testUpdateMetadata 
(Products.ZCatalog.tests.testCatalog.TestZCatalog)
Traceback (most recent call last):
  File unittest.py, line 260, in run
testMethod()
  File 
/home/maurits/svn/Zope-210/lib/python/Products/ZCatalog/tests/testCatalog.py, 
line 274, in testUpdateMetadata
self._catalog.catalog_object(text, '1')
  File /home/maurits/svn/Zope-210/lib/python/Products/ZCatalog/ZCatalog.py, 
line 536, in catalog_object
update_metadata=update_metadata)
  File /home/maurits/svn/Zope-210/lib/python/Products/ZCatalog/Catalog.py, 
line 368, in catalogObject
blah = x.index_object(index, object, threshold)
  File 
/home/maurits/svn/Zope-210/lib/python/Products/PluginIndexes/common/UnIndex.py,
 line 235, in index_object
res += self._index_object(documentId, obj, threshold, attr)
  File 
/home/maurits/svn/Zope-210/lib/python/Products/PluginIndexes/KeywordIndex/KeywordIndex.py,
 line 85, in _index_object
fdiff = difference(oldKeywords, newKeywords)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 25: 
ordinal not in range(128)
===

This is a bit trickier to fix,