Re: [Zope3-dev] zope.tal.xmlparser.XMLParser() dislikes unicode

2007-01-14 Thread Andreas Jung



--On 14. Januar 2007 10:48:06 +0100 Bernd Dorn <[EMAIL PROTECTED]> 
wrote:



I am not sure if this behavior is intentional?! Is the XMLParser
supposed
to deal with unicode strings or will it only accept a standard
Python string? A workaround inside parseString() would to check for
unicode
and convert the string on-the-fly to a Python string with utf-8
encoding.
This is possibly a limitation of the underlying Expat parser...any
recommendation how to deal with this issue?


IMHO it should only accept strings, because in the value should be a xml
string and therefore always has to be encoded in 'utf-8' or in the
encoding specified in the processing instruction.



I disagree with that. Since Zope 3 is supposed to use unicode internally
(at least that's the legend) it should support unicode also at the parser 
level. Other languages like Java store XML also as unicode strings and 
support parsing it.


Andreas



pgp8ib4BIWYFC.pgp
Description: PGP signature
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] zope.tal.xmlparser.XMLParser() dislikes unicode

2007-01-14 Thread Bernd Dorn


On 13.01.2007, at 18:49, Andreas Jung wrote:


Hi,

the XMLParser.parseString() method  raises an exception

 File "/opt/python-2.4.4/lib/python2.4/unittest.py", line 260, in run
   testMethod()
 File "/Users/ajung_data/sandboxes/Zope/Zope/lib/python/zope/tal/ 
tests/test_xmlparser.py", line 127, in test_xx

   self._run_check(xml, ())
 File "/Users/ajung_data/sandboxes/Zope/Zope/lib/python/zope/tal/ 
tests/test_xmlparser.py", line 106, in _run_check

   parser.parseString(source)
 File "/Users/ajung_data/sandboxes/Zope/Zope/lib/python/zope/tal/ 
xmlparser.py", line 77, in parseString

   self.parser.Parse(s, 1)
UnicodeEncodeError: 'ascii' codec can't encode characters in  
position 43-48: ordinal not in range(128)


if the string to be parsed is a unicode strings and contains some  
non-ascii
chars. The following snippet from a private unittest  
(test_xmlparsers.py)

shows the error.

   def test_xx(self):
   xml = unicode('>üöä', 'iso-8859-15')

   self._run_check(xml, ())

I am not sure if this behavior is intentional?! Is the XMLParser  
supposed
to deal with unicode strings or will it only accept a standard  
Python string? A workaround inside parseString() would to check for  
unicode
and convert the string on-the-fly to a Python string with utf-8  
encoding.
This is possibly a limitation of the underlying Expat parser...any  
recommendation how to deal with this issue?


IMHO it should only accept strings, because in the value should be a  
xml string and therefore always has to be encoded in 'utf-8' or in  
the encoding specified in the processing instruction.


Bernd



Andras




___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/zope- 
mailinglist%40mopa.at




___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com