[issue7185] csv reader utf-8 BOM error

2009-10-22 Thread Walter Dörwald
Walter Dörwald added the comment: Then the solution should simply be to use "utf-8-sig" as the encoding, instead of "utf-8". -- ___ Python tracker <http://bu

[issue7185] csv reader utf-8 BOM error

2009-10-22 Thread Walter Dörwald
Walter Dörwald added the comment: http://docs.python.org/library/csv.html#module-csv states: This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe

[issue7138] elementtree segfaults on invalid xml declaration

2009-10-15 Thread Walter Dörwald
Walter Dörwald added the comment: Here is a stacktrace of the crash with the system Python 2.6.1 on Mac OS X 10.6.1: Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x00010100 0x7fff810f96b8 in XML_SetEncoding () (gdb) bt #0

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-08-12 Thread Walter Dörwald
Changes by Walter Dörwald : -- nosy: -doerwalter ___ Python tracker <http://bugs.python.org/issue2636> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue6331] Add unicode script info to the unicode database

2009-07-01 Thread Walter Dörwald
Walter Dörwald added the comment: Here is a new version that includes a new function scriptl() that returns the script name in lowercase. -- Added file: http://bugs.python.org/file14418/unicode-script-3.diff ___ Python tracker <h

[issue6331] Add unicode script info to the unicode database

2009-06-25 Thread Walter Dörwald
Walter Dörwald added the comment: I was comparing apples and oranges: The 229 entries for the trunk where for an UCS2 build (the patched version was UCS4), with UCS4 there are 317 entries for the trunk. size unicodedata.o gives: __TEXT __DATA __OBJC others dec hex 13622 587057 0

[issue6331] Add unicode script info to the unicode database

2009-06-24 Thread Walter Dörwald
Changes by Walter Dörwald : Added file: http://bugs.python.org/file14356/unicode-script-2.diff ___ Python tracker <http://bugs.python.org/issue6331> ___ ___ Python-bug

[issue6331] Add unicode script info to the unicode database

2009-06-24 Thread Walter Dörwald
Walter Dörwald added the comment: Martin v. Löwis wrote: > Martin v. Löwis added the comment: > > I think the patch is incorrect: the default value for the script > property ought to be Unknown, not Common (despite UCD.html saying the > contrary; see UTR#24 and Scripts.txt).

[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

2009-06-23 Thread Walter Dörwald
Walter Dörwald added the comment: http://bugs.python.org/6331 is a patch that adds unicode script info to the unicode database. -- nosy: +doerwalter ___ Python tracker <http://bugs.python.org/issue2

[issue6331] Add unicode script info to the unicode database

2009-06-23 Thread Walter Dörwald
New submission from Walter Dörwald : This patch adds a function unicodedata.script() that returns information about the script of the Unicode character. -- components: Unicode files: unicode-script.diff keywords: patch messages: 89642 nosy: doerwalter severity: normal status: open title

[issue6213] Incremental encoder incompatibility between 2.x and py3k

2009-06-08 Thread Walter Dörwald
Walter Dörwald added the comment: AFAICR the difference is: 2.x may return any object in getstate(), but py3k must return a (buffered input, integer) tuple. Simply moving py3ks getstate/setstate implementation over to 2.x might do the trick

[issue6213] Incremental encoder incompatibility between 2.x and py3k

2009-06-08 Thread Walter Dörwald
Walter Dörwald added the comment: This was done because the codec state is part of the return value of tell(). To have a reasonable return value (i.e. one with just the position itself) in as many cases as possible it makes sense to design the codec state in such a way, that the most common

[issue2661] Mapping tests cannot be passed by user implementations

2009-05-26 Thread Walter Dörwald
Walter Dörwald added the comment: Any custom mapping class should have a repr test anyway, so IMHO it doesn't matter whether the base test has a repr test or not. The suggested fixes for TestMappingProtocol.test_fromkeys() and TestHashMappingProtocol.test_mutatingiteration() sound OK ho

[issue1866] const arg for PyInt_FromString

2009-05-26 Thread Walter Dörwald
Walter Dörwald added the comment: The patch no longer applies cleanly to the trunk. -- nosy: +doerwalter ___ Python tracker <http://bugs.python.org/issue1

[issue3739] unicode-internal encoder reports wrong length

2009-05-06 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in: r72404,72406 (trunk) r72408 (py3k) As IMHO this is somewhat between a feature and a bugfix, I didn't check it into release26-maint and release30-maint. -- resolution: -> fixed status: open -

[issue5108] Invalid UTF-8 ("%s") length in PyUnicode_FromFormatV()

2009-05-03 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in: r72260 (trunk) r72262 (release26-maint) r72265 (py3k) r72266 (release30-maint) -- resolution: -> fixed status: open -> closed ___ Python tracker <http://bugs.python.org/

[issue5849] Idle 3.01 - invalid syntec error

2009-04-26 Thread Walter Dörwald
Walter Dörwald added the comment: This is not a bug in Python. In Python 3.0 "print" is a function, so print buildConnectionString(myParams) should read print(buildConnectionString(myParams)) Closing as invalid. -- nosy: +doerwalter resolution: -> invalid

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Changes by Walter Dörwald : -- assignee: doerwalter -> loewis ___ Python tracker <http://bugs.python.org/issue5828> ___ ___ Python-bugs-list mailing list Un

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: BTW, are the steps to regenerate the Unicode database documented somewhere? What I did was: cp /Volumes/ftp.unicode.org/Public/5.1.0/ucd/UnicodeData.txt . cp /Volumes/ftp.unicode.org/Public/5.1.0/ucd/CompositionExclusions.txt . cp /Volumes/ftp.unicode.org

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in: r71896 (py3k) r71897 (release30-maint) -- resolution: -> fixed status: open -> closed ___ Python tracker <http://bugs.python.org/

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in: r71894 (trunk) r71895 (release26-maint) -- ___ Python tracker <http://bugs.python.org/issue5828> ___ ___ Python-bug

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: I've merged your version of the patch with my changes to the test suite and regenerated the Unicode database. Attached is the resulting patch (diff4.txt) -- Added file: http://bugs.python.org/file13768/diff

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in: r71875 (trunk) r71876 (release26-maint) r71881 (py3k) r71885 (release30-maint) -- resolution: -> fixed status: open -> closed ___ Python tracker <http://bugs.python.org/

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: You're right checking both in unset() and __exit__() fixes the importlib failures. I'll check in the fix. -- ___ Python tracker <http://bugs.python.

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: This might have something to do with the _keymap hook. -- ___ Python tracker <http://bugs.python.org/issue5837> ___ ___ Pytho

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: OK, I'll remove the clear method (which is a new feature) and then check it in. -- ___ Python tracker <http://bugs.python.org/i

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: That's exactly what I was thinking too. Here's the patch. Running the test suite now. -- Added file: http://bugs.python.org/file13764/diff2.txt ___ Python tracker <http://bugs.python.

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: If you want to restore only those environment variables that have change you somehow have to record which *do* have changed, i.e. you'd have to go through EnvironmentVarGuard again. I'm working on a patch that

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Here's a patch that changes EnvironmentVarGuard to make a copy of os.environ at the start. The set and unset methods are useless now, but I left them in for backwards compatibility. Should they be removed? -- Added file: http://bugs.pytho

[issue5837] support.EnvironmentVarGuard broken

2009-04-25 Thread Walter Dörwald
New submission from Walter Dörwald : support.EnvironmentVarGuard seems to be broken: import os from test import support print(os.environ.get("HOME")) with support.EnvironmentVarGuard() as env: env.unset("HOME") env.set("HOME", "foo") print(os.e

[issue4951] failure in test_httpservers

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Hmm, EnvironmentVarGuard seems to be broken: import os from test import support with support.EnvironmentVarGuard() as env: env.unset("HOME") env.set("HOME", "bar") print(os.environ.get("HOME")) I would have

[issue4951] failure in test_httpservers

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: There's an EnvironmentVarGuard context manager in support.py that IMHO should be used for recording changes to the environment variables. Or a new context manager that does what your patch does could be put into support.py. There might be other tests

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Here is a third version of the patch. AFAICT the logic of the unicode database is as follows: * If the NODELTA_MASK is not set, delta is an offset. * If NODELTA_MASK is set and delta is != 0, delta is the upper/lower/title case character. * If NODELTA_MASK is

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Walter Dörwald
Walter Dörwald added the comment: Updated the patch (diff2.txt) as requested by Amaury. -- Added file: http://bugs.python.org/file13759/diff2.txt ___ Python tracker <http://bugs.python.org/issue5

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Walter Dörwald
Walter Dörwald added the comment: The following patch fixes the problem for me, however it breaks the test suite. The change seems to have been introduced in r66362. Assigning to Martin. -- assignee: -> loewis nosy: +loewis stage: -> patch review Added file: http://bugs.pyth

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Walter Dörwald
Walter Dörwald added the comment: It *does* return u'\u1d79' for me on Python 2.5.2: >>> u'\u1d79'.lower() u'\u1d79' >>> import sys >>> sys.version '2.5.2 (r252:60911, Apr 8 2008, 18:54:00) \n[GCC 3.3.5 (Debian 1:3.3.5-13)]

[issue5108] Invalid UTF-8 ("%s") length in PyUnicode_FromFormatV()

2009-04-09 Thread Walter Dörwald
Walter Dörwald added the comment: The problem with your patch is that it calls PyUnicode_DecodeUTF8() twice. It would be better if step 1 in the code would include the %s format specifiers and step 3 would then call PyUnicode_DecodeUTF8() and put the result into the callresults buffer. BTW, I

[issue5729] Allows tabs for indenting JSON output

2009-04-09 Thread Walter Dörwald
New submission from Walter Dörwald : This patchs makes it possible to use tabs for indenting the output of json.dumps(). With this patch the indent argument can now be either an integer specifying the number of spaces per indent level or a string specifying the indent string directly

[issue5723] Incomplete json tests

2009-04-08 Thread Walter Dörwald
Walter Dörwald added the comment: test_quopri has a decorator that calls a test using both the C and Python version of the tested function. This decorator looks like this: def withpythonimplementation(testfunc): def newtest(self): # Test default implementation testfunc(self

[issue5640] Wrong print() result when unicode error handler is not 'strict'

2009-04-03 Thread Walter Dörwald
Walter Dörwald added the comment: Indeed this patch does fix the bug. Go ahead and check it in. -- ___ Python tracker <http://bugs.python.org/issue5640> ___ ___

[issue5640] Wrong print() result when unicode error handler is not 'strict'

2009-04-01 Thread Walter Dörwald
Walter Dörwald added the comment: I can confirm this problem in the current version in the py3k branch. This seems to be a problem in the CJK codecs. Assigning to Hye Shik Chang. -- assignee: -> hyeshik.chang nosy: +doerwalter, hyeshik.chang stage: -> needs

[issue5094] datetime lacks concrete tzinfo impl. for UTC

2009-02-11 Thread Walter Dörwald
Walter Dörwald added the comment: The patch doesn't include any changes to the documentation. -- nosy: +doerwalter ___ Python tracker <http://bugs.python.org/i

[issue1076233] distutils.core.setup() with unicode arguments broken

2009-02-11 Thread Walter Dörwald
Walter Dörwald added the comment: It does indeed work with Python 2.6 (however not with 2.5). Closing. -- resolution: -> out of date status: open -> closed ___ Python tracker <http://bugs.python.org/iss

[issue5135] Expose simplegeneric function in functools module

2009-02-04 Thread Walter Dörwald
Walter Dörwald added the comment: The patch looks fine to me. Tests pass. I have no opinion about the name. Both "simplegeneric" and "generic" are OK to me. I wonder if being able to use register() directly instead of as a decorator should be dropped. Also IMHO the

[issue4178] codecs: Documentation Inconsistency

2008-10-23 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: Fixed in r67005 (trunk) and r67006 (pk3k). -- resolution: -> fixed status: open -> closed ___ Python tracker <[EMAIL PROTECTED]> <http://bugs

[issue4178] codecs: Documentation Inconsistency

2008-10-23 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: I agree that the documentation should be fixed to read "encode/decode" instead of "encoder/decoder". ___ Python tracker <[EMAIL PROTECTED]> &

[issue3739] unicode-internal encoder reports wrong length

2008-08-30 Thread Walter Dörwald
New submission from Walter Dörwald <[EMAIL PROTECTED]>: The encoder for the "unicode-internal" codec reports the wrong length: Python 3.0b3+ (py3k, Aug 30 2008, 11:55:21) [GCC 4.0.1 (Apple Inc. build 5484)] on darwin Type "help", "copyright", "c

[issue701743] Reloading pseudo modules

2008-06-23 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: AFAIK reload() is gone in 3.0 anyway, so I don't think this patch is relevant any longer. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.py

[issue1706460] access to unicodedata (via codepoints or 2-char surrogates)

2008-06-03 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: Fixed for 3.0 in r63918 ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1706460> ___ __

[issue1706460] access to unicodedata (via codepoints or 2-char surrogates)

2008-06-02 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: Fixed for 2.6 in r63899. -- nosy: +doerwalter resolution: -> fixed status: open -> closed ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.py

[issue1328] Force BOM option in UTF output.

2008-03-22 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: Oops, that code was supposed to read: import codecs def search_function(name): if name == "myutf8": utf8 = codecs.lookup("utf-8") utf8_sig = codecs.lookup("utf-8-sig") retur

[issue1328] Force BOM option in UTF output.

2008-03-22 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: If you want to use UTF-8-sig for decoding and UTF-8 for encoding and have this available as one codec you can define your owen codec for this: import codecs def search_function(name): if name == "myutf8": utf8

[issue1477] UnicodeDecodeError that cannot be caught in narrow unicode builds

2008-03-22 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: The patch looks goog to me now. Go ahead and check it in. -- assignee: doerwalter -> amaury.forgeotdarc __ Tracker <[EMAIL PROTECTED]> <http://bugs.py

[issue1477] UnicodeDecodeError that cannot be caught in narrow unicode builds

2008-03-20 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: For a wide build, the code if (x <= 0x) *p++ = (Py_UNICODE) x; else { *p++ = (Py_UNIC0DE) x; looks strange. Furthermore with the patch applied Python no longer complains about ill

[issue1328] Force BOM option in UTF output.

2008-03-20 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: I don't see exactly what James is proposing. > For my needs, I would like the decoding parts of the utf_8 module > to treat an initial BOM as an optional signature and skip it if > there is one (just like the utf_8_sig

[issue1399] XML codec

2008-03-17 Thread Walter Dörwald
Walter Dörwald <[EMAIL PROTECTED]> added the comment: There was resistance in python-dev against this patch (see the thread at http://mail.python.org/pipermail/python-dev/2007-November/075138.html), so this issue should probably closed as rejected. However there was consensus,

[issue2018] TextCalendar.formatmonth is not influenced by setfirstweekday

2008-02-07 Thread Walter Dörwald
Walter Dörwald added the comment: You're supposed to use firstweekday as a property instead of using the getter method getfirstweekday(). Anyway this is fixed now in r60651 (trunk) and r60652 (release25-maint) -- resolution: accepted -> fixed status: open -

[issue2018] TextCalendar.formatmonth is not influenced by setfirstweekday

2008-02-07 Thread Walter Dörwald
Walter Dörwald added the comment: The doccumentation is here:http://docs.python.org/dev/library/calendar.html#calendar.TextCalendar.formatmonth (or in Doc/library/calendar.rst in the source). Anyway the first of those documentation bugs is fixed now in r60649 (trunk) and r60650 (release25-maint

[issue2018] TextCalendar.formatmonth is not influenced by setfirstweekday

2008-02-06 Thread Walter Dörwald
Walter Dörwald added the comment: setfirstweekday() isn't supposed to have any influence on calendar objects created explicitely. The function setfirstweekday() is only for the module level interface. The documentation is wrong here. However you *can* change the first weekday wit

[issue2017] Calendar.yeardatescalendar etc. do not take 'month' argument

2008-02-06 Thread Walter Dörwald
Walter Dörwald added the comment: Fixed in r60618 (trunk) and r60619 (release25-maint) -- nosy: +doerwalter resolution: -> fixed status: open -> closed __ Tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue1521] string.decode() fails on long strings

2007-11-29 Thread Walter Dörwald
Walter Dörwald added the comment: Can you attach a (small) example that demonstrates the bug? -- nosy: +doerwalter __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/

[issue1444] utf_8_sig streamreader bug, patch, and test

2007-11-19 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in your change and the test as r59049 (trunk) and r59050 (2.5). Thanks for the patch. -- resolution: -> fixed status: open -> closed __ Tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue1328] feature request: force BOM option

2007-11-15 Thread Walter Dörwald
Walter Dörwald added the comment: > For utf16, (arguably) a missing BOM should merely assume machian endianess. > For utf_16_le, utf_16_be input, both should accept & discard a BOM. > On output, I'm not sure; maybe all should write a BOM unless passed a flag > signifying no

[issue1328] feature request: force BOM option

2007-11-15 Thread Walter Dörwald
Walter Dörwald added the comment: jgsack wrote: > > If codec utf_8 or utf_8_sig were to accept input with or without the > 3-byte BOM, and write it as currently specified without/with the BOM > respectively, then _I_ can reread again with either utf_8 or utf_8_sig. That's

[issue1427] Error in standard module calendar

2007-11-12 Thread Walter Dörwald
Walter Dörwald added the comment: Fixed in r58942 (trunk) and r58943 (2.5). Closing the issue. -- resolution: -> fixed status: open -> closed __ Tracker <[EMAIL PROTECTED]> <http://bugs.pytho

[issue1399] XML codec

2007-11-08 Thread Walter Dörwald
Walter Dörwald added the comment: OK, I've changed the name of the codec to xml_auto_detect and added support for EBCDIC. Added file: http://bugs.python.org/file8717/diff2.txt __ Tracker <[EMAIL PROTECTED]> <http://bugs.python

[issue1399] XML codec

2007-11-07 Thread Walter Dörwald
Walter Dörwald added the comment: "xml-auto-detect" sounds OK to me, it even makes sense for the encoder, because it normally detects the encoding to use for writing from the XML declaration. We could put "xml-auto-detect" into the alias mapping and keep xml as the module na

[issue1125] bytes.split shold have same interface as str.split, or different name

2007-09-07 Thread Walter Dörwald
Walter Dörwald added the comment: Because it's not clear whether b'\xa0' *is* whitespace or not. Bytes have no meaning, characters do. -- nosy: +doerwalter __ Tracker <[EMAIL PROTECTED]> <http://b

[issue1046] HTMLCalendar.formatyearpage not behaving as documented

2007-08-28 Thread Walter Dörwald
Walter Dörwald added the comment: Fixed in r57620 -- nosy: +doerwalter resolution: -> fixed status: open -> closed __ Tracker <[EMAIL PROTECTED]> <http://bugs.pytho

<    1   2