Walter Dörwald added the comment:
Then the solution should simply be to use "utf-8-sig" as the encoding,
instead of "utf-8".
--
___
Python tracker
<http://bu
Walter Dörwald added the comment:
http://docs.python.org/library/csv.html#module-csv states:
This version of the csv module doesn’t support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe
Walter Dörwald added the comment:
Here is a stacktrace of the crash with the system Python 2.6.1 on Mac OS
X 10.6.1:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00010100
0x7fff810f96b8 in XML_SetEncoding ()
(gdb) bt
#0
Changes by Walter Dörwald :
--
nosy: -doerwalter
___
Python tracker
<http://bugs.python.org/issue2636>
___
___
Python-bugs-list mailing list
Unsubscribe:
Walter Dörwald added the comment:
Here is a new version that includes a new function scriptl() that
returns the script name in lowercase.
--
Added file: http://bugs.python.org/file14418/unicode-script-3.diff
___
Python tracker
<h
Walter Dörwald added the comment:
I was comparing apples and oranges: The 229 entries for the trunk where
for an UCS2 build (the patched version was UCS4), with UCS4 there are
317 entries for the trunk.
size unicodedata.o gives:
__TEXT __DATA __OBJC others dec hex
13622 587057 0
Changes by Walter Dörwald :
Added file: http://bugs.python.org/file14356/unicode-script-2.diff
___
Python tracker
<http://bugs.python.org/issue6331>
___
___
Python-bug
Walter Dörwald added the comment:
Martin v. Löwis wrote:
> Martin v. Löwis added the comment:
>
> I think the patch is incorrect: the default value for the script
> property ought to be Unknown, not Common (despite UCD.html saying the
> contrary; see UTR#24 and Scripts.txt).
Walter Dörwald added the comment:
http://bugs.python.org/6331 is a patch that adds unicode script info to
the unicode database.
--
nosy: +doerwalter
___
Python tracker
<http://bugs.python.org/issue2
New submission from Walter Dörwald :
This patch adds a function unicodedata.script() that returns information
about the script of the Unicode character.
--
components: Unicode
files: unicode-script.diff
keywords: patch
messages: 89642
nosy: doerwalter
severity: normal
status: open
title
Walter Dörwald added the comment:
AFAICR the difference is: 2.x may return any object in getstate(), but
py3k must return a (buffered input, integer) tuple. Simply moving py3ks
getstate/setstate implementation over to 2.x might do the trick
Walter Dörwald added the comment:
This was done because the codec state is part of the return value of
tell(). To have a reasonable return value (i.e. one with just the
position itself) in as many cases as possible it makes sense to design
the codec state in such a way, that the most common
Walter Dörwald added the comment:
Any custom mapping class should have a repr test anyway, so IMHO it
doesn't matter whether the base test has a repr test or not.
The suggested fixes for TestMappingProtocol.test_fromkeys() and
TestHashMappingProtocol.test_mutatingiteration() sound OK ho
Walter Dörwald added the comment:
The patch no longer applies cleanly to the trunk.
--
nosy: +doerwalter
___
Python tracker
<http://bugs.python.org/issue1
Walter Dörwald added the comment:
Checked in:
r72404,72406 (trunk)
r72408 (py3k)
As IMHO this is somewhat between a feature and a bugfix, I didn't check
it into release26-maint and release30-maint.
--
resolution: -> fixed
status: open -
Walter Dörwald added the comment:
Checked in:
r72260 (trunk)
r72262 (release26-maint)
r72265 (py3k)
r72266 (release30-maint)
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/
Walter Dörwald added the comment:
This is not a bug in Python.
In Python 3.0 "print" is a function, so
print buildConnectionString(myParams)
should read
print(buildConnectionString(myParams))
Closing as invalid.
--
nosy: +doerwalter
resolution: -> invalid
Changes by Walter Dörwald :
--
assignee: doerwalter -> loewis
___
Python tracker
<http://bugs.python.org/issue5828>
___
___
Python-bugs-list mailing list
Un
Walter Dörwald added the comment:
BTW, are the steps to regenerate the Unicode database documented
somewhere? What I did was:
cp /Volumes/ftp.unicode.org/Public/5.1.0/ucd/UnicodeData.txt .
cp /Volumes/ftp.unicode.org/Public/5.1.0/ucd/CompositionExclusions.txt .
cp /Volumes/ftp.unicode.org
Walter Dörwald added the comment:
Checked in:
r71896 (py3k)
r71897 (release30-maint)
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/
Walter Dörwald added the comment:
Checked in:
r71894 (trunk)
r71895 (release26-maint)
--
___
Python tracker
<http://bugs.python.org/issue5828>
___
___
Python-bug
Walter Dörwald added the comment:
I've merged your version of the patch with my changes to the test suite
and regenerated the Unicode database. Attached is the resulting patch
(diff4.txt)
--
Added file: http://bugs.python.org/file13768/diff
Walter Dörwald added the comment:
Checked in:
r71875 (trunk)
r71876 (release26-maint)
r71881 (py3k)
r71885 (release30-maint)
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/
Walter Dörwald added the comment:
You're right checking both in unset() and __exit__() fixes the importlib
failures. I'll check in the fix.
--
___
Python tracker
<http://bugs.python.
Walter Dörwald added the comment:
This might have something to do with the _keymap hook.
--
___
Python tracker
<http://bugs.python.org/issue5837>
___
___
Pytho
Walter Dörwald added the comment:
OK, I'll remove the clear method (which is a new feature) and then check
it in.
--
___
Python tracker
<http://bugs.python.org/i
Walter Dörwald added the comment:
That's exactly what I was thinking too. Here's the patch. Running the
test suite now.
--
Added file: http://bugs.python.org/file13764/diff2.txt
___
Python tracker
<http://bugs.python.
Walter Dörwald added the comment:
If you want to restore only those environment variables that have change
you somehow have to record which *do* have changed, i.e. you'd have to
go through EnvironmentVarGuard again. I'm working on a patch that
Walter Dörwald added the comment:
Here's a patch that changes EnvironmentVarGuard to make a copy of
os.environ at the start. The set and unset methods are useless now, but
I left them in for backwards compatibility. Should they be removed?
--
Added file: http://bugs.pytho
New submission from Walter Dörwald :
support.EnvironmentVarGuard seems to be broken:
import os
from test import support
print(os.environ.get("HOME"))
with support.EnvironmentVarGuard() as env:
env.unset("HOME")
env.set("HOME", "foo")
print(os.e
Walter Dörwald added the comment:
Hmm, EnvironmentVarGuard seems to be broken:
import os
from test import support
with support.EnvironmentVarGuard() as env:
env.unset("HOME")
env.set("HOME", "bar")
print(os.environ.get("HOME"))
I would have
Walter Dörwald added the comment:
There's an EnvironmentVarGuard context manager in support.py that IMHO
should be used for recording changes to the environment variables. Or a
new context manager that does what your patch does could be put into
support.py. There might be other tests
Walter Dörwald added the comment:
Here is a third version of the patch. AFAICT the logic of the unicode
database is as follows:
* If the NODELTA_MASK is not set, delta is an offset.
* If NODELTA_MASK is set and delta is != 0, delta is the
upper/lower/title case character.
* If NODELTA_MASK is
Walter Dörwald added the comment:
Updated the patch (diff2.txt) as requested by Amaury.
--
Added file: http://bugs.python.org/file13759/diff2.txt
___
Python tracker
<http://bugs.python.org/issue5
Walter Dörwald added the comment:
The following patch fixes the problem for me, however it breaks the test
suite. The change seems to have been introduced in r66362.
Assigning to Martin.
--
assignee: -> loewis
nosy: +loewis
stage: -> patch review
Added file: http://bugs.pyth
Walter Dörwald added the comment:
It *does* return u'\u1d79' for me on Python 2.5.2:
>>> u'\u1d79'.lower()
u'\u1d79'
>>> import sys
>>> sys.version
'2.5.2 (r252:60911, Apr 8 2008, 18:54:00) \n[GCC 3.3.5 (Debian
1:3.3.5-13)]
Walter Dörwald added the comment:
The problem with your patch is that it calls PyUnicode_DecodeUTF8()
twice. It would be better if step 1 in the code would include the %s
format specifiers and step 3 would then call PyUnicode_DecodeUTF8() and
put the result into the callresults buffer.
BTW, I
New submission from Walter Dörwald :
This patchs makes it possible to use tabs for indenting the output of
json.dumps(). With this patch the indent argument can now be either an
integer specifying the number of spaces per indent level or a string
specifying the indent string directly
Walter Dörwald added the comment:
test_quopri has a decorator that calls a test using both the C and
Python version of the tested function. This decorator looks like this:
def withpythonimplementation(testfunc):
def newtest(self):
# Test default implementation
testfunc(self
Walter Dörwald added the comment:
Indeed this patch does fix the bug. Go ahead and check it in.
--
___
Python tracker
<http://bugs.python.org/issue5640>
___
___
Walter Dörwald added the comment:
I can confirm this problem in the current version in the py3k branch.
This seems to be a problem in the CJK codecs. Assigning to Hye Shik Chang.
--
assignee: -> hyeshik.chang
nosy: +doerwalter, hyeshik.chang
stage: -> needs
Walter Dörwald added the comment:
The patch doesn't include any changes to the documentation.
--
nosy: +doerwalter
___
Python tracker
<http://bugs.python.org/i
Walter Dörwald added the comment:
It does indeed work with Python 2.6 (however not with 2.5). Closing.
--
resolution: -> out of date
status: open -> closed
___
Python tracker
<http://bugs.python.org/iss
Walter Dörwald added the comment:
The patch looks fine to me. Tests pass.
I have no opinion about the name. Both "simplegeneric" and "generic" are
OK to me.
I wonder if being able to use register() directly instead of as a
decorator should be dropped.
Also IMHO the
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
Fixed in r67005 (trunk) and r67006 (pk3k).
--
resolution: -> fixed
status: open -> closed
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
I agree that the documentation should be fixed to read "encode/decode"
instead of "encoder/decoder".
___
Python tracker <[EMAIL PROTECTED]>
&
New submission from Walter Dörwald <[EMAIL PROTECTED]>:
The encoder for the "unicode-internal" codec reports the wrong length:
Python 3.0b3+ (py3k, Aug 30 2008, 11:55:21)
[GCC 4.0.1 (Apple Inc. build 5484)] on darwin
Type "help", "copyright", "c
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
AFAIK reload() is gone in 3.0 anyway, so I don't think this patch is
relevant any longer.
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.py
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
Fixed for 3.0 in r63918
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1706460>
___
__
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
Fixed for 2.6 in r63899.
--
nosy: +doerwalter
resolution: -> fixed
status: open -> closed
___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.py
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
Oops, that code was supposed to read:
import codecs
def search_function(name):
if name == "myutf8":
utf8 = codecs.lookup("utf-8")
utf8_sig = codecs.lookup("utf-8-sig")
retur
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
If you want to use UTF-8-sig for decoding and UTF-8 for encoding and
have this available as one codec you can define your owen codec for this:
import codecs
def search_function(name):
if name == "myutf8":
utf8
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
The patch looks goog to me now. Go ahead and check it in.
--
assignee: doerwalter -> amaury.forgeotdarc
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.py
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
For a wide build, the code
if (x <= 0x)
*p++ = (Py_UNICODE) x;
else {
*p++ = (Py_UNIC0DE) x;
looks strange.
Furthermore with the patch applied Python no longer complains about
ill
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
I don't see exactly what James is proposing.
> For my needs, I would like the decoding parts of the utf_8 module
> to treat an initial BOM as an optional signature and skip it if
> there is one (just like the utf_8_sig
Walter Dörwald <[EMAIL PROTECTED]> added the comment:
There was resistance in python-dev against this patch (see the thread at
http://mail.python.org/pipermail/python-dev/2007-November/075138.html),
so this issue should probably closed as rejected.
However there was consensus,
Walter Dörwald added the comment:
You're supposed to use firstweekday as a property instead of using the
getter method getfirstweekday(). Anyway this is fixed now in r60651
(trunk) and r60652 (release25-maint)
--
resolution: accepted -> fixed
status: open -
Walter Dörwald added the comment:
The doccumentation is
here:http://docs.python.org/dev/library/calendar.html#calendar.TextCalendar.formatmonth
(or in Doc/library/calendar.rst in the source).
Anyway the first of those documentation bugs is fixed now in r60649
(trunk) and r60650 (release25-maint
Walter Dörwald added the comment:
setfirstweekday() isn't supposed to have any influence on calendar
objects created explicitely. The function setfirstweekday() is only for
the module level interface. The documentation is wrong here. However you
*can* change the first weekday wit
Walter Dörwald added the comment:
Fixed in r60618 (trunk) and r60619 (release25-maint)
--
nosy: +doerwalter
resolution: -> fixed
status: open -> closed
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Walter Dörwald added the comment:
Can you attach a (small) example that demonstrates the bug?
--
nosy: +doerwalter
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/
Walter Dörwald added the comment:
Checked in your change and the test as r59049 (trunk) and r59050 (2.5).
Thanks for the patch.
--
resolution: -> fixed
status: open -> closed
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Walter Dörwald added the comment:
> For utf16, (arguably) a missing BOM should merely assume machian
endianess.
> For utf_16_le, utf_16_be input, both should accept & discard a BOM.
> On output, I'm not sure; maybe all should write a BOM unless passed a flag
> signifying no
Walter Dörwald added the comment:
jgsack wrote:
>
> If codec utf_8 or utf_8_sig were to accept input with or without the
> 3-byte BOM, and write it as currently specified without/with the BOM
> respectively, then _I_ can reread again with either utf_8 or utf_8_sig.
That's
Walter Dörwald added the comment:
Fixed in r58942 (trunk) and r58943 (2.5). Closing the issue.
--
resolution: -> fixed
status: open -> closed
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
Walter Dörwald added the comment:
OK, I've changed the name of the codec to xml_auto_detect and added
support for EBCDIC.
Added file: http://bugs.python.org/file8717/diff2.txt
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python
Walter Dörwald added the comment:
"xml-auto-detect" sounds OK to me, it even makes sense for the encoder,
because it normally detects the encoding to use for writing from the XML
declaration.
We could put "xml-auto-detect" into the alias mapping and keep xml as
the module na
Walter Dörwald added the comment:
Because it's not clear whether b'\xa0' *is* whitespace or not. Bytes
have no meaning, characters do.
--
nosy: +doerwalter
__
Tracker <[EMAIL PROTECTED]>
<http://b
Walter Dörwald added the comment:
Fixed in r57620
--
nosy: +doerwalter
resolution: -> fixed
status: open -> closed
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.pytho
101 - 169 of 169 matches
Mail list logo