[issue17404] ValueError: can't have unbuffered text I/O for io.open(1, 'wt', 0)

2013-03-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 - it won't work for reading: TextIOWrapper calls the read1() method, which is 
 only defined by BufferedIO objects.

Since 3.3 TextIOWrapper works with raw IO objects (issue12591).

 Yes. And I just noticed that the _io module (the C version) will also buffer 
 encoded bytes, up to f._CHUNK_SIZE.

Use write_through=True to disable this.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17404
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17299] Test cPickle with real files

2013-03-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm a little polished the patch before committing. Thank you for the patch, 
Aman Shah.

--
resolution:  - fixed
stage: commit review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1285086] urllib.quote is too slow

2013-03-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Sorry, I perhaps missed your response, Senthil. Now committed and closed again.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1285086
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17016] _sre: avoid relying on pointer overflow

2013-03-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Of course it would be nice to have the tests for so much cases as possible, but 
I am afraid that it will not be easy. The patch LGTM.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17016
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13056] test_multibytecodec.py:TestStreamWriter is skipped after PEP393

2013-03-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I think these tests have no sense after PEP393. They tests that StreamWriter 
works with non-BMP characters broken inside surrogate pair. I.e. 
c.write(s[:i]); c.write(s[i:]) always is same as c.write(s), even if i breaks s 
inside a surrogate pair. This case is impossible after PEP393.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13056
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1243730] Big speedup in email message parsing

2013-03-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Test fails with stack overflow:

==
ERROR: test_pushCR_LF (email.test.test_email.TestIterators)
FeedParser BufferedSubFile.push() assumed it received complete
--
Traceback (most recent call last):
  File /home/serhiy/py/cpython2.7/Lib/email/test/test_email.py, line 2585, in 
test_pushCR_LF
bsf.push(il)
  File /home/serhiy/py/cpython2.7/Lib/email/feedparser.py, line 140, in push
parts = _splitlines(data)
  File /home/serhiy/py/cpython2.7/Lib/email/feedparser.py, line 170, in 
_splitlines
lines.extend(_splitlines(part))
...
  File /home/serhiy/py/cpython2.7/Lib/email/feedparser.py, line 170, in 
_splitlines
lines.extend(_splitlines(part))
RuntimeError: maximum recursion depth exceeded

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1243730
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17440] Some IO related problems on x86 windows

2013-03-16 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
components: +IO
nosy: +benjamin.peterson, hynek, pitrou, stutzbach

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17440
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1159051] Handle corrupted gzip files with unexpected EOF

2013-03-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

tuned_gzip does dangerous things, it overloads private methods of GzipFile.

From Bazaar 2.3 Release Notes:

* Stop using ``bzrlib.tuned_gzip.GzipFile``. It is incompatible with
  python-2.7 and was only used for Knit format repositories, which haven't
  been recommended since 2007. The file itself will be removed in the next
  release. (John Arbash Meinel)

Current version is 2.6b2. bzrlib.tuned_gzip.GzipFile should be removed two 
releases ago.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1159051
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17441] Do not cache re.compile

2013-03-16 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

Ezio proposed in issue16389 to not cache re.compile. Caching of re.compile has 
no sense and only pollutes the cache.

--
components: Library (Lib), Regular Expressions
messages: 184354
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: needs patch
status: open
title: Do not cache re.compile
type: enhancement
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17441
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17441] Do not cache re.compile

2013-03-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch.

--
keywords: +patch
stage: needs patch - patch review
Added file: http://bugs.python.org/file29429/re_compile_nocache.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17441
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17415] Clarify docs of os.path.normpath()

2013-03-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

os.path.normpath() works not only with strings but with bytes objects too.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17415
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17447] str.identifier shouldn't accept Python keywords

2013-03-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Hmm. I were going to use this method for re's named group (see issue14462). 
There is a possibility that some third-party code uses it for checking on 
general Unicode-aware identifiers. The language specifification says that 
keywords is a subset of identifiers. However in most places in stdlib 
(collections.namedtuple, unittest.mock, inspect.Parameter) 
is_usable_identifier() should be used instead of isidentifier().

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17447
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17299] Test cPickle with real files

2013-03-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
resolution: fixed - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17299] Test cPickle with real files

2013-03-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm not sure what is wrong and can't check on Windows, but it is possible that 
this patch fixes tests. Please check it if you can.

--
Added file: http://bugs.python.org/file29433/test_cpickle_fileio.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17299] Test cPickle with real files

2013-03-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Oh, yes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17299] Test cPickle with real files

2013-03-18 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file29433/test_cpickle_fileio.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17299] Test cPickle with real files

2013-03-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Benjamin has fixed this in the changeset 6aab72424063.

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17460] Remove the strict and related params completely removing the 0.9 support

2013-03-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

May be in 3.4 an exception should be raised? HTTPConnection('python.org', 80, 
False) now silently returns wrong result.

--
components: +Library (Lib)
nosy: +serhiy.storchaka
stage:  - patch review
type:  - enhancement
versions: +Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17460
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17397] ttk::themes missing from ttk.py

2013-03-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This looks similar to issue16809 and requires a similar solution.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17397
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17433] stdlib generator-like iterators don't forward send/throw

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

This was proposed before (see issue16150) and was rejected after discussing on 
Python-ideas.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17433
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17433] stdlib generator-like iterators don't forward send/throw

2013-03-19 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +rhettinger
type:  - enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17433
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17478] Tkinter's split() inconsistent for bytes and unicode strings

2013-03-19 Thread Serhiy Storchaka

New submission from Serhiy Storchaka:

Tkinter's split() recursive splits bytes but not unicode strings.

 from tkinter import *
 t = Tcl()
 t.tk.split((b'a 2',))
(('a', '2'),)
 t.tk.split(('a 2',))
('a 2',)

--
components: Tkinter, Unicode
messages: 184622
nosy: ezio.melotti, gpolo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Tkinter's split() inconsistent for bytes and unicode strings
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17478
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16809] Tk 8.6.0 introduces TypeError. (Tk 8.5.13 works)

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch which add support of Tcl_Obj to tkinter's splitlist(). This not 
only fixes some incompatibility with Tk 8.6, but can fix some issues with older 
Tk versions (see for example issue17397).

--
keywords: +patch
nosy: +gpolo
stage:  - patch review
versions: +Python 3.2
Added file: http://bugs.python.org/file29477/tkinter_splitlist.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17460] Remove the strict and related params completely removing the 0.9 support

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I do not understand what is bad in converting parameters after removed 'strict' 
to be keyword-only.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17460
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13477] tarfile module should have a command line

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Note that --create command should support --directory option too.

 Modern tar programs don't need to be told the compression method--they infer 
 it.  If they can do it in C, we can do it in Python.  So we should simply 
 omit the -bz2 stuff.

An archive may have no extension or have a nonstandard extension. And 
stdin/stdout does not have a name.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13477
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14010] deeply nested filter segfaults

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm trying to solve this issue (it seemed easy), but the bug is worse than 
expected. Python crashed even without iteration at all.

it = 'abracadabra'
for _ in range(100):
it = filter(bool, it)

del it

And fixing a recursive deallocator is more harder than iterator.

What can we do if a deallocator raises RuntimeError due to maximum recursion 
depth exceeded.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14010
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14010] deeply nested filter segfaults

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you. Now I understand why this issue not happened with containers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14010
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14010] deeply nested filter segfaults

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch which adds recursion limit checks to builtin and itertools 
recursive iterators.

--
components: +Extension Modules
keywords: +patch
nosy: +rhettinger
stage: needs patch - patch review
Added file: http://bugs.python.org/file29483/iter_recursion.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14010
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2518] smtpd.py to handle huge email

2013-03-19 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
versions: +Python 3.4 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2518
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1159051] Handle corrupted gzip files with unexpected EOF

2013-03-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I will be offline some time. Feel free to revert these changes in 2.7-3.3 if it 
is necessary.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1159051
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14313] zipfile should raise an exception for unsupported compression methods

2012-05-14 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Modified patch adopted in 3.3 (changeset 596b0eaeece8), therefore the current 
patch only applies to 3.2 and 2.7. If this is a new feature, the issue can be 
closed.

--
nosy: +loewis, storchaka
versions:  -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14313
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14315] zipfile.ZipFile() unable to open zip File

2012-05-14 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 This is definitely *not* a padding issue.

This is definitely a padding issue. All uncompressed files are located
so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486
+30+17+3=27536, etc). This is, probably, allows the use of mmap for the
resources.

 As Martin pointed out, the standard says that things must be in 
 multiples of 4-bytes.

More precisely, the extra field must have at least 4-bytes length to fit
a header. The standard is insufficiently defined in terms of what would
happen if the rest of the field is less than 4 bytes (this is hidden
behind by ellipsis).

   So the record is non-portable.

De jure the record is non-portable. De facto the record is portable
(many other tools supports it). But even if it does not portable, we are
dealing with the expansion of the zip format, which is very easy support
for reading.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14315
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14624] Faster utf-16 decoder

2012-05-14 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

The patch updated with a little clarified code and added comments.

--
Added file: http://bugs.python.org/file25590/decode_utf16_4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14624
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14315] zipfile.ZipFile() unable to open zip File

2012-05-14 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 That can't possibly be the reason. mmap requires 4k (4096) alignment (on 
 x86; more than that on SPARC).

This may be the reason to mmap the entire file and then read aligned
binary data.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14315
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14674] Add link to RFC 4627 from json documentation

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 for key, value in pairs:
 if key in pairs:

if key in obj:?

--
title: Link to  explain deviations from RFC 4627 in json module docs - Add 
link to RFC 4627 from json documentation

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14674
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14674] Add link to RFC 4627 from json documentation

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

IMHO, it would be sufficient to have a simple bullet list of differences
and notes or warnings in places where Python can generate non-standard
JSON (top-level scalars, inf and nan, non-utf8 encoded strings).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14674
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14811] compile fails - UTF-8 character decoding

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

I can reproduce it on Linux. Minimal example:

$ ./python -c open('longline.py', 'w').write('#' + repr('\u00A1' * 4096) + 
'\n')
$ ./python longline.py
  File longline.py, line 1
SyntaxError: Non-UTF-8 code starting with '\xc2' in file longline.py on line 1, 
but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14811
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14811] compile fails - UTF-8 character decoding

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

And for Python 2.7 too.

--
versions: +Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14811
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14811] compile fails - UTF-8 character decoding

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Function decoding_fgets (Parser/tokenizer.c) reads line in buffer of fixed size 
8192 (line truncated to size 8191) and then fails because line is cut in the 
middle of a multibyte UTF-8 character.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14811
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14811] Syntax error on long UTF-8 lines

2012-05-15 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
title: compile fails - UTF-8 character decoding - Syntax error on long UTF-8 
lines

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14811
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14803] Add feature to allow code execution prior to __main__ invocation

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

For faulthandler and coverage would be more convenient option -M (run
module with __name__='__premain__' (or something of the sort) and
continue command line processing).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14803
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14777] Tkinter clipboard_get() decodes characters incorrectly

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 ...And mere minutes after I said I hadn't heard anything, I've got the 
 confirmation email. :-)

Congratulations!

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14777
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14624] Faster utf-16 decoder

2012-05-15 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Here are two new patch. Checking for characters out-of-range moved,
making the code simpler. Theoretically it is a bit slow down decoding of
short UCS1 strings (up to 1 and 3 chars on 32- and 64-bit), but
practically there is no difference. The second patch is different from
the first patch that masks are not calculated and specified explicitly.
I am not sure that it improves readability. The commiter has the choice.

--
Added file: http://bugs.python.org/file25601/decode_utf16_5.patch
Added file: http://bugs.python.org/file25602/decode_utf16_6.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14624
___diff -r 492e6c6a01bb Objects/stringlib/codecs.h
--- a/Objects/stringlib/codecs.hTue May 15 15:30:25 2012 +0200
+++ b/Objects/stringlib/codecs.hWed May 16 00:26:02 2012 +0300
@@ -215,7 +215,6 @@
 goto Return;
 }
 
-#undef LONG_PTR_MASK
 #undef ASCII_CHAR_MASK
 
 
@@ -415,4 +414,152 @@
 #undef MAX_SHORT_UNICHARS
 }
 
+/* The pattern for constructing UCS2-repeated masks. */
+#if SIZEOF_LONG == 8
+# define UCS2_REPEAT_MASK 0x0001000100010001ul
+#elif SIZEOF_LONG == 4
+# define UCS2_REPEAT_MASK 0x00010001ul
+#else
+# error C 'long' size should be either 4 or 8!
+#endif
+
+/* The mask for fast checking. */
+#if STRINGLIB_SIZEOF_CHAR == 1
+/* The mask for fast checking of whether a C 'long' contains a
+   non-ASCII or non-Latin1 UTF16-encoded characters. */
+# define FAST_CHAR_MASK (UCS2_REPEAT_MASK * (0xu  
~STRINGLIB_MAX_CHAR))
+#else
+/* The mask for fast checking of whether a C 'long' may contain
+   UTF16-encoded surrogate characters. This is an efficient heuristic,
+   assuming that non-surrogate characters with a code point = 0x8000 are
+   rare in most input.
+*/
+# define FAST_CHAR_MASK (UCS2_REPEAT_MASK * 0x8000u)
+#endif
+/* The mask for fast byte-swapping. */
+#define STRIPPED_MASK   (UCS2_REPEAT_MASK * 0x00FFu)
+/* Swap bytes. */
+#define SWAB(value) value)  8)  STRIPPED_MASK) | \
+ (((value)  STRIPPED_MASK)  8))
+
+Py_LOCAL_INLINE(Py_UCS4)
+STRINGLIB(utf16_decode)(const unsigned char **inptr, const unsigned char *e,
+STRINGLIB_CHAR *dest, Py_ssize_t *outpos,
+int native_ordering)
+{
+Py_UCS4 ch;
+const unsigned char *aligned_end =
+(const unsigned char *) ((size_t) e  ~LONG_PTR_MASK);
+const unsigned char *q = *inptr;
+STRINGLIB_CHAR *p = dest + *outpos;
+/* Offsets from q for retrieving byte pairs in the right order. */
+#ifdef BYTEORDER_IS_LITTLE_ENDIAN
+int ihi = !!native_ordering, ilo = !native_ordering;
+#else
+int ihi = !native_ordering, ilo = !!native_ordering;
+#endif
+--e;
+
+while (q  e) {
+Py_UCS4 ch2;
+/* First check for possible aligned read of a C 'long'. Unaligned
+   reads are more expensive, better to defer to another iteration. */
+if (!((size_t) q  LONG_PTR_MASK)) {
+/* Fast path for runs of in-range non-surrogate chars. */
+register const unsigned char *_q = q;
+while (_q  aligned_end) {
+unsigned long block = * (unsigned long *) _q;
+if (native_ordering) {
+/* Can use buffer directly */
+if (block  FAST_CHAR_MASK)
+break;
+}
+else {
+/* Need to byte-swap */
+if (block  SWAB(FAST_CHAR_MASK))
+break;
+#if STRINGLIB_SIZEOF_CHAR == 1
+block = 8;
+#else
+block = SWAB(block);
+#endif
+}
+#ifdef BYTEORDER_IS_LITTLE_ENDIAN
+# if SIZEOF_LONG == 4
+p[0] = (STRINGLIB_CHAR)(block  0xu);
+p[1] = (STRINGLIB_CHAR)(block  16);
+# elif SIZEOF_LONG == 8
+p[0] = (STRINGLIB_CHAR)(block  0xu);
+p[1] = (STRINGLIB_CHAR)((block  16)  0xu);
+p[2] = (STRINGLIB_CHAR)((block  32)  0xu);
+p[3] = (STRINGLIB_CHAR)(block  48);
+# endif
+#else
+# if SIZEOF_LONG == 4
+p[0] = (STRINGLIB_CHAR)(block  16);
+p[1] = (STRINGLIB_CHAR)(block  0xu);
+# elif SIZEOF_LONG == 8
+p[0] = (STRINGLIB_CHAR)(block  48);
+p[1] = (STRINGLIB_CHAR)((block  32)  0xu);
+p[2] = (STRINGLIB_CHAR)((block  16)  0xu);
+p[3] = (STRINGLIB_CHAR)(block  0xu);
+# endif
+#endif
+_q += SIZEOF_LONG;
+p += SIZEOF_LONG / 2;
+}
+q = _q;
+if (q = e)
+break;
+}
+
+ch = (q[ihi]  8) | q[ilo];
+q += 2;
+if (!Py_UNICODE_IS_SURROGATE(ch)) {
+#if STRINGLIB_SIZEOF_CHAR

[issue14692] json.loads parse_constant callback not working anymore

2012-05-16 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 I'm afraid I have to close this one as rejected. It works as documented and 
 it's unlikely we'll decide to change it back. I'm sorry.

It does not work as documented. The proposed patch fixes the
documentation.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14692
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14313] zipfile should raise an exception for unsupported compression methods

2012-05-16 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 I still like NotImplementedError more than RuntimeError, though.

Well. here are patches for Python 3.2 and 2.7 (backported changeset
596b0eaeece8 + part of changeset fccdcd83708a).

--
Added file: 
http://bugs.python.org/file25618/zipfile_unsupported_compression-3.2.patch
Added file: 
http://bugs.python.org/file25619/zipfile_unsupported_compression-2.7.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14313
___diff -r 13900edf13be Lib/test/test_zipfile.py
--- a/Lib/test/test_zipfile.py  Wed May 16 15:01:40 2012 +0200
+++ b/Lib/test/test_zipfile.py  Wed May 16 23:00:01 2012 +0300
@@ -922,6 +922,17 @@
 caught.
 self.assertRaises(RuntimeError, zipfile.ZipFile, TESTFN, w, -1)
 
+def test_unsupported_compression(self):
+# data is declared as shrunk, but actually deflated
+data = (b'PK\x03\x04.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00'
+b'\x00\x02\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00x\x03\x00PK\x01'
+b'\x02.\x03.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00\x00\x02\x00\x00'
+b'\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
+b'\x80\x01\x00\x00\x00\x00xPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00'
+b'/\x00\x00\x00!\x00\x00\x00\x00\x00')
+with zipfile.ZipFile(io.BytesIO(data), 'r') as zipf:
+self.assertRaises(NotImplementedError, zipf.open, 'x')
+
 def test_null_byte_in_filename(self):
 Check that a filename containing a null byte is properly
 terminated.
diff -r 13900edf13be Lib/zipfile.py
--- a/Lib/zipfile.pyWed May 16 15:01:40 2012 +0200
+++ b/Lib/zipfile.pyWed May 16 23:00:01 2012 +0300
@@ -461,6 +461,28 @@
 self._UpdateKeys(c)
 return c
 
+
+compressor_names = {
+0: 'store',
+1: 'shrink',
+2: 'reduce',
+3: 'reduce',
+4: 'reduce',
+5: 'reduce',
+6: 'implode',
+7: 'tokenize',
+8: 'deflate',
+9: 'deflate64',
+10: 'implode',
+12: 'bzip2',
+14: 'lzma',
+18: 'terse',
+19: 'lz77',
+97: 'wavpack',
+98: 'ppmd',
+}
+
+
 class ZipExtFile(io.BufferedIOBase):
 File-like object for reading an archive member.
Is returned by ZipFile.open().
@@ -487,6 +509,12 @@
 
 if self._compress_type == ZIP_DEFLATED:
 self._decompressor = zlib.decompressobj(-15)
+elif self._compress_type != ZIP_STORED:
+descr = compressor_names.get(self._compress_type)
+if descr:
+raise NotImplementedError(compression type %d (%s) % 
(self._compress_type, descr))
+else:
+raise NotImplementedError(compression type %d % 
(self._compress_type,))
 self._unconsumed = b''
 
 self._readbuffer = b''
diff -r e957b93571a8 Lib/test/test_zipfile.py
--- a/Lib/test/test_zipfile.py  Wed May 16 15:01:40 2012 +0200
+++ b/Lib/test/test_zipfile.py  Wed May 16 23:03:30 2012 +0300
@@ -859,6 +859,17 @@
 caught.
 self.assertRaises(RuntimeError, zipfile.ZipFile, TESTFN, w, -1)
 
+def test_unsupported_compression(self):
+# data is declared as shrunk, but actually deflated
+data = (b'PK\x03\x04.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00'
+b'\x00\x02\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00x\x03\x00PK\x01'
+b'\x02.\x03.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00\x00\x02\x00\x00'
+b'\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
+b'\x80\x01\x00\x00\x00\x00xPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00'
+b'/\x00\x00\x00!\x00\x00\x00\x00\x00')
+with zipfile.ZipFile(io.BytesIO(data), 'r') as zipf:
+self.assertRaises(NotImplementedError, zipf.open, 'x')
+
 def test_null_byte_in_filename(self):
 Check that a filename containing a null byte is properly
 terminated.
diff -r e957b93571a8 Lib/zipfile.py
--- a/Lib/zipfile.pyWed May 16 15:01:40 2012 +0200
+++ b/Lib/zipfile.pyWed May 16 23:03:30 2012 +0300
@@ -461,6 +461,28 @@
 self._UpdateKeys(c)
 return c
 
+
+compressor_names = {
+0: 'store',
+1: 'shrink',
+2: 'reduce',
+3: 'reduce',
+4: 'reduce',
+5: 'reduce',
+6: 'implode',
+7: 'tokenize',
+8: 'deflate',
+9: 'deflate64',
+10: 'implode',
+12: 'bzip2',
+14: 'lzma',
+18: 'terse',
+19: 'lz77',
+97: 'wavpack',
+98: 'ppmd',
+}
+
+
 class ZipExtFile(io.BufferedIOBase):
 File-like object for reading an archive member.
Is returned by ZipFile.open().
@@ -485,6 +507,12 @@
 
 if self._compress_type == ZIP_DEFLATED:
 self._decompressor = zlib.decompressobj(-15)
+elif self._compress_type != ZIP_STORED:
+descr = compressor_names.get(self._compress_type)
+if descr:
+raise

[issue13031] small speed-up for tarfile.py when unzipping tarballs

2012-05-16 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Justin, perhaps of interest to the patch would be better if you provide any 
microbenchmark.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13031
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3931] codecs.charmap_build is untested and undocumented

2012-05-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
versions: +Python 3.3 -Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3931
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3931] codecs.charmap_build is untested and undocumented

2012-05-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
versions: +Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3931
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-17 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Looks like issue14738 fixes this bug for Python 3.3.

 print(ascii(b\xc2\x41\x42.decode('utf8', 'replace')))
'\ufffdAB'
 print(ascii(b\xf1ABCD.decode('utf8', 'replace')))
'\ufffdABCD'

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-17 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 The only issue left was about the number of U+FFFD generated with invalid 
 sequences in some cases.
 My last patch has extensive tests for this, so you could try to apply it (or 
 copy the tests) and see if they all pass.

Tests fails, but I'm not sure that the tests are correct.

b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
continuation byte'. This is terminological issue.

b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I
don't think that is right.

--
title: str.decode('utf8',   'replace') -- conformance with Unicode 5.2.0 - 
str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-17 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 I think that one U+FFFD is correct.  The on;y error is a premature end of
 data.

I poorly expressed. I also think that there is only one decoding error,
and not two. I think the test is wrong.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-17 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 This might be just because it first checks if there two more bytes before 
 checking if they are valid, but 'invalid continuation byte' works too.

Yes, this implementation detail. It is much easier and faster. Whether
it is necessary to change it?

 Why not?

May be I'm wrong. I looked in The Unicode Standard, Version
6.0 (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), pp. 95-97,
the standard does not categorical in this, but recommends that only
maximal subpart should be replaced by U+FFFD. \xe0\x80 is not maximal
subpart. Therefore, there must be two U+FFFD. In this case, the previous
and the current implementation does not conform to the standard.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-17 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 Changing from 'unexpected end of data' to 'invalid continuation byte' for 
 b'\xe0\x00' is fine with me, but this will be a (minor) deviation from 2.7, 
 3.1, 3.2, and pypy (it could still be changed on all these except 3.1 though).

I probably poorly said. Past and current implementations raise
'unexpected end of data' and not 'invalid continuation byte'. Test
expects 'invalid continuation byte'.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-17 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 I don't remember all the details right now, but it that test was passing with 
 my patch there must be something wrong somewhere (either in the patch, in the 
 test, or in our understanding of the standard).

No, test correctly expects two U+FFFD. Current implementation is wrong.
As I understand now, what's the error, I'll try to correct Python 3.3
implementation.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-05-18 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Anyone can review the patch?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14850] The inconsistency of codecs.charmap_decode

2012-05-18 Thread Serhiy Storchaka

New submission from Serhiy Storchaka storch...@gmail.com:

codecs.charmap_decode behaves differently with native and user string as decode 
table.

 import codecs
 print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
 class S(str): pass
... 
 print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE'
('\ufffe', 1)

It's because charmap decoder (function PyUnicode_DecodeCharmap in 
Objects/unicodeobject.c) uses different algorithms for exact strings and for 
other.

We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 
'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 
'replace', {0:0xFFFE})`?

--
components: Interpreter Core
messages: 161054
nosy: storchaka
priority: normal
severity: normal
status: open
title: The inconsistency of codecs.charmap_decode
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14850
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14624] Faster utf-16 decoder

2012-05-19 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Thank you, Antoine. Now only issue14625 waits for review.

 changeset:   77012:3430d7329a3b
 +* UTF-8 and UTF-16 decoding is now 2x to 4x faster.

In fact now UTF-16 decoding faster for a maximum of +25% compared to Python 3.2 
on my computers (and sometimes a little slower yet). 2x to 4x it is faster 
compared to former slow-downed Python 3.3 (thanks to PEP 393).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14624
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1767933] Badly formed XML using etree and utf-16

2012-05-20 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Here is updated patch, with tests and support of objects with only 'write' 
method.

--
Added file: http://bugs.python.org/file25652/etree_write_utf16_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14868] Allow log calls to return True for code optimization.

2012-05-21 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

assert logging.debug(This is a test.) or True

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14868
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14469] Python 3 documentation links

2012-05-21 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

http://permalink.gmane.org/gmane.comp.python.devel/132675

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14469
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14874] Faster charmap decoding

2012-05-21 Thread Serhiy Storchaka

New submission from Serhiy Storchaka storch...@gmail.com:

Charmap decoders are not as important as UTF decoders, but are still widely 
used. In Python 3.3 with PEP 393 they slowed down 4x. The proposed patch 
restores the performance.

Optimized only the most common case, when the decoder is specified by the UCS2 
table with length = 256. Map-based decoders translated to table-based. UCS1 
tables widened to UCS2 by adding 257th fake characters.

Benchmark results:

 3.2   3.3(vanilla)  3.3(patched)

cp1251'A'*1  111 (+10%)31 (+294%)122
cp1251'\xa0'*1   111 (+8%) 29 (+314%)120
cp1251'\u0402'*1 111 (+6%) 25 (+372%)118

--
components: Interpreter Core, Unicode
files: decode_charmap.patch
keywords: patch
messages: 161301
nosy: ezio.melotti, haypo, lemburg, pitrou, storchaka
priority: normal
severity: normal
status: open
title: Faster charmap decoding
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file25664/decode_charmap.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14874
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14874] Faster charmap decoding

2012-05-21 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25665/charmapdecodebench.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14874
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14874] Faster charmap decoding

2012-05-21 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25666/bench-diff.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14874
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14744] Use _PyUnicodeWriter API in str.format() internals

2012-05-24 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 For Python 3.3, _PyUnicodeWriter API is faster than the Py_UCS4 buffer API 
 and PyAccu API in quite all cases, with a speedup between 30% and 100%. But 
 there are some cases where the _PyUnicodeWriter API is slower:

Perhaps most of these problems can be solved if instead of the boolean
flag (overallocate/no overallocate) to use the Py_ssize_t parameter that
indicates by how much should you overallocate (it is the length of the
suffix in the format).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14897] struct.pack raises unexpected error message

2012-05-24 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Funny. struct.pack(fmt, args...) is just an alias to 
struct.Struct(fmt).pack(args...). The error message should be changed to 
explicitly state that we are talking about the data for packing, and not about 
the arguments of function. Or should remove mention of the number of arguments 
at all (leave only too much or too little).

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14897
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14897] struct.pack raises unexpected error message

2012-05-24 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 It might help if the error message also stated how many arguments were 
 actually received, like the TypeError message already does for bad function / 
 method calls.  E.g., 
 
 struct.error: pack expected 2 items for packing (got 1)

Yes, this would be useful. But seldom implemented.

Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: not enough arguments for format string
 '%s %s'%(123,456,789)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: not all arguments converted during string formatting

struct.pack also inconsistent in other error messages.

Traceback (most recent call last):
  File stdin, line 1, in module
struct.error: argument for 's' must be a bytes object
 struct.pack('i', '123')
Traceback (most recent call last):
  File stdin, line 1, in module
struct.error: required argument is not an integer

For s is mentioned format, and for i no. It would be helpful to
mention also the number of the item.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14897
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-25 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Here is a patch for 3.3. All of the tests pass successfully. Unfortunately, it 
is a little slow, but I tried to minimize the losses.

--
Added file: http://bugs.python.org/file25709/issue8271-3.3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14920] help(urllib.parse) fails when LANG=C

2012-05-25 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
versions: +Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14920
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-26 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Here are the benchmark results (numbers are speed, MB/s).

On 32-bit Linux, AMD Athlon 64 X2:

  vanilla  patched

utf-8 'A'*1   2016 (+5%)   2111
utf-8 '\x80'*1383 (+9%)416
utf-8   '\x80'+'A'*   1283 (+1%)   1301
utf-8 '\u0100'*1  383 (-8%)354
utf-8   '\u0100'+'A'* 1258 (-6%)   1184
utf-8   '\u0100'+'\x80'*  383 (-8%)354
utf-8 '\u8000'*1  434 (-11%)   388
utf-8   '\u8000'+'A'* 1262 (-6%)   1180
utf-8   '\u8000'+'\x80'*  383 (-8%)354
utf-8   '\u8000'+'\u0100'*383 (-8%)354
utf-8 '\U0001'*1  358 (+1%)361
utf-8   '\U0001'+'A'* 1168 (-5%)   1104
utf-8   '\U0001'+'\x80'*  382 (-20%)   307
utf-8   '\U0001'+'\u0100'*382 (-20%)   307
utf-8   '\U0001'+'\u8000'*404 (-10%)   365

On 32-bit Linux, Intel Atom N570:

  vanilla  patched

ascii 'A'*1   789 (+1%)800

latin1'A'*1   796 (-2%)781
latin1'A'*+'\x80' 779 (+1%)789
latin1'\x80'*11739 (-3%)   1690
latin1  '\x80'+'A'*   1747 (+1%)   1773

utf-8 'A'*1   623 (+1%)631
utf-8 '\x80'*1145 (+14%)   165
utf-8   '\x80'+'A'*   354 (+1%)358
utf-8 '\u0100'*1  164 (-5%)156
utf-8   '\u0100'+'A'* 343 (+2%)350
utf-8   '\u0100'+'\x80'*  164 (-4%)157
utf-8 '\u8000'*1  175 (-5%)166
utf-8   '\u8000'+'A'* 349 (+2%)356
utf-8   '\u8000'+'\x80'*  164 (-4%)157
utf-8   '\u8000'+'\u0100'*164 (-4%)157
utf-8 '\U0001'*1  152 (+7%)163
utf-8   '\U0001'+'A'* 313 (+6%)332
utf-8   '\U0001'+'\x80'*  161 (-13%)   140
utf-8   '\U0001'+'\u0100'*161 (-14%)   139
utf-8   '\U0001'+'\u8000'*160 (-1%)159

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka

New submission from Serhiy Storchaka storch...@gmail.com:

As strange as it may seem, but using a simple trick was made UTF-8 decoding 
even more speed up.

Here are the benchmark results.

On 32-bit Linux, AMD Athlon 64 X2:

  vanilla  patched

utf-8 'A'*1   2061 (+3%)   2115
utf-8 '\x80'*1383 (-7%)355
utf-8   '\x80'+'A'*   1273 (+1%)   1290
utf-8 '\u0100'*1  382 (+47%)   562
utf-8   '\u0100'+'A'* 1239 (+1%)   1253
utf-8   '\u0100'+'\x80'*  383 (+47%)   562
utf-8 '\u8000'*1  434 (-6%)409
utf-8   '\u8000'+'A'* 1245 (+1%)   1256
utf-8   '\u8000'+'\x80'*  382 (+47%)   560
utf-8   '\u8000'+'\u0100'*383 (+44%)   553
utf-8 '\U0001'*1  358 (+4%)373
utf-8   '\U0001'+'A'* 1171 (+0%)   1176
utf-8   '\U0001'+'\x80'*  381 (+44%)   548
utf-8   '\U0001'+'\u0100'*381 (+44%)   548
utf-8   '\U0001'+'\u8000'*404 (+0%)406

On 32-bit Linux, Intel Atom N570:

  vanilla  patched

utf-8 'A'*1   623 (+0%)626
utf-8 '\x80'*1145 (+15%)   167
utf-8   '\x80'+'A'*   354 (+2%)362
utf-8 '\u0100'*1  164 (+10%)   181
utf-8   '\u0100'+'A'* 343 (-0%)342
utf-8   '\u0100'+'\x80'*  164 (+11%)   182
utf-8 '\u8000'*1  175 (+5%)183
utf-8   '\u8000'+'A'* 349 (+0%)349
utf-8   '\u8000'+'\x80'*  164 (+11%)   182
utf-8   '\u8000'+'\u0100'*164 (+10%)   181
utf-8 '\U0001'*1  152 (+11%)   168
utf-8   '\U0001'+'A'* 313 (+0%)313
utf-8   '\U0001'+'\x80'*  161 (+11%)   179
utf-8   '\U0001'+'\u0100'*161 (+11%)   179
utf-8   '\U0001'+'\u8000'*160 (+11%)   177

--
components: Interpreter Core, Unicode
files: decode_utf8_signed_byte.patch
keywords: patch
messages: 161652
nosy: Arfrever, ezio.melotti, haypo, janssen, jcea, loewis, mark.dickinson, 
ned.deily, pitrou, python-dev, ronaldoussoren, storchaka
priority: normal
severity: normal
status: open
title: Even faster UTF-8 decoding
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file25717/decode_utf8_signed_byte.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14923
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25718/decodebench.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14923
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25719/bench-diff.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14923
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2012-05-26 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Fortunately, issue14923 (if accepted) will compensate for the slowdown.

On 32-bit Linux, AMD Athlon 64 X2:

  vanilla  old patchfast patch

utf-8 'A'*1   2016 (+3%)   2111 (-2%)   2072
utf-8 '\x80'*1383 (+19%)   416 (+9%)454
utf-8   '\x80'+'A'*   1283 (-7%)   1301 (-9%)   1190
utf-8 '\u0100'*1  383 (+46%)   354 (+58%)   560
utf-8   '\u0100'+'A'* 1258 (-1%)   1184 (+5%)   1244
utf-8   '\u0100'+'\x80'*  383 (+46%)   354 (+58%)   558
utf-8 '\u8000'*1  434 (+6%)388 (+19%)   461
utf-8   '\u8000'+'A'* 1262 (-1%)   1180 (+5%)   1244
utf-8   '\u8000'+'\x80'*  383 (+46%)   354 (+58%)   559
utf-8   '\u8000'+'\u0100'*383 (+45%)   354 (+57%)   555
utf-8 '\U0001'*1  358 (+5%)361 (+4%)375
utf-8   '\U0001'+'A'* 1168 (-1%)   1104 (+5%)   1159
utf-8   '\U0001'+'\x80'*  382 (+43%)   307 (+78%)   546
utf-8   '\U0001'+'\u0100'*382 (+43%)   307 (+79%)   548
utf-8   '\U0001'+'\u8000'*404 (+13%)   365 (+25%)   458

On 32-bit Linux, Intel Atom N570:

  vanilla  old patchfast patch

utf-8 'A'*1   623 (+1%)631 (+0%)631
utf-8 '\x80'*1145 (+26%)   165 (+11%)   183
utf-8   '\x80'+'A'*   354 (-0%)358 (-1%)353
utf-8 '\u0100'*1  164 (+10%)   156 (+16%)   181
utf-8   '\u0100'+'A'* 343 (+1%)350 (-1%)348
utf-8   '\u0100'+'\x80'*  164 (+10%)   157 (+15%)   181
utf-8 '\u8000'*1  175 (-1%)166 (+5%)174
utf-8   '\u8000'+'A'* 349 (+0%)356 (-2%)349
utf-8   '\u8000'+'\x80'*  164 (+10%)   157 (+15%)   180
utf-8   '\u8000'+'\u0100'*164 (+10%)   157 (+15%)   181
utf-8 '\U0001'*1  152 (+7%)163 (+0%)163
utf-8   '\U0001'+'A'* 313 (+4%)332 (-2%)327
utf-8   '\U0001'+'\x80'*  161 (+11%)   140 (+28%)   179
utf-8   '\U0001'+'\u0100'*161 (+11%)   139 (+28%)   178
utf-8   '\U0001'+'\u8000'*160 (+9%)159 (+9%)174

--
Added file: http://bugs.python.org/file25720/issue8271-3.3-fast.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 It seems the patch relies on a two's complement representation of integers. 
 Mark, do you think that's ok?

Yes, the patch depends on two facts -- 8-bit bytes and a two's
complement representation of integers. That's why I call it a trick.
However, today CPython will not work on other platforms. However, we can
wrap macro definition in #if/#else/#end and provide the traditional form
(but I don't remember how to test a two's complement representation in
compile time).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14923
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14923] Even faster UTF-8 decoding

2012-05-27 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Yes, this is an implementation-dependent behavior (and on the supported 
platforms it is implemented well in a certain way).

However, if the continuation byte check to do the simplest way ((ch) = 0x80  
(ch)  0xC0), this has the same effect (speed up to +45%) on AMD Athlon.

  vanilla  patched

utf-8 'A'*1   2061 (-2%)   2018
utf-8 '\x80'*1383 (+9%)416
utf-8   '\x80'+'A'*   1273 (+3%)   1315
utf-8 '\u0100'*1  382 (+46%)   558
utf-8   '\u0100'+'A'* 1239 (+0%)   1245
utf-8   '\u0100'+'\x80'*  383 (+46%)   558
utf-8 '\u8000'*1  434 (-6%)408
utf-8   '\u8000'+'A'* 1245 (+0%)   1245
utf-8   '\u8000'+'\x80'*  382 (+46%)   556
utf-8   '\u8000'+'\u0100'*383 (+45%)   556
utf-8 '\U0001'*1  358 (+0%)359
utf-8   '\U0001'+'A'* 1171 (-0%)   1170
utf-8   '\U0001'+'\x80'*  381 (+30%)   495
utf-8   '\U0001'+'\u0100'*381 (+30%)   495
utf-8   '\U0001'+'\u8000'*404 (-5%)385

On Intel Atom the results did not change or become a little better.

  vanilla  patched

utf-8 'A'*1   623 (+3%)642
utf-8 '\x80'*1145 (+9%)158
utf-8   '\x80'+'A'*   354 (+4%)367
utf-8 '\u0100'*1  164 (+0%)164
utf-8   '\u0100'+'A'* 343 (+2%)351
utf-8   '\u0100'+'\x80'*  164 (+1%)165
utf-8 '\u8000'*1  175 (-2%)171
utf-8   '\u8000'+'A'* 349 (+3%)359
utf-8   '\u8000'+'\x80'*  164 (+0%)164
utf-8   '\u8000'+'\u0100'*164 (+0%)164
utf-8 '\U0001'*1  152 (-1%)150
utf-8   '\U0001'+'A'* 313 (+2%)319
utf-8   '\U0001'+'\x80'*  161 (+1%)162
utf-8   '\U0001'+'\u0100'*161 (+1%)162
utf-8   '\U0001'+'\u8000'*160 (-2%)156

--
Added file: http://bugs.python.org/file25733/decode_utf8_range_check.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14923
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12716] Reorganize os docs for files/dirs/fds

2012-05-28 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12716
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1470548] Bugfix for #1470540 (XMLGenerator cannot output UTF-16)

2012-05-28 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

See also issue1767933.

Instead of codecs.StreamWriter better to use io.TextIOWrapper, because the 
first is slower and has numerous flaws.

--
nosy: +storchaka
versions: +Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1470548
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2005] posixmodule expects sizeof(pid_t/gid_t/uid_t) = sizeof(long)

2012-05-28 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2005
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2005] posixmodule expects sizeof(pid_t/gid_t/uid_t) = sizeof(long)

2012-05-28 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
versions: +Python 3.3 -Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2005
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13518] configparser can’t read file objects from urlopen

2012-05-28 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Mickey, you can wrap file-like object returned by urlopen with io.TextIOWrapper.

  config = configparser.RawConfigParser()
  config.read_file(io.TextIOWrapper(urlopen(path_config), encoding='utf-8'))

Because there is no bug and new feature is not needed, I believe that this 
issue can be closed.

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13518
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4733] Add a decode to declared encoding version of urlopen to urllib

2012-05-28 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

If you add the encoding parameter, you should also add at least errors and 
newline parameters. And why not just use io.TextIOWrapper?

page.decode_content() bad that compels to read and to decode at once all of the 
data, while io.TextIOWrapper returns a file-like object and allows you to read 
line-by-line or by other pieces.

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4733
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14744] Use _PyUnicodeWriter API in str.format() internals

2012-05-28 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 So, do you have any comment or complain? Or can I commit the patch?

I beg your pardon, I will do a review and additional benchmarks today.

So far away I have to say, it is better to use stringlib approach, than the 
massive macros, which are more difficult to read and edit. However, I will do a 
benchmark to check if we can achieve the same effect with less change code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14744] Use _PyUnicodeWriter API in str.format() internals

2012-05-28 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

I just sent you a patch which does not use any macros or stringlib.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1470548] Bugfix for #1470540 (XMLGenerator cannot output UTF-16)

2012-05-30 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1470548
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1470548] Bugfix for #1470540 (XMLGenerator cannot output UTF-16)

2012-05-30 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Oh, I see XMLGenerator completely outdated. It even has not been ported to 
Python 3. See function _write:

def _write(self, text):
if isinstance(text, str):
self._out.write(text)
else:
self._out.write(text.encode(self._encoding, _error_handling))

In Python 2 there was a choice between bytes and unicode strings. But in Python 
3 encoding never happens.

XMLGenerator does not distinguish between binary and text streams.

Here is a patch that fixes the work of XMLGenerator in Python 3. Unfortunately, 
it is impossible to avoid the loss of backward compatibility. I tried to keep 
the code to work for the most common cases, but some code which worked before 
may break (including I had to correct some tests).

--
Added file: http://bugs.python.org/file25760/XMLGenerator.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1470548
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10376] ZipFile unzip is unbuffered

2012-05-31 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

The patch updated to reflect Martin's stylistic comments.

Sorry for the delay, Martin. I have not received an email with your review from 
2012-05-13, and only today accidentally discovered your comments in Rietveld. 
It seems to have been some bug in Rietveld.

--
Added file: http://bugs.python.org/file25769/zipfile_optimize_read_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10376
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14973] restore python2 unicode literals in ur strings

2012-05-31 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

See issue3665.

--
nosy: +storchaka
title: restore python2 unicode literals in ru strings - restore python2 
unicode literals in ur strings

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14973
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3665] Support \u and \U escapes in regexes

2012-06-01 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

I don't think it is worth to target it for 2.7 and 3.2 (it's new feature, not 
bugfix), but for 3.3 it will be very useful.

Since PEP 393 conversion to the surrogate pairs is no longer relevant.

--
components: +Regular Expressions, Unicode
nosy: +storchaka
type: behavior - enhancement
versions:  -Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3665] Support \u and \U escapes in regexes

2012-06-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25781/re_unicode_escapes.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3665] Support \u and \U escapes in regexes

2012-06-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25782/3665.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3665] Support \u and \U escapes in regexes

2012-06-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file25781/re_unicode_escapes.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3665] Support \u and \U escapes in regexes

2012-06-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file25782/3665.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3665] Support \u and \U escapes in regexes

2012-06-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25783/re_unicode_escapes.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3665] Support \u and \U escapes in regexes

2012-06-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file25784/3665.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14993] GCC error when using unicodeobject.h

2012-06-04 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14993
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14626] os module: use keyword-only arguments for dir_fd and nofollow to reduce function count

2012-06-04 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Well, I'm going to ignore the long lines and documentation.  The patch is 
really big and impressive.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14626
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15026] Faster UTF-16 encoding

2012-06-07 Thread Serhiy Storchaka

New submission from Serhiy Storchaka storch...@gmail.com:

In pair to issue14624 here is a patch than speed up UTF-16 encoding in several 
times. In addition, it fixes an unsafe check of an integer overflow.

Here are the results of benchmarking. See benchmark tools in 
https://bitbucket.org/storchaka/cpython-stuff repository.

On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

Py2.7Py3.2Py3.3patched

457 (+575%)  458 (+573%)  1077 (+186%) 3083   encode  utf-16le  'A'*1
457 (+579%)  493 (+529%)  1084 (+186%) 3102   encode  utf-16le  '\x80'*1
489 (+534%)  458 (+577%)  1081 (+187%) 3102   encode  utf-16le
'\x80'+'A'*
457 (+1261%) 493 (+1161%) 1116 (+457%) 6219   encode  utf-16le  '\u0100'*1
489 (+1266%) 458 (+1358%) 1126 (+493%) 6678   encode  utf-16le
'\u0100'+'A'*
489 (+1263%) 458 (+1355%) 1129 (+490%)    encode  utf-16le
'\u0100'+'\x80'*
457 (+1240%) 493 (+1142%) 1118 (+448%) 6125   encode  utf-16le  '\u8000'*1
489 (+1271%) 458 (+1363%) 1127 (+495%) 6702   encode  utf-16le
'\u8000'+'A'*
489 (+1271%) 458 (+1364%) 1129 (+494%) 6705   encode  utf-16le
'\u8000'+'\x80'*
489 (+1135%) 458 (+1218%) 1136 (+432%) 6038   encode  utf-16le
'\u8000'+'\u0100'*
498 (+128%)  505 (+125%)  630 (+80%)   1137   encode  utf-16le  
'\U0001'*1
489 (+35%)   458 (+44%)   360 (+83%)   659encode  utf-16le
'\U0001'+'A'*
489 (+35%)   458 (+44%)   359 (+84%)   660encode  utf-16le
'\U0001'+'\x80'*
489 (+36%)   458 (+45%)   361 (+84%)   663encode  utf-16le
'\U0001'+'\u0100'*
489 (+36%)   458 (+45%)   361 (+84%)   663encode  utf-16le
'\U0001'+'\u8000'*

447 (+507%)  493 (+450%)  1086 (+150%) 2712   encode  utf-16be  'A'*1
447 (+513%)  493 (+456%)  1080 (+154%) 2739   encode  utf-16be  '\x80'*1
489 (+458%)  458 (+496%)  1079 (+153%) 2729   encode  utf-16be
'\x80'+'A'*
447 (+498%)  494 (+441%)  1118 (+139%) 2672   encode  utf-16be  '\u0100'*1
489 (+464%)  458 (+502%)  1128 (+144%) 2756   encode  utf-16be
'\u0100'+'A'*
489 (+463%)  458 (+502%)  1131 (+144%) 2755   encode  utf-16be
'\u0100'+'\x80'*
447 (+500%)  493 (+444%)  1119 (+139%) 2680   encode  utf-16be  '\u8000'*1
489 (+463%)  458 (+502%)  1126 (+145%) 2755   encode  utf-16be
'\u8000'+'A'*
489 (+464%)  458 (+502%)  1129 (+144%) 2757   encode  utf-16be
'\u8000'+'\x80'*
489 (+479%)  458 (+518%)  1137 (+149%) 2829   encode  utf-16be
'\u8000'+'\u0100'*
499 (+102%)  506 (+99%)   630 (+60%)   1009   encode  utf-16be  
'\U0001'*1
489 (+6%)458 (+13%)   360 (+44%)   519encode  utf-16be
'\U0001'+'A'*
489 (+6%)458 (+13%)   359 (+44%)   518encode  utf-16be
'\U0001'+'\x80'*
489 (+6%)458 (+13%)   361 (+44%)   519encode  utf-16be
'\U0001'+'\u0100'*
489 (+6%)458 (+13%)   361 (+44%)   519encode  utf-16be
'\U0001'+'\u8000'*

--
components: Interpreter Core, Unicode
files: encode-utf16.patch
keywords: patch
messages: 162473
nosy: Arfrever, asvetlov, ezio.melotti, haypo, pitrou, storchaka
priority: normal
severity: normal
status: open
title: Faster UTF-16 encoding
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file25856/encode-utf16.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15026
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15027] Faster UTF-32 encoding

2012-06-07 Thread Serhiy Storchaka

New submission from Serhiy Storchaka storch...@gmail.com:

In pair to issue14625 here is a patch than speed up UTF-32 encoding in several 
times. In addition, it fixes an unsafe check of an integer overflow.

Here are the results of benchmarking. See benchmark tools in 
https://bitbucket.org/storchaka/cpython-stuff repository.

On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

Py2.7Py3.2Py3.3patched

541 (+1032%) 541 (+1032%) 844 (+626%)  6125   encode  utf-32le  'A'*1
543 (+1056%) 541 (+1060%) 844 (+643%)  6275   encode  utf-32le  '\x80'*1
544 (+1010%) 542 (+1014%) 843 (+616%)  6037   encode  utf-32le
'\x80'+'A'*
541 (+799%)  542 (+797%)  764 (+537%)  4864   encode  utf-32le  '\u0100'*1
544 (+781%)  542 (+784%)  767 (+525%)  4793   encode  utf-32le
'\u0100'+'A'*
544 (+789%)  542 (+792%)  766 (+531%)  4834   encode  utf-32le
'\u0100'+'\x80'*
542 (+799%)  541 (+801%)  764 (+538%)  4874   encode  utf-32le  '\u8000'*1
544 (+779%)  542 (+782%)  767 (+523%)  4780   encode  utf-32le
'\u8000'+'A'*
544 (+793%)  542 (+796%)  766 (+534%)  4859   encode  utf-32le
'\u8000'+'\x80'*
544 (+819%)  542 (+823%)  766 (+553%)  5001   encode  utf-32le
'\u8000'+'\u0100'*
430 (+867%)  427 (+874%)  860 (+383%)  4157   encode  utf-32le  
'\U0001'*1
543 (+655%)  543 (+655%)  861 (+376%)  4101   encode  utf-32le
'\U0001'+'A'*
543 (+658%)  543 (+658%)  861 (+378%)  4116   encode  utf-32le
'\U0001'+'\x80'*
543 (+670%)  543 (+670%)  859 (+387%)  4180   encode  utf-32le
'\U0001'+'\u0100'*
543 (+666%)  543 (+666%)  860 (+383%)  4158   encode  utf-32le
'\U0001'+'\u8000'*

541 (+880%)  543 (+876%)  844 (+528%)  5300   encode  utf-32be  'A'*1
541 (+872%)  542 (+870%)  844 (+523%)  5256   encode  utf-32be  '\x80'*1
544 (+843%)  542 (+846%)  843 (+509%)  5130   encode  utf-32be
'\x80'+'A'*
541 (+363%)  542 (+362%)  764 (+228%)  2505   encode  utf-32be  '\u0100'*1
544 (+366%)  542 (+368%)  766 (+231%)  2534   encode  utf-32be
'\u0100'+'A'*
544 (+363%)  542 (+365%)  766 (+229%)  2519   encode  utf-32be
'\u0100'+'\x80'*
542 (+363%)  541 (+364%)  764 (+228%)  2509   encode  utf-32be  '\u8000'*1
544 (+366%)  542 (+368%)  766 (+231%)  2534   encode  utf-32be
'\u8000'+'A'*
544 (+363%)  542 (+364%)  766 (+229%)  2517   encode  utf-32be
'\u8000'+'\x80'*
544 (+372%)  542 (+374%)  766 (+235%)  2568   encode  utf-32be
'\u8000'+'\u0100'*
430 (+428%)  427 (+432%)  860 (+164%)  2270   encode  utf-32be  
'\U0001'*1
543 (+317%)  541 (+318%)  861 (+163%)  2262   encode  utf-32be
'\U0001'+'A'*
543 (+320%)  541 (+321%)  861 (+165%)  2279   encode  utf-32be
'\U0001'+'\x80'*
543 (+322%)  541 (+323%)  859 (+167%)  2290   encode  utf-32be
'\U0001'+'\u0100'*
543 (+322%)  541 (+324%)  860 (+167%)  2292   encode  utf-32be
'\U0001'+'\u8000'*

--
components: Interpreter Core, Unicode
files: encode-utf32.patch
keywords: patch
messages: 162474
nosy: Arfrever, asvetlov, ezio.melotti, haypo, pitrou, storchaka
priority: normal
severity: normal
status: open
title: Faster UTF-32 encoding
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file25857/encode-utf32.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15027
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14850] The inconsistency of codecs.charmap_decode

2012-06-10 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 What is the use case for passing a string subclass to charmap_decode?  Or in 
 other words, how did you stumble upon the bug?

I stumbled upon it, rewriting the charmap decoder (issue14874). Now
charmap decoder processes the two cases -- a more effective case of
string table and a general slower case of general mapping. I proposed a
more optimized case of 256-character UCS2 string (covers all standard
charmap encodings). If processing general strings and maps was
consistent, these cases can be merged. A string subclass is just an
example that illustrates the inconsistency.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14850
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14850] The inconsistency of codecs.charmap_decode

2012-06-10 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

 U+FFFE is documented as representing an undefined mapping,

Yes, using U+FFFE for representing an undefined mapping in strings is
normal, the question was about string subclasses. And if we will correct
it for string subclasses, how far we go any further? How about general
mapping?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14850
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



<    7   8   9   10   11   12   13   14   15   16   >