[issue17404] ValueError: can't have unbuffered text I/O for io.open(1, 'wt', 0)
Serhiy Storchaka added the comment: - it won't work for reading: TextIOWrapper calls the read1() method, which is only defined by BufferedIO objects. Since 3.3 TextIOWrapper works with raw IO objects (issue12591). Yes. And I just noticed that the _io module (the C version) will also buffer encoded bytes, up to f._CHUNK_SIZE. Use write_through=True to disable this. -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17404 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17299] Test cPickle with real files
Serhiy Storchaka added the comment: I'm a little polished the patch before committing. Thank you for the patch, Aman Shah. -- resolution: - fixed stage: commit review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17299 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1285086] urllib.quote is too slow
Serhiy Storchaka added the comment: Sorry, I perhaps missed your response, Senthil. Now committed and closed again. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1285086 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17016] _sre: avoid relying on pointer overflow
Serhiy Storchaka added the comment: Of course it would be nice to have the tests for so much cases as possible, but I am afraid that it will not be easy. The patch LGTM. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17016 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13056] test_multibytecodec.py:TestStreamWriter is skipped after PEP393
Serhiy Storchaka added the comment: I think these tests have no sense after PEP393. They tests that StreamWriter works with non-BMP characters broken inside surrogate pair. I.e. c.write(s[:i]); c.write(s[i:]) always is same as c.write(s), even if i breaks s inside a surrogate pair. This case is impossible after PEP393. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13056 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1243730] Big speedup in email message parsing
Serhiy Storchaka added the comment: Test fails with stack overflow: == ERROR: test_pushCR_LF (email.test.test_email.TestIterators) FeedParser BufferedSubFile.push() assumed it received complete -- Traceback (most recent call last): File /home/serhiy/py/cpython2.7/Lib/email/test/test_email.py, line 2585, in test_pushCR_LF bsf.push(il) File /home/serhiy/py/cpython2.7/Lib/email/feedparser.py, line 140, in push parts = _splitlines(data) File /home/serhiy/py/cpython2.7/Lib/email/feedparser.py, line 170, in _splitlines lines.extend(_splitlines(part)) ... File /home/serhiy/py/cpython2.7/Lib/email/feedparser.py, line 170, in _splitlines lines.extend(_splitlines(part)) RuntimeError: maximum recursion depth exceeded -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1243730 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17440] Some IO related problems on x86 windows
Changes by Serhiy Storchaka storch...@gmail.com: -- components: +IO nosy: +benjamin.peterson, hynek, pitrou, stutzbach ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17440 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1159051] Handle corrupted gzip files with unexpected EOF
Serhiy Storchaka added the comment: tuned_gzip does dangerous things, it overloads private methods of GzipFile. From Bazaar 2.3 Release Notes: * Stop using ``bzrlib.tuned_gzip.GzipFile``. It is incompatible with python-2.7 and was only used for Knit format repositories, which haven't been recommended since 2007. The file itself will be removed in the next release. (John Arbash Meinel) Current version is 2.6b2. bzrlib.tuned_gzip.GzipFile should be removed two releases ago. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1159051 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17441] Do not cache re.compile
New submission from Serhiy Storchaka: Ezio proposed in issue16389 to not cache re.compile. Caching of re.compile has no sense and only pollutes the cache. -- components: Library (Lib), Regular Expressions messages: 184354 nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka priority: normal severity: normal stage: needs patch status: open title: Do not cache re.compile type: enhancement versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17441 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17441] Do not cache re.compile
Serhiy Storchaka added the comment: Here is a patch. -- keywords: +patch stage: needs patch - patch review Added file: http://bugs.python.org/file29429/re_compile_nocache.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17441 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17415] Clarify docs of os.path.normpath()
Serhiy Storchaka added the comment: os.path.normpath() works not only with strings but with bytes objects too. -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17415 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17447] str.identifier shouldn't accept Python keywords
Serhiy Storchaka added the comment: Hmm. I were going to use this method for re's named group (see issue14462). There is a possibility that some third-party code uses it for checking on general Unicode-aware identifiers. The language specifification says that keywords is a subset of identifiers. However in most places in stdlib (collections.namedtuple, unittest.mock, inspect.Parameter) is_usable_identifier() should be used instead of isidentifier(). -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17447 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17299] Test cPickle with real files
Changes by Serhiy Storchaka storch...@gmail.com: -- resolution: fixed - status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17299 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17299] Test cPickle with real files
Serhiy Storchaka added the comment: I'm not sure what is wrong and can't check on Windows, but it is possible that this patch fixes tests. Please check it if you can. -- Added file: http://bugs.python.org/file29433/test_cpickle_fileio.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17299 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17299] Test cPickle with real files
Serhiy Storchaka added the comment: Oh, yes. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17299 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17299] Test cPickle with real files
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file29433/test_cpickle_fileio.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17299 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17299] Test cPickle with real files
Serhiy Storchaka added the comment: Benjamin has fixed this in the changeset 6aab72424063. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17299 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17460] Remove the strict and related params completely removing the 0.9 support
Serhiy Storchaka added the comment: May be in 3.4 an exception should be raised? HTTPConnection('python.org', 80, False) now silently returns wrong result. -- components: +Library (Lib) nosy: +serhiy.storchaka stage: - patch review type: - enhancement versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17460 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17397] ttk::themes missing from ttk.py
Serhiy Storchaka added the comment: This looks similar to issue16809 and requires a similar solution. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17397 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17433] stdlib generator-like iterators don't forward send/throw
Serhiy Storchaka added the comment: This was proposed before (see issue16150) and was rejected after discussing on Python-ideas. -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17433 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17433] stdlib generator-like iterators don't forward send/throw
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +rhettinger type: - enhancement ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17433 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17478] Tkinter's split() inconsistent for bytes and unicode strings
New submission from Serhiy Storchaka: Tkinter's split() recursive splits bytes but not unicode strings. from tkinter import * t = Tcl() t.tk.split((b'a 2',)) (('a', '2'),) t.tk.split(('a 2',)) ('a 2',) -- components: Tkinter, Unicode messages: 184622 nosy: ezio.melotti, gpolo, serhiy.storchaka priority: normal severity: normal status: open title: Tkinter's split() inconsistent for bytes and unicode strings type: behavior versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17478 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16809] Tk 8.6.0 introduces TypeError. (Tk 8.5.13 works)
Serhiy Storchaka added the comment: Here is a patch which add support of Tcl_Obj to tkinter's splitlist(). This not only fixes some incompatibility with Tk 8.6, but can fix some issues with older Tk versions (see for example issue17397). -- keywords: +patch nosy: +gpolo stage: - patch review versions: +Python 3.2 Added file: http://bugs.python.org/file29477/tkinter_splitlist.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17460] Remove the strict and related params completely removing the 0.9 support
Serhiy Storchaka added the comment: I do not understand what is bad in converting parameters after removed 'strict' to be keyword-only. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17460 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13477] tarfile module should have a command line
Serhiy Storchaka added the comment: Note that --create command should support --directory option too. Modern tar programs don't need to be told the compression method--they infer it. If they can do it in C, we can do it in Python. So we should simply omit the -bz2 stuff. An archive may have no extension or have a nonstandard extension. And stdin/stdout does not have a name. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13477 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14010] deeply nested filter segfaults
Serhiy Storchaka added the comment: I'm trying to solve this issue (it seemed easy), but the bug is worse than expected. Python crashed even without iteration at all. it = 'abracadabra' for _ in range(100): it = filter(bool, it) del it And fixing a recursive deallocator is more harder than iterator. What can we do if a deallocator raises RuntimeError due to maximum recursion depth exceeded. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14010 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14010] deeply nested filter segfaults
Serhiy Storchaka added the comment: Thank you. Now I understand why this issue not happened with containers. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14010 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14010] deeply nested filter segfaults
Serhiy Storchaka added the comment: Here is a patch which adds recursion limit checks to builtin and itertools recursive iterators. -- components: +Extension Modules keywords: +patch nosy: +rhettinger stage: needs patch - patch review Added file: http://bugs.python.org/file29483/iter_recursion.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14010 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2518] smtpd.py to handle huge email
Changes by Serhiy Storchaka storch...@gmail.com: -- versions: +Python 3.4 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2518 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1159051] Handle corrupted gzip files with unexpected EOF
Serhiy Storchaka added the comment: I will be offline some time. Feel free to revert these changes in 2.7-3.3 if it is necessary. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1159051 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14313] zipfile should raise an exception for unsupported compression methods
Serhiy Storchaka storch...@gmail.com added the comment: Modified patch adopted in 3.3 (changeset 596b0eaeece8), therefore the current patch only applies to 3.2 and 2.7. If this is a new feature, the issue can be closed. -- nosy: +loewis, storchaka versions: -Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14313 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14315] zipfile.ZipFile() unable to open zip File
Serhiy Storchaka storch...@gmail.com added the comment: This is definitely *not* a padding issue. This is definitely a padding issue. All uncompressed files are located so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486 +30+17+3=27536, etc). This is, probably, allows the use of mmap for the resources. As Martin pointed out, the standard says that things must be in multiples of 4-bytes. More precisely, the extra field must have at least 4-bytes length to fit a header. The standard is insufficiently defined in terms of what would happen if the rest of the field is less than 4 bytes (this is hidden behind by ellipsis). So the record is non-portable. De jure the record is non-portable. De facto the record is portable (many other tools supports it). But even if it does not portable, we are dealing with the expansion of the zip format, which is very easy support for reading. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14315 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14624] Faster utf-16 decoder
Serhiy Storchaka storch...@gmail.com added the comment: The patch updated with a little clarified code and added comments. -- Added file: http://bugs.python.org/file25590/decode_utf16_4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14624 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14315] zipfile.ZipFile() unable to open zip File
Serhiy Storchaka storch...@gmail.com added the comment: That can't possibly be the reason. mmap requires 4k (4096) alignment (on x86; more than that on SPARC). This may be the reason to mmap the entire file and then read aligned binary data. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14315 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14674] Add link to RFC 4627 from json documentation
Serhiy Storchaka storch...@gmail.com added the comment: for key, value in pairs: if key in pairs: if key in obj:? -- title: Link to explain deviations from RFC 4627 in json module docs - Add link to RFC 4627 from json documentation ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14674 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14674] Add link to RFC 4627 from json documentation
Serhiy Storchaka storch...@gmail.com added the comment: IMHO, it would be sufficient to have a simple bullet list of differences and notes or warnings in places where Python can generate non-standard JSON (top-level scalars, inf and nan, non-utf8 encoded strings). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14674 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14811] compile fails - UTF-8 character decoding
Serhiy Storchaka storch...@gmail.com added the comment: I can reproduce it on Linux. Minimal example: $ ./python -c open('longline.py', 'w').write('#' + repr('\u00A1' * 4096) + '\n') $ ./python longline.py File longline.py, line 1 SyntaxError: Non-UTF-8 code starting with '\xc2' in file longline.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14811 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14811] compile fails - UTF-8 character decoding
Serhiy Storchaka storch...@gmail.com added the comment: And for Python 2.7 too. -- versions: +Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14811 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14811] compile fails - UTF-8 character decoding
Serhiy Storchaka storch...@gmail.com added the comment: Function decoding_fgets (Parser/tokenizer.c) reads line in buffer of fixed size 8192 (line truncated to size 8191) and then fails because line is cut in the middle of a multibyte UTF-8 character. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14811 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14811] Syntax error on long UTF-8 lines
Changes by Serhiy Storchaka storch...@gmail.com: -- title: compile fails - UTF-8 character decoding - Syntax error on long UTF-8 lines ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14811 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14803] Add feature to allow code execution prior to __main__ invocation
Serhiy Storchaka storch...@gmail.com added the comment: For faulthandler and coverage would be more convenient option -M (run module with __name__='__premain__' (or something of the sort) and continue command line processing). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14803 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14777] Tkinter clipboard_get() decodes characters incorrectly
Serhiy Storchaka storch...@gmail.com added the comment: ...And mere minutes after I said I hadn't heard anything, I've got the confirmation email. :-) Congratulations! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14777 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14624] Faster utf-16 decoder
Serhiy Storchaka storch...@gmail.com added the comment: Here are two new patch. Checking for characters out-of-range moved, making the code simpler. Theoretically it is a bit slow down decoding of short UCS1 strings (up to 1 and 3 chars on 32- and 64-bit), but practically there is no difference. The second patch is different from the first patch that masks are not calculated and specified explicitly. I am not sure that it improves readability. The commiter has the choice. -- Added file: http://bugs.python.org/file25601/decode_utf16_5.patch Added file: http://bugs.python.org/file25602/decode_utf16_6.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14624 ___diff -r 492e6c6a01bb Objects/stringlib/codecs.h --- a/Objects/stringlib/codecs.hTue May 15 15:30:25 2012 +0200 +++ b/Objects/stringlib/codecs.hWed May 16 00:26:02 2012 +0300 @@ -215,7 +215,6 @@ goto Return; } -#undef LONG_PTR_MASK #undef ASCII_CHAR_MASK @@ -415,4 +414,152 @@ #undef MAX_SHORT_UNICHARS } +/* The pattern for constructing UCS2-repeated masks. */ +#if SIZEOF_LONG == 8 +# define UCS2_REPEAT_MASK 0x0001000100010001ul +#elif SIZEOF_LONG == 4 +# define UCS2_REPEAT_MASK 0x00010001ul +#else +# error C 'long' size should be either 4 or 8! +#endif + +/* The mask for fast checking. */ +#if STRINGLIB_SIZEOF_CHAR == 1 +/* The mask for fast checking of whether a C 'long' contains a + non-ASCII or non-Latin1 UTF16-encoded characters. */ +# define FAST_CHAR_MASK (UCS2_REPEAT_MASK * (0xu ~STRINGLIB_MAX_CHAR)) +#else +/* The mask for fast checking of whether a C 'long' may contain + UTF16-encoded surrogate characters. This is an efficient heuristic, + assuming that non-surrogate characters with a code point = 0x8000 are + rare in most input. +*/ +# define FAST_CHAR_MASK (UCS2_REPEAT_MASK * 0x8000u) +#endif +/* The mask for fast byte-swapping. */ +#define STRIPPED_MASK (UCS2_REPEAT_MASK * 0x00FFu) +/* Swap bytes. */ +#define SWAB(value) value) 8) STRIPPED_MASK) | \ + (((value) STRIPPED_MASK) 8)) + +Py_LOCAL_INLINE(Py_UCS4) +STRINGLIB(utf16_decode)(const unsigned char **inptr, const unsigned char *e, +STRINGLIB_CHAR *dest, Py_ssize_t *outpos, +int native_ordering) +{ +Py_UCS4 ch; +const unsigned char *aligned_end = +(const unsigned char *) ((size_t) e ~LONG_PTR_MASK); +const unsigned char *q = *inptr; +STRINGLIB_CHAR *p = dest + *outpos; +/* Offsets from q for retrieving byte pairs in the right order. */ +#ifdef BYTEORDER_IS_LITTLE_ENDIAN +int ihi = !!native_ordering, ilo = !native_ordering; +#else +int ihi = !native_ordering, ilo = !!native_ordering; +#endif +--e; + +while (q e) { +Py_UCS4 ch2; +/* First check for possible aligned read of a C 'long'. Unaligned + reads are more expensive, better to defer to another iteration. */ +if (!((size_t) q LONG_PTR_MASK)) { +/* Fast path for runs of in-range non-surrogate chars. */ +register const unsigned char *_q = q; +while (_q aligned_end) { +unsigned long block = * (unsigned long *) _q; +if (native_ordering) { +/* Can use buffer directly */ +if (block FAST_CHAR_MASK) +break; +} +else { +/* Need to byte-swap */ +if (block SWAB(FAST_CHAR_MASK)) +break; +#if STRINGLIB_SIZEOF_CHAR == 1 +block = 8; +#else +block = SWAB(block); +#endif +} +#ifdef BYTEORDER_IS_LITTLE_ENDIAN +# if SIZEOF_LONG == 4 +p[0] = (STRINGLIB_CHAR)(block 0xu); +p[1] = (STRINGLIB_CHAR)(block 16); +# elif SIZEOF_LONG == 8 +p[0] = (STRINGLIB_CHAR)(block 0xu); +p[1] = (STRINGLIB_CHAR)((block 16) 0xu); +p[2] = (STRINGLIB_CHAR)((block 32) 0xu); +p[3] = (STRINGLIB_CHAR)(block 48); +# endif +#else +# if SIZEOF_LONG == 4 +p[0] = (STRINGLIB_CHAR)(block 16); +p[1] = (STRINGLIB_CHAR)(block 0xu); +# elif SIZEOF_LONG == 8 +p[0] = (STRINGLIB_CHAR)(block 48); +p[1] = (STRINGLIB_CHAR)((block 32) 0xu); +p[2] = (STRINGLIB_CHAR)((block 16) 0xu); +p[3] = (STRINGLIB_CHAR)(block 0xu); +# endif +#endif +_q += SIZEOF_LONG; +p += SIZEOF_LONG / 2; +} +q = _q; +if (q = e) +break; +} + +ch = (q[ihi] 8) | q[ilo]; +q += 2; +if (!Py_UNICODE_IS_SURROGATE(ch)) { +#if STRINGLIB_SIZEOF_CHAR
[issue14692] json.loads parse_constant callback not working anymore
Serhiy Storchaka storch...@gmail.com added the comment: I'm afraid I have to close this one as rejected. It works as documented and it's unlikely we'll decide to change it back. I'm sorry. It does not work as documented. The proposed patch fixes the documentation. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14692 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14313] zipfile should raise an exception for unsupported compression methods
Serhiy Storchaka storch...@gmail.com added the comment: I still like NotImplementedError more than RuntimeError, though. Well. here are patches for Python 3.2 and 2.7 (backported changeset 596b0eaeece8 + part of changeset fccdcd83708a). -- Added file: http://bugs.python.org/file25618/zipfile_unsupported_compression-3.2.patch Added file: http://bugs.python.org/file25619/zipfile_unsupported_compression-2.7.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14313 ___diff -r 13900edf13be Lib/test/test_zipfile.py --- a/Lib/test/test_zipfile.py Wed May 16 15:01:40 2012 +0200 +++ b/Lib/test/test_zipfile.py Wed May 16 23:00:01 2012 +0300 @@ -922,6 +922,17 @@ caught. self.assertRaises(RuntimeError, zipfile.ZipFile, TESTFN, w, -1) +def test_unsupported_compression(self): +# data is declared as shrunk, but actually deflated +data = (b'PK\x03\x04.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00' +b'\x00\x02\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00x\x03\x00PK\x01' +b'\x02.\x03.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00\x00\x02\x00\x00' +b'\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' +b'\x80\x01\x00\x00\x00\x00xPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00' +b'/\x00\x00\x00!\x00\x00\x00\x00\x00') +with zipfile.ZipFile(io.BytesIO(data), 'r') as zipf: +self.assertRaises(NotImplementedError, zipf.open, 'x') + def test_null_byte_in_filename(self): Check that a filename containing a null byte is properly terminated. diff -r 13900edf13be Lib/zipfile.py --- a/Lib/zipfile.pyWed May 16 15:01:40 2012 +0200 +++ b/Lib/zipfile.pyWed May 16 23:00:01 2012 +0300 @@ -461,6 +461,28 @@ self._UpdateKeys(c) return c + +compressor_names = { +0: 'store', +1: 'shrink', +2: 'reduce', +3: 'reduce', +4: 'reduce', +5: 'reduce', +6: 'implode', +7: 'tokenize', +8: 'deflate', +9: 'deflate64', +10: 'implode', +12: 'bzip2', +14: 'lzma', +18: 'terse', +19: 'lz77', +97: 'wavpack', +98: 'ppmd', +} + + class ZipExtFile(io.BufferedIOBase): File-like object for reading an archive member. Is returned by ZipFile.open(). @@ -487,6 +509,12 @@ if self._compress_type == ZIP_DEFLATED: self._decompressor = zlib.decompressobj(-15) +elif self._compress_type != ZIP_STORED: +descr = compressor_names.get(self._compress_type) +if descr: +raise NotImplementedError(compression type %d (%s) % (self._compress_type, descr)) +else: +raise NotImplementedError(compression type %d % (self._compress_type,)) self._unconsumed = b'' self._readbuffer = b'' diff -r e957b93571a8 Lib/test/test_zipfile.py --- a/Lib/test/test_zipfile.py Wed May 16 15:01:40 2012 +0200 +++ b/Lib/test/test_zipfile.py Wed May 16 23:03:30 2012 +0300 @@ -859,6 +859,17 @@ caught. self.assertRaises(RuntimeError, zipfile.ZipFile, TESTFN, w, -1) +def test_unsupported_compression(self): +# data is declared as shrunk, but actually deflated +data = (b'PK\x03\x04.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00' +b'\x00\x02\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00x\x03\x00PK\x01' +b'\x02.\x03.\x00\x00\x00\x01\x00\xe4C\xa1@\x00\x00\x00\x00\x02\x00\x00' +b'\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' +b'\x80\x01\x00\x00\x00\x00xPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00' +b'/\x00\x00\x00!\x00\x00\x00\x00\x00') +with zipfile.ZipFile(io.BytesIO(data), 'r') as zipf: +self.assertRaises(NotImplementedError, zipf.open, 'x') + def test_null_byte_in_filename(self): Check that a filename containing a null byte is properly terminated. diff -r e957b93571a8 Lib/zipfile.py --- a/Lib/zipfile.pyWed May 16 15:01:40 2012 +0200 +++ b/Lib/zipfile.pyWed May 16 23:03:30 2012 +0300 @@ -461,6 +461,28 @@ self._UpdateKeys(c) return c + +compressor_names = { +0: 'store', +1: 'shrink', +2: 'reduce', +3: 'reduce', +4: 'reduce', +5: 'reduce', +6: 'implode', +7: 'tokenize', +8: 'deflate', +9: 'deflate64', +10: 'implode', +12: 'bzip2', +14: 'lzma', +18: 'terse', +19: 'lz77', +97: 'wavpack', +98: 'ppmd', +} + + class ZipExtFile(io.BufferedIOBase): File-like object for reading an archive member. Is returned by ZipFile.open(). @@ -485,6 +507,12 @@ if self._compress_type == ZIP_DEFLATED: self._decompressor = zlib.decompressobj(-15) +elif self._compress_type != ZIP_STORED: +descr = compressor_names.get(self._compress_type) +if descr: +raise
[issue13031] small speed-up for tarfile.py when unzipping tarballs
Serhiy Storchaka storch...@gmail.com added the comment: Justin, perhaps of interest to the patch would be better if you provide any microbenchmark. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13031 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3931] codecs.charmap_build is untested and undocumented
Changes by Serhiy Storchaka storch...@gmail.com: -- versions: +Python 3.3 -Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3931 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3931] codecs.charmap_build is untested and undocumented
Changes by Serhiy Storchaka storch...@gmail.com: -- versions: +Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3931 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: Looks like issue14738 fixes this bug for Python 3.3. print(ascii(b\xc2\x41\x42.decode('utf8', 'replace'))) '\ufffdAB' print(ascii(b\xf1ABCD.decode('utf8', 'replace'))) '\ufffdABCD' -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: The only issue left was about the number of U+FFFD generated with invalid sequences in some cases. My last patch has extensive tests for this, so you could try to apply it (or copy the tests) and see if they all pass. Tests fails, but I'm not sure that the tests are correct. b'\xe0\x00' raises 'unexpected end of data' and not 'invalid continuation byte'. This is terminological issue. b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I don't think that is right. -- title: str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 - str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: I think that one U+FFFD is correct. The on;y error is a premature end of data. I poorly expressed. I also think that there is only one decoding error, and not two. I think the test is wrong. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: This might be just because it first checks if there two more bytes before checking if they are valid, but 'invalid continuation byte' works too. Yes, this implementation detail. It is much easier and faster. Whether it is necessary to change it? Why not? May be I'm wrong. I looked in The Unicode Standard, Version 6.0 (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), pp. 95-97, the standard does not categorical in this, but recommends that only maximal subpart should be replaced by U+FFFD. \xe0\x80 is not maximal subpart. Therefore, there must be two U+FFFD. In this case, the previous and the current implementation does not conform to the standard. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: Changing from 'unexpected end of data' to 'invalid continuation byte' for b'\xe0\x00' is fine with me, but this will be a (minor) deviation from 2.7, 3.1, 3.2, and pypy (it could still be changed on all these except 3.1 though). I probably poorly said. Past and current implementations raise 'unexpected end of data' and not 'invalid continuation byte'. Test expects 'invalid continuation byte'. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: I don't remember all the details right now, but it that test was passing with my patch there must be something wrong somewhere (either in the patch, in the test, or in our understanding of the standard). No, test correctly expects two U+FFFD. Current implementation is wrong. As I understand now, what's the error, I'll try to correct Python 3.3 implementation. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1767933] Badly formed XML using etree and utf-16
Serhiy Storchaka storch...@gmail.com added the comment: Anyone can review the patch? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1767933 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14850] The inconsistency of codecs.charmap_decode
New submission from Serhiy Storchaka storch...@gmail.com: codecs.charmap_decode behaves differently with native and user string as decode table. import codecs print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE'))) ('\ufffd', 1) class S(str): pass ... print(ascii(codecs.charmap_decode(b'\x00', 'replace', S('\uFFFE' ('\ufffe', 1) It's because charmap decoder (function PyUnicode_DecodeCharmap in Objects/unicodeobject.c) uses different algorithms for exact strings and for other. We need to fix it? If yes, what should return `codecs.charmap_decode(b'\x00', 'replace', {0:'\uFFFE'})`? What should return `codecs.charmap_decode(b'\x00', 'replace', {0:0xFFFE})`? -- components: Interpreter Core messages: 161054 nosy: storchaka priority: normal severity: normal status: open title: The inconsistency of codecs.charmap_decode type: behavior versions: Python 2.7, Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14850 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14624] Faster utf-16 decoder
Serhiy Storchaka storch...@gmail.com added the comment: Thank you, Antoine. Now only issue14625 waits for review. changeset: 77012:3430d7329a3b +* UTF-8 and UTF-16 decoding is now 2x to 4x faster. In fact now UTF-16 decoding faster for a maximum of +25% compared to Python 3.2 on my computers (and sometimes a little slower yet). 2x to 4x it is faster compared to former slow-downed Python 3.3 (thanks to PEP 393). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14624 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1767933] Badly formed XML using etree and utf-16
Serhiy Storchaka storch...@gmail.com added the comment: Here is updated patch, with tests and support of objects with only 'write' method. -- Added file: http://bugs.python.org/file25652/etree_write_utf16_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1767933 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14868] Allow log calls to return True for code optimization.
Serhiy Storchaka storch...@gmail.com added the comment: assert logging.debug(This is a test.) or True -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14868 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14469] Python 3 documentation links
Serhiy Storchaka storch...@gmail.com added the comment: http://permalink.gmane.org/gmane.comp.python.devel/132675 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14469 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14874] Faster charmap decoding
New submission from Serhiy Storchaka storch...@gmail.com: Charmap decoders are not as important as UTF decoders, but are still widely used. In Python 3.3 with PEP 393 they slowed down 4x. The proposed patch restores the performance. Optimized only the most common case, when the decoder is specified by the UCS2 table with length = 256. Map-based decoders translated to table-based. UCS1 tables widened to UCS2 by adding 257th fake characters. Benchmark results: 3.2 3.3(vanilla) 3.3(patched) cp1251'A'*1 111 (+10%)31 (+294%)122 cp1251'\xa0'*1 111 (+8%) 29 (+314%)120 cp1251'\u0402'*1 111 (+6%) 25 (+372%)118 -- components: Interpreter Core, Unicode files: decode_charmap.patch keywords: patch messages: 161301 nosy: ezio.melotti, haypo, lemburg, pitrou, storchaka priority: normal severity: normal status: open title: Faster charmap decoding type: performance versions: Python 3.3 Added file: http://bugs.python.org/file25664/decode_charmap.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14874 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14874] Faster charmap decoding
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25665/charmapdecodebench.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14874 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14874] Faster charmap decoding
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25666/bench-diff.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14874 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14744] Use _PyUnicodeWriter API in str.format() internals
Serhiy Storchaka storch...@gmail.com added the comment: For Python 3.3, _PyUnicodeWriter API is faster than the Py_UCS4 buffer API and PyAccu API in quite all cases, with a speedup between 30% and 100%. But there are some cases where the _PyUnicodeWriter API is slower: Perhaps most of these problems can be solved if instead of the boolean flag (overallocate/no overallocate) to use the Py_ssize_t parameter that indicates by how much should you overallocate (it is the length of the suffix in the format). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14897] struct.pack raises unexpected error message
Serhiy Storchaka storch...@gmail.com added the comment: Funny. struct.pack(fmt, args...) is just an alias to struct.Struct(fmt).pack(args...). The error message should be changed to explicitly state that we are talking about the data for packing, and not about the arguments of function. Or should remove mention of the number of arguments at all (leave only too much or too little). -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14897 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14897] struct.pack raises unexpected error message
Serhiy Storchaka storch...@gmail.com added the comment: It might help if the error message also stated how many arguments were actually received, like the TypeError message already does for bad function / method calls. E.g., struct.error: pack expected 2 items for packing (got 1) Yes, this would be useful. But seldom implemented. Traceback (most recent call last): File stdin, line 1, in module TypeError: not enough arguments for format string '%s %s'%(123,456,789) Traceback (most recent call last): File stdin, line 1, in module TypeError: not all arguments converted during string formatting struct.pack also inconsistent in other error messages. Traceback (most recent call last): File stdin, line 1, in module struct.error: argument for 's' must be a bytes object struct.pack('i', '123') Traceback (most recent call last): File stdin, line 1, in module struct.error: required argument is not an integer For s is mentioned format, and for i no. It would be helpful to mention also the number of the item. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14897 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: Here is a patch for 3.3. All of the tests pass successfully. Unfortunately, it is a little slow, but I tried to minimize the losses. -- Added file: http://bugs.python.org/file25709/issue8271-3.3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14920] help(urllib.parse) fails when LANG=C
Changes by Serhiy Storchaka storch...@gmail.com: -- versions: +Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14920 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: Here are the benchmark results (numbers are speed, MB/s). On 32-bit Linux, AMD Athlon 64 X2: vanilla patched utf-8 'A'*1 2016 (+5%) 2111 utf-8 '\x80'*1383 (+9%)416 utf-8 '\x80'+'A'* 1283 (+1%) 1301 utf-8 '\u0100'*1 383 (-8%)354 utf-8 '\u0100'+'A'* 1258 (-6%) 1184 utf-8 '\u0100'+'\x80'* 383 (-8%)354 utf-8 '\u8000'*1 434 (-11%) 388 utf-8 '\u8000'+'A'* 1262 (-6%) 1180 utf-8 '\u8000'+'\x80'* 383 (-8%)354 utf-8 '\u8000'+'\u0100'*383 (-8%)354 utf-8 '\U0001'*1 358 (+1%)361 utf-8 '\U0001'+'A'* 1168 (-5%) 1104 utf-8 '\U0001'+'\x80'* 382 (-20%) 307 utf-8 '\U0001'+'\u0100'*382 (-20%) 307 utf-8 '\U0001'+'\u8000'*404 (-10%) 365 On 32-bit Linux, Intel Atom N570: vanilla patched ascii 'A'*1 789 (+1%)800 latin1'A'*1 796 (-2%)781 latin1'A'*+'\x80' 779 (+1%)789 latin1'\x80'*11739 (-3%) 1690 latin1 '\x80'+'A'* 1747 (+1%) 1773 utf-8 'A'*1 623 (+1%)631 utf-8 '\x80'*1145 (+14%) 165 utf-8 '\x80'+'A'* 354 (+1%)358 utf-8 '\u0100'*1 164 (-5%)156 utf-8 '\u0100'+'A'* 343 (+2%)350 utf-8 '\u0100'+'\x80'* 164 (-4%)157 utf-8 '\u8000'*1 175 (-5%)166 utf-8 '\u8000'+'A'* 349 (+2%)356 utf-8 '\u8000'+'\x80'* 164 (-4%)157 utf-8 '\u8000'+'\u0100'*164 (-4%)157 utf-8 '\U0001'*1 152 (+7%)163 utf-8 '\U0001'+'A'* 313 (+6%)332 utf-8 '\U0001'+'\x80'* 161 (-13%) 140 utf-8 '\U0001'+'\u0100'*161 (-14%) 139 utf-8 '\U0001'+'\u8000'*160 (-1%)159 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14923] Even faster UTF-8 decoding
New submission from Serhiy Storchaka storch...@gmail.com: As strange as it may seem, but using a simple trick was made UTF-8 decoding even more speed up. Here are the benchmark results. On 32-bit Linux, AMD Athlon 64 X2: vanilla patched utf-8 'A'*1 2061 (+3%) 2115 utf-8 '\x80'*1383 (-7%)355 utf-8 '\x80'+'A'* 1273 (+1%) 1290 utf-8 '\u0100'*1 382 (+47%) 562 utf-8 '\u0100'+'A'* 1239 (+1%) 1253 utf-8 '\u0100'+'\x80'* 383 (+47%) 562 utf-8 '\u8000'*1 434 (-6%)409 utf-8 '\u8000'+'A'* 1245 (+1%) 1256 utf-8 '\u8000'+'\x80'* 382 (+47%) 560 utf-8 '\u8000'+'\u0100'*383 (+44%) 553 utf-8 '\U0001'*1 358 (+4%)373 utf-8 '\U0001'+'A'* 1171 (+0%) 1176 utf-8 '\U0001'+'\x80'* 381 (+44%) 548 utf-8 '\U0001'+'\u0100'*381 (+44%) 548 utf-8 '\U0001'+'\u8000'*404 (+0%)406 On 32-bit Linux, Intel Atom N570: vanilla patched utf-8 'A'*1 623 (+0%)626 utf-8 '\x80'*1145 (+15%) 167 utf-8 '\x80'+'A'* 354 (+2%)362 utf-8 '\u0100'*1 164 (+10%) 181 utf-8 '\u0100'+'A'* 343 (-0%)342 utf-8 '\u0100'+'\x80'* 164 (+11%) 182 utf-8 '\u8000'*1 175 (+5%)183 utf-8 '\u8000'+'A'* 349 (+0%)349 utf-8 '\u8000'+'\x80'* 164 (+11%) 182 utf-8 '\u8000'+'\u0100'*164 (+10%) 181 utf-8 '\U0001'*1 152 (+11%) 168 utf-8 '\U0001'+'A'* 313 (+0%)313 utf-8 '\U0001'+'\x80'* 161 (+11%) 179 utf-8 '\U0001'+'\u0100'*161 (+11%) 179 utf-8 '\U0001'+'\u8000'*160 (+11%) 177 -- components: Interpreter Core, Unicode files: decode_utf8_signed_byte.patch keywords: patch messages: 161652 nosy: Arfrever, ezio.melotti, haypo, janssen, jcea, loewis, mark.dickinson, ned.deily, pitrou, python-dev, ronaldoussoren, storchaka priority: normal severity: normal status: open title: Even faster UTF-8 decoding type: performance versions: Python 3.3 Added file: http://bugs.python.org/file25717/decode_utf8_signed_byte.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14923 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14923] Even faster UTF-8 decoding
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25718/decodebench.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14923 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14923] Even faster UTF-8 decoding
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25719/bench-diff.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14923 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
Serhiy Storchaka storch...@gmail.com added the comment: Fortunately, issue14923 (if accepted) will compensate for the slowdown. On 32-bit Linux, AMD Athlon 64 X2: vanilla old patchfast patch utf-8 'A'*1 2016 (+3%) 2111 (-2%) 2072 utf-8 '\x80'*1383 (+19%) 416 (+9%)454 utf-8 '\x80'+'A'* 1283 (-7%) 1301 (-9%) 1190 utf-8 '\u0100'*1 383 (+46%) 354 (+58%) 560 utf-8 '\u0100'+'A'* 1258 (-1%) 1184 (+5%) 1244 utf-8 '\u0100'+'\x80'* 383 (+46%) 354 (+58%) 558 utf-8 '\u8000'*1 434 (+6%)388 (+19%) 461 utf-8 '\u8000'+'A'* 1262 (-1%) 1180 (+5%) 1244 utf-8 '\u8000'+'\x80'* 383 (+46%) 354 (+58%) 559 utf-8 '\u8000'+'\u0100'*383 (+45%) 354 (+57%) 555 utf-8 '\U0001'*1 358 (+5%)361 (+4%)375 utf-8 '\U0001'+'A'* 1168 (-1%) 1104 (+5%) 1159 utf-8 '\U0001'+'\x80'* 382 (+43%) 307 (+78%) 546 utf-8 '\U0001'+'\u0100'*382 (+43%) 307 (+79%) 548 utf-8 '\U0001'+'\u8000'*404 (+13%) 365 (+25%) 458 On 32-bit Linux, Intel Atom N570: vanilla old patchfast patch utf-8 'A'*1 623 (+1%)631 (+0%)631 utf-8 '\x80'*1145 (+26%) 165 (+11%) 183 utf-8 '\x80'+'A'* 354 (-0%)358 (-1%)353 utf-8 '\u0100'*1 164 (+10%) 156 (+16%) 181 utf-8 '\u0100'+'A'* 343 (+1%)350 (-1%)348 utf-8 '\u0100'+'\x80'* 164 (+10%) 157 (+15%) 181 utf-8 '\u8000'*1 175 (-1%)166 (+5%)174 utf-8 '\u8000'+'A'* 349 (+0%)356 (-2%)349 utf-8 '\u8000'+'\x80'* 164 (+10%) 157 (+15%) 180 utf-8 '\u8000'+'\u0100'*164 (+10%) 157 (+15%) 181 utf-8 '\U0001'*1 152 (+7%)163 (+0%)163 utf-8 '\U0001'+'A'* 313 (+4%)332 (-2%)327 utf-8 '\U0001'+'\x80'* 161 (+11%) 140 (+28%) 179 utf-8 '\U0001'+'\u0100'*161 (+11%) 139 (+28%) 178 utf-8 '\U0001'+'\u8000'*160 (+9%)159 (+9%)174 -- Added file: http://bugs.python.org/file25720/issue8271-3.3-fast.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14923] Even faster UTF-8 decoding
Serhiy Storchaka storch...@gmail.com added the comment: It seems the patch relies on a two's complement representation of integers. Mark, do you think that's ok? Yes, the patch depends on two facts -- 8-bit bytes and a two's complement representation of integers. That's why I call it a trick. However, today CPython will not work on other platforms. However, we can wrap macro definition in #if/#else/#end and provide the traditional form (but I don't remember how to test a two's complement representation in compile time). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14923 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14923] Even faster UTF-8 decoding
Serhiy Storchaka storch...@gmail.com added the comment: Yes, this is an implementation-dependent behavior (and on the supported platforms it is implemented well in a certain way). However, if the continuation byte check to do the simplest way ((ch) = 0x80 (ch) 0xC0), this has the same effect (speed up to +45%) on AMD Athlon. vanilla patched utf-8 'A'*1 2061 (-2%) 2018 utf-8 '\x80'*1383 (+9%)416 utf-8 '\x80'+'A'* 1273 (+3%) 1315 utf-8 '\u0100'*1 382 (+46%) 558 utf-8 '\u0100'+'A'* 1239 (+0%) 1245 utf-8 '\u0100'+'\x80'* 383 (+46%) 558 utf-8 '\u8000'*1 434 (-6%)408 utf-8 '\u8000'+'A'* 1245 (+0%) 1245 utf-8 '\u8000'+'\x80'* 382 (+46%) 556 utf-8 '\u8000'+'\u0100'*383 (+45%) 556 utf-8 '\U0001'*1 358 (+0%)359 utf-8 '\U0001'+'A'* 1171 (-0%) 1170 utf-8 '\U0001'+'\x80'* 381 (+30%) 495 utf-8 '\U0001'+'\u0100'*381 (+30%) 495 utf-8 '\U0001'+'\u8000'*404 (-5%)385 On Intel Atom the results did not change or become a little better. vanilla patched utf-8 'A'*1 623 (+3%)642 utf-8 '\x80'*1145 (+9%)158 utf-8 '\x80'+'A'* 354 (+4%)367 utf-8 '\u0100'*1 164 (+0%)164 utf-8 '\u0100'+'A'* 343 (+2%)351 utf-8 '\u0100'+'\x80'* 164 (+1%)165 utf-8 '\u8000'*1 175 (-2%)171 utf-8 '\u8000'+'A'* 349 (+3%)359 utf-8 '\u8000'+'\x80'* 164 (+0%)164 utf-8 '\u8000'+'\u0100'*164 (+0%)164 utf-8 '\U0001'*1 152 (-1%)150 utf-8 '\U0001'+'A'* 313 (+2%)319 utf-8 '\U0001'+'\x80'* 161 (+1%)162 utf-8 '\U0001'+'\u0100'*161 (+1%)162 utf-8 '\U0001'+'\u8000'*160 (-2%)156 -- Added file: http://bugs.python.org/file25733/decode_utf8_range_check.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14923 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12716] Reorganize os docs for files/dirs/fds
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12716 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1470548] Bugfix for #1470540 (XMLGenerator cannot output UTF-16)
Serhiy Storchaka storch...@gmail.com added the comment: See also issue1767933. Instead of codecs.StreamWriter better to use io.TextIOWrapper, because the first is slower and has numerous flaws. -- nosy: +storchaka versions: +Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1470548 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2005] posixmodule expects sizeof(pid_t/gid_t/uid_t) = sizeof(long)
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2005 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2005] posixmodule expects sizeof(pid_t/gid_t/uid_t) = sizeof(long)
Changes by Serhiy Storchaka storch...@gmail.com: -- versions: +Python 3.3 -Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2005 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13518] configparser can’t read file objects from urlopen
Serhiy Storchaka storch...@gmail.com added the comment: Mickey, you can wrap file-like object returned by urlopen with io.TextIOWrapper. config = configparser.RawConfigParser() config.read_file(io.TextIOWrapper(urlopen(path_config), encoding='utf-8')) Because there is no bug and new feature is not needed, I believe that this issue can be closed. -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13518 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4733] Add a decode to declared encoding version of urlopen to urllib
Serhiy Storchaka storch...@gmail.com added the comment: If you add the encoding parameter, you should also add at least errors and newline parameters. And why not just use io.TextIOWrapper? page.decode_content() bad that compels to read and to decode at once all of the data, while io.TextIOWrapper returns a file-like object and allows you to read line-by-line or by other pieces. -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4733 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14744] Use _PyUnicodeWriter API in str.format() internals
Serhiy Storchaka storch...@gmail.com added the comment: So, do you have any comment or complain? Or can I commit the patch? I beg your pardon, I will do a review and additional benchmarks today. So far away I have to say, it is better to use stringlib approach, than the massive macros, which are more difficult to read and edit. However, I will do a benchmark to check if we can achieve the same effect with less change code. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14744] Use _PyUnicodeWriter API in str.format() internals
Serhiy Storchaka storch...@gmail.com added the comment: I just sent you a patch which does not use any macros or stringlib. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1470548] Bugfix for #1470540 (XMLGenerator cannot output UTF-16)
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1470548 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1470548] Bugfix for #1470540 (XMLGenerator cannot output UTF-16)
Serhiy Storchaka storch...@gmail.com added the comment: Oh, I see XMLGenerator completely outdated. It even has not been ported to Python 3. See function _write: def _write(self, text): if isinstance(text, str): self._out.write(text) else: self._out.write(text.encode(self._encoding, _error_handling)) In Python 2 there was a choice between bytes and unicode strings. But in Python 3 encoding never happens. XMLGenerator does not distinguish between binary and text streams. Here is a patch that fixes the work of XMLGenerator in Python 3. Unfortunately, it is impossible to avoid the loss of backward compatibility. I tried to keep the code to work for the most common cases, but some code which worked before may break (including I had to correct some tests). -- Added file: http://bugs.python.org/file25760/XMLGenerator.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1470548 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10376] ZipFile unzip is unbuffered
Serhiy Storchaka storch...@gmail.com added the comment: The patch updated to reflect Martin's stylistic comments. Sorry for the delay, Martin. I have not received an email with your review from 2012-05-13, and only today accidentally discovered your comments in Rietveld. It seems to have been some bug in Rietveld. -- Added file: http://bugs.python.org/file25769/zipfile_optimize_read_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10376 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14973] restore python2 unicode literals in ur strings
Serhiy Storchaka storch...@gmail.com added the comment: See issue3665. -- nosy: +storchaka title: restore python2 unicode literals in ru strings - restore python2 unicode literals in ur strings ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14973 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3665] Support \u and \U escapes in regexes
Serhiy Storchaka storch...@gmail.com added the comment: I don't think it is worth to target it for 2.7 and 3.2 (it's new feature, not bugfix), but for 3.3 it will be very useful. Since PEP 393 conversion to the surrogate pairs is no longer relevant. -- components: +Regular Expressions, Unicode nosy: +storchaka type: behavior - enhancement versions: -Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3665 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3665] Support \u and \U escapes in regexes
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25781/re_unicode_escapes.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3665 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3665] Support \u and \U escapes in regexes
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25782/3665.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3665 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3665] Support \u and \U escapes in regexes
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file25781/re_unicode_escapes.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3665 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3665] Support \u and \U escapes in regexes
Changes by Serhiy Storchaka storch...@gmail.com: Removed file: http://bugs.python.org/file25782/3665.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3665 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3665] Support \u and \U escapes in regexes
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25783/re_unicode_escapes.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3665 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3665] Support \u and \U escapes in regexes
Changes by Serhiy Storchaka storch...@gmail.com: Added file: http://bugs.python.org/file25784/3665.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3665 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14993] GCC error when using unicodeobject.h
Changes by Serhiy Storchaka storch...@gmail.com: -- nosy: +storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14993 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14626] os module: use keyword-only arguments for dir_fd and nofollow to reduce function count
Serhiy Storchaka storch...@gmail.com added the comment: Well, I'm going to ignore the long lines and documentation. The patch is really big and impressive. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14626 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15026] Faster UTF-16 encoding
New submission from Serhiy Storchaka storch...@gmail.com: In pair to issue14624 here is a patch than speed up UTF-16 encoding in several times. In addition, it fixes an unsafe check of an integer overflow. Here are the results of benchmarking. See benchmark tools in https://bitbucket.org/storchaka/cpython-stuff repository. On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz: Py2.7Py3.2Py3.3patched 457 (+575%) 458 (+573%) 1077 (+186%) 3083 encode utf-16le 'A'*1 457 (+579%) 493 (+529%) 1084 (+186%) 3102 encode utf-16le '\x80'*1 489 (+534%) 458 (+577%) 1081 (+187%) 3102 encode utf-16le '\x80'+'A'* 457 (+1261%) 493 (+1161%) 1116 (+457%) 6219 encode utf-16le '\u0100'*1 489 (+1266%) 458 (+1358%) 1126 (+493%) 6678 encode utf-16le '\u0100'+'A'* 489 (+1263%) 458 (+1355%) 1129 (+490%) encode utf-16le '\u0100'+'\x80'* 457 (+1240%) 493 (+1142%) 1118 (+448%) 6125 encode utf-16le '\u8000'*1 489 (+1271%) 458 (+1363%) 1127 (+495%) 6702 encode utf-16le '\u8000'+'A'* 489 (+1271%) 458 (+1364%) 1129 (+494%) 6705 encode utf-16le '\u8000'+'\x80'* 489 (+1135%) 458 (+1218%) 1136 (+432%) 6038 encode utf-16le '\u8000'+'\u0100'* 498 (+128%) 505 (+125%) 630 (+80%) 1137 encode utf-16le '\U0001'*1 489 (+35%) 458 (+44%) 360 (+83%) 659encode utf-16le '\U0001'+'A'* 489 (+35%) 458 (+44%) 359 (+84%) 660encode utf-16le '\U0001'+'\x80'* 489 (+36%) 458 (+45%) 361 (+84%) 663encode utf-16le '\U0001'+'\u0100'* 489 (+36%) 458 (+45%) 361 (+84%) 663encode utf-16le '\U0001'+'\u8000'* 447 (+507%) 493 (+450%) 1086 (+150%) 2712 encode utf-16be 'A'*1 447 (+513%) 493 (+456%) 1080 (+154%) 2739 encode utf-16be '\x80'*1 489 (+458%) 458 (+496%) 1079 (+153%) 2729 encode utf-16be '\x80'+'A'* 447 (+498%) 494 (+441%) 1118 (+139%) 2672 encode utf-16be '\u0100'*1 489 (+464%) 458 (+502%) 1128 (+144%) 2756 encode utf-16be '\u0100'+'A'* 489 (+463%) 458 (+502%) 1131 (+144%) 2755 encode utf-16be '\u0100'+'\x80'* 447 (+500%) 493 (+444%) 1119 (+139%) 2680 encode utf-16be '\u8000'*1 489 (+463%) 458 (+502%) 1126 (+145%) 2755 encode utf-16be '\u8000'+'A'* 489 (+464%) 458 (+502%) 1129 (+144%) 2757 encode utf-16be '\u8000'+'\x80'* 489 (+479%) 458 (+518%) 1137 (+149%) 2829 encode utf-16be '\u8000'+'\u0100'* 499 (+102%) 506 (+99%) 630 (+60%) 1009 encode utf-16be '\U0001'*1 489 (+6%)458 (+13%) 360 (+44%) 519encode utf-16be '\U0001'+'A'* 489 (+6%)458 (+13%) 359 (+44%) 518encode utf-16be '\U0001'+'\x80'* 489 (+6%)458 (+13%) 361 (+44%) 519encode utf-16be '\U0001'+'\u0100'* 489 (+6%)458 (+13%) 361 (+44%) 519encode utf-16be '\U0001'+'\u8000'* -- components: Interpreter Core, Unicode files: encode-utf16.patch keywords: patch messages: 162473 nosy: Arfrever, asvetlov, ezio.melotti, haypo, pitrou, storchaka priority: normal severity: normal status: open title: Faster UTF-16 encoding type: performance versions: Python 3.3 Added file: http://bugs.python.org/file25856/encode-utf16.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15026 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15027] Faster UTF-32 encoding
New submission from Serhiy Storchaka storch...@gmail.com: In pair to issue14625 here is a patch than speed up UTF-32 encoding in several times. In addition, it fixes an unsafe check of an integer overflow. Here are the results of benchmarking. See benchmark tools in https://bitbucket.org/storchaka/cpython-stuff repository. On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz: Py2.7Py3.2Py3.3patched 541 (+1032%) 541 (+1032%) 844 (+626%) 6125 encode utf-32le 'A'*1 543 (+1056%) 541 (+1060%) 844 (+643%) 6275 encode utf-32le '\x80'*1 544 (+1010%) 542 (+1014%) 843 (+616%) 6037 encode utf-32le '\x80'+'A'* 541 (+799%) 542 (+797%) 764 (+537%) 4864 encode utf-32le '\u0100'*1 544 (+781%) 542 (+784%) 767 (+525%) 4793 encode utf-32le '\u0100'+'A'* 544 (+789%) 542 (+792%) 766 (+531%) 4834 encode utf-32le '\u0100'+'\x80'* 542 (+799%) 541 (+801%) 764 (+538%) 4874 encode utf-32le '\u8000'*1 544 (+779%) 542 (+782%) 767 (+523%) 4780 encode utf-32le '\u8000'+'A'* 544 (+793%) 542 (+796%) 766 (+534%) 4859 encode utf-32le '\u8000'+'\x80'* 544 (+819%) 542 (+823%) 766 (+553%) 5001 encode utf-32le '\u8000'+'\u0100'* 430 (+867%) 427 (+874%) 860 (+383%) 4157 encode utf-32le '\U0001'*1 543 (+655%) 543 (+655%) 861 (+376%) 4101 encode utf-32le '\U0001'+'A'* 543 (+658%) 543 (+658%) 861 (+378%) 4116 encode utf-32le '\U0001'+'\x80'* 543 (+670%) 543 (+670%) 859 (+387%) 4180 encode utf-32le '\U0001'+'\u0100'* 543 (+666%) 543 (+666%) 860 (+383%) 4158 encode utf-32le '\U0001'+'\u8000'* 541 (+880%) 543 (+876%) 844 (+528%) 5300 encode utf-32be 'A'*1 541 (+872%) 542 (+870%) 844 (+523%) 5256 encode utf-32be '\x80'*1 544 (+843%) 542 (+846%) 843 (+509%) 5130 encode utf-32be '\x80'+'A'* 541 (+363%) 542 (+362%) 764 (+228%) 2505 encode utf-32be '\u0100'*1 544 (+366%) 542 (+368%) 766 (+231%) 2534 encode utf-32be '\u0100'+'A'* 544 (+363%) 542 (+365%) 766 (+229%) 2519 encode utf-32be '\u0100'+'\x80'* 542 (+363%) 541 (+364%) 764 (+228%) 2509 encode utf-32be '\u8000'*1 544 (+366%) 542 (+368%) 766 (+231%) 2534 encode utf-32be '\u8000'+'A'* 544 (+363%) 542 (+364%) 766 (+229%) 2517 encode utf-32be '\u8000'+'\x80'* 544 (+372%) 542 (+374%) 766 (+235%) 2568 encode utf-32be '\u8000'+'\u0100'* 430 (+428%) 427 (+432%) 860 (+164%) 2270 encode utf-32be '\U0001'*1 543 (+317%) 541 (+318%) 861 (+163%) 2262 encode utf-32be '\U0001'+'A'* 543 (+320%) 541 (+321%) 861 (+165%) 2279 encode utf-32be '\U0001'+'\x80'* 543 (+322%) 541 (+323%) 859 (+167%) 2290 encode utf-32be '\U0001'+'\u0100'* 543 (+322%) 541 (+324%) 860 (+167%) 2292 encode utf-32be '\U0001'+'\u8000'* -- components: Interpreter Core, Unicode files: encode-utf32.patch keywords: patch messages: 162474 nosy: Arfrever, asvetlov, ezio.melotti, haypo, pitrou, storchaka priority: normal severity: normal status: open title: Faster UTF-32 encoding type: performance versions: Python 3.3 Added file: http://bugs.python.org/file25857/encode-utf32.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15027 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14850] The inconsistency of codecs.charmap_decode
Serhiy Storchaka storch...@gmail.com added the comment: What is the use case for passing a string subclass to charmap_decode? Or in other words, how did you stumble upon the bug? I stumbled upon it, rewriting the charmap decoder (issue14874). Now charmap decoder processes the two cases -- a more effective case of string table and a general slower case of general mapping. I proposed a more optimized case of 256-character UCS2 string (covers all standard charmap encodings). If processing general strings and maps was consistent, these cases can be merged. A string subclass is just an example that illustrates the inconsistency. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14850 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14850] The inconsistency of codecs.charmap_decode
Serhiy Storchaka storch...@gmail.com added the comment: U+FFFE is documented as representing an undefined mapping, Yes, using U+FFFE for representing an undefined mapping in strings is normal, the question was about string subclasses. And if we will correct it for string subclasses, how far we go any further? How about general mapping? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14850 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com