Serhiy Storchaka added the comment:
- it won't work for reading: TextIOWrapper calls the read1() method, which is
only defined by BufferedIO objects.
Since 3.3 TextIOWrapper works with raw IO objects (issue12591).
Yes. And I just noticed that the _io module (the C version) will also buffer
Serhiy Storchaka added the comment:
I'm a little polished the patch before committing. Thank you for the patch,
Aman Shah.
--
resolution: - fixed
stage: commit review - committed/rejected
status: open - closed
___
Python tracker rep
Serhiy Storchaka added the comment:
Sorry, I perhaps missed your response, Senthil. Now committed and closed again.
--
resolution: - fixed
stage: patch review - committed/rejected
status: open - closed
___
Python tracker rep...@bugs.python.org
http
Serhiy Storchaka added the comment:
Of course it would be nice to have the tests for so much cases as possible, but
I am afraid that it will not be easy. The patch LGTM.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17016
Serhiy Storchaka added the comment:
I think these tests have no sense after PEP393. They tests that StreamWriter
works with non-BMP characters broken inside surrogate pair. I.e.
c.write(s[:i]); c.write(s[i:]) always is same as c.write(s), even if i breaks s
inside a surrogate pair. This case
Serhiy Storchaka added the comment:
Test fails with stack overflow:
==
ERROR: test_pushCR_LF (email.test.test_email.TestIterators)
FeedParser BufferedSubFile.push() assumed it received complete
Changes by Serhiy Storchaka storch...@gmail.com:
--
components: +IO
nosy: +benjamin.peterson, hynek, pitrou, stutzbach
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17440
Serhiy Storchaka added the comment:
tuned_gzip does dangerous things, it overloads private methods of GzipFile.
From Bazaar 2.3 Release Notes:
* Stop using ``bzrlib.tuned_gzip.GzipFile``. It is incompatible with
python-2.7 and was only used for Knit format repositories, which haven't
been
New submission from Serhiy Storchaka:
Ezio proposed in issue16389 to not cache re.compile. Caching of re.compile has
no sense and only pollutes the cache.
--
components: Library (Lib), Regular Expressions
messages: 184354
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
Serhiy Storchaka added the comment:
Here is a patch.
--
keywords: +patch
stage: needs patch - patch review
Added file: http://bugs.python.org/file29429/re_compile_nocache.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org
Serhiy Storchaka added the comment:
os.path.normpath() works not only with strings but with bytes objects too.
--
nosy: +serhiy.storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17415
Serhiy Storchaka added the comment:
Hmm. I were going to use this method for re's named group (see issue14462).
There is a possibility that some third-party code uses it for checking on
general Unicode-aware identifiers. The language specifification says that
keywords is a subset
Changes by Serhiy Storchaka storch...@gmail.com:
--
resolution: fixed -
status: closed - open
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
Serhiy Storchaka added the comment:
I'm not sure what is wrong and can't check on Windows, but it is possible that
this patch fixes tests. Please check it if you can.
--
Added file: http://bugs.python.org/file29433/test_cpickle_fileio.patch
Serhiy Storchaka added the comment:
Oh, yes.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
___
___
Python-bugs-list mailing list
Changes by Serhiy Storchaka storch...@gmail.com:
Removed file: http://bugs.python.org/file29433/test_cpickle_fileio.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
Serhiy Storchaka added the comment:
Benjamin has fixed this in the changeset 6aab72424063.
--
resolution: - fixed
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17299
Serhiy Storchaka added the comment:
May be in 3.4 an exception should be raised? HTTPConnection('python.org', 80,
False) now silently returns wrong result.
--
components: +Library (Lib)
nosy: +serhiy.storchaka
stage: - patch review
type: - enhancement
versions: +Python 3.4
Serhiy Storchaka added the comment:
This looks similar to issue16809 and requires a similar solution.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17397
Serhiy Storchaka added the comment:
This was proposed before (see issue16150) and was rejected after discussing on
Python-ideas.
--
nosy: +serhiy.storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17433
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +rhettinger
type: - enhancement
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17433
New submission from Serhiy Storchaka:
Tkinter's split() recursive splits bytes but not unicode strings.
from tkinter import *
t = Tcl()
t.tk.split((b'a 2',))
(('a', '2'),)
t.tk.split(('a 2',))
('a 2',)
--
components: Tkinter, Unicode
messages: 184622
nosy: ezio.melotti, gpolo
Serhiy Storchaka added the comment:
Here is a patch which add support of Tcl_Obj to tkinter's splitlist(). This not
only fixes some incompatibility with Tk 8.6, but can fix some issues with older
Tk versions (see for example issue17397).
--
keywords: +patch
nosy: +gpolo
stage
Serhiy Storchaka added the comment:
I do not understand what is bad in converting parameters after removed 'strict'
to be keyword-only.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17460
Serhiy Storchaka added the comment:
Note that --create command should support --directory option too.
Modern tar programs don't need to be told the compression method--they infer
it. If they can do it in C, we can do it in Python. So we should simply
omit the -bz2 stuff.
An archive may
Serhiy Storchaka added the comment:
I'm trying to solve this issue (it seemed easy), but the bug is worse than
expected. Python crashed even without iteration at all.
it = 'abracadabra'
for _ in range(100):
it = filter(bool, it)
del it
And fixing a recursive deallocator is more
Serhiy Storchaka added the comment:
Thank you. Now I understand why this issue not happened with containers.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14010
Serhiy Storchaka added the comment:
Here is a patch which adds recursion limit checks to builtin and itertools
recursive iterators.
--
components: +Extension Modules
keywords: +patch
nosy: +rhettinger
stage: needs patch - patch review
Added file: http://bugs.python.org/file29483
Changes by Serhiy Storchaka storch...@gmail.com:
--
versions: +Python 3.4 -Python 3.2
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2518
Serhiy Storchaka added the comment:
I will be offline some time. Feel free to revert these changes in 2.7-3.3 if it
is necessary.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1159051
Serhiy Storchaka storch...@gmail.com added the comment:
Modified patch adopted in 3.3 (changeset 596b0eaeece8), therefore the current
patch only applies to 3.2 and 2.7. If this is a new feature, the issue can be
closed.
--
nosy: +loewis, storchaka
versions: -Python 3.3
Serhiy Storchaka storch...@gmail.com added the comment:
This is definitely *not* a padding issue.
This is definitely a padding issue. All uncompressed files are located
so that the data starts with a 4-byte boundary (1190+30+15+1=1236, 27486
+30+17+3=27536, etc). This is, probably, allows
Serhiy Storchaka storch...@gmail.com added the comment:
The patch updated with a little clarified code and added comments.
--
Added file: http://bugs.python.org/file25590/decode_utf16_4.patch
___
Python tracker rep...@bugs.python.org
http
Serhiy Storchaka storch...@gmail.com added the comment:
That can't possibly be the reason. mmap requires 4k (4096) alignment (on
x86; more than that on SPARC).
This may be the reason to mmap the entire file and then read aligned
binary data
Serhiy Storchaka storch...@gmail.com added the comment:
for key, value in pairs:
if key in pairs:
if key in obj:?
--
title: Link to explain deviations from RFC 4627 in json module docs - Add
link to RFC 4627 from json documentation
Serhiy Storchaka storch...@gmail.com added the comment:
IMHO, it would be sufficient to have a simple bullet list of differences
and notes or warnings in places where Python can generate non-standard
JSON (top-level scalars, inf and nan, non-utf8 encoded strings
Serhiy Storchaka storch...@gmail.com added the comment:
I can reproduce it on Linux. Minimal example:
$ ./python -c open('longline.py', 'w').write('#' + repr('\u00A1' * 4096) +
'\n')
$ ./python longline.py
File longline.py, line 1
SyntaxError: Non-UTF-8 code starting with '\xc2' in file
Serhiy Storchaka storch...@gmail.com added the comment:
And for Python 2.7 too.
--
versions: +Python 2.7
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14811
Serhiy Storchaka storch...@gmail.com added the comment:
Function decoding_fgets (Parser/tokenizer.c) reads line in buffer of fixed size
8192 (line truncated to size 8191) and then fails because line is cut in the
middle of a multibyte UTF-8 character
Changes by Serhiy Storchaka storch...@gmail.com:
--
title: compile fails - UTF-8 character decoding - Syntax error on long UTF-8
lines
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14811
Serhiy Storchaka storch...@gmail.com added the comment:
For faulthandler and coverage would be more convenient option -M (run
module with __name__='__premain__' (or something of the sort) and
continue command line processing).
--
___
Python tracker
Serhiy Storchaka storch...@gmail.com added the comment:
...And mere minutes after I said I hadn't heard anything, I've got the
confirmation email. :-)
Congratulations!
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14777
Serhiy Storchaka storch...@gmail.com added the comment:
Here are two new patch. Checking for characters out-of-range moved,
making the code simpler. Theoretically it is a bit slow down decoding of
short UCS1 strings (up to 1 and 3 chars on 32- and 64-bit), but
practically there is no difference
Serhiy Storchaka storch...@gmail.com added the comment:
I'm afraid I have to close this one as rejected. It works as documented and
it's unlikely we'll decide to change it back. I'm sorry.
It does not work as documented. The proposed patch fixes the
documentation
Serhiy Storchaka storch...@gmail.com added the comment:
I still like NotImplementedError more than RuntimeError, though.
Well. here are patches for Python 3.2 and 2.7 (backported changeset
596b0eaeece8 + part of changeset fccdcd83708a).
--
Added file:
http://bugs.python.org/file25618
Serhiy Storchaka storch...@gmail.com added the comment:
Justin, perhaps of interest to the patch would be better if you provide any
microbenchmark.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13031
Changes by Serhiy Storchaka storch...@gmail.com:
--
versions: +Python 3.3 -Python 2.7, Python 3.2
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3931
Changes by Serhiy Storchaka storch...@gmail.com:
--
versions: +Python 2.7, Python 3.2
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3931
Serhiy Storchaka storch...@gmail.com added the comment:
Looks like issue14738 fixes this bug for Python 3.3.
print(ascii(b\xc2\x41\x42.decode('utf8', 'replace')))
'\ufffdAB'
print(ascii(b\xf1ABCD.decode('utf8', 'replace')))
'\ufffdABCD'
--
nosy: +storchaka
Serhiy Storchaka storch...@gmail.com added the comment:
The only issue left was about the number of U+FFFD generated with invalid
sequences in some cases.
My last patch has extensive tests for this, so you could try to apply it (or
copy the tests) and see if they all pass.
Tests fails
Serhiy Storchaka storch...@gmail.com added the comment:
I think that one U+FFFD is correct. The on;y error is a premature end of
data.
I poorly expressed. I also think that there is only one decoding error,
and not two. I think the test is wrong
Serhiy Storchaka storch...@gmail.com added the comment:
This might be just because it first checks if there two more bytes before
checking if they are valid, but 'invalid continuation byte' works too.
Yes, this implementation detail. It is much easier and faster. Whether
it is necessary
Serhiy Storchaka storch...@gmail.com added the comment:
Changing from 'unexpected end of data' to 'invalid continuation byte' for
b'\xe0\x00' is fine with me, but this will be a (minor) deviation from 2.7,
3.1, 3.2, and pypy (it could still be changed on all these except 3.1 though).
I
Serhiy Storchaka storch...@gmail.com added the comment:
I don't remember all the details right now, but it that test was passing with
my patch there must be something wrong somewhere (either in the patch, in the
test, or in our understanding of the standard).
No, test correctly expects two
Serhiy Storchaka storch...@gmail.com added the comment:
Anyone can review the patch?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1767933
New submission from Serhiy Storchaka storch...@gmail.com:
codecs.charmap_decode behaves differently with native and user string as decode
table.
import codecs
print(ascii(codecs.charmap_decode(b'\x00', 'replace', '\uFFFE')))
('\ufffd', 1)
class S(str): pass
...
print(ascii
Serhiy Storchaka storch...@gmail.com added the comment:
Thank you, Antoine. Now only issue14625 waits for review.
changeset: 77012:3430d7329a3b
+* UTF-8 and UTF-16 decoding is now 2x to 4x faster.
In fact now UTF-16 decoding faster for a maximum of +25% compared to Python 3.2
on my
Serhiy Storchaka storch...@gmail.com added the comment:
Here is updated patch, with tests and support of objects with only 'write'
method.
--
Added file: http://bugs.python.org/file25652/etree_write_utf16_2.patch
___
Python tracker rep
Serhiy Storchaka storch...@gmail.com added the comment:
assert logging.debug(This is a test.) or True
--
nosy: +storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14868
Serhiy Storchaka storch...@gmail.com added the comment:
http://permalink.gmane.org/gmane.comp.python.devel/132675
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14469
New submission from Serhiy Storchaka storch...@gmail.com:
Charmap decoders are not as important as UTF decoders, but are still widely
used. In Python 3.3 with PEP 393 they slowed down 4x. The proposed patch
restores the performance.
Optimized only the most common case, when the decoder
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25665/charmapdecodebench.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14874
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25666/bench-diff.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14874
Serhiy Storchaka storch...@gmail.com added the comment:
For Python 3.3, _PyUnicodeWriter API is faster than the Py_UCS4 buffer API
and PyAccu API in quite all cases, with a speedup between 30% and 100%. But
there are some cases where the _PyUnicodeWriter API is slower:
Perhaps most
Serhiy Storchaka storch...@gmail.com added the comment:
Funny. struct.pack(fmt, args...) is just an alias to
struct.Struct(fmt).pack(args...). The error message should be changed to
explicitly state that we are talking about the data for packing, and not about
the arguments of function
Serhiy Storchaka storch...@gmail.com added the comment:
It might help if the error message also stated how many arguments were
actually received, like the TypeError message already does for bad function /
method calls. E.g.,
struct.error: pack expected 2 items for packing (got 1)
Yes
Serhiy Storchaka storch...@gmail.com added the comment:
Here is a patch for 3.3. All of the tests pass successfully. Unfortunately, it
is a little slow, but I tried to minimize the losses.
--
Added file: http://bugs.python.org/file25709/issue8271-3.3.patch
Changes by Serhiy Storchaka storch...@gmail.com:
--
versions: +Python 3.3
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14920
___
___
Python-bugs
Serhiy Storchaka storch...@gmail.com added the comment:
Here are the benchmark results (numbers are speed, MB/s).
On 32-bit Linux, AMD Athlon 64 X2:
vanilla patched
utf-8 'A'*1 2016 (+5%) 2111
utf-8 '\x80
New submission from Serhiy Storchaka storch...@gmail.com:
As strange as it may seem, but using a simple trick was made UTF-8 decoding
even more speed up.
Here are the benchmark results.
On 32-bit Linux, AMD Athlon 64 X2:
vanilla patched
utf-8
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25718/decodebench.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14923
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25719/bench-diff.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14923
Serhiy Storchaka storch...@gmail.com added the comment:
Fortunately, issue14923 (if accepted) will compensate for the slowdown.
On 32-bit Linux, AMD Athlon 64 X2:
vanilla old patchfast patch
utf-8 'A'*1 2016 (+3
Serhiy Storchaka storch...@gmail.com added the comment:
It seems the patch relies on a two's complement representation of integers.
Mark, do you think that's ok?
Yes, the patch depends on two facts -- 8-bit bytes and a two's
complement representation of integers. That's why I call it a trick
Serhiy Storchaka storch...@gmail.com added the comment:
Yes, this is an implementation-dependent behavior (and on the supported
platforms it is implemented well in a certain way).
However, if the continuation byte check to do the simplest way ((ch) = 0x80
(ch) 0xC0), this has the same
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12716
___
___
Python-bugs-list
Serhiy Storchaka storch...@gmail.com added the comment:
See also issue1767933.
Instead of codecs.StreamWriter better to use io.TextIOWrapper, because the
first is slower and has numerous flaws.
--
nosy: +storchaka
versions: +Python 3.3
___
Python
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2005
___
___
Python-bugs-list
Changes by Serhiy Storchaka storch...@gmail.com:
--
versions: +Python 3.3 -Python 3.1
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue2005
Serhiy Storchaka storch...@gmail.com added the comment:
Mickey, you can wrap file-like object returned by urlopen with io.TextIOWrapper.
config = configparser.RawConfigParser()
config.read_file(io.TextIOWrapper(urlopen(path_config), encoding='utf-8'))
Because there is no bug and new
Serhiy Storchaka storch...@gmail.com added the comment:
If you add the encoding parameter, you should also add at least errors and
newline parameters. And why not just use io.TextIOWrapper?
page.decode_content() bad that compels to read and to decode at once all of the
data, while
Serhiy Storchaka storch...@gmail.com added the comment:
So, do you have any comment or complain? Or can I commit the patch?
I beg your pardon, I will do a review and additional benchmarks today.
So far away I have to say, it is better to use stringlib approach, than the
massive macros, which
Serhiy Storchaka storch...@gmail.com added the comment:
I just sent you a patch which does not use any macros or stringlib.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14744
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +loewis
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1470548
___
___
Python-bugs-list
Serhiy Storchaka storch...@gmail.com added the comment:
Oh, I see XMLGenerator completely outdated. It even has not been ported to
Python 3. See function _write:
def _write(self, text):
if isinstance(text, str):
self._out.write(text)
else:
self
Serhiy Storchaka storch...@gmail.com added the comment:
The patch updated to reflect Martin's stylistic comments.
Sorry for the delay, Martin. I have not received an email with your review from
2012-05-13, and only today accidentally discovered your comments in Rietveld.
It seems to have been
Serhiy Storchaka storch...@gmail.com added the comment:
See issue3665.
--
nosy: +storchaka
title: restore python2 unicode literals in ru strings - restore python2
unicode literals in ur strings
___
Python tracker rep...@bugs.python.org
http
Serhiy Storchaka storch...@gmail.com added the comment:
I don't think it is worth to target it for 2.7 and 3.2 (it's new feature, not
bugfix), but for 3.3 it will be very useful.
Since PEP 393 conversion to the surrogate pairs is no longer relevant.
--
components: +Regular Expressions
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25781/re_unicode_escapes.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25782/3665.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
Changes by Serhiy Storchaka storch...@gmail.com:
Removed file: http://bugs.python.org/file25781/re_unicode_escapes.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
Changes by Serhiy Storchaka storch...@gmail.com:
Removed file: http://bugs.python.org/file25782/3665.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25783/re_unicode_escapes.diff
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file25784/3665.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3665
Changes by Serhiy Storchaka storch...@gmail.com:
--
nosy: +storchaka
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14993
___
___
Python-bugs-list
Serhiy Storchaka storch...@gmail.com added the comment:
Well, I'm going to ignore the long lines and documentation. The patch is
really big and impressive.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14626
New submission from Serhiy Storchaka storch...@gmail.com:
In pair to issue14624 here is a patch than speed up UTF-16 encoding in several
times. In addition, it fixes an unsafe check of an integer overflow.
Here are the results of benchmarking. See benchmark tools in
https://bitbucket.org
New submission from Serhiy Storchaka storch...@gmail.com:
In pair to issue14625 here is a patch than speed up UTF-32 encoding in several
times. In addition, it fixes an unsafe check of an integer overflow.
Here are the results of benchmarking. See benchmark tools in
https://bitbucket.org
Serhiy Storchaka storch...@gmail.com added the comment:
What is the use case for passing a string subclass to charmap_decode? Or in
other words, how did you stumble upon the bug?
I stumbled upon it, rewriting the charmap decoder (issue14874). Now
charmap decoder processes the two cases
Serhiy Storchaka storch...@gmail.com added the comment:
U+FFFE is documented as representing an undefined mapping,
Yes, using U+FFFE for representing an undefined mapping in strings is
normal, the question was about string subclasses. And if we will correct
it for string subclasses, how far we
1101 - 1200 of 25750 matches
Mail list logo