[issue46572] Unicode identifiers not necessarily unique
Diego Argueta added the comment: I did read PEP-3131 before posting this but I still thought the behavior was counterintuitive. -- ___ Python tracker <https://bugs.python.org/issue46572> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46572] Unicode identifiers not necessarily unique
New submission from Diego Argueta : The way Python 3 handles identifiers containing mathematical characters appears to be broken. I didn't test the entire range of U+1D400 through U+1D59F but I spot-checked them and the bug manifests itself there: Python 3.9.7 (default, Sep 10 2021, 14:59:43) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> foo = 1234567890 >>> bar = 1234567890 >>> foo is bar False >>> 햇햆햗 = 1234567890 >>> foo is 햇햆햗 False >>> bar is 햇햆햗 True >>> 햇햆햗 = 0 >>> bar 0 This differs from the behavior with other non-ASCII characters. For example, ASCII 'a' and Cyrillic 'a' are properly treated as different identifiers: >>> а = 987654321# Cyrillic lowercase 'a', U+0430 >>> a = 123456789# ASCII 'a' >>> а# Cyrillic 987654321 >>> a# ASCII 123456789 While a bit of a pathological case, it is a nasty surprise. It's possible this is a symptom of a larger bug in the way identifiers are resolved. This is similar but not identical to https://bugs.python.org/issue46555 Note: I did not find this myself; I give credit to Cooper Stimson (https://github.com/6C1) for finding this bug. I merely reported it. -- components: Parser, Unicode messages: 412084 nosy: da, ezio.melotti, lys.nikolaou, pablogsal, vstinner priority: normal severity: normal status: open title: Unicode identifiers not necessarily unique type: behavior versions: Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue46572> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33361] readline() + seek() on codecs.EncodedFile breaks next readline()
Diego Argueta added the comment: > though #32110 ("Make codecs.StreamReader.read() more compatible with read() > of other files") may have fixed more (all?) of it. Still seeing this in 3.7.3 so I don't think so? -- ___ Python tracker <https://bugs.python.org/issue33361> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33361] readline() + seek() on codecs.EncodedFile breaks next readline()
Diego Argueta added the comment: Bug still present in 3.7.0, now seeing it in 3.8.0a0 as well. -- versions: +Python 3.8 ___ Python tracker <https://bugs.python.org/issue33361> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33593] Support heapq on typed arrays?
Diego Argueta <diego.argu...@gmail.com> added the comment: However I do see your point about the speed. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33593> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33593] Support heapq on typed arrays?
Diego Argueta <diego.argu...@gmail.com> added the comment: I was referring to the C arrays in the Python standard library: https://docs.python.org/3/library/array.html -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33593> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33593] Support heapq on typed arrays?
New submission from Diego Argueta <diego.argu...@gmail.com>: It'd be really great if we could have support for using the `heapq` module on typed arrays from `array`. For example: ``` import array import heapq import random a = array.array('I', (random.randrange(10) for _ in range(10))) heapq.heapify(a) ``` Right now this code throws a TypeError: TypeError: heap argument must be a list I suppose I could use `bisect` to insert items one by one but I imagine a single call to heapify() would be more efficient, especially if I'm loading the array from a byte string. >From what I can tell the problem lies in the C implementation, since removing >the _heapq imports at the end of the heapq module (in 3.6) makes it work. -- components: Library (Lib) messages: 317250 nosy: da priority: normal severity: normal status: open title: Support heapq on typed arrays? type: enhancement versions: Python 2.7, Python 3.6 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33593> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33361] readline() + seek() on io.EncodedFile breaks next readline()
Diego Argueta <diego.argu...@gmail.com> added the comment: Update: Tested this on Python 3.5.4, 3.4.8, and 3.7.0b3 on OSX 10.13.4. They also exhibit the bug. Updating the ticket accordingly. -- versions: +Python 3.4, Python 3.5, Python 3.7 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33361> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33038] GzipFile doesn't always ignore None as filename
Diego Argueta <diego.argu...@gmail.com> added the comment: Did this make it into 2.7.15? There aren't any release notes for it on the download page like usual. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33038> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33361] readline() + seek() on io.EncodedFile breaks next readline()
Diego Argueta <diego.argu...@gmail.com> added the comment: Update: If I run your exact code it still breaks for me: ``` Got header: 'abc\n' Skipping the header. 'def\n' Line 2: 'ghi\n' Line 3: 'abc\n' Line 4: 'def\n' Line 5: 'ghi\n' ``` I'm running Python 2.7.14 and 3.6.5 on OSX 10.13.4. Startup banners: Python 2.7.14 (default, Feb 7 2018, 14:15:12) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin Python 3.6.5 (default, Apr 2 2018, 14:03:12) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33361> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33361] readline() + seek() on io.EncodedFile breaks next readline()
Diego Argueta <diego.argu...@gmail.com> added the comment: That's because the stream isn't transcoding, since UTF-8 is ASCII-compatible. Try using something not ASCII-compatible as the codec e.g. 'ibm500' and it'll give incorrect results. ``` b = io.BytesIO(u'a,b\r\n"asdf","jkl;"\r\n'.encode('ibm500')) s = codecs.EncodedFile(b, 'ibm500') ``` ``` Got header: '\x81k\x82\r%' Skipping the header. '\x7f\x81\xa2\x84\x86\x7fk\x7f\x91\x92\x93^\x7f\r%' Line 2: '\x81k\x82\r%' Line 3: '\x7f\x81\xa2\x84\x86\x7fk\x7f\x91\x92\x93^\x7f\r%' ``` -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33361> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33361] readline() + seek() on io.EncodedFile breaks next readline()
New submission from Diego Argueta <diego.argu...@gmail.com>: It appears that calling readline() on a codecs.EncodedFile stream breaks seeking and causes subsequent attempts to iterate over the lines or call readline() to backtrack and return already consumed lines. A minimal example: ``` from __future__ import print_function import codecs import io def run(stream): offset = stream.tell() try: stream.seek(0) header_row = stream.readline() finally: stream.seek(offset) print('Got header: %r' % header_row) if stream.tell() == 0: print('Skipping the header: %r' % stream.readline()) for index, line in enumerate(stream, start=2): print('Line %d: %r' % (index, line)) b = io.BytesIO(u'a,b\r\n"asdf","jkl;"\r\n'.encode('utf-16-le')) s = codecs.EncodedFile(b, 'utf-8', 'utf-16-le') run(s) ``` Output: ``` Got header: 'a,b\r\n' Skipping the header: '"asdf","jkl;"\r\n'<-- this is line 2 Line 2: 'a,b\r\n' <-- this is line 1 Line 3: '"asdf","jkl;"\r\n' <-- now we're back to line 2 ``` As you can see, the line being skipped is actually the second line, and when we try reading from the stream again, the iterator starts from the beginning of the file. Even weirder, adding a second call to readline() to skip the second line shows it's going **backwards**: ``` Got header: 'a,b\r\n' Skipping the header: '"asdf","jkl;"\r\n'<-- this is actually line 2 Skipping the second line: 'a,b\r\n' <-- this is line 1 Line 2: '"asdf","jkl;"\r\n' <-- this is now correct ``` The expected output shows that we got a header, skipped it, and then read one data line. ``` Got header: 'a,b' Skipping the header: 'a,b\r\n' Line 2: '"asdf","jkl;"\r\n' ``` I'm sure this is related to the implementation of readline() because if we change this: ``` header_row = stream.readline() ``` to this: ``` header_row = stream.read().splitlines()[0] ``` then we get the expected output. If on the other hand we comment out the seek() in the finally clause, we also get the expected output (minus the "skipping the header") code. -- components: IO, Library (Lib) messages: 315768 nosy: da priority: normal severity: normal status: open title: readline() + seek() on io.EncodedFile breaks next readline() type: behavior versions: Python 2.7, Python 3.6 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33361> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33038] GzipFile doesn't always ignore None as filename
Diego Argueta <diego.argu...@gmail.com> added the comment: Yeah that's fine. Thanks! -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33038> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33038] GzipFile doesn't always ignore None as filename
New submission from Diego Argueta <diego.argu...@gmail.com>: The Python documentation states that if the GzipFile can't determine a filename from `fileobj` it'll use an empty string and won't be included in the header. Unfortunately, this doesn't work for SpooledTemporaryFile which has a `name` attribute but doesn't set it initially. The result is a crash. To reproduce ``` import gzip import tempfile with tempfile.SpooledTemporaryFile() as fd: with gzip.GzipFile(mode='wb', fileobj=fd) as gz: gz.write(b'asdf') ``` Result: ``` Traceback (most recent call last): File "", line 2, in File "/Users/diegoargueta/.pyenv/versions/2.7.14/lib/python2.7/gzip.py", line 136, in __init__ self._write_gzip_header() File "/Users/diegoargueta/.pyenv/versions/2.7.14/lib/python2.7/gzip.py", line 170, in _write_gzip_header fname = os.path.basename(self.name) File "/Users/diegoargueta/.pyenv/versions/gds27/lib/python2.7/posixpath.py", line 114, in basename i = p.rfind('/') + 1 AttributeError: 'NoneType' object has no attribute 'rfind' ``` This doesn't happen on Python 3.6, where the null filename is handled properly. I've attached a patch file that fixed the issue for me. -- components: Library (Lib) files: gzip_filename_fix.patch keywords: patch messages: 313512 nosy: da priority: normal severity: normal status: open title: GzipFile doesn't always ignore None as filename type: crash versions: Python 2.7 Added file: https://bugs.python.org/file47473/gzip_filename_fix.patch ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33038> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com