[issue46572] Unicode identifiers not necessarily unique

2022-01-29 Thread Diego Argueta


Diego Argueta  added the comment:

I did read PEP-3131 before posting this but I still thought the behavior was 
counterintuitive.

--

___
Python tracker 
<https://bugs.python.org/issue46572>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46572] Unicode identifiers not necessarily unique

2022-01-29 Thread Diego Argueta

New submission from Diego Argueta :

The way Python 3 handles identifiers containing mathematical characters appears 
to be broken. I didn't test the entire range of U+1D400 through U+1D59F but I 
spot-checked them and the bug manifests itself there:

Python 3.9.7 (default, Sep 10 2021, 14:59:43) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> foo = 1234567890
>>> bar = 1234567890
>>> foo is bar
False
>>> 햇햆햗 = 1234567890

>>> foo is 햇햆햗
False
>>> bar is 햇햆햗
True

>>> 햇햆햗 = 0
>>> bar
0


This differs from the behavior with other non-ASCII characters. For example, 
ASCII 'a' and Cyrillic 'a' are properly treated as different identifiers:

>>> а = 987654321# Cyrillic lowercase 'a', U+0430
>>> a = 123456789# ASCII 'a'
>>> а# Cyrillic
987654321
>>> a# ASCII
123456789


While a bit of a pathological case, it is a nasty surprise. It's possible this 
is a symptom of a larger bug in the way identifiers are resolved.

This is similar but not identical to https://bugs.python.org/issue46555

Note: I did not find this myself; I give credit to Cooper Stimson 
(https://github.com/6C1) for finding this bug. I merely reported it.

--
components: Parser, Unicode
messages: 412084
nosy: da, ezio.melotti, lys.nikolaou, pablogsal, vstinner
priority: normal
severity: normal
status: open
title: Unicode identifiers not necessarily unique
type: behavior
versions: Python 3.7, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue46572>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33361] readline() + seek() on codecs.EncodedFile breaks next readline()

2019-05-26 Thread Diego Argueta


Diego Argueta  added the comment:

> though #32110 ("Make codecs.StreamReader.read() more compatible with read() 
> of other files") may have fixed more (all?) of it.

Still seeing this in 3.7.3 so I don't think so?

--

___
Python tracker 
<https://bugs.python.org/issue33361>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33361] readline() + seek() on codecs.EncodedFile breaks next readline()

2018-07-13 Thread Diego Argueta


Diego Argueta  added the comment:

Bug still present in 3.7.0, now seeing it in 3.8.0a0 as well.

--
versions: +Python 3.8

___
Python tracker 
<https://bugs.python.org/issue33361>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33593] Support heapq on typed arrays?

2018-05-22 Thread Diego Argueta

Diego Argueta <diego.argu...@gmail.com> added the comment:

However I do see your point about the speed.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33593] Support heapq on typed arrays?

2018-05-22 Thread Diego Argueta

Diego Argueta <diego.argu...@gmail.com> added the comment:

I was referring to the C arrays in the Python standard library: 
https://docs.python.org/3/library/array.html

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33593] Support heapq on typed arrays?

2018-05-21 Thread Diego Argueta

New submission from Diego Argueta <diego.argu...@gmail.com>:

It'd be really great if we could have support for using the `heapq` module on 
typed arrays from `array`. For example:


```
import array
import heapq
import random

a = array.array('I', (random.randrange(10) for _ in range(10)))
heapq.heapify(a)
```

Right now this code throws a TypeError:

TypeError: heap argument must be a list


I suppose I could use `bisect` to insert items one by one but I imagine a 
single call to heapify() would be more efficient, especially if I'm loading the 
array from a byte string.

>From what I can tell the problem lies in the C implementation, since removing 
>the _heapq imports at the end of the heapq module (in 3.6) makes it work.

--
components: Library (Lib)
messages: 317250
nosy: da
priority: normal
severity: normal
status: open
title: Support heapq on typed arrays?
type: enhancement
versions: Python 2.7, Python 3.6

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33361] readline() + seek() on io.EncodedFile breaks next readline()

2018-05-21 Thread Diego Argueta

Diego Argueta <diego.argu...@gmail.com> added the comment:

Update: Tested this on Python 3.5.4, 3.4.8, and 3.7.0b3 on OSX 10.13.4. They 
also exhibit the bug. Updating the ticket accordingly.

--
versions: +Python 3.4, Python 3.5, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33361>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33038] GzipFile doesn't always ignore None as filename

2018-05-01 Thread Diego Argueta

Diego Argueta <diego.argu...@gmail.com> added the comment:

Did this make it into 2.7.15? There aren't any release notes for it on the 
download page like usual.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33038>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33361] readline() + seek() on io.EncodedFile breaks next readline()

2018-04-26 Thread Diego Argueta

Diego Argueta <diego.argu...@gmail.com> added the comment:

Update: If I run your exact code it still breaks for me:

```
Got header: 'abc\n'
Skipping the header. 'def\n'
Line 2: 'ghi\n'
Line 3: 'abc\n'
Line 4: 'def\n'
Line 5: 'ghi\n'
```

I'm running Python 2.7.14 and 3.6.5 on OSX 10.13.4. Startup banners:

Python 2.7.14 (default, Feb  7 2018, 14:15:12) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin

Python 3.6.5 (default, Apr  2 2018, 14:03:12) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33361>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33361] readline() + seek() on io.EncodedFile breaks next readline()

2018-04-26 Thread Diego Argueta

Diego Argueta <diego.argu...@gmail.com> added the comment:

That's because the stream isn't transcoding, since UTF-8 is ASCII-compatible. 
Try using something not ASCII-compatible as the codec e.g. 'ibm500' and it'll 
give incorrect results.

```
b = io.BytesIO(u'a,b\r\n"asdf","jkl;"\r\n'.encode('ibm500'))
s = codecs.EncodedFile(b, 'ibm500')
```

```
Got header: '\x81k\x82\r%'
Skipping the header. '\x7f\x81\xa2\x84\x86\x7fk\x7f\x91\x92\x93^\x7f\r%'
Line 2: '\x81k\x82\r%'
Line 3: '\x7f\x81\xa2\x84\x86\x7fk\x7f\x91\x92\x93^\x7f\r%'
```

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33361>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33361] readline() + seek() on io.EncodedFile breaks next readline()

2018-04-25 Thread Diego Argueta

New submission from Diego Argueta <diego.argu...@gmail.com>:

It appears that calling readline() on a codecs.EncodedFile stream breaks 
seeking and causes subsequent attempts to iterate over the lines or call 
readline() to backtrack and return already consumed lines.

A minimal example:

```
from __future__ import print_function

import codecs
import io


def run(stream):
offset = stream.tell()
try:
stream.seek(0)
header_row = stream.readline()
finally:
stream.seek(offset)

print('Got header: %r' % header_row)

if stream.tell() == 0:
print('Skipping the header: %r' % stream.readline())

for index, line in enumerate(stream, start=2):
print('Line %d: %r' % (index, line))


b = io.BytesIO(u'a,b\r\n"asdf","jkl;"\r\n'.encode('utf-16-le'))
s = codecs.EncodedFile(b, 'utf-8', 'utf-16-le')

run(s)
```

Output:

```
Got header: 'a,b\r\n'
Skipping the header: '"asdf","jkl;"\r\n'<-- this is line 2
Line 2: 'a,b\r\n'   <-- this is line 1
Line 3: '"asdf","jkl;"\r\n' <-- now we're back to line 2
```

As you can see, the line being skipped is actually the second line, and when we 
try reading from the stream again, the iterator starts from the beginning of 
the file.

Even weirder, adding a second call to readline() to skip the second line shows 
it's going **backwards**:

```
Got header: 'a,b\r\n'
Skipping the header: '"asdf","jkl;"\r\n'<-- this is actually line 2
Skipping the second line: 'a,b\r\n' <-- this is line 1
Line 2: '"asdf","jkl;"\r\n' <-- this is now correct
```

The expected output shows that we got a header, skipped it, and then read one 
data line.

```
Got header: 'a,b'
Skipping the header: 'a,b\r\n'
Line 2: '"asdf","jkl;"\r\n'
```

I'm sure this is related to the implementation of readline() because if we 
change this:

```
header_row = stream.readline()
```

to this:

```
header_row = stream.read().splitlines()[0]
```

then we get the expected output. If on the other hand we comment out the seek() 
in the finally clause, we also get the expected output (minus the "skipping the 
header") code.

--
components: IO, Library (Lib)
messages: 315768
nosy: da
priority: normal
severity: normal
status: open
title: readline() + seek() on io.EncodedFile breaks next readline()
type: behavior
versions: Python 2.7, Python 3.6

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33361>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33038] GzipFile doesn't always ignore None as filename

2018-03-11 Thread Diego Argueta

Diego Argueta <diego.argu...@gmail.com> added the comment:

Yeah that's fine. Thanks!

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33038>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33038] GzipFile doesn't always ignore None as filename

2018-03-09 Thread Diego Argueta

New submission from Diego Argueta <diego.argu...@gmail.com>:

The Python documentation states that if the GzipFile can't determine a filename 
from `fileobj` it'll use an empty string and won't be included in the header. 
Unfortunately, this doesn't work for SpooledTemporaryFile which has a `name` 
attribute but doesn't set it initially. The result is a crash.

To reproduce

```
import gzip
import tempfile

with tempfile.SpooledTemporaryFile() as fd:
with gzip.GzipFile(mode='wb', fileobj=fd) as gz:
gz.write(b'asdf')
```

Result:
```
Traceback (most recent call last):
  File "", line 2, in 
  File "/Users/diegoargueta/.pyenv/versions/2.7.14/lib/python2.7/gzip.py", line 
136, in __init__
self._write_gzip_header()
  File "/Users/diegoargueta/.pyenv/versions/2.7.14/lib/python2.7/gzip.py", line 
170, in _write_gzip_header
fname = os.path.basename(self.name)
  File "/Users/diegoargueta/.pyenv/versions/gds27/lib/python2.7/posixpath.py", 
line 114, in basename
i = p.rfind('/') + 1
AttributeError: 'NoneType' object has no attribute 'rfind'
```

This doesn't happen on Python 3.6, where the null filename is handled properly. 
I've attached a patch file that fixed the issue for me.

--
components: Library (Lib)
files: gzip_filename_fix.patch
keywords: patch
messages: 313512
nosy: da
priority: normal
severity: normal
status: open
title: GzipFile doesn't always ignore None as filename
type: crash
versions: Python 2.7
Added file: https://bugs.python.org/file47473/gzip_filename_fix.patch

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33038>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com