[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-14 Thread Ronald Oussoren


Ronald Oussoren  added the comment:

I'm closing this as "not a bug" because this is likely caused by different 
unicode normalisations for strings.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed
type: crash -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-14 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-12 Thread Ronald Oussoren


Change by Ronald Oussoren :


--
components: +macOS
nosy: +ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-12 Thread Ronald Oussoren

Ronald Oussoren  added the comment:

What filesystem is used on macOS? If it is HFS+ you're likely running into 
unicode normalisation in the filesystem.

That is, 'й' can be represented as a single unicode codepoint (and likely is in 
your script), but in the NFD normalisation used by HFS+ the same character is 
represented using two codepoints (one of which is a combining character). 
Python string comparison compares code points and is not normalisation aware.

For APFS (used by default in recent macOS versions) the situation is more 
complicated according to what I've found on Google. However, APFS doesn't seen 
to normalise names (I've created a file name 'й' and os.listdir() returns a 
name with a single codepoint).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-10 Thread Mihail Kirilov

Mihail Kirilov  added the comment:

I am uploading an Archive with

1 - mac.png
Using a mac I cannot generate the other 'й', but I can load the file, it 
exists, but .name is wrong.

2 - linux.png
Using a linux the exact same thing generates the file not existing.

3 - The file itself.

It is very tricky to generate the problem on the mac I can hop on a call with 
you to show you exactly what I do.

--
Added file: https://bugs.python.org/file49663/Archive.zip

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-10 Thread Steven D'Aprano

Steven D'Aprano  added the comment:

In addition, you are probably hitting normalization issues. There are two ways 
to get the Cyrillic character 'й' in your string, one of them is a single code 
point, the other is two code points:

>>> a = 'й'
>>> b = 'й'
>>> len(a), unicodedata.name(a)
(1, 'CYRILLIC SMALL LETTER SHORT I')
>>> len(b), unicodedata.name(b[0]), unicodedata.name(b[1])
(2, 'CYRILLIC SMALL LETTER I', 'COMBINING BREVE')

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-10 Thread Steven D'Aprano

Steven D'Aprano  added the comment:

You are comparing the name with the file extension against the name without the 
file extension:

>>> "Файл на български.ldr" == "Файл на български"
False

--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-10 Thread Ronald Oussoren


Ronald Oussoren  added the comment:

What platform are you using?

--
nosy: +ronaldoussoren

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42614] Pathlib does not support a Cyrillic character 'й'

2020-12-10 Thread Mihail Kirilov

New submission from Mihail Kirilov :

I have a file with a Cirilyc name - "Файл на български", which when I load with 
path.Path and call name on it behaves differently

```
(Pdb) 
pathlib.Path("/tmp/pytest-of-root/pytest-15/test_bulgarian_name0/data/encoding/Файл
 на български.ldr").name
'Файл на български.ldr'
(Pdb) 
pathlib.Path("/tmp/pytest-of-root/pytest-15/test_bulgarian_name0/data/encoding/Файл
 на български.ldr").name[2]
'и'
(Pdb) 
pathlib.Path("/tmp/pytest-of-root/pytest-15/test_bulgarian_name0/data/encoding/Файл
 на български.ldr").name == "Файл на български"
False
```

--
components: Unicode
messages: 382823
nosy: ezio.melotti, hidr0.frbg, vstinner
priority: normal
severity: normal
status: open
title: Pathlib does not support a Cyrillic character 'й'
type: crash
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com