Serhiy Storchaka added the comment:
Thanks, Anatoly. I see an actual bug. FindFirstFile and FindNextFile return
broken name if file unicode name can't be represented in current codepage.
I don't know what is perfect solution for this issue.
On 2.7 we can decode listdir() argument to unicode
R. David Murray added the comment:
That's what surrogateescape is for, on linux. I thought Victor dealt with this
a different way in Windows. Maybe by deprecating the bytes interface :)
--
nosy: +haypo
___
Python tracker rep...@bugs.python.org
Serhiy Storchaka added the comment:
Surrogateescape is for non-decodable names. Here we have a problem with
non-encodable names.
I know that naive approach with using only Unicode API inside is not work
because Windows use complex logic for filename encoding (for example dropping
R. David Murray added the comment:
Ah, I misunderstood your comment.
So, listdir is returning the correct the filename, it's just that we can't
encode it to the console encoding. So, it is working as expected within the
current windows console limitations, if not in a particularly useful
Serhiy Storchaka added the comment:
Ah, I misunderstood your comment.
Ah, you misunderstood my comment right now.
So, listdir is returning the correct the filename, it's just that we can't
encode it to the console encoding.
listdir() returns already irremediably broken filename (all
Éric Araujo added the comment:
Anatoly
b'Русское имя' is not a valid syntax construct in Python 3 even though I have
correct 'coding: utf-8' header and expect characters to be utf-8 bytes.
David
The byte string vs the coding cookie is an interesting observation, but is a
separate
issue
anatoly techtonik added the comment:
There is one more problem - when I redirect the output with:
py test_unicode_fname.py test.log 21
In Python 2.7 the traceback is at the end of file, in Python 3.3 it is at the
beginning. Therefore I just copied data from the screen, where it appears in
Changes by anatoly techtonik techto...@gmail.com:
Added file: http://bugs.python.org/file28305/py27fname.log
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Changes by anatoly techtonik techto...@gmail.com:
Added file: http://bugs.python.org/file28306/py33fname.log
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Amaury Forgeot d'Arc added the comment:
Anatoly, please file another issue for the 21 mess.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
anatoly techtonik added the comment:
- Do you have a full traceback of the failing os.walk() in Python3.3?
Traceback (most recent call last):
File test.py, line 9, in module
print(dirs)
File C:\Python33\lib\encodings\cp437.py, line 19, in encode
return
anatoly techtonik added the comment:
I attach tests.py file used to run the tests. Results are in python2.out.txt
and python3.out.txt also attached.
What are the results of os.stat(b'Русское имя') and os.stat(b'Русское имя')
on Python 2.7 and Python 3.3+?
b'Русское имя' is not a valid
Changes by anatoly techtonik techto...@gmail.com:
Added file: http://bugs.python.org/file28288/tests.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Changes by anatoly techtonik techto...@gmail.com:
Added file: http://bugs.python.org/file28289/python2.out.txt
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Changes by anatoly techtonik techto...@gmail.com:
Added file: http://bugs.python.org/file28290/python3.out.txt
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Serhiy Storchaka added the comment:
Thank you, Anatoly, for report. I'll try to investigate this issue.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Amaury Forgeot d'Arc added the comment:
So, it seems that os.walk() and os.listdir() work correctly with Python3.3, but
print(u'Русское имя') fails because the terminal encoding is cp437.
See issue1602 for the print issue.
As a quick workaround, try to set
Serhiy Storchaka added the comment:
Anatoly, can you please run the attached test?
--
Added file: http://bugs.python.org/file28291/test_unicode_fname.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
R. David Murray added the comment:
Based on the pasted results I'm pretty sure there's nothing wrong with walk and
listdir. But it sounds like Serhiy will check to make sure, so we'll wait for
his report.
The byte string vs the coding cookie is an interesting observation, but is a
separate
Changes by Serhiy Storchaka storch...@gmail.com:
Added file: http://bugs.python.org/file28293/test_unicode_fname.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Changes by Serhiy Storchaka storch...@gmail.com:
Removed file: http://bugs.python.org/file28291/test_unicode_fname.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
anatoly techtonik added the comment:
In Python 3 it fails with UnicodeEncodeError in
C:\Python33\lib\encodings\cp437.py, while Vista's 'dir' command shows
everything correctly in the same console, so somebody definitely overlooked
that aspect.
This bug is clearly an issue for developers who
Amaury Forgeot d'Arc added the comment:
- Do you have a full traceback of the failing os.walk() in Python3.3?
- What's the result of os.listdir(u'.') ?
--
nosy: +amaury.forgeotdarc
status: pending - open
___
Python tracker rep...@bugs.python.org
Serhiy Storchaka added the comment:
What are the results of os.listdir(b'.') and os.listdir(u'.') on Python 2.7 and
Python 3.3+?
What are the results of os.stat(b'Русское имя') and os.stat(b'Русское имя') on
Python 2.7 and Python 3.3+?
What are the results of sys.getdefaultencoding(),
R. David Murray added the comment:
My guess is that your unicode issue is issue 1602, which is non-trivial to
solve.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Serhiy Storchaka added the comment:
My guess is that your unicode issue is issue 1602, which is non-trivial to
solve.
In such case the output will be something like:
['English name', '']
[]
[]
--
___
Python tracker rep...@bugs.python.org
New submission from anatoly techtonik:
This critical bug is one of the reasons that non-English speaking communities
doesn't adopt Python as broadly as it happens in English world compared to
other technologies (PHP etc.).
# -*- coding: utf-8 -*-
import os
os.mkdir(u'Русское имя')
Serhiy Storchaka added the comment:
It is reproduced on 3.x?
--
nosy: +serhiy.storchaka
type: - behavior
versions: -Python 3.1
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Changes by Serhiy Storchaka storch...@gmail.com:
--
Removed message: http://bugs.python.org/msg177278
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
Serhiy Storchaka added the comment:
Is it reproduced on 3.x?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list
R. David Murray added the comment:
No.
--
nosy: +r.david.murray
resolution: - out of date
status: open - closed
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
R. David Murray added the comment:
Oops, clicked submit too soon.
It isn't likely to get fixed in 2.7, because 2.7's unicode support problems is
the major reason python3 was developed.
--
stage: - committed/rejected
___
Python tracker
R. David Murray added the comment:
For that matter, it isn't reproduced in python2.7, either:
for r, dirs, files in os.walk(u'.'):
... print dirs
...
[u'\u0420\u0443\u0441\u0441\u043a\u043e\u0435 \u0438\u043c\u044f']
[]
--
resolution: out of date - invalid
Jeremy Kloth added the comment:
The problem exhibited is not coming from the os.walk() implementation, but from
the use of a byte-string as the argument to it.
The directories are created with unicode literals and therefore the argument
must also be a unicode literal (u'.') for them to be
R. David Murray added the comment:
Works for me without the u'.', too, though less usefully:
for r, dirs, files in os.walk('.'):
... print dirs
...
['\xd0\xa0\xd1\x83\xd1\x81\xd1\x81\xd0\xba\xd0\xbe\xd0\xb5
\xd0\xb8\xd0\xbc\xd1\x8f']
Maybe that doesn't work on Windows, though. I am, of
35 matches
Mail list logo