[issue16656] os.walk ignores international dirs on Windows

2012-12-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thanks, Anatoly. I see an actual bug. FindFirstFile and FindNextFile return 
broken name if file unicode name can't be represented in current codepage.

I don't know what is perfect solution for this issue.

On 2.7 we can decode listdir() argument to unicode and then encode result names 
to str with sys.getfilesystemencoding() only if it is possible. Therefore 
listdir() with str argument will return unicode for non-encodable names. This 
should not make many new problems in addition to those which 2.7 already have 
with Unicode.

But on 3.x listdir() with bytes argument can returns only bytes objects. I 
don't know what to do with non-encodable names in such case. Perhaps an 
exception should be raised. Fortunately listdir() with bytes argument is rarely 
used on 3.x.

--
components: +Extension Modules, Unicode, Windows -Library (Lib)
nosy: +ezio.melotti, larry, loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-14 Thread R. David Murray

R. David Murray added the comment:

That's what surrogateescape is for, on linux.  I thought Victor dealt with this 
a different way in Windows.  Maybe by deprecating the bytes interface :)

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Surrogateescape is for non-decodable names. Here we have a problem with 
non-encodable names.

I know that naive approach with using only Unicode API inside is not work 
because Windows use complex logic for filename encoding (for example dropping 
diacritics). Perhaps Martin have more to say.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-14 Thread R. David Murray

R. David Murray added the comment:

Ah, I misunderstood your comment.

So, listdir is returning the correct the filename, it's just that we can't 
encode it to the console encoding.  So, it is working as expected within the 
current windows console limitations, if not in a particularly useful fashion.

(That is, listdir/os.walk are *not* ignoring the international dirs.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 Ah, I misunderstood your comment.

Ah, you misunderstood my comment right now.

 So, listdir is returning the correct the filename, it's just that we can't 
 encode it to the console encoding.

listdir() returns already irremediably broken filename (all Cyrillic
letters replaced with '?'). My test script outputs only ascii data, you
see literally what you get, there is no output encoding issues.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-13 Thread Éric Araujo

Éric Araujo added the comment:

Anatoly
 b'Русское имя' is not a valid syntax construct in Python 3 even though I have
 correct 'coding: utf-8' header and expect characters to be utf-8 bytes.

David
 The byte string vs the coding cookie is an interesting observation, but is a 
 separate
 issue and should probably be raised on python-ideas, since I'm guessing it the
 current behavior was a conscious design choice.

Yes, it works as designed: the coding cookie is used to decode bytes to 
characters in unicode literals (e.g. if I have u'Éric' in my source file, not a 
\u escape); bytes literals are independent of the coding cookie and should 
always contain only bytes, not characters (including \u escapes), e.g. 
'\xc3\x89ric' for UTF-8 bytes.

--
nosy: +eric.araujo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-13 Thread anatoly techtonik

anatoly techtonik added the comment:

There is one more problem - when I redirect the output with:

 py test_unicode_fname.py  test.log 21

In Python 2.7 the traceback is at the end of file, in Python 3.3 it is at the 
beginning. Therefore I just copied data from the screen, where it appears in 
correct order.

(current mood: Python debugging is a mess)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-13 Thread anatoly techtonik

Changes by anatoly techtonik techto...@gmail.com:


Added file: http://bugs.python.org/file28305/py27fname.log

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-13 Thread anatoly techtonik

Changes by anatoly techtonik techto...@gmail.com:


Added file: http://bugs.python.org/file28306/py33fname.log

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-13 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

Anatoly, please file another issue for the 21 mess.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread anatoly techtonik

anatoly techtonik added the comment:


 - Do you have a full traceback of the failing os.walk() in Python3.3?


Traceback (most recent call last):
  File test.py, line 9, in module
print(dirs)
  File C:\Python33\lib\encodings\cp437.py, line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
18-24: character maps to undefined

 - What's the result of os.listdir(u'.') ?


python3 -c import os; print(os.listdir(u'.'))
Traceback (most recent call last):
  File string, line 1, in module
  File C:\Python33\lib\encodings\cp437.py, line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position
41-47: character maps to undefined

python2 -c import os; print(os.listdir(u'.'))
[u'English name', u'test.py', u'test2.py',
u'\u0420\u0443\u0441\u0441\u043a\u043e\u0435 \u0438\u043c\u044f']

python2 -c import os; print(os.listdir('.'))
['English name', 'test.py', 'test2.py', '??? ???']

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread anatoly techtonik

anatoly techtonik added the comment:

I attach tests.py file used to run the tests. Results are in python2.out.txt 
and python3.out.txt also attached.

 What are the results of os.stat(b'Русское имя') and os.stat(b'Русское имя') 
 on Python 2.7 and Python 3.3+?

b'Русское имя' is not a valid syntax construct in Python 3 even though I have 
correct 'coding: utf-8' header and expect characters to be utf-8 bytes. 
Therefore I skipped this test for Python 3.
 python test.py
  File tests.py, line 23
print(os.stat(b'\u0420\u0443\u0441\u0441\u043a\u043e\u0435 
\u0438\u043c\u044f'))
   ^
SyntaxError: bytes can only contain ASCII literal characters.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread anatoly techtonik

Changes by anatoly techtonik techto...@gmail.com:


Added file: http://bugs.python.org/file28288/tests.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread anatoly techtonik

Changes by anatoly techtonik techto...@gmail.com:


Added file: http://bugs.python.org/file28289/python2.out.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread anatoly techtonik

Changes by anatoly techtonik techto...@gmail.com:


Added file: http://bugs.python.org/file28290/python3.out.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you, Anatoly, for report. I'll try to investigate this issue.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

So, it seems that os.walk() and os.listdir() work correctly with Python3.3, but 
print(u'Русское имя') fails because the terminal encoding is cp437.

See issue1602 for the print issue.
As a quick workaround, try to set PYTHONIOENCODING=cp437:backslashreplace as 
suggested in http://wiki.python.org/moin/PrintFails

If nothing is wrong with os.walk() and os.listdir(), this issue should be 
closed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Anatoly, can you please run the attached test?

--
Added file: http://bugs.python.org/file28291/test_unicode_fname.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread R. David Murray

R. David Murray added the comment:

Based on the pasted results I'm pretty sure there's nothing wrong with walk and 
listdir.  But it sounds like Serhiy will check to make sure, so we'll wait for 
his report.

The byte string vs the coding cookie is an interesting observation, but is a 
separate issue and should probably be raised on python-ideas, since I'm 
guessing it the current behavior was a conscious design choice.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file28293/test_unicode_fname.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file28291/test_unicode_fname.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-11 Thread anatoly techtonik

anatoly techtonik added the comment:

In Python 3 it fails with UnicodeEncodeError in 
C:\Python33\lib\encodings\cp437.py, while Vista's 'dir' command shows 
everything correctly in the same console, so somebody definitely overlooked 
that aspect.

This bug is clearly an issue for developers who write products for 
international markets. It is neither out of date, nor it is invalid. Note in 
documentation in red is a must have, also a warning should be issued in warning 
mode when os.walk() ignores international dirs. I doubt there are many people 
who aware of this racist behavior and want it be default.

--
resolution: invalid - 
status: closed - pending

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-11 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

- Do you have a full traceback of the failing os.walk() in Python3.3?
- What's the result of os.listdir(u'.') ?

--
nosy: +amaury.forgeotdarc
status: pending - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What are the results of os.listdir(b'.') and os.listdir(u'.') on Python 2.7 and 
Python 3.3+?

What are the results of os.stat(b'Русское имя') and os.stat(b'Русское имя') on 
Python 2.7 and Python 3.3+?

What are the results of sys.getdefaultencoding(), sys.getfilesystemencoding(), 
locale.getpreferredencoding(False) and locale.getpreferredencoding(True) on 
Python 2.7 and Python 3.3+?

If any of those calls fail, please provide a full traceback.

--
stage: committed/rejected - test needed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-11 Thread R. David Murray

R. David Murray added the comment:

My guess is that your unicode issue is issue 1602, which is non-trivial to 
solve.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 My guess is that your unicode issue is issue 1602, which is non-trivial to 
 solve.

In such case the output will be something like:

['English name', '']
[]
[]

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread anatoly techtonik

New submission from anatoly techtonik:

This critical bug is one of the reasons that non-English speaking communities 
doesn't adopt Python as broadly as it happens in English world compared to 
other technologies (PHP etc.). 


# -*- coding: utf-8 -*-

import os

os.mkdir(u'Русское имя')
os.mkdir(u'English name')

for r, dirs, files in os.walk('.'):
  print dirs


This gives:
['English name']
[]


Windows Vista.
dir /b
English name
test.py
Русское имя

--
components: Library (Lib)
messages: 177276
nosy: techtonik
priority: normal
severity: normal
status: open
title: os.walk ignores international dirs on Windows
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

It is reproduced on 3.x?

--
nosy: +serhiy.storchaka
type:  - behavior
versions:  -Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
Removed message: http://bugs.python.org/msg177278

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Is it reproduced on 3.x?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread R. David Murray

R. David Murray added the comment:

No.

--
nosy: +r.david.murray
resolution:  - out of date
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread R. David Murray

R. David Murray added the comment:

Oops, clicked submit too soon.

It isn't likely to get fixed in 2.7, because 2.7's unicode support problems is 
the major reason python3 was developed.

--
stage:  - committed/rejected

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread R. David Murray

R. David Murray added the comment:

For that matter, it isn't reproduced in python2.7, either:

 for r, dirs, files in os.walk(u'.'):
...   print dirs
... 
[u'\u0420\u0443\u0441\u0441\u043a\u043e\u0435 \u0438\u043c\u044f']
[]

--
resolution: out of date - invalid

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread Jeremy Kloth

Jeremy Kloth added the comment:

The problem exhibited is not coming from the os.walk() implementation, but from 
the use of a byte-string as the argument to it.

The directories are created with unicode literals and therefore the argument 
must also be a unicode literal (u'.') for them to be shown.  See the note in 
the listdir() documentation.

As it stands, I suggest that this is closed as invalid, or at the minimum that 
it could be a documentation bug for walk() not also having a similar note as 
listdir().

--
nosy: +jkloth

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16656] os.walk ignores international dirs on Windows

2012-12-10 Thread R. David Murray

R. David Murray added the comment:

Works for me without the u'.', too, though less usefully:

 for r, dirs, files in os.walk('.'):
...   print dirs
... 
['\xd0\xa0\xd1\x83\xd1\x81\xd1\x81\xd0\xba\xd0\xbe\xd0\xb5 
\xd0\xb8\xd0\xbc\xd1\x8f']

Maybe that doesn't work on Windows, though.  I am, of course, assuming that 
python3 does the right thing on Windows, but I can't imagine Victor would have 
overlooked that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16656
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com