[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread Yuval Greenfield

Yuval Greenfield ubershme...@gmail.com added the comment:

I use python a lot with Hebrew and many websites have internationalization 
which may involve unicode paths. I agree that saying unicode paths are rare 
is inaccurate. 

If the current situation isn't fixed though - you just can't use the resulting 
path for almost anything. Do you have a use case Ishimoto?

Windows XP and up implement paths as unicode, that means that a bytes api 
doesn't even make sense unless python does some encoding and decoding for you. 
E.g. python can use the unicode API's internally and return utf-8 encoded 
bytes. But you couldn't use these paths outside of python. The fact is you 
shouldn't be doing os.path.abspath(b'.') in windows to begin with.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread Yuval Greenfield

Yuval Greenfield ubershme...@gmail.com added the comment:

Another option btw is to use utf-16, which will work but it's a bit ugly as 
well:

 os.listdir(os.path.abspath(u'.').encode('utf-16'))
[]
 os.path.abspath(u'.')
u'C:\\Users\\alon\\Desktop\\\u05e9\u05dc\u05d5\u05dd'
 os.path.abspath(u'.').encode('utf-16')
'\xff\xfeC\x00:\x00\\\x00U\x00s\x00e\x00r\x00s\x00\\\x00a\x00l\x00o\x00n\x00\\\x
00D\x00e\x00s\x00k\x00t\x00o\x00p\x00\\\x00\xe9\x05\xdc\x05\xd5\x05\xdd\x05'
 os.listdir(os.path.abspath(u'.').encode('utf-16'))
[]

Tested on python 2.7, but you know what I mean.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread Atsuo Ishimoto

Atsuo Ishimoto ishim...@gembook.org added the comment:

On Wed, Oct 26, 2011 at 3:36 PM, Yuval Greenfield
rep...@bugs.python.org wrote:

 If the current situation isn't fixed though - you just can't use the 
 resulting path for almost anything. Do you have a use case Ishimoto?

I don't have use case. But does raising UnicodeEncodeError fix
problems?  It could break existing code, but I don't see much
difference over WindowsError caused by the broken file names.

 The fact is you shouldn't be doing os.path.abspath(b'.') in windows to begin 
 with.

Agreed. So I think adding Windows specific check to Byte API does not
improve situation, but increase complexity of std lib.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread Yuval Greenfield

Yuval Greenfield ubershme...@gmail.com added the comment:

It won't break existing code. Ignoring this problem here only moves the 
exception to whenever the data returned is first used.

Any code this fix breaks is already broken.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 Yuval Greenfield ubershme...@gmail.com added the comment:
 Another option btw is to use utf-16

UTF-8, UTF-16 or any encoding different than the ANSI code page are not an 
option. The Windows bytes API expect filenames encoded to the ANSI code page. 
os.listdir() would raise an error (unknown directory) or return an empty list 
instead of the content of the directory.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

Yuval, you are assuming that *no one* who uses the os byte APIs on Windows is 
either checking for '?' in returned paths or catching later exceptions. With 
Google code search, I did find one instance where someone tests paths for '?' 
after encoding with the file system encoding. It was not an instance of os.xxx 
output, but it is the same idea.

In any case,
1. Our experience is that any change will affect someone. I was the victim of a 
'harmless' micro change introduced in 3.1.2 (an intentional violation of the 
bugfix-only rule in bugfix releases -- and the last that I know of ;-).
2. The change will introduce an incompatibility between 3.2- and 3.3+.

The justification that mitigates the above is that there is little reason to 
request os bytes returns. By the same reasoning, the change is hardly worth 
bothering with as there should be little to no benefit in real code. So I am 
+-0 on the change.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 2cad20e2e588 by Victor Stinner in branch 'default':
Close #13247: Add cp65001 codec, the Windows UTF-8 (CP_UTF8)
http://hg.python.org/cpython/rev/2cad20e2e588

--
nosy: +python-dev
resolution:  - fixed
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-26 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Oops, I specified the wrong issue number in my changeset 2cad20e2e588, it's the 
issue #13216.

--
resolution: fixed - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-25 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

os.getcwdb() (GetCurrentDirectoryA) and os.listdir(bytes) (FindNextFileA  co) 
encode filenames using WideCharToMultiByte() in default mode (flags=0): 
unencodable characters are replaced by question marks. Such filenames cannot be 
used, open() fails with OSError(22, invalid argument: '?') for example.

Attached patch changes os.getcwdb() and os.listdir(bytes) to use the Windows 
native API (wide character API) with Python MBCS codec in strict mode (error 
handler strict) to notify directly the user that the filename cannot be 
decoded.

The patch only changes the behaviour for filename not encodable to the ANSI 
code page, such filenames are rare.

--
keywords: +patch
Added file: http://bugs.python.org/file23521/os_mbcs.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-25 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

os_mbcs.patch adds _Py_EncodeCodePage() to encode directly wchar_t* filenames 
without having to create a temporary Unicode object.

The patch removes HAVE_MBCS because the MBCS is now always needed by the 
posixmodule.c. Anyway, I don't see why MultiByteToWideChar() and 
WideCharToMultiByte() would not be available on Windows.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-25 Thread Atsuo Ishimoto

Atsuo Ishimoto ishim...@gembook.org added the comment:

-1 from me.

- I hate to see Unicode exceptions here. It would be an another source of 
mysterious Unicode exception. Programmers and users would be confused by error 
message. If you make such characters error, Python should raise an OSError or 
such.

- File names with '?' are fine to display informations to users. Not all file 
names are nessesary to be used to open files.

- I don't think filenames cannot be decoded in ANSI code page are rare enough 
to be ignored. I use Japanese edition of windows, but I sometime receive files 
with Chinese or German names. 

Or, in some case, I have to change codepage with 'chcp 437' command to run 
console application made for American environment. I seldom run such 
application in these days, though.

--
nosy: +ishimoto

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-25 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Le 26/10/2011 01:32, Atsuo Ishimoto a écrit :
 - I don't think filenames cannot be decoded in ANSI code page are rare enough 
 to be ignored.

The issue is able being able to be noticied of encoding errors. 
Currently, unencodable characters are silently replaced and you don't 
know if the filename is valid or not. If a UnicodeEncodeError is raised, 
you will be noticed and so you have to fix the problem.

Anyway, you must use the Unicode API on Windows. If you use the Unicode 
API, filenames are no more encoded and code pages are no more used, so 
bye bye Unicode errors!

The Windows bytes API is just kept for backward compatibility. More 
details in my email to python-dev:
http://mail.python.org/pipermail/python-dev/2011-October/114203.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-25 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

The doc says All functions accepting path or file names accept both bytes and 
string objects, and result in an object of the same type, if a path or file 
name is returned. It does that now (the encoding assumed or produced for bytes 
is not specified). It says nothing about raising exceptions in certain 
situations. So this is a feature change request, one that would likely break 
existing code.

Users can test for invalid returned paths with '?' in returned_path, though I 
admit that the use of '?' as a glob, regex, and url special char makes it a bad 
choice of error char.

--
nosy: +terry.reedy
type: behavior - feature request
versions:  -Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-25 Thread Atsuo Ishimoto

Atsuo Ishimoto ishim...@gembook.org added the comment:

On Wed, Oct 26, 2011 at 9:12 AM, STINNER Victor rep...@bugs.python.org wrote:

 STINNER Victor victor.stin...@haypocalc.com added the comment:

 Le 26/10/2011 01:32, Atsuo Ishimoto a écrit :
 - I don't think filenames cannot be decoded in ANSI code page are rare 
 enough to be ignored.

 The issue is able being able to be noticied of encoding errors.

This patch solve nothing, but just raises exception. It can break
existing codes. Also, I don't think it worth to add weired behavior to
Python std lib. I'll be surprised if *Byte* API raised an
UnicodeEncodeError.

 Anyway, you must use the Unicode API on Windows. If you use the Unicode
 API, filenames are no more encoded and code pages are no more used, so
 bye bye Unicode errors!


Agreed. So I would like to suggest not to adding unnecessary
complexity to the Byte API.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-24 Thread Yuval Greenfield

Yuval Greenfield ubershme...@gmail.com added the comment:

An example error with abspath and bytes input:

 os.path.abspath('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'
 os.path.abspath(b'.')
b'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??'
 os.listdir(os.path.abspath(b'.'))
Traceback (most recent call last):
  File stdin, line 1, in module
WindowsError: [Error 123] The filename, directory name, or volume label 
syntax is incorrect: 'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??/*.*'



I couldn't follow the implementation, I got stuck not being able to locate the 
definition for os.getcwdb so I couldn't join you for that part. Here's another 
possible solution:

 win32api.GetFullPathName('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'
 win32api.GetShortPathName(win32api.GetFullPathName('.'))
'C:\\Users\\yuv\\Desktop\\YUVDES~1\\5F30~1'

The short path is ascii but the problem is not all windows file systems have 
8.3 filenames [1]. So I think your suggestion is the best solution.

[1] 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#short_vs._long_names

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-23 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

abspath() is implemented using nt._getfullpathname() which calls 
GetFullPathNameA().

 The returned path with question marks is completely useless.

Can you open the file using such filename? If no, I agree that the result is 
useless.

 It's better that python throw an error than return the question marks.

Python is currently a thin wrapper on the Windows API. Windows doesn't consider 
that a filename with question marks as an error.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa364963%28v=vs.85%29.aspx

Python can maybe uses GetFullPathNameW() and encode manually the filename using 
its strict MBCS codec. MBCS codec is strict since Python 3.2: it raises a 
UnicodeEncodeError if the string cannot be encoded.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-22 Thread Yuval Greenfield

New submission from Yuval Greenfield ubershme...@gmail.com:

For Python 2:

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit 
(Intel)] on win32
 os.path.abspath('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??'
 os.path.abspath(u'.')
u'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'

For Python 3:
Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] 
on win32
 os.path.abspath('.')
'C:\\Users\\yuv\\Desktop\\YuvDesktop\\\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5'
 os.path.abspath(b'.')
b'C:\\Users\\yuv\\Desktop\\YuvDesktop\\??'


The returned path with question marks is completely useless. It's better that 
python throw an error than return the question marks. Another option is to try 
and get the ascii version of the path, I believe windows has one.

--
components: Library (Lib)
messages: 146204
nosy: ubershmekel
priority: normal
severity: normal
status: open
title: os.path.abspath returns unicode paths as question marks
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13247] os.path.abspath returns unicode paths as question marks

2011-10-22 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
nosy: +haypo
versions:  -Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13247
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com