[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2011-12-09 Thread Serg Asminog

Serg Asminog akudov...@gmail.com added the comment:

dirname = 'A-Za-z\xc4\xd6\xdc\xe4\xf6\xfc\xdf'

Traceback (most recent call last):
  File D:\temp\python bug\test.py, line 19, in module
file_object, file_path, description = imp.find_module(basename, [dirname])
UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: 
invalid character

--
nosy: +Serg.Asminog
Added file: http://bugs.python.org/file23891/test.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2011-12-09 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

@Serg Asminog: What is your Python version? What is your locale encoding 
(print(sys.getfilesystemencoding())? What is your Windows version?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2011-12-09 Thread Serg Asminog

Serg Asminog akudov...@gmail.com added the comment:

print(sys.getfilesystemencoding())
print(os.name)
print(sys.version)
print(sys.version_info)
print(sys.platform)

-
mbcs
nt
3.2.2 (default, Sep  4 2011, 09:07:29) [MSC v.1500 64 bit (AMD64)]
sys.version_info(major=3, minor=2, micro=2, releaselevel='final', serial=0)
win32

---
Windows 7 64bit

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2011-12-09 Thread Serg Asminog

Serg Asminog akudov...@gmail.com added the comment:

Also 

Traceback (most recent call last):
  File D:\temp\python bug\test.py, line 20, in module
file_object, file_path, description = imp.find_module(basename, [dirname])
ImportError: No module named mymodule

with python  2.6.6 (r266:84297, Aug 24 2010, 18:13:38) [MSC v.1500 64 bit 
(AMD64)]

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2011-12-09 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Oops, it's not sys.getfilesystemencoding(), but locale.getpreferredencoding() 
which is interesting. Can you give me your locale encoding?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2011-12-09 Thread Serg Asminog

Serg Asminog akudov...@gmail.com added the comment:

cp1251

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2010-10-17 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Good news: this issue is now fixed in py3k (Python 3.2). I cannot give a commit 
number, because there are too much commits related to this problem (see #8611 
and #9425), but it works ;-)

--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2010-07-29 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I wrote a patch to fix this issue, see #9425.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2010-06-18 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

I closed issue #850997, mbcs is now really strict by default:

 'h\u00e4kkinen'.encode('mbcs')
UnicodeEncodeError: ...
 'h\u00e4kkinen'.encode('mbcs', 'replace')
b'hakkinen'

PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefault() and os.fsencode() use 
mbcs with strict error handler on Windows. On other OS, these functions use 
surrogateescape error handler, but mbcs only supports strict and replace (to 
encode, and strict and ignore to decode).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2010-06-14 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

About the mbcs encoding: issue #850997 proposes to make it more strict.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2010-05-19 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

See also #8611.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2009-03-30 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

At the sprint, Andrew Svetlov, Martin von Loewis and I looked into this
a bit, and discovered that Andrew's Vista copy uses a Russian locale for
the filesystem encoding (despite using English as the language).  In
this locale, a-umlaut cannot be represented in the ANSI code page (which
has only 256 values), because the Russian locale uses those byte values
to represent Cyrillic.

As long as the import code (written in C) uses bytes in the filesystem
encoding to represent paths, this problem will remain.

Two possible solutions would be to switch to Brett's importlib, or to
change the import code to use wide characters everywhere (like
posixmodule.c).  Both are extremely risky and a lot of work, and I don't
expect we'll get to this for 3.1.

(In 2.x the same problem exists, but is perhaps less real because module
names are limited to ASCII.)

We also discovered another problem, which I'll report separately: the
*module* name is decoded to UTF8, while the *path* name uses the
filesystem encoding...

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2009-03-29 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

I can reproduce this problem on Windows Vista, fresh py3k sources.
Looks like bug occurs only with Latin-1 characters.
At least Cyrillic works ok.

--
nosy: +asvetlov

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2009-03-29 Thread Andrew Svetlov

Andrew Svetlov andrew.svet...@gmail.com added the comment:

From my understanding (after tracing/debugging) problem lies in import.c
find_module tries to convert path from unicode to bytestring using 
Py_FileSystemDefaultEncoding (line 1397). For Windows it is 'mbcs'.

Conversion done with decode_mbcs (unicodeobject.c:4244) what uses 
MultiByteToWideChar with codepage CP_ACP. Problem is: converting 
composite characters ('\u00e4' is 'a'+'2 dots over letter', I don't know 
true name for this sign) this function returns only 'a'.

 repr('h\u00e4kkinen'.encode('mbcs'))
b'hakkinen'

MSDN says (http://msdn.microsoft.com/en-
us/library/dd374130(VS.85).aspx):
For strings that require validation, such as file, resource, and user 
names, the application should always use the WC_NO_BEST_FIT_CHARS flag 
with WideCharToMultiByte. This flag prevents the function from mapping 
characters to characters that appear similar but have very different 
semantics. In some cases, the semantic change can be extreme. For 
example, the symbol for ∞ (infinity) maps to 8 (eight) in some code 
pages.

Writing encoding function in opposite to PyUnicode_DecodeFSDefault with 
setting this flag also cannot help - problematic character just replaced 
with 'default' ('?' if not specified).
Hacking specially for 'latin-1' encoding sounds ugly.

Changing all filenames to unicode (with possible usage of fileio instead 
of direct calls of open/fdopen) in import.c looks good for me but takes 
long time and makes many changes.

--
components: +Interpreter Core
versions: +Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2009-03-19 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Oh, I found sys.setfilesystemencoding(latin-1)! But even with that, 
your example find_module.py works correctly with py3k trunk. The 
problem has maybe gone?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2009-03-19 Thread Benjamin Peterson

Benjamin Peterson benja...@python.org added the comment:

Well, latin-1 can decode any arbitrary array of bytes, so of course it
won't fail. :)

--
nosy: +benjamin.peterson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2008-11-19 Thread STINNER Victor

STINNER Victor [EMAIL PROTECTED] added the comment:

The example works correctly on Linux (py3k trunk). The problem is maybe 
specific to Windows?

--
nosy: +haypo

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2008-11-19 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment:

Indeed. It happens when the filesystem encoding is not utf-8.

I have several changes in my local workspace about this, which also deal
with zipimport and other places that import modules.
I suggest to let 3.0 go out and correct all this for 3.1.

--
nosy: +amaury.forgeotdarc

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4352] imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII search paths

2008-11-18 Thread Jukka Aho

Changes by Jukka Aho [EMAIL PROTECTED]:


--
title: imp.find_module() causes UnicodeDecodeError with non-ASCII search paths 
- imp.find_module() fails with a UnicodeDecodeError when called with non-ASCII 
search paths

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4352
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com