Robert Collins added the comment:
Given two (or more) parameters where one is unicode and one is not, upcasting
will occur multiples times in path.join on windows:
- '\\' is str and will cast up safely in all codecs
- the other str (or bytes) parameter will be upcast using sys.defaultencoding
which is often / usually ASCII on Windows
This will then fail when the str parameter is not valid ASCII.
>From this we can conclude that this is a failure to use path.join correctly:
>if all the parameters passed in were unicode, no error would occur as only
>'\\' would be getting coerced to unicode.
The interesting question is why there was a str parameter that wasn't valid
ASCII; and that lies with path.expanduser() which is returning a str for the
non-ascii home directory.
Changing that to return unicode rather than a no-encoding specified str when
HOME or HOMEPATH etc etc contain non-ascii characters is a change that would
worry me - specifically that we'd encounter code that assumes it is always str,
e.g. by calling path.join(expanduser('~fred'), '\xe1\xbd\x84D') which will then
blow up.
Worth noting too is that
expanduser(u'~user/\u14ffd')
will also blow up in the same way in the same situation - as it ends up
decoding the user home path when it concatenates userhome and path[i:].
So, what to do:
- It might be worth testing a patch that changes expanduser to decode the
environment variables - I'm not sure whether we'd want the filesystemencoding
or the defaultencoding for handling these environment variables. Steve Dower
probably knows :).
- Or we say 'sorry, too hard in 2.7' and move on: join *itself* is fine here,
given the limits of 2.7.
----------
nosy: +rbcollins, steve.dower
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue20140>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com