Re: Blocker for 1.5 - representation of filesystem paths in Django
My preference is for option 2: convert file system paths to unicode and use unicode internally as much as possible. This is consistent with what we have been doing/recommending for years, even if it is at odds with Python's default for 2.X. See for example: https://code.djangoproject.com/ticket/9579 fixed by: https://github.com/django/django/commit/dfa90aec1b which converts the file system paths returned by Python to unicode for future use. #9579 was essentially the same problem we are again facing in #19357, only the unicode literals change has made it way more widespread. I think we should approach #19357 in a consistent fashion with recommendations we have made ever since adding unicode support to Django: convert to/from unicode at the edges, use unicode internally. True, this is somewhat at odds with Python's 2.X behavior/default, but we found Python's behavior in 2.X to be unworkable for practical unicode support so I'm not particularly concerned with "going against" the way Python 2.X does things...Python 2.X is broken here, in my opinion. It is true this approach may introduce regressions for people who do not have their environment locale properly configured such that os. getfilesystemencoding() returns a value that can be used to decode their file system paths. But these systems are already broken and it's just accident that they have not run afoul of the problem yet. We have been noting the need for proper system configuration for years: https://code.djangoproject.com/ticket/11030#comment:5 https://code.djangoproject.com/ticket/9696#comment:10 https://code.djangoproject.com/ticket/13550#comment:3 The current documentation on this need, however, is still buried (though at least is is no longer in the mod_python doc only) and makes it sound like this is only necessary in some rare cases: https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/modwsgi/#if-you-get-a-unicodeencodeerror That note should be moved to somewhere more prominent ( https://docs.djangoproject.com/en/dev/howto/deployment/ ?) and re-written since the problem will now be more widespread and not only crop up during file uploads of files with non-ascii characters in their names. Likely the release notes should have an item noting this, assuming we go with option 2 to fix it. Karen -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Blocker for 1.5 - representation of filesystem paths in Django
Hi Russell, > I've had a look at the patches for (1) and (2), and to me, the look like > mirror images of the same patch -- it's just a matter of whether we convert > everything to bytes or unicode when we have the opportunity. > My immediate reaction is that (2) -- keeping everything in unicode until it > doesn't need to be -- looks like the better long term solution, but I'll also > admit that this is based purely upon history and an inspection of the patch. Yes, they're similar, because they touch the same locations. (1) is restoring the behavior of Django 1.4 by switching some string literals back to bytestrings in modules that use unicode_literals. (2) is modifying the tests to adjust for the changes in the code. I agree that (2) looks like the better solution in the long term. Django is already doing a lot of filesystem-related operations in unicode internally. Keeping a mix of bytes and unicode will cause more trouble in the future. Sorry if I sound pessimistic, but no matter what we do, I expect that some apps will die with UnicodeDecodeError after upgrading to 1.5, because Django will attempt to convert some byetstrings created in user code to unicode. The risk is a bit higher with option (2). Option (1) doesn't eliminate it either: see https://code.djangoproject.com/ticket/19357#comment:13 At this point, sticking with our guns and insisting that developers use unicode everywhere (even where bytestrings used to work) is a decent plan. At least that's easy to explain. > In particular, I'm not completely up to speed with the Python3 implications. Python 3 is a non-issue here: - It provides an unicode abstraction to the filesystem and handles decoding / encoding automatically. - We don't have any backwards compatibility to begin with. > In the notes for approach 2, you say that this approach would be "deviating > from Python's" behaviour -- can you summarise what the expected Python > behaviour here is (especially for Python 3, but summarising Python 2 wouldn't > hurt either)? (explanation stolen from https://code.djangoproject.com/ticket/19398) By default, filesystem paths are represented with native strings (ie. str objects) in Python 2 and Python 3. % python2 >>> import os >>> type(os.listdir('.')[0]) % python3 >>> import os >>> type(os.listdir('.')[0]) In other words, they were switched from bytestrings in Python 2 to unicode in Python 3. For the sake of completeness: - In Python 2, it's possible to use unicode for filesystem paths, when os.path.supports_unicode_filenames = True, but that's not the default mode of operation. - In Python 3, it's possible to use bytestrings for filesystem paths, because not all supported platforms sport unicode-aware filesystems. However, the intent of Python's developers is that str objets should be used in all cases. Best regards, -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Blocker for 1.5 - representation of filesystem paths in Django
On Sun, Dec 2, 2012 at 2:43 AM, Aymeric Augustin < aymeric.augus...@polytechnique.org> wrote: > Hello, > > Django 1.5 beta 1 contains a regression for users who install Django or > their projects under non-ASCII paths: > https://code.djangoproject.com/ticket/19357 Unfortunately, the patch > isn't going to be trivial. I'd like to have some feedback before making > changes. > > > In order to add compatibility with Python 3, the first step was to remove > all uses of the `u"..."` syntax, add `from __future__ import > unicode_literals` in many modules, and use the `b"…"` syntax in the rare > cases where a bytestring is really needed. > > Unfortunately, after enabling unicode_literals, under Python 2, Django > attempts to concatenate bytestrings and unicode, for instance: > > # django/utils/translation/trans_real.py, line 154 > apppath = os.path.join(os.path.dirname(app.__file__), 'locale') > > This pattern occurs in several areas of Django: fixtures, static files, > templates and translations, etc. It's also very common in tests. > > In the example above, when the first argument only contains ASCII > characters, it's silently converted to unicode. This explains why the > problem wasn't detected earlier. > > > Fundamentally, the unicode_literals patch had the side effect of switching > the internal representation of filesystem paths from str to unicode under > Python 2 in many modules. (Under Python 3, everything is working fine!) > > Since rolling back that patch isn't possible, I see three options. > > 1) Restore Django 1.4's behavior and switch filesystem paths handling back > to str. That means using native strings (str objects) under Python 2 and 3, > like Python itself does. The example above would become: > > apppath = os.path.join(os.path.dirname(app.__file__), str('locale')) > > That's what I've started doing on the ticket. This is (probably) the most > backwards-compatible solution. The patch is large — but not that large > compared to the unicode_literals path itself… When we added support for > Python 3 with a single codebase, we knew we'd have to use this pattern > wherever we needed a native string. > > 2) Keep filesystem paths handling in unicode. In general, it's a good > practice to work in unicode and convert at the edges ("unicode sandwich"). > But in this case, it also means deviating from Python's behavior. This > would be a major change in Django's APIs, and one whose consequences > haven't been well anticipated. I haven't explored this solution. > > 3) Document this as a known limitation of Django 1.5, and postpone the fix > to Django 1.6. > > (3) sounds like a non-started to me. I've had a look at the patches for (1) and (2), and to me, the look like mirror images of the same patch -- it's just a matter of whether we convert everything to bytes or unicode when we have the opportunity. My immediate reaction is that (2) -- keeping everything in unicode until it doesn't need to be -- looks like the better long term solution, but I'll also admit that this is based purely upon history and an inspection of the patch. In particular, I'm not completely up to speed with the Python3 implications. In the notes for approach 2, you say that this approach would be "deviating from Python's" behaviour -- can you summarise what the expected Python behaviour here is (especially for Python 3, but summarising Python 2 wouldn't hurt either)? Russ %-) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Blocker for 1.5 - representation of filesystem paths in Django
Le samedi 1 décembre 2012 19:43:12 UTC+1, Aymeric Augustin a écrit : > > Hello, > > Django 1.5 beta 1 contains a regression for users who install Django or > their projects under non-ASCII paths: > https://code.djangoproject.com/ticket/19357 Unfortunately, the patch > isn't going to be trivial. I'd like to have some feedback before making > changes. > (...) > > 2) Keep filesystem paths handling in unicode. In general, it's a good > practice to work in unicode and convert at the edges ("unicode sandwich"). > But in this case, it also means deviating from Python's behavior. This > would be a major change in Django's APIs, and one whose consequences > haven't been well anticipated. I haven't explored this solution. > I've just uploaded a patch on the ticket that implements that approach. I'm rather confident about it, but of course, it has to be thoroughly tested. Claude -- You received this message because you are subscribed to the Google Groups "Django developers" group. To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/_Q-QNYKxNWMJ. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Blocker for 1.5 - representation of filesystem paths in Django
Hello, Django 1.5 beta 1 contains a regression for users who install Django or their projects under non-ASCII paths: https://code.djangoproject.com/ticket/19357 Unfortunately, the patch isn't going to be trivial. I'd like to have some feedback before making changes. In order to add compatibility with Python 3, the first step was to remove all uses of the `u"..."` syntax, add `from __future__ import unicode_literals` in many modules, and use the `b"…"` syntax in the rare cases where a bytestring is really needed. Unfortunately, after enabling unicode_literals, under Python 2, Django attempts to concatenate bytestrings and unicode, for instance: # django/utils/translation/trans_real.py, line 154 apppath = os.path.join(os.path.dirname(app.__file__), 'locale') This pattern occurs in several areas of Django: fixtures, static files, templates and translations, etc. It's also very common in tests. In the example above, when the first argument only contains ASCII characters, it's silently converted to unicode. This explains why the problem wasn't detected earlier. Fundamentally, the unicode_literals patch had the side effect of switching the internal representation of filesystem paths from str to unicode under Python 2 in many modules. (Under Python 3, everything is working fine!) Since rolling back that patch isn't possible, I see three options. 1) Restore Django 1.4's behavior and switch filesystem paths handling back to str. That means using native strings (str objects) under Python 2 and 3, like Python itself does. The example above would become: apppath = os.path.join(os.path.dirname(app.__file__), str('locale')) That's what I've started doing on the ticket. This is (probably) the most backwards-compatible solution. The patch is large — but not that large compared to the unicode_literals path itself… When we added support for Python 3 with a single codebase, we knew we'd have to use this pattern wherever we needed a native string. 2) Keep filesystem paths handling in unicode. In general, it's a good practice to work in unicode and convert at the edges ("unicode sandwich"). But in this case, it also means deviating from Python's behavior. This would be a major change in Django's APIs, and one whose consequences haven't been well anticipated. I haven't explored this solution. 3) Document this as a known limitation of Django 1.5, and postpone the fix to Django 1.6. How do you think we should move forward? Best regards, -- Aymeric. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.