Re: Blocker for 1.5 - representation of filesystem paths in Django

2012-12-02 Thread Karen Tracey
My preference is for option 2: convert file system paths to unicode and use
unicode internally as much as possible. This is consistent with what we
have been doing/recommending for years, even if it is at odds with Python's
default for 2.X. See for example:

https://code.djangoproject.com/ticket/9579

fixed by:

https://github.com/django/django/commit/dfa90aec1b

which converts the file system paths returned by Python to unicode for
future use.

#9579 was essentially the same problem we are again facing in #19357, only
the unicode literals change has made it way more widespread. I think we
should approach #19357 in a consistent fashion with recommendations we have
made ever since adding unicode support to Django: convert to/from unicode
at the edges, use unicode internally. True, this is somewhat at odds with
Python's 2.X behavior/default, but we found Python's behavior in 2.X to be
unworkable for practical unicode support so I'm not particularly concerned
with "going against" the way Python 2.X does things...Python 2.X is broken
here, in my opinion.

It is true this approach may introduce regressions for people who do not
have their environment locale properly configured such that os.
getfilesystemencoding() returns a value that can be used to decode their
file system paths. But these systems are already broken and it's just
accident that they have not run afoul of the problem yet. We have been
noting the need for proper system configuration for years:

https://code.djangoproject.com/ticket/11030#comment:5
https://code.djangoproject.com/ticket/9696#comment:10
https://code.djangoproject.com/ticket/13550#comment:3

The current documentation on this need, however, is still buried (though at
least is is no longer in the mod_python doc only) and makes it sound like
this is only necessary in some rare cases:

https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/modwsgi/#if-you-get-a-unicodeencodeerror

That note should be moved to somewhere more prominent (
https://docs.djangoproject.com/en/dev/howto/deployment/ ?) and re-written
since the problem will now be more widespread and not only crop up during
file uploads of files with non-ascii characters in their names. Likely the
release notes should have an item noting this, assuming we go with option 2
to fix it.

Karen

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Blocker for 1.5 - representation of filesystem paths in Django

2012-12-02 Thread Aymeric Augustin
Hi Russell,

> I've had a look at the patches for (1) and (2), and to me, the look like 
> mirror images of the same patch -- it's just a matter of whether we convert 
> everything to bytes or unicode when we have the opportunity. 
> My immediate reaction is that (2) -- keeping everything in unicode until it 
> doesn't need to be -- looks like the better long term solution, but I'll also 
> admit that this is based purely upon history and an inspection of the patch.

Yes, they're similar, because they touch the same locations. (1) is restoring 
the behavior of Django 1.4 by switching some string literals back to 
bytestrings in modules that use unicode_literals. (2) is modifying the tests to 
adjust for the changes in the code.

I agree that (2) looks like the better solution in the long term. Django is 
already doing a lot of filesystem-related operations in unicode internally. 
Keeping a mix of bytes and unicode will cause more trouble in the future.

Sorry if I sound pessimistic, but no matter what we do, I expect that some apps 
will die with UnicodeDecodeError after upgrading to 1.5, because Django will 
attempt to convert some byetstrings created in user code to unicode. The risk 
is a bit higher with option (2). Option (1) doesn't eliminate it either: see 
https://code.djangoproject.com/ticket/19357#comment:13

At this point, sticking with our guns and insisting that developers use unicode 
everywhere (even where bytestrings used to work) is a decent plan. At least 
that's easy to explain.

> In particular, I'm not completely up to speed with the Python3 implications.

Python 3 is a non-issue here:
  - It provides an unicode abstraction to the filesystem and handles decoding / 
encoding automatically.
  - We don't have any backwards compatibility to begin with.

> In the notes for approach 2, you say that this approach would be "deviating 
> from Python's" behaviour -- can you summarise what the expected Python 
> behaviour here is (especially for Python 3, but summarising Python 2 wouldn't 
> hurt either)?


(explanation stolen from https://code.djangoproject.com/ticket/19398)

By default, filesystem paths are represented with native strings (ie. str 
objects) in Python 2 and Python 3.

% python2
>>> import os
>>> type(os.listdir('.')[0])


% python3
>>> import os
>>> type(os.listdir('.')[0])


In other words, they were switched from bytestrings in Python 2 to unicode in 
Python 3.

For the sake of completeness:
- In Python 2, it's possible to use unicode for filesystem paths, when 
os.path.supports_unicode_filenames = True, but that's not the default mode of 
operation.
- In Python 3, it's possible to use bytestrings for filesystem paths, because 
not all supported platforms sport unicode-aware filesystems.

However, the intent of Python's developers is that str objets should be used in 
all cases.

Best regards,
-- 
Aymeric.



-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Blocker for 1.5 - representation of filesystem paths in Django

2012-12-01 Thread Russell Keith-Magee
On Sun, Dec 2, 2012 at 2:43 AM, Aymeric Augustin <
aymeric.augus...@polytechnique.org> wrote:

> Hello,
>
> Django 1.5 beta 1 contains a regression for users who install Django or
> their projects under non-ASCII paths:
> https://code.djangoproject.com/ticket/19357 Unfortunately, the patch
> isn't going to be trivial. I'd like to have some feedback before making
> changes.
>
>
> In order to add compatibility with Python 3, the first step was to remove
> all uses of the `u"..."` syntax, add `from __future__ import
> unicode_literals` in many modules, and use the `b"…"` syntax in the rare
> cases where a bytestring is really needed.
>
> Unfortunately, after enabling unicode_literals, under Python 2, Django
> attempts to concatenate bytestrings and unicode, for instance:
>
> # django/utils/translation/trans_real.py, line 154
> apppath = os.path.join(os.path.dirname(app.__file__), 'locale')
>
> This pattern occurs in several areas of Django: fixtures, static files,
> templates and translations, etc. It's also very common in tests.
>
> In the example above, when the first argument only contains ASCII
> characters, it's silently converted to unicode. This explains why the
> problem wasn't detected earlier.
>
>
> Fundamentally, the unicode_literals patch had the side effect of switching
> the internal representation of filesystem paths from str to unicode under
> Python 2 in many modules. (Under Python 3, everything is working fine!)
>
> Since rolling back that patch isn't possible, I see three options.
>
> 1) Restore Django 1.4's behavior and switch filesystem paths handling back
> to str. That means using native strings (str objects) under Python 2 and 3,
> like Python itself does. The example above would become:
>
> apppath = os.path.join(os.path.dirname(app.__file__), str('locale'))
>
> That's what I've started doing on the ticket. This is (probably) the most
> backwards-compatible solution. The patch is large — but not that large
> compared to the unicode_literals path itself… When we added support for
> Python 3 with a single codebase, we knew we'd have to use this pattern
> wherever we needed a native string.
>
> 2) Keep filesystem paths handling in unicode. In general, it's a good
> practice to work in unicode and convert at the edges ("unicode sandwich").
> But in this case, it also means deviating from Python's behavior. This
> would be a major change in Django's APIs, and one whose consequences
> haven't been well anticipated. I haven't explored this solution.
>
> 3) Document this as a known limitation of Django 1.5, and postpone the fix
> to Django 1.6.
>
> (3) sounds like a non-started to me.

I've had a look at the patches for (1) and (2), and to me, the look like
mirror images of the same patch -- it's just a matter of whether we convert
everything to bytes or unicode when we have the opportunity.

My immediate reaction is that (2) -- keeping everything in unicode until it
doesn't need to be -- looks like the better long term solution, but I'll
also admit that this is based purely upon history and an inspection of the
patch. In particular, I'm not completely up to speed with the Python3
implications. In the notes for approach 2, you say that this approach would
be "deviating from Python's" behaviour -- can you summarise what the
expected Python behaviour here is (especially for Python 3, but summarising
Python 2 wouldn't hurt either)?

Russ %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Blocker for 1.5 - representation of filesystem paths in Django

2012-12-01 Thread Claude Paroz
Le samedi 1 décembre 2012 19:43:12 UTC+1, Aymeric Augustin a écrit :
>
> Hello,
>
> Django 1.5 beta 1 contains a regression for users who install Django or 
> their projects under non-ASCII paths: 
> https://code.djangoproject.com/ticket/19357 Unfortunately, the patch 
> isn't going to be trivial. I'd like to have some feedback before making 
> changes.
> (...)
>
> 2) Keep filesystem paths handling in unicode. In general, it's a good 
> practice to work in unicode and convert at the edges ("unicode sandwich"). 
> But in this case, it also means deviating from Python's behavior. This 
> would be a major change in Django's APIs, and one whose consequences 
> haven't been well anticipated. I haven't explored this solution.
>

I've just uploaded a patch on the ticket that implements that approach. I'm 
rather confident about it, but of course, it has to be thoroughly tested.

Claude 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-developers/-/_Q-QNYKxNWMJ.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Blocker for 1.5 - representation of filesystem paths in Django

2012-12-01 Thread Aymeric Augustin
Hello,

Django 1.5 beta 1 contains a regression for users who install Django or their 
projects under non-ASCII paths: https://code.djangoproject.com/ticket/19357 
Unfortunately, the patch isn't going to be trivial. I'd like to have some 
feedback before making changes.


In order to add compatibility with Python 3, the first step was to remove all 
uses of the `u"..."` syntax, add `from __future__ import unicode_literals` in 
many modules, and use the `b"…"` syntax in the rare cases where a bytestring is 
really needed.

Unfortunately, after enabling unicode_literals, under Python 2, Django attempts 
to concatenate bytestrings and unicode, for instance:

# django/utils/translation/trans_real.py, line 154
apppath = os.path.join(os.path.dirname(app.__file__), 'locale')

This pattern occurs in several areas of Django: fixtures, static files, 
templates and translations, etc. It's also very common in tests.

In the example above, when the first argument only contains ASCII characters, 
it's silently converted to unicode. This explains why the problem wasn't 
detected earlier.


Fundamentally, the unicode_literals patch had the side effect of switching the 
internal representation of filesystem paths from str to unicode under Python 2 
in many modules. (Under Python 3, everything is working fine!)

Since rolling back that patch isn't possible, I see three options.

1) Restore Django 1.4's behavior and switch filesystem paths handling back to 
str. That means using native strings (str objects) under Python 2 and 3, like 
Python itself does. The example above would become:

apppath = os.path.join(os.path.dirname(app.__file__), str('locale'))

That's what I've started doing on the ticket. This is (probably) the most 
backwards-compatible solution. The patch is large — but not that large compared 
to the unicode_literals path itself… When we added support for Python 3 with a 
single codebase, we knew we'd have to use this pattern wherever we needed a 
native string.

2) Keep filesystem paths handling in unicode. In general, it's a good practice 
to work in unicode and convert at the edges ("unicode sandwich"). But in this 
case, it also means deviating from Python's behavior. This would be a major 
change in Django's APIs, and one whose consequences haven't been well 
anticipated. I haven't explored this solution.

3) Document this as a known limitation of Django 1.5, and postpone the fix to 
Django 1.6.


How do you think we should move forward?

Best regards,

-- 
Aymeric.


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.