On Tue, Jan 5, 2016 at 3:17 PM Aymeric Augustin < aymeric.augustin.2...@polytechnique.org> wrote:
> Hello Benoît, > > > Le mardi 5 janvier 2016 14:13:48 UTC+1, Benoit Chesneau a écrit : >> >> Header formats which are btw US-ASCII in the HTTP spec now, could be >> already solved if only the frameworks could comply with the spec instead of >> trying to impose their own rules. >> > > That's just a detail, but either I misunderstood you or you blamed the > wrong side here. > > Non-ASCII data in request headers isn't a problem created by frameworks, > it's a problem created by (possibly non compliant) user-agents. > I had in mind this ticket: https://github.com/benoitc/gunicorn/issues/1151 As of today, because some applications are still sending response in a a non compliant way we are trying to recode the headers on the server side so we can send them. Today like in apache 2 (and I think nginx) we now just ignore headers that can't be encoded in us-ascii. If all applications/framework would give us the headers as Latin1 it wouldn't be a major problem, but that's not the case. > > If future-WSGI guaranteed that HTTP header values provided in environ only > contain ASCI, fameworks would be happy. Servers would likely have to > respond 400 to requests containing non-ASCII headers, which would likely be > considered a problematic backwards-incompatibility. It would go against the > IETF principle of being tolerant in what a system accepts. > We should also update the spec to reflect the latest changes in the HTTP specs to force applications to send to the gateway US-ASCII headers. > > If future-WSGI provided header values as bytes, frameworks would be happy > as well. That would be my preference, because the application is in the > best position to pick a charset for decoding the values (that would be > UTF-8 in general). > > If future-WSGI insists on decoding header values with an arbitrary > encoding, I believe it should do so with UTF-8 rather than ISO-8859-1. "The > server is decoding with ISO-8859-1 so the application can reencode to get > the raw bytes" never sounded like a compelling argument to me. It will > still be wrong in theory, but it will generally give the right results in > practice. > Hmm but, actually the HTTP spec insist that headers are neither utf-8, neither latin1 (iso8859-1) but US-ASCII: https://github.com/benoitc/gunicorn/issues/1151#issuecomment-158884740 so native strings or bytes are fine for me until we make sure that we are sending and receiving US-ASCII. > Best regards, > > > -- > Aymeric. > > PS: if you find Django trying to impose its own rules, I'll do my best to > correct that. As much as I can speak for the Django team, this isn't our > intent. Please flag such cases so we can make sure there's no > misunderstanding. Thanks! > > Thanks! I will if needed :) - benoît
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: https://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com