Re: [Zope-dev] redirect burps on unicode URLs
Hi, On 03/01/2010 05:04 PM, Adam GROSZER wrote: > Hello Christian, > > Isn't it that anything below chr(128) converts to utf-8 as the same > character? That would mean that slash and ampersand will stay as it > is. No. The spec says that if you want to use a reserved character (depending on the scheme) you need to quote it. > OTOH encoding is done only on non-ascii characters. Supposed that the > encoding is utf-8. What's hardwired into absoluteURL. But then again, it's not UTF-8 for all of the URL. No spec ever says "code path elements to UTF-8". Christian -- Christian Theune · c...@gocept.com gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1 Zope and Plone consulting and development ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
Hello Christian, Isn't it that anything below chr(128) converts to utf-8 as the same character? That would mean that slash and ampersand will stay as it is. OTOH encoding is done only on non-ascii characters. Supposed that the encoding is utf-8. What's hardwired into absoluteURL. Monday, March 1, 2010, 4:40:30 PM, you wrote: CT> On 03/01/2010 03:34 PM, Wichert Akkerman wrote: >> On 3/1/10 15:09 , Christian Theune wrote: >>> Hi, >>> >>> On 03/01/2010 02:28 PM, Martin Aspeli wrote: I'm with Wichert here. In most places, we tend to carry around unicode strings internally, and only encode on the boundaries, e.g. when the URL is "rendered". I don't see why redirect() can't have a sensible and predictable policy for unicode strings, making life easier for everyone. If we think that non-ASCII URLs are illegal, then maybe we should validate for that and throw an error. However, I don't think that's the case (anymore?). In that case, passing a unicode object to the function seems entirely consistent with other places, e.g. when we pass unicode to the page template engine or return unicode from a view, which the publisher then encodes before it's pushed down to the client. >>> >>> I opened a question in another part of the thread, but haven't gotten an >>> answer yet. In my understanding, a Unicode string is not able to >>> represent the structural properties of a URL in http scheme properly, >>> thus encoding back to ASCII is not possible. >>> >>> Can someone confirm or disprove this? >> >> I am not sure what you mean. On the wire you get a path component in a >> HTTP get request which is UTF-8 encoded and escaped. For example >> http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 >> >> , which is a Japanese string if you decode it back to unicode. That >> encoding works fine in two directions, and all other properties used in >> the http scheme such as query strings and fragments work normally. Can >> you provide an example of something that might not work? CT> The problem is that a URI has internal structure which looks to me like CT> it can't be reconstructed properly if it was decoded into a "regular" CT> unicode string. CT> E.g. reserved characters are probably decoded into their regular symbols CT> (e.g. a slash embedded in a path component or ampersands used in query CT> arguments), so escaping needs to be done (manually) before encoding. CT> Also, some parts of a URI can use other ways to encode symbols. CT> Hostnames would like to be encoded to punycode whereas URIs don't even CT> say what character set unicode characters should be encoded to. That CT> would be up to the application (e.g. our publisher, so that's manageable). CT> I have the feeling that roundtrip behaviour of URI -> unicode string -> CT> URI won't be possible fully correctly and thus may be susceptible to CT> interference from the outside. CT> I still hope we can do better than doing nothing about it. I just think CT> it's more complex than calling encode('something'). ;) CT> Christian -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: Reflect upon your present blessings - of which every man has many- not on your past misfortunes, of which all men have some. - Charles Dickens ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
Hello, Thinking about the problem and this itself the following comes to my mind. I guess we're using most of the redirect with absoluteURL(). And what does absoluteURL do? It converts unicode object names to a URL. Seemingly in a simple way. We feed then this URL to redirect(). The edge case that happened with loginform is when the URL does not come from absoluteURL. My assumption is that doing the same in redirect as absoluteURL does should be OK. (unless Tres find this out of the line with the RFC) Excerpts from the source: class AbsoluteURL(BrowserView): implements(IAbsoluteURL) def __unicode__(self): return urllib.unquote(self.__str__()).decode('utf-8') ... def __str__(self): ... name = getattr(context, '__name__', None) ... if name: url += '/' + urllib.quote(name.encode('utf-8'), _safe) return url Monday, March 1, 2010, 3:09:33 PM, you wrote: CT> Hi, CT> On 03/01/2010 02:28 PM, Martin Aspeli wrote: >> >> I'm with Wichert here. >> >> In most places, we tend to carry around unicode strings internally, and >> only encode on the boundaries, e.g. when the URL is "rendered". I don't >> see why redirect() can't have a sensible and predictable policy for >> unicode strings, making life easier for everyone. >> >> If we think that non-ASCII URLs are illegal, then maybe we should >> validate for that and throw an error. However, I don't think that's the >> case (anymore?). In that case, passing a unicode object to the function >> seems entirely consistent with other places, e.g. when we pass unicode >> to the page template engine or return unicode from a view, which the >> publisher then encodes before it's pushed down to the client. CT> I opened a question in another part of the thread, but haven't gotten an CT> answer yet. In my understanding, a Unicode string is not able to CT> represent the structural properties of a URL in http scheme properly, CT> thus encoding back to ASCII is not possible. CT> Can someone confirm or disprove this? CT> Christian -- Best regards, Adam GROSZERmailto:agros...@gmail.com -- Quote of the day: Death is God's way of telling you not to be such a wise guy. ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
On 03/01/2010 03:34 PM, Wichert Akkerman wrote: > On 3/1/10 15:09 , Christian Theune wrote: >> Hi, >> >> On 03/01/2010 02:28 PM, Martin Aspeli wrote: >>> >>> I'm with Wichert here. >>> >>> In most places, we tend to carry around unicode strings internally, and >>> only encode on the boundaries, e.g. when the URL is "rendered". I don't >>> see why redirect() can't have a sensible and predictable policy for >>> unicode strings, making life easier for everyone. >>> >>> If we think that non-ASCII URLs are illegal, then maybe we should >>> validate for that and throw an error. However, I don't think that's the >>> case (anymore?). In that case, passing a unicode object to the function >>> seems entirely consistent with other places, e.g. when we pass unicode >>> to the page template engine or return unicode from a view, which the >>> publisher then encodes before it's pushed down to the client. >> >> I opened a question in another part of the thread, but haven't gotten an >> answer yet. In my understanding, a Unicode string is not able to >> represent the structural properties of a URL in http scheme properly, >> thus encoding back to ASCII is not possible. >> >> Can someone confirm or disprove this? > > I am not sure what you mean. On the wire you get a path component in a > HTTP get request which is UTF-8 encoded and escaped. For example > http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 > > , which is a Japanese string if you decode it back to unicode. That > encoding works fine in two directions, and all other properties used in > the http scheme such as query strings and fragments work normally. Can > you provide an example of something that might not work? The problem is that a URI has internal structure which looks to me like it can't be reconstructed properly if it was decoded into a "regular" unicode string. E.g. reserved characters are probably decoded into their regular symbols (e.g. a slash embedded in a path component or ampersands used in query arguments), so escaping needs to be done (manually) before encoding. Also, some parts of a URI can use other ways to encode symbols. Hostnames would like to be encoded to punycode whereas URIs don't even say what character set unicode characters should be encoded to. That would be up to the application (e.g. our publisher, so that's manageable). I have the feeling that roundtrip behaviour of URI -> unicode string -> URI won't be possible fully correctly and thus may be susceptible to interference from the outside. I still hope we can do better than doing nothing about it. I just think it's more complex than calling encode('something'). ;) Christian -- Christian Theune · c...@gocept.com gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1 Zope and Plone consulting and development ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
On 3/1/10 15:09 , Christian Theune wrote: > Hi, > > On 03/01/2010 02:28 PM, Martin Aspeli wrote: >> >> I'm with Wichert here. >> >> In most places, we tend to carry around unicode strings internally, and >> only encode on the boundaries, e.g. when the URL is "rendered". I don't >> see why redirect() can't have a sensible and predictable policy for >> unicode strings, making life easier for everyone. >> >> If we think that non-ASCII URLs are illegal, then maybe we should >> validate for that and throw an error. However, I don't think that's the >> case (anymore?). In that case, passing a unicode object to the function >> seems entirely consistent with other places, e.g. when we pass unicode >> to the page template engine or return unicode from a view, which the >> publisher then encodes before it's pushed down to the client. > > I opened a question in another part of the thread, but haven't gotten an > answer yet. In my understanding, a Unicode string is not able to > represent the structural properties of a URL in http scheme properly, > thus encoding back to ASCII is not possible. > > Can someone confirm or disprove this? I am not sure what you mean. On the wire you get a path component in a HTTP get request which is UTF-8 encoded and escaped. For example http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 , which is a Japanese string if you decode it back to unicode. That encoding works fine in two directions, and all other properties used in the http scheme such as query strings and fragments work normally. Can you provide an example of something that might not work? Wichert. ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
Hi, On 03/01/2010 02:28 PM, Martin Aspeli wrote: > > I'm with Wichert here. > > In most places, we tend to carry around unicode strings internally, and > only encode on the boundaries, e.g. when the URL is "rendered". I don't > see why redirect() can't have a sensible and predictable policy for > unicode strings, making life easier for everyone. > > If we think that non-ASCII URLs are illegal, then maybe we should > validate for that and throw an error. However, I don't think that's the > case (anymore?). In that case, passing a unicode object to the function > seems entirely consistent with other places, e.g. when we pass unicode > to the page template engine or return unicode from a view, which the > publisher then encodes before it's pushed down to the client. I opened a question in another part of the thread, but haven't gotten an answer yet. In my understanding, a Unicode string is not able to represent the structural properties of a URL in http scheme properly, thus encoding back to ASCII is not possible. Can someone confirm or disprove this? Christian -- Christian Theune · c...@gocept.com gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany http://gocept.com · tel +49 345 1229889 0 · fax +49 345 1229889 1 Zope and Plone consulting and development ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
Wichert Akkerman wrote: > On 3/1/10 13:41 , Tres Seaver wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> Marius Gedminas wrote: >>> On Sun, Feb 28, 2010 at 05:05:51PM +0100, Wichert Akkerman wrote: On 2010-2-26 18:25, Tres Seaver wrote: > Wichert Akkerman wrote: >> I see this as naming confusion. In this day and age every URL is >> effectively an IRI, and every modern browser treats them that way. If >> you look at http://jp.wikipedia.org/ you can see how well that works. I >> do not see why zope.publisher should not be able to support that >> transparently. Other systems such as Routes and repoze.bfg do. > Browseers *display* what looks like unicode to the user, but they *pass* > URL-encoded ASCII bytes to the server. But why can't zope.publisher do that conversion? It don't see the point in requiring all the thousands of routines that call those functions to do that conversion when zope.publisher can easily do so itself. >>> +1 >>> >>> Just like zope.publisher converts Unicode strings returned by views into >>> UTF-8 (or whatever encoding negotiated via Accept-Charset), >>> response.redirect() ought to Do The Right Thing with Unicode URLs or >>> IRLs or whatever they're called. >> - -1. > > --1 is the same as +1, but I suspect that is not what you meant. > > >> Where is this "unicode URL" coming from? URLs generated from code >> should already be "correct". > > The only change is changing the point where 'correct' changes from > unicode to an escaped UTF-8 encoded string. That change can made without > breaking any backwards compatibility. I'm with Wichert here. In most places, we tend to carry around unicode strings internally, and only encode on the boundaries, e.g. when the URL is "rendered". I don't see why redirect() can't have a sensible and predictable policy for unicode strings, making life easier for everyone. If we think that non-ASCII URLs are illegal, then maybe we should validate for that and throw an error. However, I don't think that's the case (anymore?). In that case, passing a unicode object to the function seems entirely consistent with other places, e.g. when we pass unicode to the page template engine or return unicode from a view, which the publisher then encodes before it's pushed down to the client. Martin -- Author of `Professional Plone Development`, a book for developers who want to work with Plone. See http://martinaspeli.net/plone-book ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
On 3/1/10 13:41 , Tres Seaver wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Marius Gedminas wrote: >> On Sun, Feb 28, 2010 at 05:05:51PM +0100, Wichert Akkerman wrote: >>> On 2010-2-26 18:25, Tres Seaver wrote: Wichert Akkerman wrote: > I see this as naming confusion. In this day and age every URL is > effectively an IRI, and every modern browser treats them that way. If > you look at http://jp.wikipedia.org/ you can see how well that works. I > do not see why zope.publisher should not be able to support that > transparently. Other systems such as Routes and repoze.bfg do. Browseers *display* what looks like unicode to the user, but they *pass* URL-encoded ASCII bytes to the server. >>> But why can't zope.publisher do that conversion? It don't see the point >>> in requiring all the thousands of routines that call those functions to >>> do that conversion when zope.publisher can easily do so itself. >> >> +1 >> >> Just like zope.publisher converts Unicode strings returned by views into >> UTF-8 (or whatever encoding negotiated via Accept-Charset), >> response.redirect() ought to Do The Right Thing with Unicode URLs or >> IRLs or whatever they're called. > > - -1. --1 is the same as +1, but I suspect that is not what you meant. > Where is this "unicode URL" coming from? URLs generated from code > should already be "correct". The only change is changing the point where 'correct' changes from unicode to an escaped UTF-8 encoded string. That change can made without breaking any backwards compatibility. Wichert. ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Marius Gedminas wrote: > On Sun, Feb 28, 2010 at 05:05:51PM +0100, Wichert Akkerman wrote: >> On 2010-2-26 18:25, Tres Seaver wrote: >>> Wichert Akkerman wrote: I see this as naming confusion. In this day and age every URL is effectively an IRI, and every modern browser treats them that way. If you look at http://jp.wikipedia.org/ you can see how well that works. I do not see why zope.publisher should not be able to support that transparently. Other systems such as Routes and repoze.bfg do. >>> Browseers *display* what looks like unicode to the user, but they *pass* >>> URL-encoded ASCII bytes to the server. >> But why can't zope.publisher do that conversion? It don't see the point >> in requiring all the thousands of routines that call those functions to >> do that conversion when zope.publisher can easily do so itself. > > +1 > > Just like zope.publisher converts Unicode strings returned by views into > UTF-8 (or whatever encoding negotiated via Accept-Charset), > response.redirect() ought to Do The Right Thing with Unicode URLs or > IRLs or whatever they're called. - -1. Where is this "unicode URL" coming from? URLs generated from code should already be "correct". Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkuLtdgACgkQ+gerLs4ltQ7lHwCgh//aPrrcaZ6StKVBGr8K1JaF whIAoLheGkJ3w439F+FmLCrIv7NhIxqp =7M8c -END PGP SIGNATURE- ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] Zope Tests: 6 OK
Summary of messages to the zope-tests list. Period Sun Feb 28 12:00:00 2010 UTC to Mon Mar 1 12:00:00 2010 UTC. There were 6 messages: 6 from Zope Tests. Tests passed OK --- Subject: OK : Zope-2.10 Python-2.4.6 : Linux From: Zope Tests Date: Sun Feb 28 20:38:20 EST 2010 URL: http://mail.zope.org/pipermail/zope-tests/2010-February/013653.html Subject: OK : Zope-2.11 Python-2.4.6 : Linux From: Zope Tests Date: Sun Feb 28 20:40:20 EST 2010 URL: http://mail.zope.org/pipermail/zope-tests/2010-February/013654.html Subject: OK : Zope-2.12 Python-2.6.4 : Linux From: Zope Tests Date: Sun Feb 28 20:42:20 EST 2010 URL: http://mail.zope.org/pipermail/zope-tests/2010-February/013655.html Subject: OK : Zope-2.12-alltests Python-2.6.4 : Linux From: Zope Tests Date: Sun Feb 28 20:44:20 EST 2010 URL: http://mail.zope.org/pipermail/zope-tests/2010-February/013656.html Subject: OK : Zope-trunk Python-2.6.4 : Linux From: Zope Tests Date: Sun Feb 28 20:46:20 EST 2010 URL: http://mail.zope.org/pipermail/zope-tests/2010-February/013657.html Subject: OK : Zope-trunk-alltests Python-2.6.4 : Linux From: Zope Tests Date: Sun Feb 28 20:48:20 EST 2010 URL: http://mail.zope.org/pipermail/zope-tests/2010-February/013658.html ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] redirect burps on unicode URLs
On Sun, Feb 28, 2010 at 05:05:51PM +0100, Wichert Akkerman wrote: > On 2010-2-26 18:25, Tres Seaver wrote: > > Wichert Akkerman wrote: > >> I see this as naming confusion. In this day and age every URL is > >> effectively an IRI, and every modern browser treats them that way. If > >> you look at http://jp.wikipedia.org/ you can see how well that works. I > >> do not see why zope.publisher should not be able to support that > >> transparently. Other systems such as Routes and repoze.bfg do. > > > > Browseers *display* what looks like unicode to the user, but they *pass* > > URL-encoded ASCII bytes to the server. > > But why can't zope.publisher do that conversion? It don't see the point > in requiring all the thousands of routines that call those functions to > do that conversion when zope.publisher can easily do so itself. +1 Just like zope.publisher converts Unicode strings returned by views into UTF-8 (or whatever encoding negotiated via Accept-Charset), response.redirect() ought to Do The Right Thing with Unicode URLs or IRLs or whatever they're called. Marius Gedminas -- http://pov.lt/ -- Zope 3 consulting and development signature.asc Description: Digital signature ___ Zope-Dev maillist - Zope-Dev@zope.org https://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope )