Re: [Zope-CMF] Charsets
Charlie Clark writes: > Am 25.01.2009 um 08:35 schrieb Dieter Maurer: >> Of course, "iso-8859-1" may not be approriate for form delivery -- >> and may result in funny special characters in non-western countries. > > > As Daniel noted UTF-8 should be default. I had a quick look at the > source of the appropriate module and couldn't see where the broken > "magic" was happening. It's probably a bit beyond me but is the first > thing to write a test that we know currently breaks? Apparently, a fix for the problem I was talking about was attempted in zope.publisher, but not propogated to Zope 2 as per Tres comment from 2008-06-16 from this (duplicate?) bug report. There's also some working patches for Zope 2 here: https://bugs.launchpad.net/zope2/+bug/143873 I have no clue about the problem with character encodings used to serve pages, nor am I sure if it's related at all. Daniel -- http://danielnouri.org ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 25.01.2009 um 08:35 schrieb Dieter Maurer: > Wow. Some magic in "formlib" deviating from the Zope2 standard > behaviour Maybe. But then formlib is not really a standard Zope 2 approach. The one thing I do find weird is that PreferredCharsets() is called for each field in a form. > But, if this is true, we do not understand Charlie's observations: > > When I understood him right, he is using formlib and he is observing > problems with the charsets. > > He found out that this has to do with IE browsers sending an > empty "Accept-Charsets" header which is turned by Zope's > "preferredCharset" into "iso-8859-1". > > But when the same charset is used on both form delivery and > on form processing he should not see a problem with mismatched > encodings. Actually the problems occur as soon as you use different browsers with non-ASCII text. > Of course, "iso-8859-1" may not be approriate for form delivery -- > and may result in funny special characters in non-western countries. As Daniel noted UTF-8 should be default. I had a quick look at the source of the appropriate module and couldn't see where the broken "magic" was happening. It's probably a bit beyond me but is the first thing to write a test that we know currently breaks? Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
yuppie wrote at 2009-1-19 22:54 +0100: > ... >Products.Five.browser.decode.setPageEncoding sets the response content >type charset based on zope.publisher.http.HTTPCharsets. And >setPageEncoding is called by the update method of formlib forms in Zope >2. So in this case the response encoding has something to do with the >"Accept-Charset" request header. Wow. Some magic in "formlib" deviating from the Zope2 standard behaviour But, if this is true, we do not understand Charlie's observations: When I understood him right, he is using formlib and he is observing problems with the charsets. He found out that this has to do with IE browsers sending an empty "Accept-Charsets" header which is turned by Zope's "preferredCharset" into "iso-8859-1". But when the same charset is used on both form delivery and on form processing he should not see a problem with mismatched encodings. Of course, "iso-8859-1" may not be approriate for form delivery -- and may result in funny special characters in non-western countries. -- Dieter ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 21.01.2009 um 00:11 schrieb Daniel Nouri: > Is this thread by any chance related to this bug: > https://bugs.launchpad.net/zope2/+bug/160968 > > The IUserPreferredCharsets implementation of Zope 3 found in > zope.publisher.http.HTTPCharsets has the following condition in it > to check if the HTTP_ACCEPT_CHARSET header is available: > > header_present = 'HTTP_ACCEPT_CHARSET' in self.request > > However, with Zope 2's request will return '' (the empty string) for > any header that starts with 'HTTP_', see > ZPublisher.HTTPRequest.HTTPRequest.get. > > Ultimately, this results in the HTTPCharsets.getPreferredCharsets to > return ['iso-8859-1'], where it should really return 'UTF-8'. > > To understand this problem better, look at > Products.Five.browser.decode.processInputs, which uses the > negotiator to find out which charset to use to convert form > variables. For browsers that do not send the 'HTTP_ACCEPT_CHARSET' > header, this will result in wrongly encoded form values. To > reproduce this, fill in Chinese characters to any Five formlib form > with Internet Explorer 6.0. Since Firefox sends HTTP_ACCEPT_CHARSET, > it's not a problem there. Yes Daniel, this is exactly the problem we're facing. So we need to fix Zope >= 2.10 and then we shouldn't have to worry about doing anything to the CMF. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Is this thread by any chance related to this bug: https://bugs.launchpad.net/zope2/+bug/160968 The IUserPreferredCharsets implementation of Zope 3 found in zope.publisher.http.HTTPCharsets has the following condition in it to check if the HTTP_ACCEPT_CHARSET header is available: header_present = 'HTTP_ACCEPT_CHARSET' in self.request However, with Zope 2's request will return '' (the empty string) for any header that starts with 'HTTP_', see ZPublisher.HTTPRequest.HTTPRequest.get. Ultimately, this results in the HTTPCharsets.getPreferredCharsets to return ['iso-8859-1'], where it should really return 'UTF-8'. To understand this problem better, look at Products.Five.browser.decode.processInputs, which uses the negotiator to find out which charset to use to convert form variables. For browsers that do not send the 'HTTP_ACCEPT_CHARSET' header, this will result in wrongly encoded form values. To reproduce this, fill in Chinese characters to any Five formlib form with Internet Explorer 6.0. Since Firefox sends HTTP_ACCEPT_CHARSET, it's not a problem there. -- http://danielnouri.org ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Dieter Maurer wrote: > yuppie wrote at 2009-1-19 11:32 +0100: >> Charlie Clark wrote: >>> Am 18.01.2009 um 23:00 schrieb yuppie: >>> I agree that there shouldn't be implemented in a different way than >>> for Zope 3. And if we can solve the problems by fixing form encoding >>> I'm happy. Although I'd like to see UTF-8 always the first charset >>> returned if * the quality is the same. >> zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you sure >> getPreferredCharsets()[0] is iso-8859-1 with your browser? > > This might be true for the Zope 3 publisher > however, Zope 2 "HTTPResponse" uses "default_encoding" (configured > in "zope.conf") unless an encoding is prescribed by the response > content type -- and this has nothing to do with the "Accept-Charset" > request header. Products.Five.browser.decode.setPageEncoding sets the response content type charset based on zope.publisher.http.HTTPCharsets. And setPageEncoding is called by the update method of formlib forms in Zope 2. So in this case the response encoding has something to do with the "Accept-Charset" request header. Cheers, Yuppie ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Charlie Clark wrote at 2009-1-18 22:30 +0100: >Am 18.01.2009 um 20:36 schrieb Dieter Maurer: > ... > From the current HTML specification: > >"accept-charset = charset list [CI] >This attribute specifies the list of character encodings for input >data that is accepted by the server processing this form. The value is >a space- and/or comma-delimited list of charset values. The client >must interpret this list as an exclusive-or list, i.e., the server is >able to accept any single character encoding per entity received." > >ie. exactly as you have suggested: it is possible to force a client to >encode data in a particular charset before sending it to the server. >All references I have come across suggest that this, together with the >meta tag content-type can and should be used to coerce browsers to use >UTF-8. I fear that the "accept-charset" form control attribute can easily only be used for "method=post content-type=multipart/form-data" as only then the browser has a chance to specify how it has encoded the value. I am not sure whether Zope handles the "charset" information in this case correctly. As the "Accept-Charset" request header has (almost) nothing to do with the "accept-charset" form control attribute, it must of course not be used to interpret form data even when this was created based on an "accept-charset". If the server chooses its output encoding based on the "Accept-Charset" request header (and Yuppie indicated that the Zope 3 publisher does this), then the same algorithm can be used for "normal" form data (where "normal" means, you do not explicitely specify an "accept-charset" form control attribute). That's one sensefull mode of operation. Another one is choosing a fixed encoding and using it as input and output encoding. -- Dieter ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
yuppie wrote at 2009-1-19 11:32 +0100: >Charlie Clark wrote: >> Am 18.01.2009 um 23:00 schrieb yuppie: >> I agree that there shouldn't be implemented in a different way than >> for Zope 3. And if we can solve the problems by fixing form encoding >> I'm happy. Although I'd like to see UTF-8 always the first charset >> returned if * the quality is the same. > >zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you sure >getPreferredCharsets()[0] is iso-8859-1 with your browser? This might be true for the Zope 3 publisher however, Zope 2 "HTTPResponse" uses "default_encoding" (configured in "zope.conf") unless an encoding is prescribed by the response content type -- and this has nothing to do with the "Accept-Charset" request header. -- Dieter ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 19.01.2009 um 11:32 schrieb yuppie: > zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you > sure > getPreferredCharsets()[0] is iso-8859-1 with your browser? Or do you > override somewhere the Content-Type header set by setPageEncoding()? > AFAICS CMFDefault works exactly the way you expect it to. No, I don't override anything. I'll run some tests and post the results. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Charlie Clark wrote: > Am 18.01.2009 um 23:00 schrieb yuppie: > I agree that there shouldn't be implemented in a different way than > for Zope 3. And if we can solve the problems by fixing form encoding > I'm happy. Although I'd like to see UTF-8 always the first charset > returned if * the quality is the same. zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you sure getPreferredCharsets()[0] is iso-8859-1 with your browser? Or do you override somewhere the Content-Type header set by setPageEncoding()? AFAICS CMFDefault works exactly the way you expect it to. Cheers, Yuppie ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 18.01.2009 um 23:00 schrieb yuppie: > Hi Charlie! Hiya Yuppie, > Charlie Clark wrote: >> Am 29.12.2008 um 15:01 schrieb Charlie Clark: >> >> CMFDefault.utils >> >> def getBrowserCharset(request): >> """ Get charset preferred by the browser. >> """ >> envadapter = IUserPreferredCharsets(request) >> charsets = envadapter.getPreferredCharsets() or ['utf-8'] >> return charsets[0] >> >> This will always be iso-8859-1 for Opera and Firefox because all >> charsets have the same quality, again even if UTF-8 encoding is >> specified. > > getBrowserCharset does almost the same as > zope.publisher.http.getCharsetUsingRequest. And it is only used for > encoding and decoding 'portal_status_message'. It is not relevant for > the issue you noticed. Okay. >> I haven't been able to track where the decoding of form >> data occurs for Zope 2 stuff but I can identify the problem in >> zpublisher.browser.BrowserRequest > > You mean zope.publisher.browser.BrowserRequest. The Zope 2 version > is in > Products.Five.browser.decode. Thanks - I thought it must have been in Five but didn't know where to look. > AFAICS the fallback to other charsets is usually not required in > Zope 3. > If the publisher encodes responses using > zope.publisher.http.getCharsetUsingRequest, the first charset will be > the right one. That seems reasonable. >> I would suggest that we work towards enforcing UTF-8 in where >> possible >> but at the very least add the accept-charset attribute to forms and >> use the portal's default_charset for this. >> >> I'd very much appreciate your comments on this. > > I can't see a need to implement this in a different way than Zope 3. > So > I propose to fix the encoding of forms sent to the browser. I agree that there shouldn't be implemented in a different way than for Zope 3. And if we can solve the problems by fixing form encoding I'm happy. Although I'd like to see UTF-8 always the first charset returned if * the quality is the same. One thing that did strike me when working on this is quite how often getPreferredCharsets() is called on single request. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Hi Charlie! Charlie Clark wrote: > Am 29.12.2008 um 15:01 schrieb Charlie Clark: > > CMFDefault.utils > > def getBrowserCharset(request): > """ Get charset preferred by the browser. > """ > envadapter = IUserPreferredCharsets(request) > charsets = envadapter.getPreferredCharsets() or ['utf-8'] > return charsets[0] > > This will always be iso-8859-1 for Opera and Firefox because all > charsets have the same quality, again even if UTF-8 encoding is > specified. getBrowserCharset does almost the same as zope.publisher.http.getCharsetUsingRequest. And it is only used for encoding and decoding 'portal_status_message'. It is not relevant for the issue you noticed. > I haven't been able to track where the decoding of form > data occurs for Zope 2 stuff but I can identify the problem in > zpublisher.browser.BrowserRequest You mean zope.publisher.browser.BrowserRequest. The Zope 2 version is in Products.Five.browser.decode. > def _decode(self, text): > """Try to decode the text using one of the available > charsets.""" > if self.charsets is None: > envadapter = IUserPreferredCharsets(self) > self.charsets = envadapter.getPreferredCharsets() or > ['utf-8'] > for charset in self.charsets: > try: > text = unicode(text, charset) > break > except UnicodeError: > pass > return text > > Here the naive assumption is that we decode from a charset without an > error then we have the correct charset. Sometimes this goes unnoticed > but with characters like u2013 and u2014 (en-dash and em-dash) it will > raise errors as those codepoints are not in the Latin-1 charset but it > has it's own equivalents. AFAICS the fallback to other charsets is usually not required in Zope 3. If the publisher encodes responses using zope.publisher.http.getCharsetUsingRequest, the first charset will be the right one. > I would suggest that we work towards enforcing UTF-8 in where possible > but at the very least add the accept-charset attribute to forms and > use the portal's default_charset for this. > > I'd very much appreciate your comments on this. I can't see a need to implement this in a different way than Zope 3. So I propose to fix the encoding of forms sent to the browser. Cheers, Yuppie ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 18.01.2009 um 20:36 schrieb Dieter Maurer: > The "Accept-Charset" request header should *never* be used > to guess a charset at the server side: > > "Accept-Charset" is a user preference which does not know > anything about charsets used by the server. > > If "utf-8" would not be treated with preference in the > current code, the code base would see massive problems. > > Only the server knows which charsets it is using -- and it should > use a single one (with very few exceptions). > There should be a configuration option that tells this charset > and this should be used to decode form data. Dieter, I very much appreciate that your knowledge both of the specifications but more particularly of Zope internals is greater than mine. I am, however, not suggesting that accept-charset be used more than it already is by Zope for precisely the reasons you suggest. From the current HTML specification: "accept-charset = charset list [CI] This attribute specifies the list of character encodings for input data that is accepted by the server processing this form. The value is a space- and/or comma-delimited list of charset values. The client must interpret this list as an exclusive-or list, i.e., the server is able to accept any single character encoding per entity received." ie. exactly as you have suggested: it is possible to force a client to encode data in a particular charset before sending it to the server. All references I have come across suggest that this, together with the meta tag content-type can and should be used to coerce browsers to use UTF-8. On the other hand, whenever CMFDefault.utils.decode is called the extremely unreliable getBrowserCharset() is used which will usually return iso-8859-1. It is probably down to the way I have set my site up but I currently have problems as a result of this when using different browsers unless I override the default adapter. Regarding my current configuration: default-zpublisher-encoding = utf-8 default-charset = utf-8 All content objects are edited through formlib-derived forms and data is stored as unicode. With a default CMF install I have not been able to work with non-ASCII strings across OS and browser boundaries. If possible I will try and create test cases that demonstrate the problems. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Charlie Clark wrote at 2009-1-18 15:49 +0100: > ... >I would suggest that we work towards enforcing UTF-8 in where possible >but at the very least add the accept-charset attribute to forms and >use the portal's default_charset for this. > >I'd very much appreciate your comments on this. The "Accept-Charset" request header should *never* be used to guess a charset at the server side: "Accept-Charset" is a user preference which does not know anything about charsets used by the server. If "utf-8" would not be treated with preference in the current code, the code base would see massive problems. Only the server knows which charsets it is using -- and it should use a single one (with very few exceptions). There should be a configuration option that tells this charset and this should be used to decode form data. -- Dieter ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 29.12.2008 um 15:01 schrieb Charlie Clark: >> The site should deliver all pages containing forms (if possible even >> all pages) with a single charset, let's call it the "site charset". >> Then it uses this same charset to interpret form data. > > > While I understand this, I'm a bit at a loss as to why this is > happening. I'm using forms based on CMFDefault's formlib > implementation. Charsets are set for the site and zpublisher but > something else is probably missing. Delving deeper into this I think I understand things a little better. The accept-charset attribute on a form tag requires the browser to encode any form data in the specific encoding. Ideally this would make additional negotiation unnecessary but this value isn't passed to the server as the HTTP_ACCEPT_CHARSET which is where the fun starts. As has been noted previously, http://mail.zope.org/pipermail/zope3-dev/2004-June/011483.html , browsers don't all behave themselves when setting this header: IE 6 + 7 and Safari set an empty header whereas Opera and Firefox usually set something like "iso-8859-1, utf-8, utf-16, *;q=0.1" getPreferredCharsets() will return 'iso-8859-1' where HTTP_ACCEPT_CHARSET is empty. But this will cause problems if the browser is actually using UTF-8. But the way the CMF uses getPreferredCharsets() is right either: CMFDefault.utils def getBrowserCharset(request): """ Get charset preferred by the browser. """ envadapter = IUserPreferredCharsets(request) charsets = envadapter.getPreferredCharsets() or ['utf-8'] return charsets[0] This will always be iso-8859-1 for Opera and Firefox because all charsets have the same quality, again even if UTF-8 encoding is specified. I haven't been able to track where the decoding of form data occurs for Zope 2 stuff but I can identify the problem in zpublisher.browser.BrowserRequest def _decode(self, text): """Try to decode the text using one of the available charsets.""" if self.charsets is None: envadapter = IUserPreferredCharsets(self) self.charsets = envadapter.getPreferredCharsets() or ['utf-8'] for charset in self.charsets: try: text = unicode(text, charset) break except UnicodeError: pass return text Here the naive assumption is that we decode from a charset without an error then we have the correct charset. Sometimes this goes unnoticed but with characters like u2013 and u2014 (en-dash and em-dash) it will raise errors as those codepoints are not in the Latin-1 charset but it has it's own equivalents. I would suggest that we work towards enforcing UTF-8 in where possible but at the very least add the accept-charset attribute to forms and use the portal's default_charset for this. I'd very much appreciate your comments on this. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 19.12.2008 um 19:20 schrieb Dieter Maurer: >> Right. So I must be doing something wrong if all Zope has to go on >> for >> decoding the form is the Accept-Charset? How can I set an encoding >> for >> the form? > > The site should deliver all pages containing forms (if possible even > all pages) with a single charset, let's call it the "site charset". > Then it uses this same charset to interpret form data. While I understand this, I'm a bit at a loss as to why this is happening. I'm using forms based on CMFDefault's formlib implementation. Charsets are set for the site and zpublisher but something else is probably missing. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Charlie Clark wrote at 2008-12-19 11:35 +0100: > ... >> Usually, the client will use the charset it has found in the >> page containing the form. Thus, unless this charset has been >> determined automatically from the "Accept-Charset" header, >> it is merely accidental when the client preferences ("Accept-Charset") >> is able to guess the charset correctly. > > >Right. So I must be doing something wrong if all Zope has to go on for >decoding the form is the Accept-Charset? How can I set an encoding for >the form? The site should deliver all pages containing forms (if possible even all pages) with a single charset, let's call it the "site charset". Then it uses this same charset to interpret form data. -- Dieter ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 15.12.2008 um 21:01 schrieb Dieter Maurer: > It is usually insane to use client preferences to guess the encoding > used in form data. Have to agree with you there. > Usually, the client will use the charset it has found in the > page containing the form. Thus, unless this charset has been > determined automatically from the "Accept-Charset" header, > it is merely accidental when the client preferences ("Accept-Charset") > is able to guess the charset correctly. Right. So I must be doing something wrong if all Zope has to go on for decoding the form is the Accept-Charset? How can I set an encoding for the form? Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Charlie Clark wrote at 2008-12-14 12:32 +0100: >Am 13.12.2008 um 18:40 schrieb Charlie Clark: > >> Hi, >> >> I'm struggling with the way formlib forms handle decoding from forms. >> It looks like this gets set in BrowserView using an >> IUserPreferredCharsets adapter. The default adapter seems to be in >> zope.publisher.http and it looks like latin-1 will be set if there is >> no other charset and I'm having problems with the em-dash and en-dash >> (u'\u2013' and u'\u2013') characters automatically being converted >> from latin-1 when they are being entered as cp1252. For content that >> doesn't through this decoding I have no problems if zpublisher- >> default- >> encoding is set to cp1252 and the default_charset is set to cp1252 as >> well: decoding with CMFDefault.utils.decode() works just fine. >> >> I suspect I'm missing something basic in the way charsets are handled >> but as it's a windows only IE6 environment, is the easiest solution >> writing an adapter that defaults to cp1252 if there is no >> HTTP_ACCEPT_CHARSET in the request header? > > >Overriding the adapter works fine. I'm still a bit confused by the >original code: It is usually insane to use client preferences to guess the encoding used in form data. Usually, the client will use the charset it has found in the page containing the form. Thus, unless this charset has been determined automatically from the "Accept-Charset" header, it is merely accidental when the client preferences ("Accept-Charset") is able to guess the charset correctly. -- Dieter ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests
Re: [Zope-CMF] Charsets
Am 13.12.2008 um 18:40 schrieb Charlie Clark: > Hi, > > I'm struggling with the way formlib forms handle decoding from forms. > It looks like this gets set in BrowserView using an > IUserPreferredCharsets adapter. The default adapter seems to be in > zope.publisher.http and it looks like latin-1 will be set if there is > no other charset and I'm having problems with the em-dash and en-dash > (u'\u2013' and u'\u2013') characters automatically being converted > from latin-1 when they are being entered as cp1252. For content that > doesn't through this decoding I have no problems if zpublisher- > default- > encoding is set to cp1252 and the default_charset is set to cp1252 as > well: decoding with CMFDefault.utils.decode() works just fine. > > I suspect I'm missing something basic in the way charsets are handled > but as it's a windows only IE6 environment, is the easiest solution > writing an adapter that defaults to cp1252 if there is no > HTTP_ACCEPT_CHARSET in the request header? Overriding the adapter works fine. I'm still a bit confused by the original code: # Quoting RFC 2616, $14.2: If no "*" is present in an Accept- Charset # field, then all character sets not explicitly mentioned get a # quality value of 0, except for ISO-8859-1, which gets a quality # value of 1 if not explicitly mentioned. # And quoting RFC 2616, $14.2: "If no Accept-Charset header is # present, the default is that any character set is acceptable." if not sawstar and not sawiso88591 and header_present: charsets.append((1.0, 'iso-8859-1')) So if Accept-Charset is '' then Zope will set the browser charset to Latin-1. This seems a little strange to me as the default is UTF-8 if the header is missing. And if the RFC does say the default is "any character set is possible" then I would have thought UTF-8 would be okay. Is this a possible bug? Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226 ___ Zope-CMF maillist - Zope-CMF@lists.zope.org http://mail.zope.org/mailman/listinfo/zope-cmf See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests