Re: [Zope-CMF] Charsets

2009-01-25 Thread Charlie Clark

Am 25.01.2009 um 08:35 schrieb Dieter Maurer:

 Wow. Some magic in formlib deviating from the Zope2 standard  
 behaviour

Maybe. But then formlib is not really a standard Zope 2 approach. The  
one thing I do find weird is that PreferredCharsets() is called for  
each field in a form.

 But, if this is true, we do not understand Charlie's observations:

  When I understood him right, he is using formlib and he is observing
  problems with the charsets.

  He found out that this has to do with IE browsers sending an
  empty Accept-Charsets header which is turned by Zope's
  preferredCharset into iso-8859-1.

  But when the same charset is used on both form delivery and
  on form processing he should not see a problem with mismatched
  encodings.

Actually the problems occur as soon as you use different browsers with  
non-ASCII text.

 Of course, iso-8859-1 may not be approriate for form delivery --
 and may result in funny special characters in non-western countries.


As Daniel noted UTF-8 should be default. I had a quick look at the  
source of the appropriate module and couldn't see where the broken  
magic was happening. It's probably a bit beyond me but is the first  
thing to write a test that we know currently breaks?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-25 Thread Daniel Nouri
Charlie Clark writes:

 Am 25.01.2009 um 08:35 schrieb Dieter Maurer:

 Of course, iso-8859-1 may not be approriate for form delivery --
 and may result in funny special characters in non-western countries.


 As Daniel noted UTF-8 should be default. I had a quick look at the  
 source of the appropriate module and couldn't see where the broken  
 magic was happening. It's probably a bit beyond me but is the first  
 thing to write a test that we know currently breaks?

Apparently, a fix for the problem I was talking about was attempted in
zope.publisher, but not propogated to Zope 2 as per Tres comment from
2008-06-16 from this (duplicate?) bug report.  There's also some working
patches for Zope 2 here:

  https://bugs.launchpad.net/zope2/+bug/143873

I have no clue about the problem with character encodings used to serve
pages, nor am I sure if it's related at all.


Daniel
-- 
http://danielnouri.org

___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-24 Thread Dieter Maurer
yuppie wrote at 2009-1-19 22:54 +0100:
 ...
Products.Five.browser.decode.setPageEncoding sets the response content 
type charset based on zope.publisher.http.HTTPCharsets. And 
setPageEncoding is called by the update method of formlib forms in Zope 
2. So in this case the response encoding has something to do with the 
Accept-Charset request header.

Wow. Some magic in formlib deviating from the Zope2 standard behaviour

But, if this is true, we do not understand Charlie's observations:

  When I understood him right, he is using formlib and he is observing
  problems with the charsets.

  He found out that this has to do with IE browsers sending an
  empty Accept-Charsets header which is turned by Zope's
  preferredCharset into iso-8859-1.

  But when the same charset is used on both form delivery and
  on form processing he should not see a problem with mismatched
  encodings.

Of course, iso-8859-1 may not be approriate for form delivery --
and may result in funny special characters in non-western countries.



-- 
Dieter
___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-21 Thread Charlie Clark

Am 21.01.2009 um 00:11 schrieb Daniel Nouri:

 Is this thread by any chance related to this bug:
 https://bugs.launchpad.net/zope2/+bug/160968

  The IUserPreferredCharsets implementation of Zope 3 found in
  zope.publisher.http.HTTPCharsets has the following condition in it
  to check if the HTTP_ACCEPT_CHARSET header is available:

  header_present = 'HTTP_ACCEPT_CHARSET' in self.request

  However, with Zope 2's request will return '' (the empty string) for
  any header that starts with 'HTTP_', see
  ZPublisher.HTTPRequest.HTTPRequest.get.

  Ultimately, this results in the HTTPCharsets.getPreferredCharsets to
  return ['iso-8859-1'], where it should really return 'UTF-8'.

  To understand this problem better, look at
  Products.Five.browser.decode.processInputs, which uses the
  negotiator to find out which charset to use to convert form
  variables. For browsers that do not send the 'HTTP_ACCEPT_CHARSET'
  header, this will result in wrongly encoded form values. To
  reproduce this, fill in Chinese characters to any Five formlib form
  with Internet Explorer 6.0. Since Firefox sends HTTP_ACCEPT_CHARSET,
  it's not a problem there.


Yes Daniel,

this is exactly the problem we're facing. So we need to fix Zope =  
2.10 and then we shouldn't have to worry about doing anything to the  
CMF.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-20 Thread Daniel Nouri
Is this thread by any chance related to this bug:
https://bugs.launchpad.net/zope2/+bug/160968

  The IUserPreferredCharsets implementation of Zope 3 found in
  zope.publisher.http.HTTPCharsets has the following condition in it
  to check if the HTTP_ACCEPT_CHARSET header is available:

  header_present = 'HTTP_ACCEPT_CHARSET' in self.request

  However, with Zope 2's request will return '' (the empty string) for
  any header that starts with 'HTTP_', see
  ZPublisher.HTTPRequest.HTTPRequest.get.

  Ultimately, this results in the HTTPCharsets.getPreferredCharsets to
  return ['iso-8859-1'], where it should really return 'UTF-8'.

  To understand this problem better, look at
  Products.Five.browser.decode.processInputs, which uses the
  negotiator to find out which charset to use to convert form
  variables. For browsers that do not send the 'HTTP_ACCEPT_CHARSET'
  header, this will result in wrongly encoded form values. To
  reproduce this, fill in Chinese characters to any Five formlib form
  with Internet Explorer 6.0. Since Firefox sends HTTP_ACCEPT_CHARSET,
  it's not a problem there.

-- 
http://danielnouri.org

___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-19 Thread Charlie Clark

Am 18.01.2009 um 23:00 schrieb yuppie:

 Hi Charlie!

Hiya Yuppie,

 Charlie Clark wrote:
 Am 29.12.2008 um 15:01 schrieb Charlie Clark:

 CMFDefault.utils

 def getBrowserCharset(request):
  Get charset preferred by the browser.
 
 envadapter = IUserPreferredCharsets(request)
 charsets = envadapter.getPreferredCharsets() or ['utf-8']
 return charsets[0]

 This will always be iso-8859-1 for Opera and Firefox because all
 charsets have the same quality, again even if UTF-8 encoding is
 specified.

 getBrowserCharset does almost the same as
 zope.publisher.http.getCharsetUsingRequest. And it is only used for
 encoding and decoding 'portal_status_message'. It is not relevant for
 the issue you noticed.

Okay.

 I haven't been able to track where the decoding of form
 data occurs for Zope 2 stuff but I can identify the problem in
 zpublisher.browser.BrowserRequest

 You mean zope.publisher.browser.BrowserRequest. The Zope 2 version  
 is in
 Products.Five.browser.decode.

Thanks - I thought it must have been in Five but didn't know where to  
look.

 AFAICS the fallback to other charsets is usually not required in  
 Zope 3.
 If the publisher encodes responses using
 zope.publisher.http.getCharsetUsingRequest, the first charset will be
 the right one.

That seems reasonable.

 I would suggest that we work towards enforcing UTF-8 in where  
 possible
 but at the very least add the accept-charset attribute to forms and
 use the portal's default_charset for this.

 I'd very much appreciate your comments on this.

 I can't see a need to implement this in a different way than Zope 3.  
 So
 I propose to fix the encoding of forms sent to the browser.


I agree that there shouldn't be implemented in a different way than  
for Zope 3. And if we can solve the problems by fixing form encoding  
I'm happy. Although I'd like to see UTF-8 always the first charset  
returned if * the quality is the same.

One thing that did strike me when working on this is quite how often  
getPreferredCharsets() is called on single request.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-19 Thread yuppie
Charlie Clark wrote:
 Am 18.01.2009 um 23:00 schrieb yuppie:
 I agree that there shouldn't be implemented in a different way than  
 for Zope 3. And if we can solve the problems by fixing form encoding  
 I'm happy. Although I'd like to see UTF-8 always the first charset  
 returned if * the quality is the same.

zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you sure 
getPreferredCharsets()[0] is iso-8859-1 with your browser? Or do you 
override somewhere the Content-Type header set by setPageEncoding()? 
AFAICS CMFDefault works exactly the way you expect it to.

Cheers, Yuppie

___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-19 Thread Charlie Clark

Am 19.01.2009 um 11:32 schrieb yuppie:

 zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you  
 sure
 getPreferredCharsets()[0] is iso-8859-1 with your browser? Or do you
 override somewhere the Content-Type header set by setPageEncoding()?
 AFAICS CMFDefault works exactly the way you expect it to.


No, I don't override anything. I'll run some tests and post the results.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-19 Thread Dieter Maurer
yuppie wrote at 2009-1-19 11:32 +0100:
Charlie Clark wrote:
 Am 18.01.2009 um 23:00 schrieb yuppie:
 I agree that there shouldn't be implemented in a different way than  
 for Zope 3. And if we can solve the problems by fixing form encoding  
 I'm happy. Although I'd like to see UTF-8 always the first charset  
 returned if * the quality is the same.

zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you sure 
getPreferredCharsets()[0] is iso-8859-1 with your browser?

This might be true for the Zope 3 publisher
however, Zope 2 HTTPResponse uses default_encoding (configured
in zope.conf) unless an encoding is prescribed by the response
content type -- and this has nothing to do with the Accept-Charset
request header.



-- 
Dieter
___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-19 Thread Dieter Maurer
Charlie Clark wrote at 2009-1-18 22:30 +0100:
Am 18.01.2009 um 20:36 schrieb Dieter Maurer:
 ...
 From the current HTML specification:

accept-charset = charset list [CI]
This attribute specifies the list of character encodings for input  
data that is accepted by the server processing this form. The value is  
a space- and/or comma-delimited list of charset values. The client  
must interpret this list as an exclusive-or list, i.e., the server is  
able to accept any single character encoding per entity received.

ie. exactly as you have suggested: it is possible to force a client to  
encode data in a particular charset before sending it to the server.  
All references I have come across suggest that this, together with the  
meta tag content-type can and should be used to coerce browsers to use  
UTF-8.

I fear that the accept-charset form control attribute
can easily only be used for method=post content-type=multipart/form-data
as only then the browser has a chance to specify how it has
encoded the value.

I am not sure whether Zope handles the charset information
in this case correctly.


As the Accept-Charset request header has (almost) nothing to do
with the accept-charset form control attribute, it must of course
not be used to interpret form data even when this was created
based on an accept-charset.


If the server chooses its output encoding based on the Accept-Charset
request header (and Yuppie indicated that the Zope 3 publisher does this),
then the same algorithm can be used for normal form data
(where normal means, you do not explicitely specify an accept-charset
form control attribute).
That's one sensefull mode of operation.
Another one is choosing a fixed encoding and using it as input and
output encoding.



-- 
Dieter
___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-19 Thread yuppie
Dieter Maurer wrote:
 yuppie wrote at 2009-1-19 11:32 +0100:
 Charlie Clark wrote:
 Am 18.01.2009 um 23:00 schrieb yuppie:
 I agree that there shouldn't be implemented in a different way than  
 for Zope 3. And if we can solve the problems by fixing form encoding  
 I'm happy. Although I'd like to see UTF-8 always the first charset  
 returned if * the quality is the same.
 zope.publisher.http.HTTPCharsets explicitly prefers utf-8. Are you sure 
 getPreferredCharsets()[0] is iso-8859-1 with your browser?
 
 This might be true for the Zope 3 publisher
 however, Zope 2 HTTPResponse uses default_encoding (configured
 in zope.conf) unless an encoding is prescribed by the response
 content type -- and this has nothing to do with the Accept-Charset
 request header.

Products.Five.browser.decode.setPageEncoding sets the response content 
type charset based on zope.publisher.http.HTTPCharsets. And 
setPageEncoding is called by the update method of formlib forms in Zope 
2. So in this case the response encoding has something to do with the 
Accept-Charset request header.

Cheers, Yuppie

___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-18 Thread Charlie Clark

Am 29.12.2008 um 15:01 schrieb Charlie Clark:

 The site should deliver all pages containing forms (if possible even
 all pages) with a single charset, let's call it the site charset.
 Then it uses this same charset to interpret form data.


 While I understand this, I'm a bit at a loss as to why this is
 happening. I'm using forms based on CMFDefault's formlib
 implementation. Charsets are set for the site and zpublisher but
 something else is probably missing.


Delving deeper into this I think I understand things a little better.

The accept-charset attribute on a form tag requires the browser to  
encode any form data in the specific encoding. Ideally this would make  
additional negotiation unnecessary but this value isn't passed to the  
server as the HTTP_ACCEPT_CHARSET which is where the fun starts. As  
has been noted previously, 
http://mail.zope.org/pipermail/zope3-dev/2004-June/011483.html 
  , browsers don't all behave themselves when setting this header: IE  
6 + 7 and Safari set an empty header whereas Opera and Firefox usually  
set something like iso-8859-1, utf-8, utf-16, *;q=0.1

getPreferredCharsets() will return 'iso-8859-1' where  
HTTP_ACCEPT_CHARSET is empty. But this will cause problems if the  
browser is actually using UTF-8. But the way the CMF uses  
getPreferredCharsets() is right either:

CMFDefault.utils

def getBrowserCharset(request):
  Get charset preferred by the browser.
 
 envadapter = IUserPreferredCharsets(request)
 charsets = envadapter.getPreferredCharsets() or ['utf-8']
 return charsets[0]

This will always be iso-8859-1 for Opera and Firefox because all  
charsets have the same quality, again even if UTF-8 encoding is  
specified. I haven't been able to track where the decoding of form  
data occurs for Zope 2 stuff but I can identify the problem in  
zpublisher.browser.BrowserRequest

 def _decode(self, text):
 Try to decode the text using one of the available  
charsets.
 if self.charsets is None:
 envadapter = IUserPreferredCharsets(self)
 self.charsets = envadapter.getPreferredCharsets() or  
['utf-8']
 for charset in self.charsets:
 try:
 text = unicode(text, charset)
 break
 except UnicodeError:
 pass
 return text

Here the naive assumption is that we decode from a charset without an  
error then we have the correct charset. Sometimes this goes unnoticed  
but with characters like u2013 and u2014 (en-dash and em-dash) it will  
raise errors as those codepoints are not in the Latin-1 charset but it  
has it's own equivalents.

I would suggest that we work towards enforcing UTF-8 in where possible  
but at the very least add the accept-charset attribute to forms and  
use the portal's default_charset for this.

I'd very much appreciate your comments on this.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-18 Thread Dieter Maurer
Charlie Clark wrote at 2009-1-18 15:49 +0100:
 ...
I would suggest that we work towards enforcing UTF-8 in where possible  
but at the very least add the accept-charset attribute to forms and  
use the portal's default_charset for this.

I'd very much appreciate your comments on this.

The Accept-Charset request header should *never* be used
to guess a charset at the server side:

  Accept-Charset is a user preference which does not know
  anything about charsets used by the server.

If utf-8 would not be treated with preference in the
current code, the code base would see massive problems.

Only the server knows which charsets it is using -- and it should
use a single one (with very few exceptions).
There should be a configuration option that tells this charset
and this should be used to decode form data.



-- 
Dieter
___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2009-01-18 Thread yuppie
Hi Charlie!


Charlie Clark wrote:
 Am 29.12.2008 um 15:01 schrieb Charlie Clark:
 
 CMFDefault.utils
 
 def getBrowserCharset(request):
   Get charset preferred by the browser.
  
  envadapter = IUserPreferredCharsets(request)
  charsets = envadapter.getPreferredCharsets() or ['utf-8']
  return charsets[0]
 
 This will always be iso-8859-1 for Opera and Firefox because all  
 charsets have the same quality, again even if UTF-8 encoding is  
 specified.

getBrowserCharset does almost the same as 
zope.publisher.http.getCharsetUsingRequest. And it is only used for 
encoding and decoding 'portal_status_message'. It is not relevant for 
the issue you noticed.

 I haven't been able to track where the decoding of form  
 data occurs for Zope 2 stuff but I can identify the problem in  
 zpublisher.browser.BrowserRequest

You mean zope.publisher.browser.BrowserRequest. The Zope 2 version is in 
Products.Five.browser.decode.

  def _decode(self, text):
  Try to decode the text using one of the available  
 charsets.
  if self.charsets is None:
  envadapter = IUserPreferredCharsets(self)
  self.charsets = envadapter.getPreferredCharsets() or  
 ['utf-8']
  for charset in self.charsets:
  try:
  text = unicode(text, charset)
  break
  except UnicodeError:
  pass
  return text
 
 Here the naive assumption is that we decode from a charset without an  
 error then we have the correct charset. Sometimes this goes unnoticed  
 but with characters like u2013 and u2014 (en-dash and em-dash) it will  
 raise errors as those codepoints are not in the Latin-1 charset but it  
 has it's own equivalents.

AFAICS the fallback to other charsets is usually not required in Zope 3. 
If the publisher encodes responses using 
zope.publisher.http.getCharsetUsingRequest, the first charset will be 
the right one.

 I would suggest that we work towards enforcing UTF-8 in where possible  
 but at the very least add the accept-charset attribute to forms and  
 use the portal's default_charset for this.
 
 I'd very much appreciate your comments on this.

I can't see a need to implement this in a different way than Zope 3. So 
I propose to fix the encoding of forms sent to the browser.


Cheers,

Yuppie


___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2008-12-19 Thread Charlie Clark

Am 15.12.2008 um 21:01 schrieb Dieter Maurer:

 It is usually insane to use client preferences to guess the encoding
 used in form data.

Have to agree with you there.

 Usually, the client will use the charset it has found in the
 page containing the form. Thus, unless this charset has been
 determined automatically from the Accept-Charset header,
 it is merely accidental when the client preferences (Accept-Charset)
 is able to guess the charset correctly.


Right. So I must be doing something wrong if all Zope has to go on for  
decoding the form is the Accept-Charset? How can I set an encoding for  
the form?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2008-12-19 Thread Dieter Maurer
Charlie Clark wrote at 2008-12-19 11:35 +0100:
 ...
 Usually, the client will use the charset it has found in the
 page containing the form. Thus, unless this charset has been
 determined automatically from the Accept-Charset header,
 it is merely accidental when the client preferences (Accept-Charset)
 is able to guess the charset correctly.


Right. So I must be doing something wrong if all Zope has to go on for  
decoding the form is the Accept-Charset? How can I set an encoding for  
the form?

The site should deliver all pages containing forms (if possible even
all pages) with a single charset, let's call it the site charset.
Then it uses this same charset to interpret form data.



-- 
Dieter
___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2008-12-15 Thread Dieter Maurer
Charlie Clark wrote at 2008-12-14 12:32 +0100:
Am 13.12.2008 um 18:40 schrieb Charlie Clark:

 Hi,

 I'm struggling with the way formlib forms handle decoding from forms.
 It looks like this gets set in BrowserView using an
 IUserPreferredCharsets adapter. The default adapter seems to be in
 zope.publisher.http and it looks like latin-1 will be set if there is
 no other charset and I'm having problems with the em-dash and en-dash
 (u'\u2013' and u'\u2013') characters automatically being converted
 from latin-1 when they are being entered as cp1252. For content that
 doesn't through this decoding I have no problems if zpublisher- 
 default-
 encoding is set to cp1252 and the default_charset is set to cp1252 as
 well: decoding with CMFDefault.utils.decode() works just fine.

 I suspect I'm missing something basic in the way charsets are handled
 but as it's a windows only IE6 environment, is the easiest solution
 writing an adapter that defaults to cp1252 if there is no
 HTTP_ACCEPT_CHARSET in the request header?


Overriding the adapter works fine. I'm still a bit confused by the  
original code:

It is usually insane to use client preferences to guess the encoding
used in form data.

Usually, the client will use the charset it has found in the
page containing the form. Thus, unless this charset has been
determined automatically from the Accept-Charset header,
it is merely accidental when the client preferences (Accept-Charset)
is able to guess the charset correctly.



-- 
Dieter
___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


Re: [Zope-CMF] Charsets

2008-12-14 Thread Charlie Clark

Am 13.12.2008 um 18:40 schrieb Charlie Clark:

 Hi,

 I'm struggling with the way formlib forms handle decoding from forms.
 It looks like this gets set in BrowserView using an
 IUserPreferredCharsets adapter. The default adapter seems to be in
 zope.publisher.http and it looks like latin-1 will be set if there is
 no other charset and I'm having problems with the em-dash and en-dash
 (u'\u2013' and u'\u2013') characters automatically being converted
 from latin-1 when they are being entered as cp1252. For content that
 doesn't through this decoding I have no problems if zpublisher- 
 default-
 encoding is set to cp1252 and the default_charset is set to cp1252 as
 well: decoding with CMFDefault.utils.decode() works just fine.

 I suspect I'm missing something basic in the way charsets are handled
 but as it's a windows only IE6 environment, is the easiest solution
 writing an adapter that defaults to cp1252 if there is no
 HTTP_ACCEPT_CHARSET in the request header?


Overriding the adapter works fine. I'm still a bit confused by the  
original code:

 # Quoting RFC 2616, $14.2: If no * is present in an Accept- 
Charset
 # field, then all character sets not explicitly mentioned get a
 # quality value of 0, except for ISO-8859-1, which gets a  
quality
 # value of 1 if not explicitly mentioned.
 # And quoting RFC 2616, $14.2: If no Accept-Charset header is
 # present, the default is that any character set is  
acceptable.
 if not sawstar and not sawiso88591 and header_present:
 charsets.append((1.0, 'iso-8859-1'))

So if Accept-Charset is '' then Zope will set the browser charset to  
Latin-1. This seems a little strange to me as the default is UTF-8 if  
the header is missing. And if the RFC does say the default is any  
character set is possible then I would have thought UTF-8 would be  
okay. Is this a possible bug?

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests


[Zope-CMF] Charsets

2008-12-13 Thread Charlie Clark
Hi,

I'm struggling with the way formlib forms handle decoding from forms.  
It looks like this gets set in BrowserView using an  
IUserPreferredCharsets adapter. The default adapter seems to be in  
zope.publisher.http and it looks like latin-1 will be set if there is  
no other charset and I'm having problems with the em-dash and en-dash  
(u'\u2013' and u'\u2013') characters automatically being converted  
from latin-1 when they are being entered as cp1252. For content that  
doesn't through this decoding I have no problems if zpublisher-default- 
encoding is set to cp1252 and the default_charset is set to cp1252 as  
well: decoding with CMFDefault.utils.decode() works just fine.

I suspect I'm missing something basic in the way charsets are handled  
but as it's a windows only IE6 environment, is the easiest solution  
writing an adapter that defaults to cp1252 if there is no  
HTTP_ACCEPT_CHARSET in the request header?

Thanks for any pointers.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226



___
Zope-CMF maillist  -  Zope-CMF@lists.zope.org
http://mail.zope.org/mailman/listinfo/zope-cmf

See https://bugs.launchpad.net/zope-cmf/ for bug reports and feature requests