On Jun 26, 2014, at 7:02 PM, Philip Prindeville 
<philipp_s...@redfish-solutions.com> wrote:

> 
> On Jun 25, 2014, at 5:29 PM, RW <rwmailli...@googlemail.com> wrote:
> 
>> On Wed, 25 Jun 2014 14:21:33 -0600
>> Philip Prindeville wrote:
>> 
>> 
>>> Here’s the other thing I don’t get.
>>> 
>>> The message claims to be 7-bit and text/plain, yet it uses encoded
>>> characters which exceed 7-bit widths yet this doesn’t seem to be
>>> firing any rules either.
>>> 
>>> &#x042C would seem to be at least an 11-bit wide character.
>> 
>> You are mixing-up different levels of encoding. The characters
>> &,#,x,0,4,2 and C are all 7-bit ASCI, and so are consistent with
>> Content-Transfer-Encoding: 7bit.
> 
> You’re correct… That is consistent with the CTE.
> 
> But the Content-Type omitted a ;charset=“XXX” attribute, which means it 
> defaults to “US-ASCII”.
> 
> Quoting RFC-2046:
> 
> 4.1.2.  Charset Parameter
> 
>   A critical parameter that may be specified in the Content-Type field
>   for "text/plain" data is the character set.  This is specified with a
>   "charset" parameter, as in:
> 
>     Content-type: text/plain; charset=iso-8859-1
> 
>   Unlike some other parameter values, the values of the charset
>   parameter are NOT case sensitive.  The default character set, which
>   must be assumed in the absence of a charset parameter, is US-ASCII.
> 
> 
> Since &#x042C is outside the US-ASCII character set, this would be an 
> encoding violation.
> 
> -Philip
> 


Can anyone point me at how to write a test that confirms that the actual 
encoded text will fit into the named (or implicit) charset?

I.e. what’s a good template or example to go by?

Thanks.

Reply via email to