On Thu, January 3, 2008 11:39 am, tedd wrote:
> At 4:24 PM +0100 1/3/08, Nisse Engström wrote:
>>On Wed, 2 Jan 2008 19:36:56 -0500, tedd wrote:
>>
>>>  To find out, I did put the operation through FireFox and reversed
>>> the
>>>  POST/GET operations to get a look at the string -- it is:
>>>
>>>  %C2%A0%C2%A0%C2%A0Z%C2%A0%C2%A0%C2%A0  < where Z is the value
>>> passed.
>>>
>>>  Now, C2 (HEX) is a linefeed (194 DEC)
>>>
>>>  And, A0 (HEX) is a non-breaking space (160 DEC;) which is a &nbsp;
>>
>>Not quite. <A0> is non-breaking space in *some* character
>>encodings, such as the ISO-8859-... encodings. It may
>>be different in other encodings. In UTF-8, it is <C2 A0>,
>>which is exactly what you're seing.
>
> Well considering that UTF-8 encompasses/includes all of the code
> points found ISO-8859, then I think that both encodings would
> reference the same character. After all, if they didn't then what's
> the point of Unicode?
>
> Now, one can argue how many bytes are needed to represent a character
> in what encoding, but that doesn't change the character. In the end,
> I believe that <A0> is the same regardless of what charset or
> encoding you're using.
>
> I just don't understand where C2 comes from or why it's there. I
> would think that <00 A0> would be more appropriate.
>
>>  > Therefore, if I simply use:
>>>
>>>  $submit = str_replace( chr(194), '', $submit );
>>>  $submit = str_replace( chr(160), '', $submit );
>>>
>>>  This is the solution.
>>
>>Hardly.
>
> If you mean my solution doesn't work, then you are mistaken -- for
> works for me.
>
>
>>  > Now, why does a POST operation add in C2's?  I'll leave that for
>>>  another post. :-)
>>
>>I haven't had time to look at the code, but perhaps you
>>need to specify a character encoding for the page.
>
>
> That's a valid point. Not only the encoding that's declared for the
> page via it's html DOCTYPE, but also what encoding was used to
> actually save that file on the server.
>
> This entire encoding process is more involved than it looks, or so it
> appears to me.

Perhaps you should be taking a whitelist approach to filtering input?...

:-)

In other words, only allow specific characters combinations you expect
to see, and ignore any other goofy characters that were encoded from
&nbsp;

Or, possibly, try using just spaces and not &nbsp; for the value -- I
suspect that the browsers will NOT collapse the spaces in the VALUE
since it's data, not HTML content...

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to