Friday, November 30, 2007, 11:11:47 PM, H. Fox wrote: > I think it may be very important to send the charset for security reasons.
I quote from CERT http://www.cert.org/tech_tips/malicious_code_mitigation.html Explicitly Setting the Character Encoding Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used. If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters. For example, UTF-7 provides alternative encoding for "<" and ">", and several popular browsers recognize these as the start and end of a tag. This is not a bug in those browsers. If the character encoding really is UTF-7, then this is correct behavior. The problem is that it is possible to get into a situation in which the browser and the server disagree on the encoding. Web servers should set the character set, then make sure that the data they insert is free from byte sequences that are special in the specified encoding. For example: <HTML> <HEAD> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <TITLE>HTML SAMPLE</TITLE> </HEAD> <BODY> <P>This is a sample HTML page </BODY> </HTML> The META tag in the HEAD section of this sample HTML forces the page to use the ISO-8859-1 character set encoding. =============== end quote ============= The PmWiki default skin does not set charset=ISO-8859-1, although in the documentation on PmWiki/Internationalizations it says that ISO-8859-1 is PmWiki's default. When a skin templates sets this with a meta tag above the title tag, then any subsequent char code setting via for instance include scripts/xlpage-utf-8.php will appear later in the HTML header, and I assume override the setting before this. Please correct me if I am wrong. I just try to find a good standard for skins. I recommend to read the page I quoted from http://www.cert.org/tech_tips/malicious_code_mitigation.html not just for the charset advise, but for lots more on input character validations. ~Hans _______________________________________________ pmwiki-users mailing list [email protected] http://www.pmichaud.com/mailman/listinfo/pmwiki-users
