Re: [PHP] htmlentities is incomplete: does not cover rsquo etc
On Sat, Mar 14, 2009 at 12:18 AM, Lester Caine wrote: > This probably one of the reasons some of us think that getting a stable PHP6 > based on unicode out of the door would probably be a lot more use to people > than PHP5.3 ;) +1 I cannot wait for full unicode. mbstring, iconv, all this wacky stuff, no thanks. having to feed 'utf-8' to functions all over too... everything should be UTF-8 now, period. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] htmlentities is incomplete: does not cover rsquo etc
Heddon's Gate Hotel wrote: Thanks Jan, it's much clearer now. My knowledge about character encodings has multiplied 100-fold in the last 24 hours' research. Would it be a good idea for the PHP Manual to address some of these issues, by explaining good practice in encoding arbitrary user input in forms (for example), for the benefit of those, like me, for whom character sets are a bit of a black art? Also I still cannot persuade get_html_translation_table to list those non-Latin1 entities. This is not an important issue, since it appears to be only an information function, but it would be nice if it were consistent with htmlentities and html_entity_decode. This probably one of the reasons some of us think that getting a stable PHP6 based on unicode out of the door would probably be a lot more use to people than PHP5.3 ;) Eliminate character sets and the black art goes away? -- Lester Caine - G8HFL - Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk// Firebird - http://www.firebirdsql.org/index.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] htmlentities is incomplete: does not cover rsquo etc
Thanks Jan, it's much clearer now. My knowledge about character encodings has multiplied 100-fold in the last 24 hours' research. Would it be a good idea for the PHP Manual to address some of these issues, by explaining good practice in encoding arbitrary user input in forms (for example), for the benefit of those, like me, for whom character sets are a bit of a black art? Also I still cannot persuade get_html_translation_table to list those non-Latin1 entities. This is not an important issue, since it appears to be only an information function, but it would be nice if it were consistent with htmlentities and html_entity_decode. Eddie From Jan G.B. 13/03/2009 17:27: 2009/3/13 Heddon's Gate Hotel : The string function htmlentities seems to have very incomplete coverage of the HTML entities listed in the HTML 4 spec. For example, it does not know about rsquo, lsquo, rdquo, ldquo, etc. This is confirmed by looking at the output of get_html_translation_table, which does not list these entities. My impression is that it covers those HTML entities that are in ISO-8859-1, but not the others. Is this deliberate? If so, the Manual is misleading because it suggests that all HTML entities are covered. Otherwise, is this a bug? Well, If you specify the input charset you'll have no problem at all. ;) >µ»n“¢µ€jæ', ENT_QUOTES, 'UTF-8'); ?> Latin1 AKA ISO-8859-1 doesn't have ldquo nor bdquo nor ndash and alike. Regards, -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] htmlentities is incomplete: does not cover rsquo etc
2009/3/13 Heddon's Gate Hotel : > The string function htmlentities seems to have very incomplete coverage of > the HTML entities listed in the HTML 4 spec. For example, it does not know > about rsquo, lsquo, rdquo, ldquo, etc. This is confirmed by looking at the > output of get_html_translation_table, which does not list these entities. > > My impression is that it covers those HTML entities that are in ISO-8859-1, > but not the others. Is this deliberate? If so, the Manual is misleading > because it suggests that all HTML entities are covered. Otherwise, is this a > bug? > Well, If you specify the input charset you'll have no problem at all. ;) >µ»n“¢µ€jæ', ENT_QUOTES, 'UTF-8'); ?> Latin1 AKA ISO-8859-1 doesn't have ldquo nor bdquo nor ndash and alike. Regards, -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php