I just found another potential gotcha when using Unicode throughout
your application: byte order marks in uploaded text files.
http://en.wikipedia.org/wiki/Byte_Order_Mark

Turns out Word puts a byte order mark (BOM) at the beginning of all
Unicode files. Unicode-friendly tools ignore it. PHP's fgets()
doesn't.

Detecting and stripping the BOM is an interesting exercise, because
strlen('') == 6, but it's really only 3 bytes long... not sure if
this is a bug or what, but it's certainly an annoyance.


-- 
Chris Snyder
http://chxo.com/
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Reply via email to