i am designing an application written in php which takes xml-files as input and parses them via pear::xml_tree (based upon pear::xml_parser which is itself based upon sax). the values taken from the xml are later handed around among a variety of php objects and finally renderered via a template engine (e.g. smarty).
my basic problem is that i am not familar with how the different encoding schemes are 'compatible' with php and its functions.
i read that php is internally using (the dynamically extendable) utf-8 encoding. on the other hand applications like xmlspy and later windows appz are using usually the (double byte encoded) utf-16 format.
my questions as a developer is now which format to use for
my xml file (i want to store data in latin [european], japanese
and chinese/taiwan and russian languages) and would love to be just to deal with one encoding gathering all languages instead of being in
need of a variety of diff encoding schemes. i would also be in need
of some more infos on datatypes.. for example: is a taiwanese word
also threatened as string in php and how to we deal with non-arabic
numerative systems?
furthermore i would love to know if there are there any problems for regular expressions, because i guess the alphabet of php's ereg-engine is mostly ansi based, isnt it? or maybe i am wrong and ereg recognizes the used alphabet automatically?
basically my research via google and nec's research index didnt result in any good papers about the subject which dont loose themself in the vastness of details on different encodings, byteorders and stuff.
are there any good articles for newbiews to look at?
any pointers to usefull and applyable knowledge sources would be really appriciated.
yours, matthias
__ http://www.parkstudios.net
-- PHP Internationalization Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php