Sergei Dolmatov wrote:

>Hmm... May be I didn't understand you clean, but why you wan't use
>'user defined formatters' (UDF)?
>  
>
Excellent question. A bit of info on my mischievous intentions is due.

Midgard does not formally support Unicode, because mysql doesn't . At 
CWA we have a lot of experience with Unicode support, and we know that

 - mysql 3.22.32 storage of strings is binary-clean (or at least good 
enough) to support storing and retrieving utf-8 chars reliably. You 
cannot get too smart with sql regexps, but that is the only limitation. 
UTF-8 strings are preserved safely -- we've been running systems with 
extensive used of utf-8 on mysql for 3 years now with no glitches.

 - PHP (and Perl) mysql libraries are also binary clean (or close enogh).

 - Asgard and Old Admin do not support editing of utf-8 correctly just 
because they don't explicitly set the http headers. We have successfully 
fixed that.

 - Asgard and Old Admin display content incorrectly. Why? Because they 
use midgard's own content filters.

In testing, we've discovered that calling midgard's own content filters 
has the exact same (undesired) effects as calling htmlentities() with 
the default encoding of ISO-8859-1 (see 
http://www.php.net/manual/en/function.htmlentities.php). Calling 
htmlentities using htmlentities($article->title, ENT_QUOTES, 'UTF-8') 
gives the correct result every time. As you can imagine, I do not want 
to search/replace over Asgard/OldAdmin, brute-forcing them into unicode 
friendliness...

So if we can "fix" the html encoding, we will have a unicode-compliant 
Midgard. The only thing I wouldn't guarantee 100% would be retrieving 
objects by name where the name contains UTF-8 high chars -- but testing 
would answer the question rather fast.

regards,





m


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to