--- "Patrick R. Michaud" <[EMAIL PROTECTED]> wrote: > On Tue, Feb 13, 2007 at 08:08:09AM -0800, Seth Cherney wrote: > > It just does not work. > > > > The page, even if declared, is still *not truly* encoded in utf-8. > > saxon will still have an error (browsers could care less, they > > dont work on the same low level processing as far as I can tell). > > > > Unless I have a header such as: > > > > <?xml version="1.0" encoding="iso-8859-1"?> > > <!DOCTYPE html > > PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > > "xhtml1-transitional.dtd"> > > ... > > > > It is full of byte errors. (ie the server is outputing iso bytes, not > > utf-8). > > Note that neither the webserver nor PmWiki do any form of > automatic character encoding conversions. So, if the text was > originally entered and encoded as iso-8859-1, then that's > what PmWiki will output, even if the header says otherwise.
There is something quirky on a windows box. the text is automatically converted from utf-8 native to #nnn; format. I posted an example at http://www.pmwiki.org/wiki/UTF8/GreekDiacritics. The example there remains true for a native utf-8 *text* file pasted into pmwiki on my box. I think the behavior is fairly certain. It did not always do this. when I first installed 2.1.something, it saved in utf-8 native... It does exactly the same thing on my playground at http://www.xrisma.org/coop, which is on an OpenBSD box at aplus.net. (uploaded from my windows box). Even if I create a new page, the result is the same, so pmwiki is creating pages in a format there that has the same characteristics - from pmwiki.org and from a local native utf-8 txt file. the install there is still the 2.1.x. My LOCAL config file has: include_once('scripts/urlapprove.php'); ... include_once('$FarmD/scripts/xlpage-utf-8.php'); include_once("$FarmD/cookbook/spellchecker.php"); (My JAVA one that I cant figure out how to explain yet! - due to config wierdness with apache and tomcat.) include_once("$FarmD/cookbook/StaticPages.php"); include_once("scripts/trails.php"); include_once("$FarmD/cookbook/zap.php"); include_once("$FarmD/cookbook/zapplus.php"); > > > > PS: any tips on converting between true utf-8 and the #nnn; > > sequence in ROSpatterns and/or in markups? I will write a > > verbose if necessary, unless someone knows any type of command/shorthand. > > I'm not certain which way you're wanting the translation to > go. When saving a page, do you want utf-8 characters to be > converted into the &#nnn; counterparts, or vice-versa? The only drawback to this #nnn; format is that it is hard to edit Greek once it has been entered, since the edit screen is in this format. I can live with it for now, but, it would be better to have everything saved in utf-8 native so that it can be read on edit, but with a markup script to convert to #nnn; on display, so that it can be indexed properly. ALSO, it makes me quite uneasy, since I am posting 500,000 pages of text, and don't want things to flip out on me once people start editing. I would be incapable of recovery as a likely scenario. I will be adding 9000 Greek books within the next 2 years, so I guess this is critical :). It seems, for safety's sake and future compatibility, I would probably need a ROSpattern for the native utf-8, and also a markup to convert back to #nnn;. Thanks for your time as always, Seth > > Pm > ____________________________________________________________________________________ It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. http://tools.search.yahoo.com/toolbar/features/mail/ _______________________________________________ pmwiki-users mailing list [email protected] http://www.pmichaud.com/mailman/listinfo/pmwiki-users
