On 14/11/11 11:43 AM, ABClf wrote:
Yes, I guess part (all ?) of the problem is related to the copy-pasted
text ; my German friend used Microsoft Word to write his text before
copying it in PmWiki (though he tells me he doesn't do this all the
time).
From Wikipedia:
It is very common to mislabel Windows-1252 text with the charset label
ISO-8859-1. A common result was that all the quotes and apostrophes
(produced by "smart quotes" in Microsoft software) were replaced with
question marks or boxes on non-Windows operating systems, making text
difficult to read. Most modern web browsers and e-mail clients treat the
MIME charset ISO-8859-1 as Windows-1252 in order to accommodate such
mislabeling. This is now standard behavior in the draft HTML 5
specification, which requires that documents advertised as ISO-8859-1
actually be parsed with the Windows-1252 encoding.
I use the following code to turn these characters into the corresponding
html entities:
foreach(array(130 => 'sbquo',
131 => 'fnof',
132 => 'bdquo',
133 => 'hellip',
134 => 'dagger',
135 => 'Dagger',
137 => 'permil',
138 => 'Scaron',
139 => 'lsaquo',
140 => 'OElig',
145 => 'lsquo',
146 => 'rsquo',
147 => 'ldquo',
148 => 'rdquo',
149 => '#8226',
150 => 'ndash',
151 => 'mdash',
152 => 'tilde',
153 => 'trade',
154 => 'scaron',
155 => 'rsaquo',
156 => 'oelig',
159 => 'Yuml') as $k => $v)
Markup("chr$k", 'inline', '/'.chr($k).'/', "&$v;");
JR
--
John Rankin
_______________________________________________
pmwiki-users mailing list
[email protected]
http://www.pmichaud.com/mailman/listinfo/pmwiki-users