On 14/11/11 11:43 AM, ABClf wrote:
Yes, I guess part (all ?) of the problem is related to the copy-pasted
text ; my German friend used Microsoft Word to write his text before
copying it in PmWiki (though he tells me he doesn't do this all the
time).


 From Wikipedia:
  It is very common to mislabel Windows-1252 text with the charset label
  ISO-8859-1. A common result was that all the quotes and apostrophes
  (produced by "smart quotes" in Microsoft software) were replaced with
  question marks or boxes on non-Windows operating systems, making text
  difficult to read. Most modern web browsers and e-mail clients treat the
  MIME charset ISO-8859-1 as Windows-1252 in order to accommodate such
  mislabeling. This is now standard behavior in the draft HTML 5
  specification, which requires that documents advertised as ISO-8859-1
  actually be parsed with the Windows-1252 encoding.

I use the following code to turn these characters into the corresponding html entities:

foreach(array(130 => 'sbquo',
              131 => 'fnof',
              132 => 'bdquo',
              133 => 'hellip',
              134 => 'dagger',
              135 => 'Dagger',
              137 => 'permil',
              138 => 'Scaron',
              139 => 'lsaquo',
              140 => 'OElig',
              145 => 'lsquo',
              146 => 'rsquo',
              147 => 'ldquo',
              148 => 'rdquo',
              149 => '#8226',
              150 => 'ndash',
              151 => 'mdash',
              152 => 'tilde',
              153 => 'trade',
              154 => 'scaron',
              155 => 'rsaquo',
              156 => 'oelig',
              159 => 'Yuml') as $k => $v)
    Markup("chr$k", 'inline', '/'.chr($k).'/', "&$v;");

JR

--
John Rankin



_______________________________________________
pmwiki-users mailing list
[email protected]
http://www.pmichaud.com/mailman/listinfo/pmwiki-users

Reply via email to