From: php at richardneill dot org Operating system: PHP version: 4.3.6 PHP Bug Type: Feature/Change Request Bug description: RFE: function to fix microsoft "smart quotes"and other wrong characters
Description: ------------ Feature request: str_demoronise() On my website, I often find users pasting content that was written in Microsoft Word, and which contains undisplayable "ASCII" characters where there should be single/double quotes. Anyone viewing the result on a non-MS platform gets to see rectangles instead of quotes. The problem has been solved in perl here: http://www.fourmilab.ch/webtools/demoroniser/ I quote: ============ Microsoft use their own "extension" to Latin-1, in which a variety of characters which do not appear in Latin-1 are inserted in the range 0x82 through 0x95--this having the merit of being incompatible with both Latin-1 and Unicode, which reserve this region for additional control characters. ============= I'd like to suggest the addition of a str_demoronise() function which fixes these wrong characters, and replaces them by the correct ASCII. Reproduce code: --------------- >From the source of demoroniser, here are the substitutions made. The MS column is what Microsoft use (in Hex); the FIX column is the replacement: MS FIX 0x82 , 0x83 <em>f</em> 0x84 ,, 0x85 ... 0x88 ^ 0x89 ' °/°°' <-- whitsepace; no '' quotes 0x8B < 0x8C Oe 0x91 ` 0x92 ' 0x93 " 0x94 " 0x95 * 0x96 - 0x97 -- 0x98 <sup>~</sup> 0x99 <sup>TM</sup> 0x9B > 0x9C oe -- Edit bug report at http://bugs.php.net/?id=28646&edit=1 -- Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=28646&r=trysnapshot4 Try a CVS snapshot (php5): http://bugs.php.net/fix.php?id=28646&r=trysnapshot5 Fixed in CVS: http://bugs.php.net/fix.php?id=28646&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=28646&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=28646&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=28646&r=needscript Try newer version: http://bugs.php.net/fix.php?id=28646&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=28646&r=support Expected behavior: http://bugs.php.net/fix.php?id=28646&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=28646&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=28646&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=28646&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=28646&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=28646&r=dst IIS Stability: http://bugs.php.net/fix.php?id=28646&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=28646&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=28646&r=float