From:             php at richardneill dot org
Operating system: 
PHP version:      4.3.6
PHP Bug Type:     Feature/Change Request
Bug description:  RFE: function to fix microsoft "smart quotes"and other wrong 
characters

Description:
------------
Feature request: str_demoronise()

On my website, I often find users pasting content that was written in
Microsoft Word, and which contains undisplayable "ASCII" characters where
there should be single/double quotes. Anyone viewing the result on a
non-MS platform gets to see rectangles instead of quotes.

The problem has been solved in perl here:
http://www.fourmilab.ch/webtools/demoroniser/
I quote: 
============
Microsoft use their own "extension" to Latin-1, in which a variety of
characters which do not appear in Latin-1 are inserted in the range 0x82
through 0x95--this having the merit of being incompatible with both
Latin-1 and Unicode, which reserve this region for additional control
characters.
=============

I'd like to suggest the addition of a str_demoronise() function which
fixes these wrong characters, and replaces them by the correct ASCII.




Reproduce code:
---------------
>From the source of demoroniser, here are the substitutions made. The MS
column is what Microsoft use (in Hex); the FIX column is the replacement:

MS      FIX

0x82    ,
0x83    <em>f</em>
0x84    ,,
0x85    ...
0x88    ^
0x89    ' °/°°'            <-- whitsepace; no '' quotes
0x8B    <
0x8C    Oe
0x91    `
0x92    '
0x93    "
0x94    "
0x95    *
0x96    -
0x97    --
0x98    <sup>~</sup>
0x99    <sup>TM</sup>
0x9B    >
0x9C    oe


-- 
Edit bug report at http://bugs.php.net/?id=28646&edit=1
-- 
Try a CVS snapshot (php4):  http://bugs.php.net/fix.php?id=28646&r=trysnapshot4
Try a CVS snapshot (php5):  http://bugs.php.net/fix.php?id=28646&r=trysnapshot5
Fixed in CVS:               http://bugs.php.net/fix.php?id=28646&r=fixedcvs
Fixed in release:           http://bugs.php.net/fix.php?id=28646&r=alreadyfixed
Need backtrace:             http://bugs.php.net/fix.php?id=28646&r=needtrace
Need Reproduce Script:      http://bugs.php.net/fix.php?id=28646&r=needscript
Try newer version:          http://bugs.php.net/fix.php?id=28646&r=oldversion
Not developer issue:        http://bugs.php.net/fix.php?id=28646&r=support
Expected behavior:          http://bugs.php.net/fix.php?id=28646&r=notwrong
Not enough info:            http://bugs.php.net/fix.php?id=28646&r=notenoughinfo
Submitted twice:            http://bugs.php.net/fix.php?id=28646&r=submittedtwice
register_globals:           http://bugs.php.net/fix.php?id=28646&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=28646&r=php3
Daylight Savings:           http://bugs.php.net/fix.php?id=28646&r=dst
IIS Stability:              http://bugs.php.net/fix.php?id=28646&r=isapi
Install GNU Sed:            http://bugs.php.net/fix.php?id=28646&r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=28646&r=float

Reply via email to