Hi,

Thursday, January 15, 2004, 10:41:57 AM, you wrote:
TR> Hi,

TR> Thursday, January 15, 2004, 3:07:02 AM, you wrote:
RS>> Hello,

RS>> This question may border on OT...

RS>> I have a web form where visitors must enter large amounts of text at one
RS>> time (text area).  Once submitted, the large amount of text is stored as
RS>> a CLOB in an Oracle database.

RS>> Some of my visitors create their text in Ms-Word and then cut and paste
RS>> it into the text area and then submit the form.

RS>> When I retrieve it from the database, I do a stripslahses, htmlentities
RS>> and nl2br in that order to preserve the format of the submitted test.
RS>> When I view this text, single or double quotes show up as little white
RS>> square blocks.  I've tested this out with MS-Word on a windows machine
RS>> and a mac machine.  Same thing happens with either OS.  This only
RS>> happens when they cut and paste from MS-Word into the text area.  If
RS>> they type text into the text area directly, everything is fine...

RS>> I know I can search through their submitted text and swap out the
RS>> unrecognized character and insert the proper one.  I just don't know
RS>> what to look for as being the unrecognized character.

RS>> I've googled all over looking at ascII charts and keyboard maps. 
RS>> Nothing mentions MS-Word specific information though.

RS>> Anyone out there dealt with this before?

RS>> Thanks,
RS>> R


TR> The quotes are actually a sequence of three bytes with values like

TR> 226 128 156
TR> 226 128 157

TR> for the 2 quotes

TR> here is a bit of code to fix them and a few others, I would be
TR> interested if anyone knew the complete set of these weirdos :)

TR> $crap =
TR> 
array(chr(226).chr(128).chr(147),chr(226).chr(128).chr(156),chr(226).chr(128).chr(157),chr(226).chr(128).chr(153));
TR> $clean = array('-','"','"',"'");
TR> $content = str_replace($crap,$clean,$text);

TR> -- 
TR> regards,
TR> Tom

I am probably misleading you ... sorry
It seems scintilla is the one creating the 3 byte sequence for me from
a msword paste. Here is function to clean it to entities:

function clean_ms_word($text){
        $crap = array(
                Ox82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,
                0x8a,0x8b,0x8c,0x91,0x92,0x93,0x94,0x95,
                0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,9f
        );
        $clean = array(
                
'‚','ƒ','„','&ldots;','†','‡','','‰','Š',
                
'‹','Œ','‘','’','“','”','•','–',
                '—','˜','™','š','›','œ','Ÿ'
        );
        $content = str_replace($crap,$clean,$text);
        return $content;
}

-- 
regards,
Tom

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to