Hi, Thursday, January 15, 2004, 10:41:57 AM, you wrote: TR> Hi,
TR> Thursday, January 15, 2004, 3:07:02 AM, you wrote: RS>> Hello, RS>> This question may border on OT... RS>> I have a web form where visitors must enter large amounts of text at one RS>> time (text area). Once submitted, the large amount of text is stored as RS>> a CLOB in an Oracle database. RS>> Some of my visitors create their text in Ms-Word and then cut and paste RS>> it into the text area and then submit the form. RS>> When I retrieve it from the database, I do a stripslahses, htmlentities RS>> and nl2br in that order to preserve the format of the submitted test. RS>> When I view this text, single or double quotes show up as little white RS>> square blocks. I've tested this out with MS-Word on a windows machine RS>> and a mac machine. Same thing happens with either OS. This only RS>> happens when they cut and paste from MS-Word into the text area. If RS>> they type text into the text area directly, everything is fine... RS>> I know I can search through their submitted text and swap out the RS>> unrecognized character and insert the proper one. I just don't know RS>> what to look for as being the unrecognized character. RS>> I've googled all over looking at ascII charts and keyboard maps. RS>> Nothing mentions MS-Word specific information though. RS>> Anyone out there dealt with this before? RS>> Thanks, RS>> R TR> The quotes are actually a sequence of three bytes with values like TR> 226 128 156 TR> 226 128 157 TR> for the 2 quotes TR> here is a bit of code to fix them and a few others, I would be TR> interested if anyone knew the complete set of these weirdos :) TR> $crap = TR> array(chr(226).chr(128).chr(147),chr(226).chr(128).chr(156),chr(226).chr(128).chr(157),chr(226).chr(128).chr(153)); TR> $clean = array('-','"','"',"'"); TR> $content = str_replace($crap,$clean,$text); TR> -- TR> regards, TR> Tom I am probably misleading you ... sorry It seems scintilla is the one creating the 3 byte sequence for me from a msword paste. Here is function to clean it to entities: function clean_ms_word($text){ $crap = array( Ox82,0x83,0x84,0x85,0x86,0x87,0x88,0x89, 0x8a,0x8b,0x8c,0x91,0x92,0x93,0x94,0x95, 0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,9f ); $clean = array( '‚','ƒ','„','&ldots;','†','‡','','‰','Š', '‹','Œ','‘','’','“','”','•','–', '—','˜','™','š','›','œ','Ÿ' ); $content = str_replace($crap,$clean,$text); return $content; } -- regards, Tom -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php