Does anybody really understand character encodings?

2006-09-15 Thread Block, Jon
I have some users who enter data into my web application through one of two ways: - copy/paste from microsoft word - XML export from InDesign UTF-16 - XML export from Quark In all 3 of the cases I've described above, the orign software is putting through characters that do not display

Re: Does anybody really understand character encodings?

2006-09-15 Thread Brent Nicholas
I'm no expert here, but I'd try to use a regular expression that leaves all the 'known good characters' and removes the unknown. Though you'd really have to look into what is 'known good' if you think it maybe more than A-Z, a-z, 1-0, and punctuation. 2 cents. BN I have some users who enter

RE: Does anybody really understand character encodings?

2006-09-15 Thread Brad Wood
PROTECTED] Sent: Friday, September 15, 2006 10:25 AM To: CF-Talk Subject: Re: Does anybody really understand character encodings? I'm no expert here, but I'd try to use a regular expression that leaves all the 'known good characters' and removes the unknown. Though you'd really have to look into what

RE: Does anybody really understand character encodings?

2006-09-15 Thread Block, Jon
15, 2006 11:49 AM To: CF-Talk Subject: RE: Does anybody really understand character encodings? Well the trick is he would not only want to remove bad characters but replace them with the correct ASCII equivalent. For instance a MS Word smart left quote would become a regular double quote, or a MS

Re: Does anybody really understand character encodings?

2006-09-15 Thread Paul Hastings
Block, Jon wrote: - copy/paste from microsoft word probably windows-1252 superset of latin-1 which often confuses people. - XML export from InDesign UTF-16 to simplify things it would be best to try to get utf-8 out of this thing. or at least which endian it is. - XML export from Quark

Re: Does anybody really understand character encodings?

2006-09-15 Thread Jon Gunnip
We've only support ASCII in our database and so we handle the paste from Word issue by identifying the most common non-ASCII Word characters and replacing them with ASCII equivalents: Replace(Local.Text, chr(8211), -, all); /* short dash from MS Word */ Replace(Local.Text, chr(8212), --, all); /*

Re: Does anybody really understand character encodings?

2006-09-15 Thread Sixten Otto
Replace(Local.Text, chr(8211), -, all); /* short dash from MS Word */ Replace(Local.Text, chr(8212), --, all); /* long dash from MS Word */ Replace(Local.Text, chr(8216), ', all); /* left single quote from MS Word */ Replace(Local.Text, chr(8217), ', all); /* right single quote from MS Word */

Re: Does anybody really understand character encodings?

2006-09-15 Thread Jochem van Dieten
Brad Wood wrote: Well the trick is he would not only want to remove bad characters but replace them with the correct ASCII equivalent. So what tu do with characters that are not present in ASCII? Jochem ~| Introducing the

Re: Does anybody really understand character encodings?

2006-09-15 Thread Jochem van Dieten
Block, Jon wrote: I have some users who enter data into my web application through one of two ways: - copy/paste from microsoft word - XML export from InDesign UTF-16 - XML export from Quark How can I accept text from each of the above mentioned sources, perhaps others, and somehow

Re: Does anybody really understand character encodings?

2006-09-15 Thread Denny Valliant
On 9/15/06, Paul Hastings [EMAIL PROTECTED] wrote: - XML export from Quark from i remember, these yahoos refused to support unicode. i guess they might have changed since then (4-5 years ago). I would like to take a second to bitch about how quark has claimed to have XML capabilities since

RE: Does anybody really understand character encodings?

2006-09-15 Thread Dan G. Switzer, II
Jon, I have some users who enter data into my web application through one of two ways: - copy/paste from microsoft word - XML export from InDesign UTF-16 - XML export from Quark In all 3 of the cases I've described above, the orign software is putting through characters that do not display