I have some users who enter data into my web application through one of
two ways:
- copy/paste from microsoft word
- XML export from InDesign UTF-16
- XML export from Quark
In all 3 of the cases I've described above, the orign software is
putting through characters that do not display
I'm no expert here, but I'd try to use a regular expression that leaves all the
'known good characters' and removes the unknown. Though you'd really have to
look into what is 'known good' if you think it maybe more than A-Z, a-z, 1-0,
and punctuation.
2 cents.
BN
I have some users who enter
PROTECTED]
Sent: Friday, September 15, 2006 10:25 AM
To: CF-Talk
Subject: Re: Does anybody really understand character encodings?
I'm no expert here, but I'd try to use a regular expression that leaves
all the 'known good characters' and removes the unknown. Though you'd
really have to look into what
15, 2006 11:49 AM
To: CF-Talk
Subject: RE: Does anybody really understand character encodings?
Well the trick is he would not only want to remove bad characters but
replace them with the correct ASCII equivalent. For instance a MS Word
smart left quote would become a regular double quote, or a MS
Block, Jon wrote:
- copy/paste from microsoft word
probably windows-1252 superset of latin-1 which often confuses people.
- XML export from InDesign UTF-16
to simplify things it would be best to try to get utf-8 out of this
thing. or at least which endian it is.
- XML export from Quark
We've only support ASCII in our database and so we handle the paste
from Word issue by identifying the most common non-ASCII Word
characters and replacing them with ASCII equivalents:
Replace(Local.Text, chr(8211), -, all); /* short dash from MS Word */
Replace(Local.Text, chr(8212), --, all); /*
Replace(Local.Text, chr(8211), -, all); /* short dash from MS Word */
Replace(Local.Text, chr(8212), --, all); /* long dash from MS Word */
Replace(Local.Text, chr(8216), ', all); /* left single quote from MS Word
*/
Replace(Local.Text, chr(8217), ', all); /* right single quote from
MS Word */
Brad Wood wrote:
Well the trick is he would not only want to remove bad characters but
replace them with the correct ASCII equivalent.
So what tu do with characters that are not present in ASCII?
Jochem
~|
Introducing the
Block, Jon wrote:
I have some users who enter data into my web application through one of
two ways:
- copy/paste from microsoft word
- XML export from InDesign UTF-16
- XML export from Quark
How can I accept text from each of the above mentioned sources, perhaps
others, and somehow
On 9/15/06, Paul Hastings [EMAIL PROTECTED] wrote:
- XML export from Quark
from i remember, these yahoos refused to support unicode. i guess they
might have changed since then (4-5 years ago).
I would like to take a second to bitch about how quark has claimed to have
XML capabilities since
Jon,
I have some users who enter data into my web application through one of
two ways:
- copy/paste from microsoft word
- XML export from InDesign UTF-16
- XML export from Quark
In all 3 of the cases I've described above, the orign software is
putting through characters that do not display
11 matches
Mail list logo