That does help, a lot... I was kind of coming to that conclusion after doing my home work with the Rev docs and reading all the unicode entries and testing a number of the onboard rev unicode functions. But it still didn't get me from a to b

Here's the challenge.

If the clipboard from InDesign contains a two byte character and when I paste it into Rev (or BBEdit for that matter) and it appears in Osaka as a Japanese character, I think we know we have a two byte character. Why it looks one way InDesign and another way in Rev ...don't know...

In order to "downsize" that two-byte character to a suitable 0-127 char equivalent string (In this context which is lang:English alpha:Roman, I want ALL text to be super dumb and pass painlessly through any and all future user agents in any hardware/software context) how do I do that?

e.g. our editors use some odd glyph in InDesign and our web guy is repurposing this for the web and he pastes it into my little web pager rev app, and sees wierd characters... In theory, if I knew what the two values were, what I usually do is, in the background, clean it first

put char(26) into tStringToReplace
replace tStringToReplace with quote in tIncomingText

so he never see anything but 1-127 from the start.

So challenge is: find any way to, programatically, identify
a) that an incoming character *is* two-byte and
b) if it is, then to know what it is and replace it with lo-ascii range equivalent.


If it could be translated would it look like char(204,218) or what? Then, do you cat the two?

put char(204) & char(218) into tStringToReplace
replace tStringToReplace with "Y"
## where this could be some two-byte character "Y" with marks above it of some kind


I know if I actually paste some wierd string into the script editor, assuming I know for sure what it's equivalent is... this does work:

replace "[paste 2-byte char here]" with "sh"

but, i won't always know what the incoming wierd character is... Also, since examining every single incoming char might slow operations down considerably... I might just let the user fix these manually: so I need at least for the user to be able to select the two-byte character in a rev field and then have a script that will examine the selected chunk and do the necessary replacement. This could work for small articles in our magazine, but I'm about to embark on repurposing 1000 page books from InDesign to web so I'll like to get a better handle on this from inside Rev.

I already have a matrix for HTML entities that looks like this:

�       A
�       A
�       Ch
�       E
�       N

etc. (with every possible >127 character in the fonts in use)

So, if I could identify the two-byte characters I would just extend this...

Sannyasin Sivakatirswami
Himalayan Academy Publications
at Kauai's Hindu Monastery
[EMAIL PROTECTED]

www.HimalayanAcademy.com,
www.HinduismToday.com
www.Gurudeva.org
www.Hindu.org

On Apr 21, 2002, at 1:18 PM, Brian Yennie wrote:

Sannyasin,

I don't know if this is something you already have a handle on, but the first thing to know about Unicode is that each character is _two_ bytes instead of one, so some of this weird pasting behavior happens because the receiving application treats the two bytes as two consecutive characters.

The reason why, most likely, you think you are getting a valid ASCII number but not seeing a valid ASCII character is because you are actually testing the charToNum() of a two character string- and charToNum() only considers the first character.

For example, charToNum("apple") is the same as charToNum("a"), even though they are obviously different strings to the human eye.

HTH!

_______________________________________________ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to