Re: unidecode broken on Intel Macs?

Mark Schonewille Wed, 01 Aug 2007 07:36:56 -0700

Hi Klaus,

You are right, but only with regard to languages that can beexpressed in single-byte characters. If you are working with double-byte languages, you can't simple unidecode into single-bytecharacters. Apparently, Rev 1.0 couldn't handle Chinese and Arabic(but I never tried that with version 1.0).

I am not sure that the addition of the capability to handle double-byte characters automatically implies that unidecode no longer cutsoff the second byte of each pair regardless of platform. I did have aMac file recently, which I had to convert to little endian beforeunidecoding it on Windows.


Best regards,

Mark Schonewille

--

Economy-x-Talk Consulting and Software Engineering
http://economy-x-talk.com
http;//www.salery.com

Quickly extract data from your HyperCard stacks with DIFfersifier.http://differsifier.economy-x-talk.com



Op 1-aug-2007, om 16:20 heeft Klaus Major het volgende geschreven:

Hi Mark,
A big-endian (motorola) unicode character will be in the form :msb lsb, so if the character falls within the ascii range, say"A", then it will be <numToChar(65) numToChar(0)>.
If it's in little-endian (intel) format, the same char will be<numToChar(0) numToChar(65)>.
Unidecode simply removes the most significant byte of each unicodechar/pair, so on Intel, thats the second byte, and on motorolathat's the first byte.
Yep, that's what I read in the docs.

But the docs also read:
"The ability to handle double-byte characters on "little-endian"processors was added in version 2.0. In previous versions, theuniDecode function always removed the second byte of each pair ofbytes, regardless of platform."
This gives me the impression that the function itself will takecare of the differences between the processors -> "...regardless ofplatform"!
Maybe I am wrong?
So the upshot is that if your data is big-endian (motorola), thento work with unidecode on intel, you'll need to swap each pair ofbytes.
function swapBytes pString
  repeat with n = 1 to length(pString) - 1 step 2
    put char n+1 of pString & char n of pString after swappedString
  end repeat
  return swappedString
end swapBytes
Thanks a lot, will try this (well maybe... ;-)
I'm hoping that we'll get a complete revamp of Revs unicodehandling, one of these days, but we're stuck with this sort ofthing for now. :(


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: unidecode broken on Intel Macs?

Reply via email to