Phil Davis wrote:
Don't know if this will help, but Klaus posted a response to Ken Ray in "Re: Detecting UTF-8 Encoded Files" on 7 Aug. It contains helpful hints about detecting what Unicode file format you're dealing with - I don't know if the tips work universally, but maybe that's a starting place.

That was just what I needed. Well, mostly anyway. Thanks to Mark Waddingham, Klaus, and Mark Smith for his swapBytes function, now I have some progress here.

The code posted below is as far as I've gotten. It displays every test file on my drive almost perfectly, including UTF8 and UTF16 in both big- and little-endian.

Two challenges remain:

While the glyphs appear to be good, the line spacing is way off. Looking at the same files in TextEdit shows a lot of blank lines, but in the Rev field they're all bunched up together.

And second, I've found no way to get the formattedText in any form that looks usable. :(

Any tips on those would be much appreciated. Thanks again for the code examples that got me this far.

--
 Richard Gaskin
 Fourth World
 Revolution training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com

----------------------------------------------------------

on mouseUp
  answer file "Selecf a file:"
  if it is empty then exit to top
  put url ("binfile:"&it) into tData
  set the unicodeText of fld 1 to RawDataToUTF16(tData)
end mouseUp


function RawDataToUTF16 pData
  -- Examine the data to determine encoding:
  switch
  case charToNum(byte 1 of pData) = 0
    put "UTF16BE" into tTextEncoding
    break
case charToNum(byte 1 of pData) = 0xFE and charToNum(char 2 of pData) = 0xFF
    delete byte 1 to 2 of pData
    put "UTF16BE" into tTextEncoding
    break
case charToNum(byte 1 of pData) = 0xFF and charToNum(char 2 of pData) = 0xFE
    delete byte 1 to 2 of pData
    put "UTF16LE" into tTextEncoding
    break
  default
    put "UTF8" into tTextEncoding
    break
  end switch
  --
  if tTextEncoding begins with "UTF16" then
    -- Check byte order, swapping if needed:
    if the processor is "x86" then
      put "LE" into tHostByteOrder
    else
      put "BE" into tHostByteOrder
    end if
    if byte -2 to -1 of tTextEncoding <> tHostByteOrder then
      put swapbytes(pData) into pData
    end if
    -- Already utf16, so nothing more needs to be done:
    put pData into tFieldData
  else
    -- Convert from utf8 to Rev's native utf16:
    put uniEncode(pData, "UTF8") into tFieldData
  end if
  --
  return tFieldData
end RawDataToUTF16


function swapBytes pString
  repeat with n = 1 to length(pString) - 1 step 2
    put byte n+1 of pString & byte n of pString after swappedString
  end repeat
  return swappedString
end swapBytes
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to