Re: formattedText and Unicode

Richard Gaskin Sun, 09 Aug 2009 19:08:47 -0700

Phil Davis wrote:

Don't know if this will help, but Klaus posted a response to Ken Ray in"Re: Detecting UTF-8 Encoded Files" on 7 Aug. It contains helpful hintsabout detecting what Unicode file format you're dealing with - I don'tknow if the tips work universally, but maybe that's a starting place.

That was just what I needed. Well, mostly anyway. Thanks to MarkWaddingham, Klaus, and Mark Smith for his swapBytes function, now I havesome progress here.

The code posted below is as far as I've gotten. It displays every testfile on my drive almost perfectly, including UTF8 and UTF16 in both big-and little-endian.


Two challenges remain:

While the glyphs appear to be good, the line spacing is way off.Looking at the same files in TextEdit shows a lot of blank lines, but inthe Rev field they're all bunched up together.

And second, I've found no way to get the formattedText in any form thatlooks usable. :(

Any tips on those would be much appreciated. Thanks again for the codeexamples that got me this far.


--
 Richard Gaskin
 Fourth World
 Revolution training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com

----------------------------------------------------------

on mouseUp
  answer file "Selecf a file:"
  if it is empty then exit to top
  put url ("binfile:"&it) into tData
  set the unicodeText of fld 1 to RawDataToUTF16(tData)
end mouseUp


function RawDataToUTF16 pData
  -- Examine the data to determine encoding:
  switch
  case charToNum(byte 1 of pData) = 0
    put "UTF16BE" into tTextEncoding
    break

case charToNum(byte 1 of pData) = 0xFE and charToNum(char 2 of pData)= 0xFF

    delete byte 1 to 2 of pData
    put "UTF16BE" into tTextEncoding
    break

case charToNum(byte 1 of pData) = 0xFF and charToNum(char 2 of pData)= 0xFE

    delete byte 1 to 2 of pData
    put "UTF16LE" into tTextEncoding
    break
  default
    put "UTF8" into tTextEncoding
    break
  end switch
  --
  if tTextEncoding begins with "UTF16" then
    -- Check byte order, swapping if needed:
    if the processor is "x86" then
      put "LE" into tHostByteOrder
    else
      put "BE" into tHostByteOrder
    end if
    if byte -2 to -1 of tTextEncoding <> tHostByteOrder then
      put swapbytes(pData) into pData
    end if
    -- Already utf16, so nothing more needs to be done:
    put pData into tFieldData
  else
    -- Convert from utf8 to Rev's native utf16:
    put uniEncode(pData, "UTF8") into tFieldData
  end if
  --
  return tFieldData
end RawDataToUTF16


function swapBytes pString
  repeat with n = 1 to length(pString) - 1 step 2
    put byte n+1 of pString & byte n of pString after swappedString
  end repeat
  return swappedString
end swapBytes
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: formattedText and Unicode

Reply via email to