Dar,

I got your code to work by making some simple changes in the sortCodeFromRussian function:

function sortCodeFromRussianChar utf16Char
  set the useUnicode to true
  put charToNum(utf16Char) into unicodePoint

## Devin's changes - it turns out leaving the code points in decimal works perfectly,
##  and I only had to make a couple of adjustments.
  if unicodePoint > 1039 and unicodePoint < 1072 then -- ignore case
    add 32 to unicodePoint
  else if unicodePoint = 1105 then -- sort 'yo' with 'ye'
    put 1077 into unicodePoint
  end if
##
  --   switch unicodePoint
  --   case 0x0020 -- space
  --     get 1
  --     break
  --   ...
  --   default
  --     get 255
  --   end switch
  return unicodePoint --it
end sortCodeFromRussianChar


On May 27, 2006, at 2:05 PM, Dar Scott wrote:

Try something roughly like this (not tested; typed in raw):

function sortRussian utf16RussianList
   -- use utf8 to get rid of NULs and extra line ends
   put uniDecode(utf16RussianList, "UTF8") into utf8RussianList
   sort lines of utf8RussianList text by russianLex(each)
   return utf8RussianList
end sortRussian

-- returns string suitable for lexical comparison (Rev sort text)
-- of a utf8 string made up of Russian subset of Cyrillic plus some ASCII
function russianLex utf8RussianLine
   -- Add adjustments for special words here
   put uniEncode(utf8RussianLine, "UTF8") into utf16RussianLine
   put empty into lex
repeat with i = 1 to length(utf16RussianLine)-1 step 2 -- uniCode char loop
      put char i to i+1 of utf16RussianLine into utf16RussianChar
      -- Add char dropping tests here
      put sortCodeFromRussianChar( utf16RussianChar) into sortNumber
put numTochar( sortNumber ) after lex -- use 1-byte chars for sorting
  end repeat
  return lex
end russianLex

-- returns number in range 1 to 255 indicating sort position of
-- allowed characters
function sortCodeFromRussianChar utf16Char
   set the useUnicode to true
   put charToNum(utf16Char) into unicodePoint
   switch unicodePoint
   case 0x0020 -- space
     get 1
     break
   ...
   default
     get 255
   end switch
   return it
end sortCodeFromRussianChar

This will take some debugging.

Only a little. ;-)

This is a huge help! Thanks a million.

Devin

Devin Asay
Humanities Technology and Research Support Center
Brigham Young University

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to