I was trying to convert UTF-8 content into a series of entities like 剛 so that whatever the page encoding is, the characters would show...
so I used something like this: <% begin t = '' s = Iconv.conv("UTF-32", "UTF-8", some_utf8_string) s.scan(/(.)(.)(.)(.)/) do |b1, b2, b3, b4| t += ("&#x" + "%02X" % b3.ord) + ("%02X" % b4.ord) + ";" end rescue => details t = "exception " + details end %> <%= t %> but some characters get converted, and some don't. Is it true that (.)(.)(.)(.) will not necessarily match 4 bytes at a time? At first, I was going to use s = Iconv.conv("UTF-16", "UTF-8", some_utf8_string) but then i found that utf-16 is also variable length... so I used UTF-32 instead which is fixed length. The UTF-8 string I have is just the Basic Plane... so should be all in the 0x0000 to 0xFFFF range in unicode. -- Posted via http://www.ruby-forum.com/. _______________________________________________ Railsi18n-discussion mailing list Railsi18n-discussion@rubyforge.org http://rubyforge.org/mailman/listinfo/railsi18n-discussion