Re: Unicode and Chunks

Dar Scott Mon, 29 Sep 2003 09:54:21 -0700

Welcome, Dean!

On Monday, September 29, 2003, at 06:59 AM, Dean Snyder wrote:

I've been enjoying using Unicode in Revolution 2.1 for the most part. The only problem I've encountered so far is that chunk evaluation doesn't seem to work correctly with Unicode characters. For example, if any byte of a double byte Unicode character is "09" that will increment the item count in chunk evaluation if you have set the itemDelimiter to "tab", ASCII 09; but, of course, the character is not a tab.

At this time it seems that Revolution values are still byte sequences and as long as you are using one-byte characters those are char sequences. Unicode will be UTF-16 (16-bit chars with perhaps something special for 32). Those double-byte characters are flattened into a byte sequence based on host ordering (ick). So, at this time, you are working with bytes.

Here are some ideas:

1 Convert to UTF-8. Each character is one to four bytes (for unicode version 4). This has the cool property that tab or even comma or the Revolution line end will not show up in the extension bytes. This should work with split and combine, too.

2 Highly experimental: Maybe there is an undocumented feature of useUnicode that will allow this to work. You might have to create a unicode tab char, 0009.

Dar Scott
unicode sophomore

_______________________________________________
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Unicode and Chunks

Reply via email to