Re: Truncating UTF-8 Strings

Klaus Berkling Mon, 08 Jun 2009 16:49:36 -0700

On Jun 8, 2009, at 2:40 PM, Andrew Lindesay wrote:

If you render the original string, I presume that it does not contain the corrupted UTF-8 sequence and renders the glyphs correctly?

Right. If I change the number of characters I get different results. Truncating to 12 bytes makes up two japanese characters, 6 makes up one.

returnValue = new String(textBlock.toString().getBytes("UTF-8"), 0, lengthTruncated, "UTF-8");

^^^ I know you tried it using sub-strings, but this above would definitely cause trouble as it could break inside multi-byte sequences.

I still get 'fractional' multi-byte characters but the results are different:

Previously:

Note the length is different so it does make an attempt to count the glyphs. This could mean that it's a different type of encoding and so my data is corrupted at at least not what I think it is.

Thanks

kib

"Success is not final, failure is not fatal: it is the courage to continue that counts."

Winston Churchill

Klaus Berkling

Systems Administrator

DynEd International, Inc.

www.dyned.com | www.eskimo.com/~kiberkli

smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com


This email sent to arch...@mail-archive.com

Re: Truncating UTF-8 Strings

Reply via email to