Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread John Bird
Iterating over a string is for the purpose of doing something with each individual character..whether it is a ‘A’ or a 'A' with a ^ (caret) on top of it. When I said the number of bytes in a character varies I was not meaning the number of bytes in a Char - I was meaning the total

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread Stefan Mueller
23, 2010 7:33 PM To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Iterating over a string is for the purpose of doing something with each individual character..whether it is a ‘A’ or a 'A' with a ^ (caret) on top of it. When I

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread Jolyon Smith
I think you are confusing Canonical Normalized versions of the same Unicode string (in the example s1 is canonical, s2 is normalized) and the effect of local codepage conversion. Yep, and for the record I think this is a big problem with the way Embarcadero implemented Unicode. By

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread Jolyon Smith
John, the problem is that in Unicode single character is meaningless unless you have performed some pre-processing to GIVE that term some meaning. There are some standard forms for such processing, called Normalisations. The problem is that a single character to your eyes, e.g. an accented a,

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread Todd
Hi John You can find out whether a unicode string is inside the BMP by converting it to UTF-32 and checking that the new string is twice the length of the original (UTF-16) string. A user could specifically choose to enter that character in either form - this is unlikely, yes. Or, two users

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread John Bird
?I read in one of the references that UTF-32 was a more common standard on Unix systems - which means I guess they have chosen the simplest format at the trade off of using more space? I think linux/Windows/MacOS use UTF-16 more commonly... Anyway for the time being, as long as the data in

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread Jolyon Smith
Anyway for the time being, as long as the data in strings is unicode, but is still Latin 8859 (ie ASCII characters) I can without worrying too much iterate over a string one character at a time...using length. Yep. But you are building an app that now supports Unicode. If your users are

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread Ross Levis
- Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Hi John You can find out whether a unicode string is inside the BMP by converting it to UTF-32 and checking that the new string is twice the length of the original (UTF-16) string. A user could specifically choose

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-23 Thread Jolyon Smith
Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Hi John You can find out whether a unicode string is inside the BMP by converting it to UTF-32 and checking that the new string is twice the length of the original (UTF-16) string. A user could specifically

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread John Bird
Thanks for the references, so I can answer most of the questions now. Here is what I understand so far, if anyone has anything to add this will be useful! Extra question: It looks like code like for i:=1 to length(string1) do begin DoSomethingWithOneChar(string1[i]);

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread David Brennan
, 23 November 2010 1:04 p.m. To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Thanks for the references, so I can answer most of the questions now. Here is what I understand so far, if anyone has anything to add this will be useful! Extra

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Jolyon Smith
...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of John Bird Sent: Tuesday, 23 November 2010 13:04 To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Thanks for the references, so I can answer most of the questions now. Here

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Ross Levis
. -Original Message- From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of David Brennan Sent: Tuesday, 23 November 2010 1:27 PM To: 'NZ Borland Developers Group - Delphi List' Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Just thought I would chime

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Alister Christie
You should get a copy of Marco Cantu's Delphi 2009 Handbook - it has about 90 pages on Unicode in Delphi. I think Bob Swart has a similar (less detailed) book. There is also some videos from one of the CodeRage events (probably CodeRage 3 or 4). Alister Christie Computers for People Ph: 04

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Colin Johnsun
I won't answer everything but just on this one question: On 23 November 2010 11:04, John Bird johnkb...@paradise.net.nz wrote: Extra question: It looks like code like for i:=1 to length(string1) do begin DoSomethingWithOneChar(string1[i]); end; cannot be used

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Stefan Mueller
-toolbox.com/ From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Jolyon Smith Sent: Tuesday, November 23, 2010 9:40 AM To: 'NZ Borland Developers Group - Delphi List' Subject: Re: [DUG] Upgrading to XE - Unicode strings questions I'm guessing my response

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Jolyon Smith
...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Colin Johnsun Sent: Tuesday, 23 November 2010 14:31 To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions I won't answer everything but just on this one question: On 23

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Colin Johnsun
*From:* delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] *On Behalf Of *Colin Johnsun *Sent:* Tuesday, 23 November 2010 14:31 *To:* NZ Borland Developers Group - Delphi List *Subject:* Re: [DUG] Upgrading to XE - Unicode strings questions I won't answer everything

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread John Bird
My main remaining question is the best way to handle code that up to now looked like: for i:=1 to length(string1) do begin DoSomethingWithOneChar(string1[i]); end; If I got the gist correctly, string1[i] is one unicode character, but length(string1) is the number of

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Jolyon Smith
: Tuesday, 23 November 2010 15:22 To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Doh! Thanks Jolyon for clearing that misunderstanding on my part. I was aware of the surrogate pair issue but I wrongly assumed that this might have been

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread John Bird
...@delphi.org.nz] On Behalf Of Colin Johnsun Sent: Tuesday, 23 November 2010 14:31 To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions I won't answer everything but just on this one question: On 23 November 2010 11:04, John Bird

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Jolyon Smith
[mailto:delphi-boun...@delphi.org.nz] On Behalf Of John Bird Sent: Tuesday, 23 November 2010 15:36 To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions My main remaining question is the best way to handle code that up to now looked like: for i:=1

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Todd
Hi John Extra question: It looks like code like for i:=1 to length(string1) do begin DoSomethingWithOneChar(string1[i]); end; cannot be used reliably. I think the solution here is to not to concentrate so much on unicode, but rather on what

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Jolyon Smith
As I understand it iterating over a string with Chars does get around the problem of surrogate pairs It depends what you mean by get around the problem. for c in string do WorkWith( c ); Will iterate once for each c (WIDECHAR) in s. Some of those c's may be in surrogate pairs, but you

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Todd
Hi John Extra question: It looks like code like for i:=1 to length(string1) do begin DoSomethingWithOneChar(string1[i]); end; cannot be used reliably. I think the solution here is not to concentrate on unicode vs widechar vs ansichar, but rather on what

Re: [DUG] Upgrading to XE - Unicode strings questions

2010-11-22 Thread Ross Levis
Length( s ) will always yield the number of chars in s. So how does one obtain the number of bytes in a string if one wants to use AnsiChar to check every character? Does s[0] work? Ross. ___ NZ Borland Developers Group - Delphi mailing list

[DUG] Upgrading to XE - Unicode strings questions

2010-11-17 Thread John Bird
Planning upgrading from D2007 to XE, but want to read up on issues I will need to consider first to do with strings becoming Unicode by default. I recall the release of D2009 came with good white papers explaining ramifications, however I haven’t seen these as I haven’t upgraded. Asked