Iterating over a string is for the purpose of doing something with each
individual character..whether it is a ‘A’ or a 'A' with a ^ (caret) on
top of it. When I said the number of bytes in a character varies I was not
meaning the number of bytes in a Char - I was meaning the total
23, 2010 7:33 PM
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Iterating over a string is for the purpose of doing something with each
individual character..whether it is a ‘A’ or a 'A' with a ^ (caret) on
top of it. When I
I think you are confusing Canonical Normalized versions
of the same Unicode string (in the example s1 is canonical,
s2 is normalized) and the effect of local codepage conversion.
Yep, and for the record I think this is a big problem with the way Embarcadero
implemented Unicode.
By
John, the problem is that in Unicode single character is meaningless unless
you have performed some pre-processing to GIVE that term some meaning. There
are some standard forms for such processing, called Normalisations.
The problem is that a single character to your eyes, e.g. an accented a,
Hi John
You can find out whether a unicode string is inside the BMP by
converting it to UTF-32 and checking that the new string is twice the
length of the original (UTF-16) string.
A user could specifically choose to enter that character in either form -
this is unlikely, yes. Or, two users
?I read in one of the references that UTF-32 was a more common standard on
Unix systems - which means I guess they have chosen the simplest format at
the trade off of using more space?
I think linux/Windows/MacOS use UTF-16 more commonly...
Anyway for the time being, as long as the data in
Anyway for the time being, as long as the data in
strings is unicode, but is still Latin 8859 (ie ASCII
characters) I can without worrying too much iterate over
a string one character at a time...using length.
Yep. But you are building an app that now supports Unicode.
If your users are
- Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Hi John
You can find out whether a unicode string is inside the BMP by
converting it to UTF-32 and checking that the new string is twice the
length of the original (UTF-16) string.
A user could specifically choose
Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Hi John
You can find out whether a unicode string is inside the BMP by
converting it to UTF-32 and checking that the new string is twice the
length of the original (UTF-16) string.
A user could specifically
Thanks for the references, so I can answer most of the questions now.
Here is what I understand so far, if anyone has anything to add this will be
useful!
Extra question:
It looks like code like
for i:=1 to length(string1) do
begin
DoSomethingWithOneChar(string1[i]);
, 23 November 2010 1:04 p.m.
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Thanks for the references, so I can answer most of the questions now.
Here is what I understand so far, if anyone has anything to add this will be
useful!
Extra
...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On
Behalf Of John Bird
Sent: Tuesday, 23 November 2010 13:04
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Thanks for the references, so I can answer most of the questions now.
Here
.
-Original Message-
From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On
Behalf Of David Brennan
Sent: Tuesday, 23 November 2010 1:27 PM
To: 'NZ Borland Developers Group - Delphi List'
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Just thought I would chime
You should get a copy of Marco Cantu's Delphi 2009 Handbook - it has
about 90 pages on Unicode in Delphi. I think Bob Swart has a similar
(less detailed) book. There is also some videos from one of the
CodeRage events (probably CodeRage 3 or 4).
Alister Christie
Computers for People
Ph: 04
I won't answer everything but just on this one question:
On 23 November 2010 11:04, John Bird johnkb...@paradise.net.nz wrote:
Extra question:
It looks like code like
for i:=1 to length(string1) do
begin
DoSomethingWithOneChar(string1[i]);
end;
cannot be used
-toolbox.com/
From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On
Behalf Of Jolyon Smith
Sent: Tuesday, November 23, 2010 9:40 AM
To: 'NZ Borland Developers Group - Delphi List'
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
I'm guessing my response
...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On
Behalf Of Colin Johnsun
Sent: Tuesday, 23 November 2010 14:31
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
I won't answer everything but just on this one question:
On 23
*From:* delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz]
*On Behalf Of *Colin Johnsun
*Sent:* Tuesday, 23 November 2010 14:31
*To:* NZ Borland Developers Group - Delphi List
*Subject:* Re: [DUG] Upgrading to XE - Unicode strings questions
I won't answer everything
My main remaining question is the best way to handle code that up to now
looked like:
for i:=1 to length(string1) do
begin
DoSomethingWithOneChar(string1[i]);
end;
If I got the gist correctly, string1[i] is one unicode character, but
length(string1) is the number of
: Tuesday, 23 November 2010 15:22
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Doh! Thanks Jolyon for clearing that misunderstanding on my part. I was
aware of the surrogate pair issue but I wrongly assumed that this might have
been
...@delphi.org.nz] On
Behalf Of Colin Johnsun
Sent: Tuesday, 23 November 2010 14:31
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
I won't answer everything but just on this one question:
On 23 November 2010 11:04, John Bird
[mailto:delphi-boun...@delphi.org.nz] On
Behalf Of John Bird
Sent: Tuesday, 23 November 2010 15:36
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
My main remaining question is the best way to handle code that up to now
looked like:
for i:=1
Hi John
Extra question:
It looks like code like
for i:=1 to length(string1) do
begin
DoSomethingWithOneChar(string1[i]);
end;
cannot be used reliably.
I think the solution here is to not to concentrate so much on unicode,
but rather on what
As I understand it iterating over a string with Chars does get around the
problem of surrogate pairs
It depends what you mean by get around the problem.
for c in string do WorkWith( c );
Will iterate once for each c (WIDECHAR) in s. Some of those c's may be in
surrogate pairs, but you
Hi John
Extra question:
It looks like code like
for i:=1 to length(string1) do
begin
DoSomethingWithOneChar(string1[i]);
end;
cannot be used reliably.
I think the solution here is not to concentrate on unicode vs widechar
vs ansichar, but rather on what
Length( s ) will always yield the number of chars in s.
So how does one obtain the number of bytes in a string if one wants to use
AnsiChar to check every character?
Does s[0] work?
Ross.
___
NZ Borland Developers Group - Delphi mailing list
Planning upgrading from D2007 to XE, but want to read up on issues I will
need to consider first to do with strings becoming Unicode by default. I
recall the release of D2009 came with good white papers explaining
ramifications, however I haven’t seen these as I haven’t upgraded. Asked
27 matches
Mail list logo