They didn't.
Perhaps he means C#.
But here, AFAIK, string support is very different to what Pascal
programmers are accustomed to. Strings are always constant and can't be
modified, They are created once and discarded completely later.
Something like s[3] := 'x'; is not portable to C#.
-M
Op Wed, 12 Nov 2008, schreef Bernd Mueller:
Unicode support in FPC?
Sorry for jumping in. I am not so much interested in Unicode because I mostly
use shortstrings and ansistrings for performance reasons. But may be it is
worth to look how the gcc people have solved these issues.
They d
Unicode support in FPC?
Sorry for jumping in. I am not so much interested in Unicode because I
mostly use shortstrings and ansistrings for performance reasons. But may
be it is worth to look how the gcc people have solved these issues.
Regards, Bernd.
_
- widestrings had some bugs at the time and were not reference counted
(Martin did a great job over the years to improve that)
Of course this is a serious argument. I feel that any decent string type
(but pchar :) ) needs to be reference counting. In my "suggestion" mail
10 minutes ago in t
Lazarus assumes that an ansistring contains always utf-8. This is not
generally true.
While this might be true, I think it's a consequence of a shortcoming of
FPC, which simply identifies the types ANSIString and UTF8String. IMHO
(in a future version) it should take care of the encoding of str
Op Tue, 11 Nov 2008, schreef Luiz Americo Pereira Camara:
Jonas Maebe escreveu:
If people want to rely on what they are used to in non-unicode
environments, then they cannot directly use unicode strings. They'll first
have to assign it or typecast it to a non-unicode string and then operate
Jonas Maebe escreveu:
If people want to rely on what they are used to in non-unicode
environments, then they cannot directly use unicode strings. They'll
first have to assign it or typecast it to a non-unicode string and
then operate on that string. At least if there's any data loss in that
ca
On Tue, 11 Nov 2008 17:09:37 +0100
Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> > AFAIK no one measured a noticeable speed difference between UTF8/16
> > when handling GUI.
> >
> So I don't understand why the LCL designers for the unicode upgrade
> decided to use an UTF8 API instead of a Wi
On Tue, Nov 11, 2008 at 7:10 PM, Martin Schreiber <[EMAIL PROTECTED]> wrote:
>>
> ???
> The last widestring manager bug I remember was in January 2007 in FPC 2.0.5.
>
I can't remember the exact details, but I read it a few months back on
the MSEgui newsgroup. I remember the bug was widestring rela
On Tuesday 11 November 2008 17.34:36 Graeme Geldenhuys wrote:
>
> All I do know is that only recently did the WideString manager become
> usable in FPC. Martin had until recently some issues with bugs in the
> WideString manager.
>
???
The last widestring manager bug I remember was in January 2007
On 11 Nov 2008, at 17:34, Graeme Geldenhuys wrote:
All I do know is that only recently did the WideString manager become
usable in FPC. Martin had until recently some issues with bugs in the
WideString manager.
No functional changes have been made to the unix widestring manager
since 2.2.2
Graeme Geldenhuys schrieb:
> On Tue, Nov 11, 2008 at 6:21 PM, Florian Klaempfl
> <[EMAIL PROTECTED]> wrote:
>>> Some conversions are correct or seem to be correct in that case.
>> It has been already pointed out several times that lazarus abuses the
>> anstring type to store utf-8 and this breaks s
On Tue, Nov 11, 2008 at 6:21 PM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>
>> Some conversions are correct or seem to be correct in that case.
>
> It has been already pointed out several times that lazarus abuses the
> anstring type to store utf-8 and this breaks several stuff.
I must have mis
On Tue, Nov 11, 2008 at 6:09 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> So I don't understand why the LCL designers for the unicode upgrade decided
> to use an UTF8 API instead of a WideString API (like MSEGUI does seemingly
> successfully).
I can't speak for the Lazarus team, but I can sp
It has been already pointed out several times that lazarus abuses the
anstring type to store utf-8 and this breaks several stuff.
Of course we do know this.
But as the compiler does not tell ANSIString from UTF8String anyway (to
do automatic conversions), what exactly does this mean ?
-M
Michael Schnell schrieb:
>
>> I set no special options in FPC
> Lazarus does.
>> and I don't use WideString at all.
>> UTF-8 fits perfectly in the standard String type.
>>
> Some conversions are correct or seem to be correct in that case.
It has been already pointed out several times that laz
I set no special options in FPC
Lazarus does.
and I don't use WideString at all.
UTF-8 fits perfectly in the standard String type.
Some conversions are correct or seem to be correct in that case.
-Michael
___
fpc-devel maillist - fpc-devel@lis
On Tue, Nov 11, 2008 at 6:05 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> So I suppose it does not set the FPC option to use UTF8String instead of
> WideString for non-ASCII string constants. MSEGUI works here, too, because
> of this.
I set no special options in FPC and I don't use WideStrin
AFAIK no one measured a noticeable speed difference between UTF8/16 when
handling GUI.
So I don't understand why the LCL designers for the unicode upgrade
decided to use an UTF8 API instead of a WideString API (like MSEGUI does
seemingly successfully).
-Michael
___
Graeme Geldenhuys wrote:
On Tue, Nov 11, 2008 at 4:29 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
With Lazarus even:
I don't know about Lazarus, but in fpGUI Toolkit the following works just fine.
So I suppose it does not set the FPC option to use UTF8String instead of
WideStri
On Tue, Nov 11, 2008 at 5:00 PM, Mattias Gaertner
<[EMAIL PROTECTED]> wrote:
>> OTOH, regarding an improved Lazarus, IMHO he should be enabled to
>> choose or compile an LCL version with a UTF16 or UCS2 WideStrings
>> API, for improved speed with GUI handling.
>
> AFAIK no one measured a noticeable
On Tue, Nov 11, 2008 at 4:29 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> With Lazarus even:
I don't know about Lazarus, but in fpGUI Toolkit the following works just fine.
var
s1: string
s2: TfpgString; // simply an alias to string
begin
s1 := 'äüö';
s2 := 'äüö';
Button.Text :=
2008/11/11 Michael Schnell <[EMAIL PROTECTED]>:
>
>> a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
>> b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by "COMBINING
>> DIAERESIS", which is encoded as $CC $88
>
> I see, but I fail to see the sense of providing two differe
On Tue, 11 Nov 2008 15:43:24 +0100
Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> > We were talking of a world where strings consist of widechars, not
> > about the current Lazarus, weren't we?
> I'm not sure. Of course WideStrings and WideChars are easier to be
> used, as in Europe and America
Your example shows just how accurate Widestring should be interpreted:
It just only shows it is a two ore more byte sequence to represend a
single character. It doesn't say anything about the content or the use
of a specific (meta) encoding like UCS2 or Unicode16.
Of course I did mean that in
Jonas Maebe schreef:
On 11 Nov 2008, at 15:26, Vincent Snijders wrote:
Jonas Maebe schreef:
It seems much more advisable to me to save the file with an UTF-8
BOM, or even better to add {$encoding utf-8} (and/or to pass -Fcutf-8
to the compiler) and then just use
Edit1.Caption := UTF8Encode(
We were talking of a world where strings consist of widechars, not
about the current Lazarus, weren't we?
I'm not sure. Of course WideStrings and WideChars are easier to be used,
as in Europe and America problems with surrogate pairs will seldom
arise, but IMHO the user should be enabled to de
On 11 Nov 2008, at 15:26, Vincent Snijders wrote:
Jonas Maebe schreef:
It seems much more advisable to me to save the file with an UTF-8
BOM, or even better to add {$encoding utf-8} (and/or to pass -
Fcutf-8 to the compiler) and then just use
Edit1.Caption := UTF8Encode('hallo äöü');
As a
Your example shows just how accurate Widestring should be interpreted:
It just only shows it is a two ore more byte sequence to represend a
single character. It doesn't say anything about the content or the use
of a specific (meta) encoding like UCS2 or Unicode16.
If the discussion is about tho
That is because FPC has no unicode string type yet (as must have been
repeated about 20 times by now).
We are not discussing what it has, but what it should have and how this
can be done in a way that provides decent performance on all platforms,
is easy to use, compatible to D2009 and optim
Jonas Maebe schreef:
On 10 Nov 2008, at 17:00, Vincent Snijders wrote:
procedure TForm1.Button1Click(Sender: TObject);
var
w: widestring;
i: integer;
begin
w := UTF8Decode('hallo äöü');
Edit1.Caption := UTF8Encode(w);
Note that if the file has been saved using an UTF-8 BOM, then the
com
Op Tue, 11 Nov 2008, schreef Michael Schnell:
IMO widestrings with precomposed characters, just like ansistrings, can
fullfill the needs of a newcomer. That there exists decomposed characters,
surrogates, and more, does not need to be explained in chapter 1 of a
programming for beginners
Also, remember unicode is/are a computerlanguage specific
specification(s): you may assume that a lot of thought has gone into
it to be able to use it with programming languages. That was the
design goal. The specification is, alas, rather complex but it
contains every bit of information to b
On 11 Nov 2008, at 15:20, Michael Schnell wrote:
IMO widestrings with precomposed characters, just like ansistrings,
can fullfill the needs of a newcomer. That there exists decomposed
characters, surrogates, and more, does not need to be explained in
chapter 1 of a programming for beginner
IMO widestrings with precomposed characters, just like ansistrings,
can fullfill the needs of a newcomer. That there exists decomposed
characters, surrogates, and more, does not need to be explained in
chapter 1 of a programming for beginners book.
Yep, but with s: WideString the example doe
From your writing I understood that the issue is a UTF8 ->
21-bit-unicode decoding issue and has nothing to do with ISO/ANSI (which
would render the problem thoroughly unsolvable, not only for the
compiler builder but also for the application programmer, who wants to
do a unicode aware program.
In response and to support Daniël:
Also, remember unicode is/are a computerlanguage specific
specification(s): you may assume that a lot of thought has gone into it
to be able to use it with programming languages. That was the design
goal. The specification is, alas, rather complex but it cont
Op Tue, 11 Nov 2008, schreef Michael Schnell:
Remember that an individual code point does not nessacerally represent what
a user would consider a character. ...
Again, there is no compatible handling of this with good old ANSIStrings,
anyway, so there is not "friendly old school" way that a
On 11 Nov 2008, at 13:56, Michael Schnell wrote:
If this really is two codes for the same unicode character, the
"friendly old school" handling function should normalize it. If
someone really needs to take the differences into account (like with
the case you described), he ought to do the
OK,
If this really is two codes for the same unicode character, the
"friendly old school" handling function should normalize it. If someone
really needs to take the differences into account (like with the case
you described), he ought to do the appropriate code (handling subcodes).
-Michael
On 11 Nov 2008, at 13:39, Michael Schnell wrote:
a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by
"COMBINING DIAERESIS", which is encoded as $CC $88
I see, but I fail to see the sense of providing two different UTF8
a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by "COMBINING
DIAERESIS", which is encoded as $CC $88
I see, but I fail to see the sense of providing two different UTF8 code
variants for the same unicode character.
-M
Remember that an individual code point does not nessacerally represent
what a user would consider a character. ...
Again, there is no compatible handling of this with good old
ANSIStrings, anyway, so there is not "friendly old school" way that a
compiler would be able to offer. In these specia
On 11 Nov 2008, at 13:15, Michael Schnell wrote:
OTOH, in this special case, I don't see why the compiler should
"normalize" "u¨" to "ü". If the software is supposed to be handling
unicode, the unicode string "u¨" should be considered a perfectly
legal two-code-point information consisting
Because e.g. on the ext3 file system, you can have two files with the
name "ü" in the same directory. One named using the single character
"ü" and one named using as the string "u¨" (both in utf-8). If you
make the compiler automatically normalise everything, you lose
information (and get the
Michael Schnell wrote:
It will at best be "friendly old school behaviour which works most of
the time, but which fails as soon as the strings are not completely
normalised because then you can have decomposed characters and
whatnot" (which in turn easily leads to security holes due to
inco
On 11 Nov 2008, at 12:33, Michael Schnell wrote:
It will at best be "friendly old school behaviour which works most
of the time, but which fails as soon as the strings are not
completely normalised because then you can have decomposed
characters and whatnot" (which in turn easily leads to
It will at best be "friendly old school behaviour which works most of
the time, but which fails as soon as the strings are not completely
normalised because then you can have decomposed characters and
whatnot" (which in turn easily leads to security holes due to
incomplete checks, hard to r
However, the "platform" part in it, depends on the string type used
that all libraries have been compiled with. I.e. regardless of your
setting, "assign" would accept a ansistring or unicodestring depending
on the platform, and this will be mostly dependand on wether the
platform has an actua
On 11 Nov 2008, at 10:48, Michael Schnell wrote:
Moreover, IMHO, it should be configurable if in "D2009 compatible
mode", with s[i], length(s), pos(), copy, delete(), ..., Strings are
counted in subcodes (fast behavior) or in "whatever mode" they are
counted in characters ("friendly old sc
Op Tue, 11 Nov 2008, schreef Michael Schnell:
There will have full compatibility with old code. It quite likely FPC will
have a Win32 platform where string=ansistring and a WinNT platform where
string=unicodestring. Other platforms will be decided on a case by case
basis, i.e. there is lit
There will have full compatibility with old code. It quite likely FPC
will have a Win32 platform where string=ansistring and a WinNT
platform where string=unicodestring. Other platforms will be decided
on a case by case basis, i.e. there is little point in having
string=unicodestring on Dos.
Having unicode support in any way is not free. You've to rewrite your
code somehow.
Right, but this should only necessary with the code that explicitly is
intended to benefit from unicode features. "Old school" code - using
String (= ANSIString in locale-dependent coding) - should just work.
On 11 Nov 2008, at 09:30, Michael Schnell wrote:
Edit1.Caption := UTF8Encode('hallo äöü');
Grrr, how ugly !
No "old school" Delphi user will understand/accept that you can't
just do "Edit1.Caption := 'hallo äöü';"
You are mixing two things here:
a) you said that
"Seemingly if [FPC] de
Op Tue, 11 Nov 2008, schreef Michael Schnell:
Surely this is allowed and works correctly under D2009, otherwise I
really misunderstood Unicode support in D2009.
In D2009, "String" is WideString, and the VCL API is done with this
(Wide)String. So this of course works. With Lazarus things ar
Of course Lazarus' LCL would be needed to be recompiled according to the
way the user wants to handle "String", as calling for conversion with
any LCL in and out transfer is not a good idea. Maybe the LCL could
define a type "LCLString" that can be set when compiling it.
Internally I suppose t
Object Pascal is a beautiful
language because it does type handling for you - like your example of
'+' for Integer or String types. In the same way I would hope that
FPC can handle the String type seamlessly for UTF-16 or UTF-8 -
whichever encoding the FPC developers decide String type should b
Surely this is allowed and works correctly under D2009, otherwise I
really misunderstood Unicode support in D2009.
In D2009, "String" is WideString, and the VCL API is done with this
(Wide)String. So this of course works. With Lazarus things are more
complex, as they need to support a lot o
Michael Schnell schrieb:
>
>> Lazarus has a set of utf-8 ready routines, using utf-8 inside of a
>> ansistring.
>>
> I see. But it's really ugly that you need to use those instead of just
> writing clean old school code and have the compiler care for the nasty
> details.
Having unicode support
On Tue, Nov 11, 2008 at 10:27 AM, Michael Schnell <[EMAIL PROTECTED]> wrote:
> needs to do something other than just "length()" to find out the count of
> characters in a string.
>From memory, the Delphi and FPC documentation says that Length()
returns the number of bytes, NOT the number of charac
2008/11/11 Michael Schnell <[EMAIL PROTECTED]>:
>
>> Edit1.Caption := UTF8Encode('hallo äöü');
>
> Grrr, how ugly !
>
> No "old school" Delphi user will understand/accept that you can't just do
> "Edit1.Caption := 'hallo äöü';"
I agree... When I think Unicode support I think the following shou
Lazarus has a set of utf-8 ready routines, using utf-8 inside of a ansistring.
I see. But it's really ugly that you need to use those instead of just
writing clean old school code and have the compiler care for the nasty
details.
-Michael
___
fp
Edit1.Caption := UTF8Encode('hallo äöü');
Grrr, how ugly !
No "old school" Delphi user will understand/accept that you can't just
do "Edit1.Caption := 'hallo äöü';"
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://list
IMHO, this is working fine.
(Thanks for pointing out that there in fact _are_ (slow) function that
can manually called to do this with UTF8Strings.)
I don't doubt that it is working fine, but the usual Pascal programmer
does not "expect" that (s)he manually needs to call a special functi
Which option?
I don't remember right now. It's the default option in Lazarus ;) (This
has already been discussed in another thread.).
Lazarus seems to need to set this option because the LCL API is strictly
UTF8 (and the UTF8String [ =ANSIString] ) type used otherwise would not
get correct
On Mon, 10 Nov 2008 15:04:01 -0200
"Felipe Monteiro de Carvalho" <[EMAIL PROTECTED]> wrote:
> On Mon, Nov 10, 2008 at 1:48 PM, Michael Schnell <[EMAIL PROTECTED]>
> wrote:
> >, ... There are not _slow_ functions that do the "expected" versions
> > of s[i], pos(s), copy(), delete(), ... (I've yet t
On Mon, Nov 10, 2008 at 1:48 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>, ... There are not _slow_ functions that do the "expected" versions
> of s[i], pos(s), copy(), delete(), ... (I've yet to find out how I can print
> just the first character of an UTF8String :)
Lazarus has a set of utf-8
Michael Schnell schreef:
I found that the current FPC does have Unicode support, but there are
some problems.
I am going to give it another try, maybe it helps somebody.
- by design (for speed sake), UTF8String (and WideString when surrogate
codes are used) count in subcodes and not in Unic
On 10 Nov 2008, at 16:48, Michael Schnell wrote:
- there are different option on how the compiler expects the coding
of the source file. Seemingly if it detects it to be UTF8 coded
The compiler only sets the encoding of the source to UTF-8 if the file
identifies itself as "I am UTF-8 encod
On 10 Nov 2008, at 17:22, Jonas Maebe wrote:
It seems much more advisable to me to save the file with an UTF-8
BOM, or even better to add {$encoding utf-8}
Well, {$codepage utf-8}
Jonas
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
ht
On 10 Nov 2008, at 17:00, Vincent Snijders wrote:
procedure TForm1.Button1Click(Sender: TObject);
var
w: widestring;
i: integer;
begin
w := UTF8Decode('hallo äöü');
Edit1.Caption := UTF8Encode(w);
Note that if the file has been saved using an UTF-8 BOM, then the
compiler will at compile
I found that the current FPC does have Unicode support, but there are
some problems.
- WideStrings work fine with Unicode UCS-2 but they (of course) have
similar issues as UTF8-Strings when surrogate codes are used (which is
rarely necessary in Europe and America).
- FPC does not have a dedi
Hi everybody,
I know we had so many discussions on how to implement Unicode support
in FPC in the past. From what i remember, lots was based on "lets see
what CodeGear does with D2009".
So now that D2009 is out, is there any further working being done on
Unicode support in FPC? Is anybody workin
73 matches
Mail list logo