Op Tue, 11 Nov 2008, schreef Luiz Americo Pereira Camara:
Jonas Maebe escreveu:
If people want to rely on what they are used to in non-unicode
environments, then they cannot directly use unicode strings. They'll first
have to assign it or typecast it to a non-unicode string and then operate
Jonas Maebe escreveu:
If people want to rely on what they are used to in non-unicode
environments, then they cannot directly use unicode strings. They'll
first have to assign it or typecast it to a non-unicode string and
then operate on that string. At least if there's any data loss in that
ca
In our previous episode, Jonas Maebe said:
>
> > So could somebody from the core FPC team summarize or give some
> > roadmap as to what is happing or planned for FPC + Unicode support?
>
> If anyone can, it's Florian, since he has done all the work in this
> area until now. There is no roadmap
On 11 Nov 2008, at 19:07, Graeme Geldenhuys wrote:
So could somebody from the core FPC team summarize or give some
roadmap as to what is happing or planned for FPC + Unicode support?
If anyone can, it's Florian, since he has done all the work in this
area until now. There is no roadmap docu
So could somebody from the core FPC team summarize or give some
roadmap as to what is happing or planned for FPC + Unicode support?
These Unicode discussions seem to go round and round and never seems
to reach a conclusion. :-(
So some or other roadmap or feature list for FPC on this matter wou
On Tue, 11 Nov 2008 17:09:37 +0100
Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> > AFAIK no one measured a noticeable speed difference between UTF8/16
> > when handling GUI.
> >
> So I don't understand why the LCL designers for the unicode upgrade
> decided to use an UTF8 API instead of a Wi
On Tue, Nov 11, 2008 at 7:10 PM, Martin Schreiber <[EMAIL PROTECTED]> wrote:
>>
> ???
> The last widestring manager bug I remember was in January 2007 in FPC 2.0.5.
>
I can't remember the exact details, but I read it a few months back on
the MSEgui newsgroup. I remember the bug was widestring rela
On Tuesday 11 November 2008 17.34:36 Graeme Geldenhuys wrote:
>
> All I do know is that only recently did the WideString manager become
> usable in FPC. Martin had until recently some issues with bugs in the
> WideString manager.
>
???
The last widestring manager bug I remember was in January 2007
On 11 Nov 2008, at 17:34, Graeme Geldenhuys wrote:
All I do know is that only recently did the WideString manager become
usable in FPC. Martin had until recently some issues with bugs in the
WideString manager.
No functional changes have been made to the unix widestring manager
since 2.2.2
Graeme Geldenhuys schrieb:
> On Tue, Nov 11, 2008 at 6:21 PM, Florian Klaempfl
> <[EMAIL PROTECTED]> wrote:
>>> Some conversions are correct or seem to be correct in that case.
>> It has been already pointed out several times that lazarus abuses the
>> anstring type to store utf-8 and this breaks s
On Tue, Nov 11, 2008 at 6:21 PM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>
>> Some conversions are correct or seem to be correct in that case.
>
> It has been already pointed out several times that lazarus abuses the
> anstring type to store utf-8 and this breaks several stuff.
I must have mis
On Tue, Nov 11, 2008 at 6:09 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> So I don't understand why the LCL designers for the unicode upgrade decided
> to use an UTF8 API instead of a WideString API (like MSEGUI does seemingly
> successfully).
I can't speak for the Lazarus team, but I can sp
It has been already pointed out several times that lazarus abuses the
anstring type to store utf-8 and this breaks several stuff.
Of course we do know this.
But as the compiler does not tell ANSIString from UTF8String anyway (to
do automatic conversions), what exactly does this mean ?
-M
Michael Schnell schrieb:
>
>> I set no special options in FPC
> Lazarus does.
>> and I don't use WideString at all.
>> UTF-8 fits perfectly in the standard String type.
>>
> Some conversions are correct or seem to be correct in that case.
It has been already pointed out several times that laz
I set no special options in FPC
Lazarus does.
and I don't use WideString at all.
UTF-8 fits perfectly in the standard String type.
Some conversions are correct or seem to be correct in that case.
-Michael
___
fpc-devel maillist - fpc-devel@lis
On Tue, Nov 11, 2008 at 6:05 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> So I suppose it does not set the FPC option to use UTF8String instead of
> WideString for non-ASCII string constants. MSEGUI works here, too, because
> of this.
I set no special options in FPC and I don't use WideStrin
AFAIK no one measured a noticeable speed difference between UTF8/16 when
handling GUI.
So I don't understand why the LCL designers for the unicode upgrade
decided to use an UTF8 API instead of a WideString API (like MSEGUI does
seemingly successfully).
-Michael
___
Graeme Geldenhuys wrote:
On Tue, Nov 11, 2008 at 4:29 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
With Lazarus even:
I don't know about Lazarus, but in fpGUI Toolkit the following works just fine.
So I suppose it does not set the FPC option to use UTF8String instead of
WideStri
On Tue, Nov 11, 2008 at 5:00 PM, Mattias Gaertner
<[EMAIL PROTECTED]> wrote:
>> OTOH, regarding an improved Lazarus, IMHO he should be enabled to
>> choose or compile an LCL version with a UTF16 or UCS2 WideStrings
>> API, for improved speed with GUI handling.
>
> AFAIK no one measured a noticeable
On Tue, Nov 11, 2008 at 4:29 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> With Lazarus even:
I don't know about Lazarus, but in fpGUI Toolkit the following works just fine.
var
s1: string
s2: TfpgString; // simply an alias to string
begin
s1 := 'äüö';
s2 := 'äüö';
Button.Text :=
2008/11/11 Michael Schnell <[EMAIL PROTECTED]>:
>
>> a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
>> b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by "COMBINING
>> DIAERESIS", which is encoded as $CC $88
>
> I see, but I fail to see the sense of providing two differe
On Tue, 11 Nov 2008 15:43:24 +0100
Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> > We were talking of a world where strings consist of widechars, not
> > about the current Lazarus, weren't we?
> I'm not sure. Of course WideStrings and WideChars are easier to be
> used, as in Europe and America
Your example shows just how accurate Widestring should be interpreted:
It just only shows it is a two ore more byte sequence to represend a
single character. It doesn't say anything about the content or the use
of a specific (meta) encoding like UCS2 or Unicode16.
Of course I did mean that in
Jonas Maebe schreef:
On 11 Nov 2008, at 15:26, Vincent Snijders wrote:
Jonas Maebe schreef:
It seems much more advisable to me to save the file with an UTF-8
BOM, or even better to add {$encoding utf-8} (and/or to pass -Fcutf-8
to the compiler) and then just use
Edit1.Caption := UTF8Encode(
We were talking of a world where strings consist of widechars, not
about the current Lazarus, weren't we?
I'm not sure. Of course WideStrings and WideChars are easier to be used,
as in Europe and America problems with surrogate pairs will seldom
arise, but IMHO the user should be enabled to de
On 11 Nov 2008, at 15:26, Vincent Snijders wrote:
Jonas Maebe schreef:
It seems much more advisable to me to save the file with an UTF-8
BOM, or even better to add {$encoding utf-8} (and/or to pass -
Fcutf-8 to the compiler) and then just use
Edit1.Caption := UTF8Encode('hallo äöü');
As a
Your example shows just how accurate Widestring should be interpreted:
It just only shows it is a two ore more byte sequence to represend a
single character. It doesn't say anything about the content or the use
of a specific (meta) encoding like UCS2 or Unicode16.
If the discussion is about tho
That is because FPC has no unicode string type yet (as must have been
repeated about 20 times by now).
We are not discussing what it has, but what it should have and how this
can be done in a way that provides decent performance on all platforms,
is easy to use, compatible to D2009 and optim
Jonas Maebe schreef:
On 10 Nov 2008, at 17:00, Vincent Snijders wrote:
procedure TForm1.Button1Click(Sender: TObject);
var
w: widestring;
i: integer;
begin
w := UTF8Decode('hallo äöü');
Edit1.Caption := UTF8Encode(w);
Note that if the file has been saved using an UTF-8 BOM, then the
com
Op Tue, 11 Nov 2008, schreef Michael Schnell:
IMO widestrings with precomposed characters, just like ansistrings, can
fullfill the needs of a newcomer. That there exists decomposed characters,
surrogates, and more, does not need to be explained in chapter 1 of a
programming for beginners
Also, remember unicode is/are a computerlanguage specific
specification(s): you may assume that a lot of thought has gone into
it to be able to use it with programming languages. That was the
design goal. The specification is, alas, rather complex but it
contains every bit of information to b
On 11 Nov 2008, at 15:20, Michael Schnell wrote:
IMO widestrings with precomposed characters, just like ansistrings,
can fullfill the needs of a newcomer. That there exists decomposed
characters, surrogates, and more, does not need to be explained in
chapter 1 of a programming for beginner
IMO widestrings with precomposed characters, just like ansistrings,
can fullfill the needs of a newcomer. That there exists decomposed
characters, surrogates, and more, does not need to be explained in
chapter 1 of a programming for beginners book.
Yep, but with s: WideString the example doe
From your writing I understood that the issue is a UTF8 ->
21-bit-unicode decoding issue and has nothing to do with ISO/ANSI (which
would render the problem thoroughly unsolvable, not only for the
compiler builder but also for the application programmer, who wants to
do a unicode aware program.
In response and to support Daniël:
Also, remember unicode is/are a computerlanguage specific
specification(s): you may assume that a lot of thought has gone into it
to be able to use it with programming languages. That was the design
goal. The specification is, alas, rather complex but it cont
Op Tue, 11 Nov 2008, schreef Michael Schnell:
Remember that an individual code point does not nessacerally represent what
a user would consider a character. ...
Again, there is no compatible handling of this with good old ANSIStrings,
anyway, so there is not "friendly old school" way that a
On 11 Nov 2008, at 13:56, Michael Schnell wrote:
If this really is two codes for the same unicode character, the
"friendly old school" handling function should normalize it. If
someone really needs to take the differences into account (like with
the case you described), he ought to do the
OK,
If this really is two codes for the same unicode character, the
"friendly old school" handling function should normalize it. If someone
really needs to take the differences into account (like with the case
you described), he ought to do the appropriate code (handling subcodes).
-Michael
On 11 Nov 2008, at 13:39, Michael Schnell wrote:
a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by
"COMBINING DIAERESIS", which is encoded as $CC $88
I see, but I fail to see the sense of providing two different UTF8
a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by "COMBINING
DIAERESIS", which is encoded as $CC $88
I see, but I fail to see the sense of providing two different UTF8 code
variants for the same unicode character.
-M
Remember that an individual code point does not nessacerally represent
what a user would consider a character. ...
Again, there is no compatible handling of this with good old
ANSIStrings, anyway, so there is not "friendly old school" way that a
compiler would be able to offer. In these specia
On 11 Nov 2008, at 13:15, Michael Schnell wrote:
OTOH, in this special case, I don't see why the compiler should
"normalize" "u¨" to "ü". If the software is supposed to be handling
unicode, the unicode string "u¨" should be considered a perfectly
legal two-code-point information consisting
Because e.g. on the ext3 file system, you can have two files with the
name "ü" in the same directory. One named using the single character
"ü" and one named using as the string "u¨" (both in utf-8). If you
make the compiler automatically normalise everything, you lose
information (and get the
Yes, exactly! ;)
Thank you for your support! :)
On Tue, Nov 11, 2008 at 12:55 PM, Jonas Maebe <[EMAIL PROTECTED]> wrote:
>
> On 11 Nov 2008, at 12:50, Fabio Dell'Aria wrote:
>
>> My last error is:
>>
>> C:/Programmi/Lazarus/fpc/2.2.2/bin/i386-win32/ppc386.exe -Ur -Xs -O2
>> -n -Fi../inc -Fi../i38
OK, I found the error (wrong switches). ;)
On Tue, Nov 11, 2008 at 12:43 PM, Jonas Maebe <[EMAIL PROTECTED]> wrote:
>
> On 11 Nov 2008, at 12:34, Fabio Dell'Aria wrote:
>
>> Hi Jonas,
>>
>> On Tue, Nov 11, 2008 at 12:15 PM, Jonas Maebe <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Execute the following in
On 11 Nov 2008, at 12:50, Fabio Dell'Aria wrote:
My last error is:
C:/Programmi/Lazarus/fpc/2.2.2/bin/i386-win32/ppc386.exe -Ur -Xs -O2
-n -Fi../inc -Fi../i386 -Fi../win -FE.
-FUC:/fpcbuild-2.2.2/fpcsrcrtl/units/i386-win32 -CX -XX -U3 -Ur -di386
-dRELEASE -Us -Sg system.pp -Fi../win
Error: Ill
Michael Schnell wrote:
It will at best be "friendly old school behaviour which works most of
the time, but which fails as soon as the strings are not completely
normalised because then you can have decomposed characters and
whatnot" (which in turn easily leads to security holes due to
inco
Hi,
On Tue, Nov 11, 2008 at 12:43 PM, Jonas Maebe <[EMAIL PROTECTED]> wrote:
>
> On 11 Nov 2008, at 12:34, Fabio Dell'Aria wrote:
>
>> Hi Jonas,
>>
>> On Tue, Nov 11, 2008 at 12:15 PM, Jonas Maebe <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Execute the following in the top FPC source directory:
>>>
>>>
On 11 Nov 2008, at 12:34, Fabio Dell'Aria wrote:
Hi Jonas,
On Tue, Nov 11, 2008 at 12:15 PM, Jonas Maebe <[EMAIL PROTECTED]
> wrote:
Execute the following in the top FPC source directory:
make clean all OPT="-CX -XX -U3 -Ur"
After some times I receive the following error message:
make.
On 11 Nov 2008, at 12:33, Michael Schnell wrote:
It will at best be "friendly old school behaviour which works most
of the time, but which fails as soon as the strings are not
completely normalised because then you can have decomposed
characters and whatnot" (which in turn easily leads to
Hi Jonas,
On Tue, Nov 11, 2008 at 12:15 PM, Jonas Maebe <[EMAIL PROTECTED]> wrote:
>
> On 11 Nov 2008, at 11:03, Fabio Dell'Aria wrote:
>
>> how I can rebuild the FPC and RTL with custom switches?
>>
>> I wont to uses -CX -XX -U3 -Ur
>
> Execute the following in the top FPC source directory:
>
> m
It will at best be "friendly old school behaviour which works most of
the time, but which fails as soon as the strings are not completely
normalised because then you can have decomposed characters and
whatnot" (which in turn easily leads to security holes due to
incomplete checks, hard to r
However, the "platform" part in it, depends on the string type used
that all libraries have been compiled with. I.e. regardless of your
setting, "assign" would accept a ansistring or unicodestring depending
on the platform, and this will be mostly dependand on wether the
platform has an actua
On 11 Nov 2008, at 11:03, Fabio Dell'Aria wrote:
how I can rebuild the FPC and RTL with custom switches?
I wont to uses -CX -XX -U3 -Ur
Execute the following in the top FPC source directory:
make clean all OPT="-CX -XX -U3 -Ur"
Jonas
___
fpc-dev
On 11 Nov 2008, at 10:48, Michael Schnell wrote:
Moreover, IMHO, it should be configurable if in "D2009 compatible
mode", with s[i], length(s), pos(), copy, delete(), ..., Strings are
counted in subcodes (fast behavior) or in "whatever mode" they are
counted in characters ("friendly old sc
Op Tue, 11 Nov 2008, schreef Michael Schnell:
There will have full compatibility with old code. It quite likely FPC will
have a Win32 platform where string=ansistring and a WinNT platform where
string=unicodestring. Other platforms will be decided on a case by case
basis, i.e. there is lit
There will have full compatibility with old code. It quite likely FPC
will have a Win32 platform where string=ansistring and a WinNT
platform where string=unicodestring. Other platforms will be decided
on a case by case basis, i.e. there is little point in having
string=unicodestring on Dos.
Are strings not zero terminated?
They are not: a #0 is perfectly allowable character in a (long) string.
That is why you can use strings for storing any kind of byte stream.
But they are: a #0 is automatically added at s[length(s)+1];
But accessing the terminating via string functions is erro
Hi to all,
how I can rebuild the FPC and RTL with custom switches?
I wont to uses -CX -XX -U3 -Ur
--
Best regards...
Fabio Dell'Aria.
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
We can implement a D2009 like solution and break a lot of old code :)
My impression always was the FPC is supposed to be better than Delphi :)
:) :) .
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/
Having unicode support in any way is not free. You've to rewrite your
code somehow.
Right, but this should only necessary with the code that explicitly is
intended to benefit from unicode features. "Old school" code - using
String (= ANSIString in locale-dependent coding) - should just work.
On 11 Nov 2008, at 09:30, Michael Schnell wrote:
Edit1.Caption := UTF8Encode('hallo äöü');
Grrr, how ugly !
No "old school" Delphi user will understand/accept that you can't
just do "Edit1.Caption := 'hallo äöü';"
You are mixing two things here:
a) you said that
"Seemingly if [FPC] de
Op Tue, 11 Nov 2008, schreef Michael Schnell:
Surely this is allowed and works correctly under D2009, otherwise I
really misunderstood Unicode support in D2009.
In D2009, "String" is WideString, and the VCL API is done with this
(Wide)String. So this of course works. With Lazarus things ar
Of course Lazarus' LCL would be needed to be recompiled according to the
way the user wants to handle "String", as calling for conversion with
any LCL in and out transfer is not a good idea. Maybe the LCL could
define a type "LCLString" that can be set when compiling it.
Internally I suppose t
Object Pascal is a beautiful
language because it does type handling for you - like your example of
'+' for Integer or String types. In the same way I would hope that
FPC can handle the String type seamlessly for UTF-16 or UTF-8 -
whichever encoding the FPC developers decide String type should b
Surely this is allowed and works correctly under D2009, otherwise I
really misunderstood Unicode support in D2009.
In D2009, "String" is WideString, and the VCL API is done with this
(Wide)String. So this of course works. With Lazarus things are more
complex, as they need to support a lot o
Michael Schnell schrieb:
>
>> Lazarus has a set of utf-8 ready routines, using utf-8 inside of a
>> ansistring.
>>
> I see. But it's really ugly that you need to use those instead of just
> writing clean old school code and have the compiler care for the nasty
> details.
Having unicode support
Michael Schnell schrieb:
>
>> OK, so here goes again yet another discussion... :-)
>>
> No wonder, as the current state is working, but rather disappointing :).
> (No idea if D2009 is different / better: this seems to be the cause of
> the new thread.)
We can implement a D2009 like solution an
Yes. In D2009 String is UTF16String and Char is WideChar, sizeof(Char)=2.
I personally do not like this solution.
Same here,
IMHO FPC could do this in and "Delphi 20089 string compatibility" mode.
But it should be configurable to use other ways (e.g. (a) String =
WideString but have it c
See (including comments) http://www.jacobthurman.com/?p=30
IMHO it's a bad decision to have the standard unicode string (be it
WideString or UTF8String) functionality redefined to "Code units"
(subcodes) instead of "code points" (characters). I feel it would have
been better to have the old
On Tue, Nov 11, 2008 at 10:11:10AM +0100, Michael Schnell wrote:
>
>> See (including comments) http://www.jacobthurman.com/?p=30
>
> So it seems that the Type "String" in D2009 in fact is "WideString" and
> same _does_ use surrogate pairs. This asks for even more unexpected
> behavior that with F
See (including comments) http://www.jacobthurman.com/?p=30
So it seems that the Type "String" in D2009 in fact is "WideString" and
same _does_ use surrogate pairs. This asks for even more unexpected
behavior that with FPC, with String seemingly still being ANSIString ;).
-Michael
_
On Tue, Nov 11, 2008 at 10:27 AM, Michael Schnell <[EMAIL PROTECTED]> wrote:
> needs to do something other than just "length()" to find out the count of
> characters in a string.
>From memory, the Delphi and FPC documentation says that Length()
returns the number of bytes, NOT the number of charac
2008/11/11 Michael Schnell <[EMAIL PROTECTED]>:
>
>> Edit1.Caption := UTF8Encode('hallo äöü');
>
> Grrr, how ugly !
>
> No "old school" Delphi user will understand/accept that you can't just do
> "Edit1.Caption := 'hallo äöü';"
I agree... When I think Unicode support I think the following shou
OK, so here goes again yet another discussion... :-)
No wonder, as the current state is working, but rather disappointing :).
(No idea if D2009 is different / better: this seems to be the cause of
the new thread.)
-Michael
___
fpc-devel maillist
Lazarus has a set of utf-8 ready routines, using utf-8 inside of a ansistring.
I see. But it's really ugly that you need to use those instead of just
writing clean old school code and have the compiler care for the nasty
details.
-Michael
___
fp
Edit1.Caption := UTF8Encode('hallo äöü');
Grrr, how ugly !
No "old school" Delphi user will understand/accept that you can't just
do "Edit1.Caption := 'hallo äöü';"
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://list
IMHO, this is working fine.
(Thanks for pointing out that there in fact _are_ (slow) function that
can manually called to do this with UTF8Strings.)
I don't doubt that it is working fine, but the usual Pascal programmer
does not "expect" that (s)he manually needs to call a special functi
Which option?
I don't remember right now. It's the default option in Lazarus ;) (This
has already been discussed in another thread.).
Lazarus seems to need to set this option because the LCL API is strictly
UTF8 (and the UTF8String [ =ANSIString] ) type used otherwise would not
get correct
79 matches
Mail list logo