Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Luiz Americo Pereira Camara

Graeme Geldenhuys escreveu:

2009/9/17 Luiz Americo Pereira Camara :
  

RTLString would not meant to be used in client applications. Would be useful
only in functions that interact with system calls like the RTL ones having
two benefits: avoiding extra encoding conversions and the need for
duplicated RTL (UTF16 and UTF8).



As work been started on the RTLString? If so where could I look at the
code? Or is there a plan of action on how and where it must be
implemented?

  


See unicodertl branch

Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
2009/9/17 Luiz Americo Pereira Camara :
>
> RTLString would not meant to be used in client applications. Would be useful
> only in functions that interact with system calls like the RTL ones having
> two benefits: avoiding extra encoding conversions and the need for
> duplicated RTL (UTF16 and UTF8).

As work been started on the RTLString? If so where could I look at the
code? Or is there a plan of action on how and where it must be
implemented?

What is the difference between RTLString and the "cpstr" branch? I
haven't looked at the "cpstr" branch yet, so I don't know if it's the
same thing.

Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Luiz Americo Pereira Camara

Graeme Geldenhuys escreveu:

Luiz Americo Pereira Camara het geskryf:
  
Yes. RTLString would be just an alias to UnicodeString in win32 and 
UTF8String in unixes



Bad news for Michael. :-) We would have to have serious documentation on
all the string types supported by FPC - I'm loosing count!  We would
also need a nice big table showing all the "alias" strings and what they
really mean on each platform.


RTLString would not meant to be used in client applications. Would be 
useful only in functions that interact with system calls like the RTL 
ones having two benefits: avoiding extra encoding conversions and the 
need for duplicated RTL (UTF16 and UTF8).


Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Michael Schnell
I'll try that ASAP :)

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Vincent Snijders

Felipe Monteiro de Carvalho schreef:

On Thu, Sep 17, 2009 at 9:42 AM, Jonas Maebe  wrote:

It isn't. It is a string type whereby the string encoding is part of the
string information (just like the reference count and length already are
currently).


Ah, that string type. Previous discussions between Lazarus developers
concluded that this new string type isn't a good solution.



I reserve right to change my mind, once it is available in a FPC release.

Vincent
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Felipe Monteiro de Carvalho
On Thu, Sep 17, 2009 at 9:42 AM, Jonas Maebe  wrote:
> It isn't. It is a string type whereby the string encoding is part of the
> string information (just like the reference count and length already are
> currently).

Ah, that string type. Previous discussions between Lazarus developers
concluded that this new string type isn't a good solution.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
> >>
> > http://www.stack.nl/~marcov/unicode.jpg
> 
> The electronic version is also freely available from the unicode.org
> website. I can't remember the direct link, but I do have Unicode v5
> chapters 1-6 here in pdf format, downloaded from unicode.org

http://www.unicode.org/versions/Unicode5.1.0/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
Vincent Snijders het geskryf:
>>
> http://www.stack.nl/~marcov/unicode.jpg


The electronic version is also freely available from the unicode.org
website. I can't remember the direct link, but I do have Unicode v5
chapters 1-6 here in pdf format, downloaded from unicode.org



Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Jonas Maebe


On 17 Sep 2009, at 13:34, Felipe Monteiro de Carvalho wrote:

On Thu, Sep 17, 2009 at 8:11 AM, Jonas Maebe > wrote:
Please, not again this discussion. I think we all know what  
everyone thinks
about that, what the pitfalls are and this will eventually be  
solved by

D2009-style unicode string support.


AFAIK D2009-style solution is just a reference counted widestring  
type.


It isn't. It is a string type whereby the string encoding is part of  
the string information (just like the reference count and length  
already are currently).


You can find more information here once CodeGear's server starts  
responding again: http://dn.codegear.com/article/images/38980/Delphi_and_Unicode.pdf



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread theo
You can test this also using the TCharacter class 
http://wiki.lazarus.freepascal.org/Theodp


procedure TForm1.Button1Click(Sender: TObject);
var a,b,c:String;
begin
a:= UTF8Encode(WideChar($41)+WideChar($030A)); //Decomposed Å
b:= UTF8Encode(WideChar($212B)); //Å Ångström
c:= UTF8Encode(WideChar($C5)); //Å
if TCharacter.Normalize_NFKC(a+b)=TCharacter.Normalize_NFKC(b+c) then
Caption:='equal '+a+b+' '+b+c else
Caption:='not equal '+a+b+' '+b+c;
end;

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
Michael Schnell het geskryf:
> 
> How am I supposed to input o + " in Open Office in a way that the
> program combines them to ö ?

Not that this is the right list for OpenOffice support - but I'll make
it quick. ;-)

I use Dvorak keyboard layout, so it makes such character input a breeze.

Alternatively, if you use Linux + Gnome desktop (or any GTK2 app), then
Ctrl+Shift+U will change the input cursor (displaying and underlined u
character) allowing you to enter the UTF-16 Unicode hex value. Pressing
enter will then change the input cursor back to normal, and display the
Unicode character for that hex value you entered. This works on any
Gnome (GTK2) applications.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Michael Schnell
Graeme Geldenhuys wrote:
> What I meant was, use OpenOffice to open a document with Unicode text in
> it. Make use that some "visual" character and bases on normalized text
> and some are bases on non-normalized text (two or more characters
> forming one visual character. eg: o + ¨ = ö).

How am I supposed to input o + " in Open Office in a way that the
program combines them to ö ?

Anyway, if possible, I do suppose that OO will internally store the
result as a single Unicode character.

I even suppose that OO internally (in memory and with its native file
formats) does not use multi-character Unicode surrogates.

So the problem only arises with file import.

Somebody might want to try and hack an OO document file and replace a ö
by a o + " surrogate pair and see what happens.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Felipe Monteiro de Carvalho
On Thu, Sep 17, 2009 at 8:11 AM, Jonas Maebe  wrote:
> Please, not again this discussion. I think we all know what everyone thinks
> about that, what the pitfalls are and this will eventually be solved by
> D2009-style unicode string support.

AFAIK D2009-style solution is just a reference counted widestring type.

If that's what you mean then no, it doesn't solve anything for Lazarus.

On the other hand, a UTF8 string type and a compiler directive to set
string=UTF8String could be a solution.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
Michael Schnell het geskryf:
> 
> Unfortunately I don't have a language independent version of Open
> Office, that would be able to do this for -say- ancient Egypt.

What I meant was, use OpenOffice to open a document with Unicode text in
it. Make use that some "visual" character and bases on normalized text
and some are bases on non-normalized text (two or more characters
forming one visual character. eg: o + ¨ = ö).

The do a Find in the document using a normalized character. See what it
finds. Then do the same Find, but with non-normalized character. See how
long that takes to find the text.

As far as I know OpenOffice's Unicode implementation is pretty good and
quite complete.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Jonas Maebe


On 17 Sep 2009, at 13:11, Graeme Geldenhuys wrote:


fpGUI Toolkit is in the same boat. Using UTF-8 inside AnsiStrings. And
like Lazarus, fpGUI also has various UTF-8 friendly RTL String
functions. It works relatively well, as long as the developer knows  
that

he/she must rather use the Unicode friend string functions and not the
standard RTL ones.


Please, not again this discussion. I think we all know what everyone  
thinks about that, what the pitfalls are and this will eventually be  
solved by D2009-style unicode string support.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
Michael Schnell het geskryf:
> Marco van de Voort wrote:
> 
>> like Lazarus UTF-8 in
>> ansistring.
> 
> That already produced a huge confusion and obviously is a way that can't
> decently be followed in the future.

fpGUI Toolkit is in the same boat. Using UTF-8 inside AnsiStrings. And
like Lazarus, fpGUI also has various UTF-8 friendly RTL String
functions. It works relatively well, as long as the developer knows that
he/she must rather use the Unicode friend string functions and not the
standard RTL ones.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> Marco van de Voort wrote:
> > 
> > http://www.stack/nl/~marcov/unicode.jpg
> > 
> 
> Obviously a huge Volume for a huge encoding scheme that really imposes a
> huge number of huge problems ;).

No, the description is about one-two hundred pages, the rest are tables that
show the standarized codepoints.

However that is still enough to realize that Unicode is not just about a few
chars more. It is also about fixing some other simplifications in the
computer-charset model. (like languages that compose characters out of
habit, and can't be easily fixed by adding the accented character like most
European languages)

These mean it never will be as simple as it was, unless you introduce
similar simplifications again. (but run into problems with several
languages). On the other hand, considering all this in all business code is
not workable either.

So it is wisest to keep general purpose libraries and tools (like FPC and
Lazarus) as compatible with the standard as reasonably possible, but
application builders that know their target audience can cut corners.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Michael Schnell
Marco van de Voort wrote:

> like Lazarus UTF-8 in
> ansistring.

That already produced a huge confusion and obviously is a way that can't
decently be followed in the future.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Michael Schnell
Graeme Geldenhuys wrote:
> 
> Open a document with OpenOffice and search for text. 

Unicode is all about language independent coding.

Unfortunately I don't have a language independent version of Open
Office, that would be able to do this for -say- ancient Egypt.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Michael Schnell
Marco van de Voort wrote:
> 
> http://www.stack/nl/~marcov/unicode.jpg
> 

Obviously a huge Volume for a huge encoding scheme that really imposes a
huge number of huge problems ;).

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
> > A good place to start:
> > 
> > http://www.stack.nl/~marcov/delphistringtypes.txt
> 
> Good god! 16 different types already! It's worse (and a lot more) than I
> thought. :-)

Not counting aliases and untyped library level usage like Lazarus UTF-8 in
ansistring.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
Marco van de Voort het geskryf:
> 
> A good place to start:
> 
> http://www.stack.nl/~marcov/delphistringtypes.txt

Good god! 16 different types already! It's worse (and a lot more) than I
thought. :-)


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Marco van de Voort
In our previous episode, Marco van de Voort said:
> Java is not portable. The VM hides the platform differences.

Maybe that came out to strong. I don't want endless discussions about this.

Point is, I meant more "Java has a different philosophy wrt portability".
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
> > 
> > Yes. RTLString would be just an alias to UnicodeString in win32 and 
> > UTF8String in unixes
> 
> Bad news for Michael. :-) We would have to have serious documentation on
> all the string types supported by FPC - I'm loosing count!  We would
> also need a nice big table showing all the "alias" strings and what they
> really mean on each platform.

A good place to start:

http://www.stack.nl/~marcov/delphistringtypes.txt
 
> If I sometimes compare Java's string handling and types to FPC's Object
> Pascal implementation - I wonder if we as Object Pascal developers have
> got nuts! :-) Java's string handling seems so simple, even with Unicode
> (or not) support.

Java is not portable. The VM hides the platform differences. Object Pascal
interfaces directly and unfiltered/marshalled with the world.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
Luiz Americo Pereira Camara het geskryf:
> 
> Yes. RTLString would be just an alias to UnicodeString in win32 and 
> UTF8String in unixes

Bad news for Michael. :-) We would have to have serious documentation on
all the string types supported by FPC - I'm loosing count!  We would
also need a nice big table showing all the "alias" strings and what they
really mean on each platform.

If I sometimes compare Java's string handling and types to FPC's Object
Pascal implementation - I wonder if we as Object Pascal developers have
got nuts! :-) Java's string handling seems so simple, even with Unicode
(or not) support.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Graeme Geldenhuys
Michael Schnell het geskryf:
>> Neither that much space nor that much time is required.
> 
> Any pointers regarding a decent estimation ?

Open a document with OpenOffice and search for text. It's as simple as
that. You are seriously over exaggerating on the GBytes size lookup
tables etc...

Maybe you should visit unicode.org first, and do some reading.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Vincent Snijders


http://www.stack/nl/~marcov/unicode.jpg


http://www.stack.nl/~marcov/unicode.jpg


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
[ Charset ISO-8859-1 unsupported, converting... ]
> Jonas Maebe wrote:
> > 
> > Neither that much space nor that much time is required.
> 
> Any pointers regarding a decent estimation ?

http://www.stack/nl/~marcov/unicode.jpg

there is a v5 now though.
 
> As there are billions of possible Unicode "characters" and most of them
>  potentially can be alternately depicted by one or multiple
> multi-Unicode surrogates, I don't share your optimism.

It's not that much. Probably in the case of non-reducable multi-codepoint
chars they probably simply order the various codepoints in some fixed way in
canonical form, drastically reducing the number of combinations.

However that still doesn't solve equivalent chars. That really needs
language dependant tables. Charset and language dependant interpretation of
it are two different things.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Jonas Maebe


On 17 Sep 2009, at 09:55, Michael Schnell wrote:


Jonas Maebe wrote:


Neither that much space nor that much time is required.


Any pointers regarding a decent estimation ?


No.

As there are billions of possible Unicode "characters" and most of  
them

potentially can be alternately depicted by one or multiple
multi-Unicode surrogates, I don't share your optimism.


There are many existing OSes out there which deal perfectly well with  
this (such as Mac OS X, which does auto-normalization in several  
cases). It's not like this is new rocket science, unicode has been  
around since quite a while and all this stuff has been dealt with by  
other people several years ago already.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Michael Schnell
Jonas Maebe wrote:
> 
> Neither that much space nor that much time is required.

Any pointers regarding a decent estimation ?

As there are billions of possible Unicode "characters" and most of them
 potentially can be alternately depicted by one or multiple
multi-Unicode surrogates, I don't share your optimism.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Jonas Maebe


On 17 Sep 2009, at 09:33, Michael Schnell wrote:

Of course I have been supposing that (if possible) the RTL would  
call a

library function. But it makes no difference: If the algorithm to
"normalize" the multi-character codes (if this is possible at all)  
needs
a table of several GBytes and CPU time of several Minutes, this is  
not a

viable option, may it be done in RTL or in library code.


Neither that much space nor that much time is required.


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-17 Thread Michael Schnell
Jonas Maebe wrote:
> The way to implement stuff like that is to call the appropriate library
> functions. It makes no sense to completely re-implement everything in
> the RTL.

Of course I have been supposing that (if possible) the RTL would call a
library function. But it makes no difference: If the algorithm to
"normalize" the multi-character codes (if this is possible at all) needs
a table of several GBytes and CPU time of several Minutes, this is not a
viable option, may it be done in RTL or in library code.

So doing software (that e.g. in a document file finds a Text the user
inputs in an Edit filed) that decently deals with that stuff seems
impossible to do.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Luiz Americo Pereira Camara

Martin Schreiber escreveu:

On Tuesday 15 September 2009 18:04:33 Luiz Americo Pereira Camara wrote:

  

In my view, to get the fpc unicode support in a good state would be
necessary to implement the encoding field in the string type so
converting strings can be done system independently (seems to be the
case of cpstr branch) and add a RTLString type to minimize conversions
when creating a unicode RTL


But as an additional string type, the fast and small FPC UnicodeString type we 
have now should be preserved.


  


Yes. RTLString would be just an alias to UnicodeString in win32 and 
UTF8String in unixes


Luiz

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Graeme Geldenhuys
Jonas Maebe het geskryf:
> 
> The way to implement stuff like that is to call the appropriate  
> library functions. It makes no sense to completely re-implement  
> everything in the RTL.
> 
> Such API-calls can of course be wrapped by the RTL, similar to the way  
> there are already function such as sysutils.ansicomparestr()/ 

Yes, I would imagine we have something similar to that. It would seem
the most logical and like you mentioned, keep backward compatibility and
speed for non-unicode RTL functions.

 eg: sysutils.unicomparetext()

Internally, this function could normalize the text first and then do
some comparison, etc...


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Jonas Maebe


On 16 Sep 2009, at 16:34, Michael Schnell wrote:


Does the FPC rtl compare two unicode-strings a and b as equal with
"if a=b ..."
even when both print as "ä" and one is coded as a single character and
the other is coded as an a and a " double dot superscript" ?


No, it just compares the literal bytes (and it will probably keep  
doing so forever, both for Delphi and for backwards compatibility).



How should it do that ? it would need a table of the codings of all
possible multi-unicode-character encodings ?


The way to implement stuff like that is to call the appropriate  
library functions. It makes no sense to completely re-implement  
everything in the RTL.


Such API-calls can of course be wrapped by the RTL, similar to the way  
there are already function such as sysutils.ansicomparestr()/ 
sysutils.ansicomparetext()/... etc. These also simply call through to  
OS-supplied functionality.



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Michael Schnell
Jonas Maebe wrote:
> 
> On 16 Sep 2009, at 11:44, Michael Schnell wrote:
> 
> Don't analyse them character by character, but use standard functions to
> compare them. 

Does the FPC rtl compare two unicode-strings a and b as equal with
 "if a=b ..."
even when both print as "ä" and one is coded as a single character and
the other is coded as an a and a " double dot superscript" ?

How should it do that ? it would need a table of the codings of all
possible multi-unicode-character encodings ?

-Michael

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Marco van de Voort
In our previous episode, Thaddy said:
> > It is. Widestring always worked more or less, on both FPC,Kylix and Delphi.
> > But the COM backed versions (FPC2.2+ (?) and Delphi) suffered from
> > performance problems
> As I wrote it should be opaque ( = transparent, btw).
> At least for windows I overcame most of the problems that widestring is 
> using COM by writing a simple COM Delphi memorymanager replacement years 
> ago in pre-userspace. Thus all reference counting (at least at block 
> level, but also for delphi strings) is managed by COM. It is still 
> available at my website. And yes, it is rather slow. But string 
> manipulation is slow anyway.

The slow was already relative to the non-COM kylix widestring handling.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Thaddy

Marco van de Voort wrote:

This should be transparent for the non-library user code



It is. Widestring always worked more or less, on both FPC,Kylix and Delphi.
But the COM backed versions (FPC2.2+ (?) and Delphi) suffered from
performance problems

As I wrote it should be opaque ( = transparent, btw).
At least for windows I overcame most of the problems that widestring is 
using COM by writing a simple COM Delphi memorymanager replacement years 
ago in pre-userspace. Thus all reference counting (at least at block 
level, but also for delphi strings) is managed by COM. It is still 
available at my website. And yes, it is rather slow. But string 
manipulation is slow anyway.
I had to do this for COM production code written in Delphi to work more 
reliable. The solution had good reviews from then Borland techies.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Jonas Maebe


On 16 Sep 2009, at 11:44, Michael Schnell wrote:


Jonas Maebe wrote:


Analysing strings by hand not a very smart thing to do with unicode
strings.


How should it be avoided if I want to react on a user input or on a
string read from a file ?


Don't analyse them character by character, but use standard functions  
to compare them. Any unicode support library worth its salt will offer  
you many different ways to compare strings, because depending on the  
context you may need different ways:


a) the locale may matter (e.g., depending on whether "." means  
"decimal point" or "thousands separator", a comparison result may be  
different)
b) you have many different ways to order (unicode) strings. E.g.,  
these are the options that Apple's CFString comparison offers:  (note that not all of those flags are about regular comparisons,  
and some of them are just for performance reasons). See in particular  
flags such as kCFCompareNonliteral, kCFCompareWidthInsensitive and  
kCFCompareLocalized.


This indeed causes problems with Pascal's generic comparison  
operators. I guess we will either have to define a particular  
behaviour for them (presumably whatever CodeGear chose), add some  
global variable that you can set to influence the behaviour, or tell  
people to use CompareText() and friends (and probably add variants  
with various options).


The upside of these complications (which have always existed, but most  
people just ignored them and their programs only worked with one or  
two locales and/or encodings), is that if you deal with it properly in  
the context of unicode, then your code will probably automatically  
behave "correctly" with many locales/scripts.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Michael Schnell
Jonas Maebe wrote:
> 
> Analysing strings by hand not a very smart thing to do with unicode
> strings. 

How should it be avoided if I want to react on a user input or on a
string read from a file ?

That is what a great lot of user programs do

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Jonas Maebe


On 16 Sep 2009, at 11:30, Michael Schnell wrote:


Jonas Maebe wrote:


There are definitions of "canonical forms" (both "composed" and
"decomposed") of utf strings ...


So unless the rtl automatically offers this, the user is required to
take care of this by hand any time he tries to analyze a string in  
any way.


Analysing strings by hand not a very smart thing to do with unicode  
strings. And if you want to do that, you most certainly should convert  
them to a canonical form first.



Code from hell 



That's what assembler programmers said when compilers were introduced  
("why can't I control anymore exactly how my data is laid out so I can  
save 2 instructions in my function, because I know that variable A  
comes right after variable B in memory").



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Michael Schnell
Jonas Maebe wrote:
> 
> There are definitions of "canonical forms" (both "composed" and
> "decomposed") of utf strings ...

So unless the rtl automatically offers this, the user is required to
take care of this by hand any time he tries to analyze a string in any way.

Code from hell 

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Martin Schreiber
On Tuesday 15 September 2009 15:31:36 Thaddy wrote:

>
> afaik widestrings are reference counted in Delphi. PWideChars not.

According my experience, the Delphi7/Kylix3 documentation and this article:
http://edn.embarcadero.com/article/21301
"
WideStrings are now reference counted. In Windows, the Delphi WideString is 
implemented as an Ole BSTR to maximize data compatibility with OLE and 
ActiveX APIs. Ole BSTRs / WideStrings are not reference counted like Delphi 
AnsiStrings, so WideStrings tend to be a bit promiscuous in copying 
themselves all over the place. 
In Linux, there is no WideString compatibility requirement or issue, so we've 
reimplemented WideStrings to use the same copy-on-write reference count 
semantics as AnsiStrings. In fact, Kylix WideStrings use many of the same 
internal RTL support functions as AnsiStrings! How's that for code reuse!
"
you are wrong.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> Marco van de Voort wrote:
> > 
> > Yes, but not by Delphi but by COM.
> 
> This should be transparent for the non-library user code

It is. Widestring always worked more or less, on both FPC,Kylix and Delphi.
But the COM backed versions (FPC2.2+ (?) and Delphi) suffered from
performance problems.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Jonas Maebe


On 16 Sep 2009, at 09:24, Jonas Maebe wrote:


so you cannot have to files that have the name "ä"


*two files


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Jonas Maebe


On 16 Sep 2009, at 09:00, Michael Schnell wrote:


But if this conversion is possible (even if not in all cases)
theoretically but not practically, this means that there is _no_ way  
to

determine if Unicode strings are identical.


There are definitions of "canonical forms" (both "composed" and  
"decomposed") of utf strings that do enable this sort of stuff. E.g.,  
Apple uses this for their HFS+ file system so you cannot have to files  
that have the name "ä" in the same directory (since there are multiple  
ways in unicode to represent this character).



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Michael Schnell
Marco van de Voort wrote:
> 
> Yes, but not by Delphi but by COM.

This should be transparent for the non-library user code

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-16 Thread Michael Schnell
Marco van de Voort wrote:
> In our previous episode, Michael Schnell said:
>> If we really want a "character", MyChar would need to be a 32-Bit thing,
>> and (in case of UTF, the [n] notation would need to scan the Unicode
>> byte stream to find it, but I don't know if it's implemented in that way.)
> 
> Afaik a character in the unicode sense can consist out of multiple
> codepoints. (e.g. for languages that have many possibilities of combining
> "accents" where there doesn't exist a glyph for every combination)
> 
> So a character (as something that prints a whole) can consist out of
> multiple 32-bit values (codepoints)

Even Worse !!!

So  "Unicode Character" does not make sense at all.

I suppose converting a combined character into a single character is not
possible as it would need a huge table.

But if this conversion is possible (even if not in all cases)
theoretically but not practically, this means that there is _no_ way to
determine if Unicode strings are identical.

This makes programming a profoundly obscene adventure and we better
should start breeding cattle instead.

Obviously combined Unicode characters are code from hell and should be
banned completely :( .

-Michael

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Martin Schreiber
On Tuesday 15 September 2009 18:04:33 Luiz Americo Pereira Camara wrote:

>
> In my view, to get the fpc unicode support in a good state would be
> necessary to implement the encoding field in the string type so
> converting strings can be done system independently (seems to be the
> case of cpstr branch) and add a RTLString type to minimize conversions
> when creating a unicode RTL
>
But as an additional string type, the fast and small FPC UnicodeString type we 
have now should be preserved.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Luiz Americo Pereira Camara

Graeme Geldenhuys escreveu:

Martin Schreiber het geskryf:
  
I think there is a misunderstanding. FPC UnicodeString type is identical to 
widestring on all platforms except Windows. On Windows widestring is actual a 
not reference counted OLE-string.



The problem as, that it's very different to Delphi. Bottom line: not
Delphi compatible and very misleading.

  


In my view, to get the fpc unicode support in a good state would be 
necessary to implement the encoding field in the string type so 
converting strings can be done system independently (seems to be the 
case of cpstr branch) and add a RTLString type to minimize conversions 
when creating a unicode RTL



To quote Jonas Maebe:
"The problem is that Delphi 2009's "unicodestring" type is something
completely different. It is what you (Florian) are developing in the
cpstr branch. Our current unicodestring is an utf16string or something
like that."
  


Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Felipe Monteiro de Carvalho
On Tue, Sep 15, 2009 at 8:40 AM, Marco van de Voort  wrote:
> So D2009 is revolution, not evolution, and NOT backwards compatible. The
> question if FPC should do the same. And if so, when.

IMHO Never in the default modes, only in a new delphi2009 or similar
mode, if/when someone needs it. I use Delphi mode a lot to have the
same code compilable with Delphi 7 to get smaller executables in
Windows and I wouldn't be happy if standard Delphi mode changes to
Delphi 2009 incompatible stuff.

We can't change our codebases everytime Embarcadero decides to break
compatibility and for me they are too backwards technologically now to
claim we need to follow them. Ideally there should be a standard to
guide Pascal language development, so that this kind of stuff doesn't
happen (evolution in different paths).

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Thaddy

Marco van de Voort wrote:

In our previous episode, Thaddy said:
  
  
  

afaik widestrings are reference counted in Delphi. PWideChars not.



Yes, but not by Delphi but by COM.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  
Correct. Which means it's an opaque on windows platforms, but not for 
others. That's also the reason Kylix has a different behaveour at the 
deep end. Solve it at platform level and try to keep it an opaque, I 
suggest.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Jonas Maebe het geskryf:
> 
> It affects how the compiler interprets characters > #127 inside  
> constant strings appearing in your source code. This can affect both  

Thanks Jonas. I'll make a mental note of that.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Marco van de Voort
In our previous episode, Thaddy said:
> >   
> afaik widestrings are reference counted in Delphi. PWideChars not.

Yes, but not by Delphi but by COM.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Thaddy

Martin Schreiber wrote:

On Tuesday 15 September 2009 15:21:54 Martin Schreiber wrote:
  

On Tuesday 15 September 2009 15:07:48 Jonas Maebe wrote:


On 15 Sep 2009, at 14:54, Michael Schnell wrote:
  

Martin Schreiber wrote:


On Windows widestring is actual a
not reference counted OLE-string.
  

How can decent (and System independent) coding be done with not
reference counting (variable length) strings ?


Ask Microsoft and Borland. Microsoft defined their OLE-string like
that, and Borland defined Ansistring on Windows that way.
  

Delphi widestrings on Windows, Kylix widestrings on Linux and Delphi/Kylix
ansistrings are reference counted.



Hmm, this is misleading, again:

Delphi widestrings on Windows are not reference counted, Kylix widestrings on 
Linux and Delphi/Kylix ansistrings are reference counted.


Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

  

afaik widestrings are reference counted in Delphi. PWideChars not.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Martin Schreiber
On Tuesday 15 September 2009 15:21:54 Martin Schreiber wrote:
> On Tuesday 15 September 2009 15:07:48 Jonas Maebe wrote:
> > On 15 Sep 2009, at 14:54, Michael Schnell wrote:
> > > Martin Schreiber wrote:
> > >> On Windows widestring is actual a
> > >> not reference counted OLE-string.
> > >
> > > How can decent (and System independent) coding be done with not
> > > reference counting (variable length) strings ?
> >
> > Ask Microsoft and Borland. Microsoft defined their OLE-string like
> > that, and Borland defined Ansistring on Windows that way.
>
> Delphi widestrings on Windows, Kylix widestrings on Linux and Delphi/Kylix
> ansistrings are reference counted.
>
Hmm, this is misleading, again:

Delphi widestrings on Windows are not reference counted, Kylix widestrings on 
Linux and Delphi/Kylix ansistrings are reference counted.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Martin Schreiber
On Tuesday 15 September 2009 15:07:48 Jonas Maebe wrote:
> On 15 Sep 2009, at 14:54, Michael Schnell wrote:
> > Martin Schreiber wrote:
> >> On Windows widestring is actual a
> >> not reference counted OLE-string.
> >
> > How can decent (and System independent) coding be done with not
> > reference counting (variable length) strings ?
>
> Ask Microsoft and Borland. Microsoft defined their OLE-string like
> that, and Borland defined Ansistring on Windows that way.
>
Delphi widestrings on Windows, Kylix widestrings on Linux and Delphi/Kylix 
ansistrings are reference counted.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 14:54, Michael Schnell wrote:


Martin Schreiber wrote:

On Windows widestring is actual a
not reference counted OLE-string.


How can decent (and System independent) coding be done with not
reference counting (variable length) strings ?


Ask Microsoft and Borland. Microsoft defined their OLE-string like  
that, and Borland defined Ansistring on Windows that way.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 14:58, Graeme Geldenhuys wrote:


Does the $codepage only affect the WideString Manager?  It doesn't
affect AnsiString (String type) at all?


It affects how the compiler interprets characters > #127 inside  
constant strings appearing in your source code. This can affect both  
ansistrings and widestrings in your program, depending on whether or  
not you assign constant strings containing such characters to resp.  
ansistring or widestring variables (since depending on the codepage,  
those characters will mean something different and hence your strings  
will get different content).



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 14:47, Martin Schreiber wrote:


On Tuesday 15 September 2009 14:34:52 Jonas Maebe wrote:


I don't think that the default source code encoding has ever been
changed. And the way to specify it is also quite old already.

Wasn't a string constant containing a character > #127 treated as  
widestring

earlier?


I don't think so. There was the following behaviour, but that was a  
bug: . That change in itself was due to , but that had nothing to do with Lazarus.


That change dates from August 2007, and was due to . Another thing that was changed at the same time, was that
a) *if* a string was parsed as widestring (e.g., if you have  
{$codepage utf-8} and the string has characters > #127),

/and/
b) you directly assigned this constant widestring to an ansistring in  
the program


then the compiler would no longer "convert" this constant widestring  
to an "ansistring" at compile time, but instead insert a call to the  
widestring manager to convert it at run time. The reason was not  
Lazarus, but simply because the compiler doesn't have a clue about  
what the actual code page of the ansistring will be at run time in the  
first place


I believe it just replaced all characters > #127 with '?' at compile  
time before.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Florian Klaempfl het geskryf:
> 
> Because the source might not be written by you and because we want
> consistent behaviour.

OK, thanks.

Does the $codepage only affect the WideString Manager?  It doesn't
affect AnsiString (String type) at all?



Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
> If we really want a "character", MyChar would need to be a 32-Bit thing,
> and (in case of UTF, the [n] notation would need to scan the Unicode
> byte stream to find it, but I don't know if it's implemented in that way.)

Afaik a character in the unicode sense can consist out of multiple
codepoints. (e.g. for languages that have many possibilities of combining
"accents" where there doesn't exist a glyph for every combination)

So a character (as something that prints a whole) can consist out of
multiple 32-bit values (codepoints)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Michael Schnell
Martin Schreiber wrote:
> On Windows widestring is actual a 
> not reference counted OLE-string.

How can decent (and System independent) coding be done with not
reference counting (variable length) strings ?

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Michael Schnell
Jonas Maebe wrote:
> Our current unicodestring is an utf16string or something like that.
> 

I don't claim that this is bad, but it can't be "Delphi compatible" at all.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Martin Schreiber
On Tuesday 15 September 2009 14:34:52 Jonas Maebe wrote:
>
> I don't think that the default source code encoding has ever been
> changed. And the way to specify it is also quite old already.
>
Wasn't a string constant containing a character > #127 treated as widestring 
earlier?

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Michael Schnell
Micha Nelissen wrote:
> Graeme Geldenhuys wrote:
>>   MyString := '世界您好';
>>   MyChar := MyString[1];
>>   writeln(MyChar);
>> end.
> 
> Extracting a Char from a UnicodeString? What's that supposed to do?

If "Char" is an 8-bit coded thing, for me this does not make sense
(using the [n] notation to take the n'th  byte in an utf-8, utf-16 or
Widestring information seems not very sensible). I feel the compiler
should issue a warning on that code.

If we really want a "character", MyChar would need to be a 32-Bit thing,
and (in case of UTF, the [n] notation would need to scan the Unicode
byte stream to find it, but I don't know if it's implemented in that way.)


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 14:23, Martin Schreiber wrote:


On Tuesday 15 September 2009 14:19:51 Graeme Geldenhuys wrote:

Florian Klaempfl het geskryf:

Graeme said he did tell the compiler that he uses utf-8.


My mistake, I assumed that because my Linux system defaults to  
UTF-8 it

will use utf-8. I guess I was wrong.


There was a change because of Lazarus utf-8 system AFAIK.


I don't think that the default source code encoding has ever been  
changed. And the way to specify it is also quite old already.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
> Jonas Maebe het geskryf:
>> That's because you did not specify the code page of the source code,  
>> in which case it's parsed as CP 8859-1.
> 
> Even though my Linux system defaults to UTF-8? Umm
> 
> 
>> Add {$codepage utf-8} or save  
> 
> Adding that with test3.pas and then it works.
> 
> 
> $ ./test4
> 世
> 
> 
> 
> Why doesn't the widestring manager default to the system defaults of my
> platform: UTF-8?  

Because the source might not be written by you and because we want
consistent behaviour. People wouldn't be happy if depending on the
person (or host) compiling the source, the output would be different.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Martin Schreiber
On Tuesday 15 September 2009 14:19:51 Graeme Geldenhuys wrote:
> Florian Klaempfl het geskryf:
> > Graeme said he did tell the compiler that he uses utf-8.
>
> My mistake, I assumed that because my Linux system defaults to UTF-8 it
> will use utf-8. I guess I was wrong.
>
There was a change because of Lazarus utf-8 system AFAIK.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 14:18, Graeme Geldenhuys wrote:

Why doesn't the widestring manager default to the system defaults of  
my

platform: UTF-8?


It does default to the system code page. It's the compiler that  
doesn't while parsing your source file.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 14:15, Florian Klaempfl wrote:


Jonas Maebe schrieb:



ä  is not 世 as the website described the result to be.


That's because you did not specify the code page of the source  
code, in
which case it's parsed as CP 8859-1. Add {$codepage utf-8} or save  
the

file with an UTF-8 BOM and it will work fine.


Graeme said he did tell the compiler that he uses utf-8.



Then he probably made a mistake, because it works for me:

$ cat tt.pp
{$codepage utf-8}

program test3;

{$mode objfpc}{$H+}
uses
 cwstring, Classes, SysUtils;

var
 MyChar: UnicodeChar;
 MyString: UnicodeString;
begin
 MyString := '世界您好';
 MyChar := MyString[1];
 writeln(MyChar);
end.
$ ppn34 tt
Target OS: Linux for i386
Compiling tt.pp
Linking tt
16 lines compiled, 2.4 sec
$ ./tt
世
$ echo $LANG
en_US.UTF-8
$ uname -a
Linux traxx 2.6.18-92.1.22.el5PAE #1 SMP Tue Dec 16 07:10:07 EST 2008  
i686 i686 i386 GNU/Linux


If I remove the "{$codepage utf-8}", then I also get ä as output.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Florian Klaempfl het geskryf:
> 
> Graeme said he did tell the compiler that he uses utf-8.

My mistake, I assumed that because my Linux system defaults to UTF-8 it
will use utf-8. I guess I was wrong.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Jonas Maebe het geskryf:
> 
> That's because you did not specify the code page of the source code,  
> in which case it's parsed as CP 8859-1.

Even though my Linux system defaults to UTF-8? Umm


> Add {$codepage utf-8} or save  

Adding that with test3.pas and then it works.


$ ./test4
世



Why doesn't the widestring manager default to the system defaults of my
platform: UTF-8?  Sorry if this sounds dumb, but I never use WideString.
I use String with UTF-8 encoded content - similar to Lazarus IDE.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Jonas Maebe schrieb:
> 
> On 15 Sep 2009, at 13:53, Graeme Geldenhuys wrote:
> 
>> ian Klaempfl het geskryf:
>>>
>>> Do you use the cwstrings unit? Did you tell the encoding (UTF-8?) to the
>>> compiler? Did you use the UnicodeChar instead of Char?
>>
>>
>> Yes to all, and the example still doesn't work.
   ^^

>>
>> 
>> $ ./test3
>> ä
>> 
>>
>> ä  is not 世 as the website described the result to be.
> 
> That's because you did not specify the code page of the source code, in
> which case it's parsed as CP 8859-1. Add {$codepage utf-8} or save the
> file with an UTF-8 BOM and it will work fine.

Graeme said he did tell the compiler that he uses utf-8.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Jonas Maebe schrieb:
> 
> On 15 Sep 2009, at 13:44, Florian Klaempfl wrote:
> 
>> Graeme Geldenhuys schrieb:
>>> Micha Nelissen het geskryf:
 Extracting a Char from a UnicodeString? What's that supposed to do?
>>>
>>> Follow the URL I posted with that example.
>>>
>>> I don't claim to know everything regarding Unicode. Florian said FPC
>>> supports Unicode,
>>
>> I said it supports the Unicodestring type.
> 
> The problem is that Delphi 2009's "unicodestring" type is something
> completely different. It is what you are developing in the cpstr branch.
> Our current unicodestring is an utf16string or something like that.

No. UnicodeString means in D2009 UTF-16 encoding. It has the encoding
and code point size stored, but as soon as you change the encoding, it's
no more an UnicodeString.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
> > I think there is a misunderstanding. FPC UnicodeString type is identical to 
> > widestring on all platforms except Windows. On Windows widestring is actual 
> > a 
> > not reference counted OLE-string.
> 
> The problem as, that it's very different to Delphi. Bottom line: not
> Delphi compatible and very misleading.

I'm not entirely sure we'll ever be D2009 compatible in every way. It quite
radically breaks with the past.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 13:53, Graeme Geldenhuys wrote:


ian Klaempfl het geskryf:


Do you use the cwstrings unit? Did you tell the encoding (UTF-8?)  
to the

compiler? Did you use the UnicodeChar instead of Char?



Yes to all, and the example still doesn't work.


$ ./test3
ä


ä  is not 世 as the website described the result to be.


That's because you did not specify the code page of the source code,  
in which case it's parsed as CP 8859-1. Add {$codepage utf-8} or save  
the file with an UTF-8 BOM and it will work fine.



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Martin Schreiber het geskryf:
> I think there is a misunderstanding. FPC UnicodeString type is identical to 
> widestring on all platforms except Windows. On Windows widestring is actual a 
> not reference counted OLE-string.

The problem as, that it's very different to Delphi. Bottom line: not
Delphi compatible and very misleading.

To quote Jonas Maebe:
"The problem is that Delphi 2009's "unicodestring" type is something
completely different. It is what you (Florian) are developing in the
cpstr branch. Our current unicodestring is an utf16string or something
like that."


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Martin Schreiber
On Tuesday 15 September 2009 12:56:08 Graeme Geldenhuys wrote:

>
> > Who says that? What is not supported?
>
> This I found myself:
> * Unicode+Variants... varUString type.
>
> By Google'ing for some D2009 unicode examples and searching the FPC
> source. Here I list a few that I found in under 5 minutes:
>
Graeme,
I think there is a misunderstanding. FPC UnicodeString type is identical to 
widestring on all platforms except Windows. On Windows widestring is actual a 
not reference counted OLE-string.
With widestrings and the automatically conversion to the system 8bit encoding 
by the widestringmanager FPC has all what is needed for transparent platform 
independent Unicode handling since years. Now that we have the fast reference 
counted widestring on Windows also, the last problem has been solved. ;-)

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Jonas Maebe het geskryf:
> 
> The problem is that Delphi 2009's "unicodestring" type is something  
> completely different.

Correct.  I don't know the 'cpstr' branch so I can't comment on that.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Marco van de Voort
In our previous episode, Micha Nelissen said:
> >>>   MyChar := MyString[1];
> >>>   writeln(MyChar);
> >>> end.
> >> Extracting a Char from a UnicodeString? What's that supposed to do?
> > 
> > CHAR is a 16-bit wchar in D2009.  Simularly, pchar is a pointer to a 16-bits
> > char. (pansichar being the 1-byte one).

.. and most importantly STRING is unicodestring. So running D2009 unittests
on FPC, or claiming unicode compatibility with D2009 is totally useless atm,
unless we have some clue how we are going to deal with defaults.

(per platform, depending on default granularity of the target, except in
Delphi mode, switches for per unit behaviour etc etc).

Note that _IF_ we really follow D2009, without any additional FPC specific 
stuff, this might mean a complete fork of both FPC and Lazarus into
pre-D2009 and D2009+ modes.
 
> And if MyChar is declared as a WideChar? Then it does work?

No.

> Isn't it 
> like assigning a LongInt to an Integer? It might be cut, screwed or stay 
> the same (depending on sizeof(Integer)).

No, since string[1] is a 16-bit expression. Delphi string support works with
encoding granularity not with codepoints, or even chars. Only some
specialized functions allow character based access with full Unicode range.

See also http://www.stack.nl/~marcov/unicode.pdf though that could have a
few updates.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Jonas Maebe


On 15 Sep 2009, at 13:44, Florian Klaempfl wrote:


Graeme Geldenhuys schrieb:

Micha Nelissen het geskryf:

Extracting a Char from a UnicodeString? What's that supposed to do?


Follow the URL I posted with that example.

I don't claim to know everything regarding Unicode. Florian said FPC
supports Unicode,


I said it supports the Unicodestring type.


The problem is that Delphi 2009's "unicodestring" type is something  
completely different. It is what you are developing in the cpstr  
branch. Our current unicodestring is an utf16string or something like  
that.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Florian Klaempfl het geskryf:
> 
> Do you use the cwstrings unit? Did you tell the encoding (UTF-8?) to the
> compiler? Did you use the UnicodeChar instead of Char?


Yes to all, and the example still doesn't work.


$ ./test3
ä


ä  is not 世 as the website described the result to be.


program test3;

{$mode objfpc}{$H+}
uses
  cwstring, Classes, SysUtils;

var
  MyChar: UnicodeChar;
  MyString: UnicodeString;
begin
  MyString := '世界您好';
  MyChar := MyString[1];
  writeln(MyChar);
end.
-


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
> Florian Klaempfl het geskryf:
>> Then you shouldn't cry if a release candidate breaks your stuff :)
> 
> I'm not crying. I go through this process on every new FPC release. It's
> part of my job. :-)
> 
> 
>> We can only fix stuff we know about.
> 
> And I only know about things (issues) when I get our code ready for a
> new release. ;-)
> 
> As I said, before I used to stay compatible with "unstable" FPC
> releases, but that was just way to much work. Code constantly broke
> etc... Now I do this just before a new release, when the "FPC release
> candidate" is a lot more stable.
> 
> 
>> Fill bug reports if you need and use it.
> 
> I'll start filing them - as long as Vincent doesn't get on my case
> again. ;-)
> 
> 
>>> -
>>> program test2;
>>>
>>> {$mode objfpc}{$H+}
>>> uses
>>>   Classes, SysUtils;
>>>
>>> var
>>>   MyChar: Char;
>>>   MyString: UnicodeString;
>>> begin
>>>   MyString := '世界您好';
>>>   MyChar := MyString[1];
>>>   writeln(MyChar);
>>> end.
>>> -
>> BTW: This is UCS coding style as well ;)
> 
> Even worse then! FPC is not Unicode or UCS-2 compatible.

Do you use the cwstrings unit? Did you tell the encoding (UTF-8?) to the
compiler? Did you use the UnicodeChar instead of Char?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Micha Nelissen schrieb:
> Graeme Geldenhuys wrote:
>>   MyString := '世界您好';
>>   MyChar := MyString[1];
>>   writeln(MyChar);
>> end.
> 
> Extracting a Char from a UnicodeString? What's that supposed to do?

As I said, it's UCS coding style :)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
> Micha Nelissen het geskryf:
>> Extracting a Char from a UnicodeString? What's that supposed to do?
> 
> Follow the URL I posted with that example.
> 
> I don't claim to know everything regarding Unicode. Florian said FPC
> supports Unicode, 

I said it supports the Unicodestring type.

> so I simple tested a few D2009 examples I could find.

See my other mails ;)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Micha Nelissen

Marco van de Voort wrote:

In our previous episode, Micha Nelissen said:

Graeme Geldenhuys wrote:

  MyString := '';
  MyChar := MyString[1];
  writeln(MyChar);
end.

Extracting a Char from a UnicodeString? What's that supposed to do?


CHAR is a 16-bit wchar in D2009.  Simularly, pchar is a pointer to a 16-bits
char. (pansichar being the 1-byte one).


And if MyChar is declared as a WideChar? Then it does work? Isn't it 
like assigning a LongInt to an Integer? It might be cut, screwed or stay 
the same (depending on sizeof(Integer)).


Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Marco van de Voort
In our previous episode, Micha Nelissen said:
> Graeme Geldenhuys wrote:
> >   MyString := '';
> >   MyChar := MyString[1];
> >   writeln(MyChar);
> > end.
> 
> Extracting a Char from a UnicodeString? What's that supposed to do?

CHAR is a 16-bit wchar in D2009.  Simularly, pchar is a pointer to a 16-bits
char. (pansichar being the 1-byte one).

So D2009 is revolution, not evolution, and NOT backwards compatible. The
question if FPC should do the same. And if so, when.

Also note that the portability aspects of the D2009 haven't really worked
out at all atm. (doing everything in UTF-16 on basically UTF-8 platforms is
IMHO not sane)

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Micha Nelissen het geskryf:
> 
> Extracting a Char from a UnicodeString? What's that supposed to do?

Follow the URL I posted with that example.

I don't claim to know everything regarding Unicode. Florian said FPC
supports Unicode, so I simple tested a few D2009 examples I could find.
Hardly any worked under FPC.

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Micha Nelissen

Graeme Geldenhuys wrote:

  MyString := '世界您好';
  MyChar := MyString[1];
  writeln(MyChar);
end.


Extracting a Char from a UnicodeString? What's that supposed to do?

Micha
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Florian Klaempfl het geskryf:
> 
> Then you shouldn't cry if a release candidate breaks your stuff :)

I'm not crying. I go through this process on every new FPC release. It's
part of my job. :-)


> We can only fix stuff we know about.

And I only know about things (issues) when I get our code ready for a
new release. ;-)

As I said, before I used to stay compatible with "unstable" FPC
releases, but that was just way to much work. Code constantly broke
etc... Now I do this just before a new release, when the "FPC release
candidate" is a lot more stable.


> Fill bug reports if you need and use it.

I'll start filing them - as long as Vincent doesn't get on my case
again. ;-)


>> -
>> program test2;
>>
>> {$mode objfpc}{$H+}
>> uses
>>   Classes, SysUtils;
>>
>> var
>>   MyChar: Char;
>>   MyString: UnicodeString;
>> begin
>>   MyString := '世界您好';
>>   MyChar := MyString[1];
>>   writeln(MyChar);
>> end.
>> -
> 
> BTW: This is UCS coding style as well ;)

Even worse then! FPC is not Unicode or UCS-2 compatible.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Marco van de Voort
In our previous episode, Florian Klaempfl said:
> > Exactly. 
> 
> The "1-byte" stuff has nothing to do with unicode but with code page
> aware strings.

Doesn't it have certain consequences for unicodestring<-> ansistring
conversions? Most notably to avoid that if a procedure has "ansistring" in
its interface it is always forces the default
encoding, unless I do everything manually.
 
To be honest, I have been waiting on this functionality to really start with
unicodestrings.

Maybe fcl-xml should be converted to unicodestring as a first attempt? Under
ifdef maybe?

I did this as a port to D2009 already, so I'm willing to attempt this.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
> Marco van de Voort het geskryf:
>> MSE has no D2009 tested code afaik.
> 
> MSE has no unit tests, period!
> 
>> As far as I know unicodestring support is not at D2009 level, since the
>> 1-byte stuff and the format of  internals are still missing/different?
> 
> Exactly. 

The "1-byte" stuff has nothing to do with unicode but with code page
aware strings.

> Plus, from what I can see from the unit tests in tiOPF, Unicode
> is not supported in Variants either. varUString type doesn't even exist.

As I said, fill bug reports else we don't know about it.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
> Graeme Geldenhuys het geskryf:
>>> Who says that? What is not supported?
> 
> Let me know how far you get with this example as well.
> 
> http://compaspascal.blogspot.com/2008/10/delphi-2009-strings-explained-by.html

Didn't we talk about *unicode* ?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
> Florian Klaempfl het geskryf:
>> You should have tested with the unicode string branch one year ago ;)
> 
> I gave up a long time ago testing "unstable" FPC branches with
> production code. Things change to often. I only test with the new FPC
> when it is announced that a new version ('fixes' branch is created) is
> on its way.

Then you shouldn't cry if a release candidate breaks your stuff :) We
can only fix stuff we know about.

> 
> 
>> Who says that? What is not supported?
> 
> This I found myself:
> * Unicode+Variants... varUString type.
> 
> By Google'ing for some D2009 unicode examples and searching the FPC
> source. Here I list a few that I found in under 5 minutes:

Fill bug reports if you need and use it.

> 
> http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:2894
> * No TCharacter Class
> 
> http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:2879
> * No CharInSet() function
> * No IsSurrogate() function
> * No IsSurrogatePair() function

Can be done within minutes if needed.

> 
> http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:2874
> * No ConvertFromUtf32() function
> 
> http://edn.embarcadero.com/article/38437
> * As per Embarcadero example. I substituted String for UnicodeString.
> Compile and run and I do not see the 世 character as mentioned in the
> article.

Indeed, FPC keeps a certain degree of backward compatibility.
sizeof(Char) will be always 1, at least in the language modes currently
implemented in FPC.

> -
> program test2;
> 
> {$mode objfpc}{$H+}
> uses
>   Classes, SysUtils;
> 
> var
>   MyChar: Char;
>   MyString: UnicodeString;
> begin
>   MyString := '世界您好';
>   MyChar := MyString[1];
>   writeln(MyChar);
> end.
> -

BTW: This is UCS coding style as well ;)

> 
> 
> So than was just in 5 minuets. I pretty confident, if I keep going I can
> list a lot more. Hence I can't believe you think FPC supports Unicode
> like Delphi 2009+ does.

I implemented what people needed so far.

> 
> 
>> claims to support the UnicodeString type fully and it can be that bad
>> because e.g. MSE is using it afaik.
> 
> MSE only uses the BMP, nothing above that. No support for surrogate
> pairs. So at best MSE is only UCS-2 compliant and NOT Unicode compliant.
> Don't confuse the two.

This is only a matter of the helper functions.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Graeme Geldenhuys het geskryf:
> 
>> Who says that? What is not supported?

Let me know how far you get with this example as well.

http://compaspascal.blogspot.com/2008/10/delphi-2009-strings-explained-by.html


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Florian Klaempfl het geskryf:
> 
> You should have tested with the unicode string branch one year ago ;)

I gave up a long time ago testing "unstable" FPC branches with
production code. Things change to often. I only test with the new FPC
when it is announced that a new version ('fixes' branch is created) is
on its way.


> Who says that? What is not supported?

This I found myself:
* Unicode+Variants... varUString type.

By Google'ing for some D2009 unicode examples and searching the FPC
source. Here I list a few that I found in under 5 minutes:

http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:2894
* No TCharacter Class

http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:2879
* No CharInSet() function
* No IsSurrogate() function
* No IsSurrogatePair() function

http://www.bobswart.nl/Weblog/Blog.aspx?RootId=5:2874
* No ConvertFromUtf32() function

http://edn.embarcadero.com/article/38437
* As per Embarcadero example. I substituted String for UnicodeString.
Compile and run and I do not see the 世 character as mentioned in the
article.
-
program test2;

{$mode objfpc}{$H+}
uses
  Classes, SysUtils;

var
  MyChar: Char;
  MyString: UnicodeString;
begin
  MyString := '世界您好';
  MyChar := MyString[1];
  writeln(MyChar);
end.
-


So than was just in 5 minuets. I pretty confident, if I keep going I can
list a lot more. Hence I can't believe you think FPC supports Unicode
like Delphi 2009+ does.


> claims to support the UnicodeString type fully and it can be that bad
> because e.g. MSE is using it afaik.

MSE only uses the BMP, nothing above that. No support for surrogate
pairs. So at best MSE is only UCS-2 compliant and NOT Unicode compliant.
Don't confuse the two.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Martin Schreiber
On Tuesday 15 September 2009 11:49:13 Florian Klaempfl wrote:

> Who says that? What is not supported? Which issue report? FPC 2.3.1
> claims to support the UnicodeString type fully and it can be that bad
> because e.g. MSE is using it afaik.

Correct, it is enabled for MSEide+MSEgui SVN trunk which is adapted to FPC 
fixes_2_4. No problems found up to now.
FPC UnicodeString type (reference counted widestring on all platforms) is 
exactly what MSEgui needs, thank you very much Florian.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Graeme Geldenhuys
Marco van de Voort het geskryf:
> 
> MSE has no D2009 tested code afaik.

MSE has no unit tests, period!

> As far as I know unicodestring support is not at D2009 level, since the
> 1-byte stuff and the format of  internals are still missing/different?

Exactly. Plus, from what I can see from the unit tests in tiOPF, Unicode
is not supported in Variants either. varUString type doesn't even exist.



Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://opensoft.homeip.net/fpgui/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] FPC 2.3.1 seems a mixed mess with Unicode support

2009-09-15 Thread Marco van de Voort
In our previous episode, Florian Klaempfl said:
> > As far as I know FPC doesn't have Unicode support like Delphi 2009+ has.
> > Yet, when I query a WideString property, the RTTI functions now return
> > tkUString. tkUString is the Delphi Unicode string - but FPC doesn't
> > support that yet?
> 
> Who says that? What is not supported? Which issue report? FPC 2.3.1
> claims to support the UnicodeString type fully and it can be that bad
> because e.g. MSE is using it afaik.

MSE has no D2009 tested code afaik.

As far as I know unicodestring support is not at D2009 level, since the
1-byte stuff and the format of  internals are still missing/different?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


  1   2   >