Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Marco van de Voort
In our previous episode, Dani?l Mantione said:
 Full Unicode support is for FPC 2.4. If you need it today, widestrings are 
 your best option.

Is it? Because that might mean yet another 2.2 fixes branch release to fix
up the delay that this will cause to 2.4
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Marco van de Voort
In our previous episode, Florian Klaempfl said:
 They add it only because they insist on using utf-8 :)

That's perfectly normal on *nix.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Daniël Mantione



Op Fri, 21 Nov 2008, schreef Marco van de Voort:


In our previous episode, Dani?l Mantione said:

Full Unicode support is for FPC 2.4. If you need it today, widestrings are
your best option.


Is it? Because that might mean yet another 2.2 fixes branch release to fix
up the delay that this will cause to 2.4


People were complaining against the current FPC, not being aware of the 
new UTF16 string type in FPC 2.3. Perhaps it indeed needs postponing to an 
even later release, but at any Unicode support should not be expected for 
2.2.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Marco van de Voort
In our previous episode, Dani?l Mantione said:
  If you want to help, we need to implement the Delphi 2009 encoding aware 
  string type, both runtime support as well as the compiler support.
  A previous discussion showed that this also breaks a lot of old code and is 
  not really nice.
 
 As I understand it, the incompatibility from Delphi 2009 comes from the 
 fact that and char string by default has becomes 2 bytes, not by adding 
 encoding information to the type.

Yes. The added strings are perfectly compatible. (ansistring for 1 byte
encodings unicodestring for 2 byte encodings)  The incompatible part comes
from switching the default string/char type to the 2 byte variants.

  So a better concept seems to have a dedicated type for any possible Coding 
  (ANSISTring of course locale-depending, UTF8String, UTF16String, maybe 
  UCS2String, too) and let the user choose (e.g. by a {$ compiler option) 
  which one he want to be used for String and WideString. This would 
  allow 
  for simple compiler magic to perform any necessary conversion (including 
  assigning constants).
 
 Isn't this the same??

Typing wise it is the same. (except for the nonexistance of UCS2),
implementation wise not.

But Michael obviously hasn't read the PDF.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Felipe Monteiro de Carvalho
On Thu, Nov 20, 2008 at 9:05 AM, Daniël Mantione
[EMAIL PROTECTED] wrote:
 On the other hand Lazarus may want to move to a string depending on
 platform too, to attract both Delphi  2009 and Delphi = 2009 users.

I don't see this change any time soon, because it would break too much
of existing code. If Delphi 2009 becomes really popular and there is a
large need to migrate projects to Lazarus we may think of a solution.

At the moment a fully working UTF8String is what we need.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell



http://wiki.freepascal.org/FPC_Unicode_support#Roadmap_of_RTL_Unicode_support
  


This page does not talk about UTF8Strings being counted in code elements 
vs in code points.


I don't consider it understood that they in any case are counted in code 
elements. IMHO this should be seriously discussed and a solution should 
be found that the user can select either way to be able to do either 
fast code or not break old code.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: Unicode and Lazarus

2008-11-21 Thread Felipe Monteiro de Carvalho
On Thu, Nov 20, 2008 at 9:09 AM, Mattias Gärtner
[EMAIL PROTECTED] wrote:
 So the roadmap from LCL pov is:
 - a RTL using unicode strings
 - changing the string types in the lazarus code
 - a fpc release with the unicode RTL

From what I've heard about the Unicode RTL fpc developers recomend
that we build our own set of routines/classes using UTF8String.

Later they could be added to Free Pascal. This is more or less what we
are doing at the moment, so we should just continue in the same
direction.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Felipe Monteiro de Carvalho
On Thu, Nov 20, 2008 at 9:05 AM, Daniël Mantione
[EMAIL PROTECTED] wrote:
 There will be a real UTF8string, i.e. ansistring with UTF-8 encoding as part
 of type information, this will help Lazarus users to get rid of the
 utf8encode/utf8decode.

When? Is this planned for 2.4?

thanks,
-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Daniël Mantione



Op Fri, 21 Nov 2008, schreef Felipe Monteiro de Carvalho:


On Thu, Nov 20, 2008 at 9:05 AM, Daniël Mantione
[EMAIL PROTECTED] wrote:

There will be a real UTF8string, i.e. ansistring with UTF-8 encoding as part
of type information, this will help Lazarus users to get rid of the
utf8encode/utf8decode.


When? Is this planned for 2.4?


See Marco's e-mail [EMAIL PROTECTED] from 10:21.

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Felipe Monteiro de Carvalho
On Fri, Nov 21, 2008 at 7:30 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
 This is easily said, please create examples and descriptions how fully
 working is defined.

// Should actually convert from widestring to utf-8 when using encoding utf-8
programa utf8test1;

{$encoding utf-8} // or is it utf8?

var
  Str: UTF8String;
begin
  Str := ção;
  if Length(Str) = 5 then Success
  else Fail;
end;

// Should work on all platforms. Passing the UTF8String to a routine
that requires
// ansistring should do the proper conversion
programa utf8test2;

{$encoding utf-8} // or is it utf8?

var
  Str: UTF8String;
begin
  Str := ção;
  WriteLn(Str);
end;

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Felipe Monteiro de Carvalho
On Fri, Nov 21, 2008 at 7:01 AM, Marco van de Voort [EMAIL PROTECTED] wrote:
 Is it? Because that might mean yet another 2.2 fixes branch release to fix
 up the delay that this will cause to 2.4

Another 2.2 fixes branch release is a good idea, because it contains a
fix for static methods which is necessary for Cocoa projects.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Vincent Snijders

Felipe Monteiro de Carvalho schreef:

On Fri, Nov 21, 2008 at 7:01 AM, Marco van de Voort [EMAIL PROTECTED] wrote:

Is it? Because that might mean yet another 2.2 fixes branch release to fix
up the delay that this will cause to 2.4


Another 2.2 fixes branch release is a good idea, because it contains a
fix for static methods which is necessary for Cocoa projects.



When Marco said yet another 2.2 fixes branch release, he meant 2.2.6.

Vincent
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Felipe Monteiro de Carvalho
On Fri, Nov 21, 2008 at 7:30 AM, Michael Schnell [EMAIL PROTECTED] wrote:
 This page does not talk about UTF8Strings being counted in code elements vs
 in code points.

 I don't consider it understood that they in any case are counted in code
 elements. IMHO this should be seriously discussed and a solution should be
 found that the user can select either way to be able to do either fast code
 or not break old code.

I prefer it to be counted in bytes. If it is counted in Bytes then I
can build a routine that counts in real chars. And we already have a
lot of code to handle utf-8 inside ansisstring which depends on that.

Counting the elements in real chars is very ineficient.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Michael Schnell



// Should actually convert from widestring to utf-8 when using encoding utf-8
programa utf8test1;
  
In fact it should automatically convert (as correctly as possible) 
between all available string types (ANSI, UTF8, UTF16).


Should provide appropriate char types for all available string types.

User selectable way of element counting in utf-strings (code element 
counting or code point counting)


The RTL would need to provide the appropriate objects (e.g. StringList) 
for all available string types.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Felipe Monteiro de Carvalho
On Fri, Nov 21, 2008 at 7:43 AM, Vincent Snijders
[EMAIL PROTECTED] wrote:
 When Marco said yet another 2.2 fixes branch release, he meant 2.2.6.

Ah, ok ... =)

So my commend would then be changed to:

Unicode is what is most discussed and needed at the moment. What is
the point in making a major release without any major change? For me
it doesn~t matter if it will take time.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Florian Klaempfl
Felipe Monteiro de Carvalho schrieb:
 On Fri, Nov 21, 2008 at 7:30 AM, Florian Klaempfl
 [EMAIL PROTECTED] wrote:
 This is easily said, please create examples and descriptions how fully
 working is defined.
 
 // Should actually convert from widestring to utf-8 when using encoding utf-8
 programa utf8test1;
 
 {$encoding utf-8} // or is it utf8?
 
 var
   Str: UTF8String;
 begin
   Str := ção;
   if Length(Str) = 5 then Success
   else Fail;
 end;
 
 // Should work on all platforms. Passing the UTF8String to a routine
 that requires
 // ansistring should do the proper conversion
 programa utf8test2;
 
 {$encoding utf-8} // or is it utf8?
 
 var
   Str: UTF8String;
 begin
   Str := ção;
   WriteLn(Str);
 end;
 

Big deal, I simply enable operator overloading for unique string types
to get this working, then everybody is happy and we've unicode support?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Felipe Monteiro de Carvalho
On Fri, Nov 21, 2008 at 7:30 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
 This is easily said, please create examples and descriptions how fully
 working is defined.

It would be really good if there was a guide, preferably in the wiki,
to explain how to add a new test case to Free Pascal.

I have already some test cases in mind, like making sure static
methods compile (an error in 2.2.2) and then after some discussion the
utf-8 test cases. At the moment I can't add the test cases because I
don't know how to.

thanks,
-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell



I prefer it to be counted in bytes. If it is counted in Bytes then I
can build a routine that counts in real chars. And we already have a
lot of code to handle utf-8 inside ansisstring which depends on that.

Counting the elements in real chars is very ineficient.
  
This is commonly agreed, But counting in code elements breaks old code 
counting in code points sometimes is more handy. That is why I vote for 
making the default syntax (s[i], pos(), copy(), ...) user selectable, 
while of course providing dedicated functions for both flavors.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Florian Klaempfl
Felipe Monteiro de Carvalho schrieb:
 On Fri, Nov 21, 2008 at 7:30 AM, Florian Klaempfl
 [EMAIL PROTECTED] wrote:
 This is easily said, please create examples and descriptions how fully
 working is defined.
 
 It would be really good if there was a guide, preferably in the wiki,
 to explain how to add a new test case to Free Pascal.
 
 I have already some test cases in mind, like making sure static
 methods compile (an error in 2.2.2) 

I'am quite sure I made a test case when I fixed it.

 and then after some discussion the
 utf-8 test cases. At the moment I can't add the test cases because I
 don't know how to.

Just create a program which returns 0 if everything was ok and another
value if it fails and attach it to a bug report. The program might only
depend on FPC units, not LCL or anything else.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Felipe Monteiro de Carvalho
On Fri, Nov 21, 2008 at 7:49 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
 Big deal, I simply enable operator overloading for unique string types
 to get this working, then everybody is happy and we've unicode support?

Indeed that could work. But the operator overloading would need to
override the widestring managed. Actually it will be a bit confuse to
have 2 methods to change the assignments: the widestring manager and
the operator overloading.

We also need a {$ to set which string type string should be.

And then we need a set of RTL routines using utf8string, but Lazarus
developers/users can write them after the other parts are working.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Graeme Geldenhuys
On Fri, Nov 21, 2008 at 11:30 AM, Michael Schnell [EMAIL PROTECTED] wrote:

 http://wiki.freepascal.org/FPC_Unicode_support#Roadmap_of_RTL_Unicode_support

 This page does not talk about UTF8Strings being counted in code elements vs
 in code points.

I only added the roadmap section, the rest of the content existed
before. You are welcome to amend the content.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Graeme Geldenhuys
On Fri, Nov 21, 2008 at 11:51 AM, Felipe Monteiro de Carvalho
[EMAIL PROTECTED] wrote:

 It would be really good if there was a guide, preferably in the wiki,
 to explain how to add a new test case to Free Pascal.

 I have already some test cases in mind, like making sure static
 methods compile (an error in 2.2.2) and then after some discussion the
 utf-8 test cases. At the moment I can't add the test cases because I
 don't know how to.

For everything but deep compiler stuff, I would think fpcUnit should
do perfectly. After all, that is what fpcUnit (or unit testing in
general) is for. And unit tests can have a GUI or Text (console) test
runner.  The latter being handy for automated runs - daily or hourly.
tiOPF project does this and we have around 1600 unit tests.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell



I only added the roadmap section, the rest of the content existed
before. You are welcome to amend the content.
  
I'd rightfully be severely bashed by those who actually will be required 
to do the work ;) .


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Mattias Gärtner
Zitat von Graeme Geldenhuys [EMAIL PROTECTED]:

 On Fri, Nov 21, 2008 at 11:45 AM, Michael Schnell [EMAIL PROTECTED] wrote:
 
  In fact it should automatically convert (as correctly as possible)
 between
  all available string types (ANSI, UTF8, UTF16).

 And the compiler should produce a warning if you assign UTF8 or UTF16
 string to a ANSI string. Mentioning that conversion is not 100%
 possible and you stand a chance to loose data.

... and a possibility to tell the compiler 'Thanks, I know. Don't bark about
this place any longer'.


Mattias

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Graeme Geldenhuys
On Fri, Nov 21, 2008 at 12:47 PM, Mattias Gärtner
[EMAIL PROTECTED] wrote:

 ... and a possibility to tell the compiler 'Thanks, I know. Don't bark about
 this place any longer'.

:-) Yes definately!  Like the wish for Parameter not being used or
Sender not being user etc... Those drive me nuts!


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] wrong rtti default value in the fixes_2_2 branch (dont know about trunk)

2008-11-21 Thread Paul Ishenin

Michael Van Canneyt wrote:
I fixed the bug in trunk. Please do some tests in Lazarus with the 12114 revision 
of the compiler. If all works still OK and the testsuites don't give any regressions, 
I'll merge it to the fix branch.
  
Here nothing bad happen - at least I had not note. If you have no 
related tracker issues then maybe you will merge your fix?


Best regards,
Paul Ishenin.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] wrong rtti default value in the fixes_2_2 branch (dont know about trunk)

2008-11-21 Thread Michael Van Canneyt


On Fri, 21 Nov 2008, Paul Ishenin wrote:

 Michael Van Canneyt wrote:
  I fixed the bug in trunk. Please do some tests in Lazarus with the 12114
  revision of the compiler. If all works still OK and the testsuites don't
  give any regressions, I'll merge it to the fix branch.

 Here nothing bad happen - at least I had not note. If you have no related
 tracker issues then maybe you will merge your fix?

I will do so tonight.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-21 Thread Marco van de Voort
In our previous episode, Felipe Monteiro de Carvalho said:
  When Marco said yet another 2.2 fixes branch release, he meant 2.2.6.
 
 Ah, ok ... =)
 
 So my commend would then be changed to:
 
 Unicode is what is most discussed and needed at the moment. What is
 the point in making a major release without any major change? For me
 it doesn~t matter if it will take time.

Both branches are divergating, and merging gets more difficult. Also all
changes (like a lot of alignment stuff for ARM) would be held up.

Note that I'm happy with either way. I just want a regular release schedule
no matter what course (early 2.4 or late 2.4) is taken, and paint the
consequences of declaring the uncodestuff.

I want to avoid self-delusion of painting optimistic timeschedules and
saying this time a major release preparation won't take as long as last
time.

It is late november now, and 2.4 preparation hasn't started, so I have
doubts we'll see 2.4 before summer, even _IF_ we don't wait for more unicode
functionality (ansistring_with_UTF8, TEncoding)


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Jonas Maebe


On 21 Nov 2008, at 10:51, Felipe Monteiro de Carvalho wrote:


It would be really good if there was a guide, preferably in the wiki,
to explain how to add a new test case to Free Pascal.


It is documented here: 
http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/tests/readme.txt?view=markup

You can find tons of examples under 
http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/tests/

Description of the subdirectory names:

stolen from Florian
test: systematic tests, usually developed by test driven development
webtbs: tests derived from bug tracker bugs requiring successful  
compilation and run
webtbf: tests derived from bug tracker bugs requiring failing of  
compilation
tbs: tests derived from non tracker reports or ideas while fixing   
something requiring successfull compilation and run
tbf: tests derived from non tracker reports or ideas while fixing  
something requiring failing compilation

/stolen


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode and Lazarus

2008-11-21 Thread Graeme Geldenhuys
On Fri, Nov 21, 2008 at 11:45 AM, Michael Schnell [EMAIL PROTECTED] wrote:

 In fact it should automatically convert (as correctly as possible) between
 all available string types (ANSI, UTF8, UTF16).

And the compiler should produce a warning if you assign UTF8 or UTF16
string to a ANSI string. Mentioning that conversion is not 100%
possible and you stand a chance to loose data.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Sergei Gorelkin

Michael Schnell wrote:



I prefer it to be counted in bytes. If it is counted in Bytes then I
can build a routine that counts in real chars. And we already have a
lot of code to handle utf-8 inside ansisstring which depends on that.

Counting the elements in real chars is very ineficient.
  
This is commonly agreed, But counting in code elements breaks old code 
counting in code points sometimes is more handy. That is why I vote for 
making the default syntax (s[i], pos(), copy(), ...) user selectable, 
while of course providing dedicated functions for both flavors.




If Length() would return its value in chars, what length in *bytes* 
would the following call set:


SetLength(utfstring_1), Length(utfstring_2));

??

Regards,
Sergei


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell






If Length() would return its value in chars, what length in *bytes* 
would the following call set:


SetLength(utfstring_1), Length(utfstring_2));


I don't really understand your question.

I think would would need to have two different function

UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String), first 
giving the string length in code elements (byte) and second giving the 
length in code points (unicode characters),


So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would be 1.

I think we should have a third function Length(UTF8String) that can be 
selected by the user (e.g. via a {$ option to be mapped to wither of the 
two.


The same would be necessary for the SetLength function

e.g.
(1) UTF8ElementSetLength(utfstring_1), UTF8ElementLength(utfstring_2));
or
(2) UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));

(2) would work as expected if the purpose i to delete all but the first 
n characters in a string.


I don't see a decent use for (1) other than creating a string long 
enough to use as a buffer for e.g. TStream.read.


I do see that there in fact is a compatibility problem when porting old 
code with the setting of UTF8Count=Point.


here

SetLength(utfstring_1), Length(utfstring_2)); would be translated as
UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));

which does not make sense if UTF8PointLength(utfstring_1) is smaller 
than UTF8PointLength(utfstring_2).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Jonas Maebe


On 21 Nov 2008, at 14:50, Michael Schnell wrote:

If Length() would return its value in chars, what length in *bytes*  
would the following call set:


SetLength(utfstring_1), Length(utfstring_2));


I don't really understand your question.

I think would would need to have two different function

UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String),  
first giving the string length in code elements (byte) and second  
giving the length in code points (unicode characters),


So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would  
be 1.


Or 2, depending on whether it's predcomposed or decomposed.

I think we should have a third function Length(UTF8String) that can  
be selected by the user (e.g. via a {$ option to be mapped to wither  
of the two.


He's simply talking about the case where Length is mapped to your  
proposed UTF8PointLength.


I do see that there in fact is a compatibility problem when porting  
old code with the setting of UTF8Count=Point.


here

SetLength(utfstring_1), Length(utfstring_2)); would be translated as
UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));

which does not make sense if UTF8PointLength(utfstring_1) is smaller  
than UTF8PointLength(utfstring_2).



It does not make any sense under any circumstances, because there is  
no way for UTF8PointSetLength to know how many bytes it has to  
allocate when you pass a value (any value, regardless of where it  
comes from) to it.



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Sergei Gorelkin

Michael Schnell wrote:


I don't really understand your question.

I think would would need to have two different function

UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String), first 
giving the string length in code elements (byte) and second giving the 
length in code points (unicode characters),


So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would be 1.

I think we should have a third function Length(UTF8String) that can be 
selected by the user (e.g. via a {$ option to be mapped to wither of the 
two.


The same would be necessary for the SetLength function

e.g.
(1) UTF8ElementSetLength(utfstring_1), UTF8ElementLength(utfstring_2));
or
(2) UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));

(2) would work as expected if the purpose i to delete all but the first 
n characters in a string.


I don't see a decent use for (1) other than creating a string long 
enough to use as a buffer for e.g. TStream.read.


I do see that there in fact is a compatibility problem when porting old 
code with the setting of UTF8Count=Point.


here

SetLength(utfstring_1), Length(utfstring_2)); would be translated as
UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));

which does not make sense if UTF8PointLength(utfstring_1) is smaller 
than UTF8PointLength(utfstring_2).


The SetLength function is used mostly for allocating the storage for the 
new strings. Yes, it can be used for truncating the overlong strings, 
but truncating can be perfectly done with Delete (or UTF8Delete).


As you mentioned yourself, allocating utf-8 strings using length in 
codepoints is senseless. This is exactly what I wanted to say initially.


What follows is that for calls like SetLength(str1, Pos('foo', str2)) 
you also cannot freely change the return value of Pos() from elements to 
codepoints. And so on, and so forth.


Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell


So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would 
be 1.

Or 2, depending on whether it's predcomposed or decomposed.
I seem to remember that we discussed this some time ago and the result 
was that the compose (MAC style ?) characters in fact are a single code 
point (Unicode character) that consists of two (maybe more ? ) complete 
code points that are tied together by some special coding, so IMHO it 
can be considered as a single Unicode character in both cases. If this 
would result in a huge table of possibly composed characters I thing we 
would stick to the concept of providing  a decent functionality and 
restrict on those that are currently used by the customers we normally 
address (Mac in Europe and America). A method to provide an extended 
composition table should be provided to have those help themselves who 
really need it.
which does not make sense if UTF8PointLength(utfstring_1) is smaller 
than UTF8PointLength(utfstring_2).
It does not make any sense under any circumstances, because there is 
no way for UTF8PointSetLength to know how many bytes it has to 
allocate when you pass a value (any value, regardless of where it 
comes from) to it.
If UTF8PointLength(utfstring_1) is greater than 
UTF8PointLength(utfstring_2) no new bytes need to be allocated but the 
function is just equivalent to


utfstring1 := UTF8PointCopy(utfstring1, 1, UTF8PointLength(utfstring_2));

To me this does not seem to impose any problem.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell


you also cannot freely change the return value of Pos() from elements 
to codepoints.
Of course the counting needs to be consistent for all string functions. 
So changing it on the fly is dangerous (if you keep a count value in 
an integer variable). But this is up to the user.


-Michael

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Jonas Maebe


On 21 Nov 2008, at 16:16, Michael Schnell wrote:

So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü')  
would be 1.

Or 2, depending on whether it's predcomposed or decomposed.
I seem to remember that we discussed this some time ago and the  
result was that the compose (MAC style ?)


Decomposed and precomposed have nothing to do with Windows vs Mac OS X  
vs Linux vs whatever. They are both equally valid ways to represent  
UTF strings and both have their uses (on all platforms). All programs  
should also be prepared to deal with them, since you never know what  
kind of input you will get.


characters in fact are a single code point (Unicode character) that  
consists of two (maybe more ? ) complete code points that are tied  
together by some special coding, so IMHO it can be considered as a  
single Unicode character in both cases. If this would result in a  
huge table of possibly composed characters I thing we would stick to  
the concept of providing  a decent functionality and restrict on  
those that are currently used by the customers we normally address  
(Mac in Europe and America).


I think you are talking about a different we. Further, inventing our  
own meanings of what a code point or unicode character means is an  
extremely bad idea (you'd also have to rename UTF*Point* routines to  
UTF*FPCLikeChar* so they properly indicate the fact that they do not  
deal with code points). UTF by itself already has enough variations to  
deal with, we will not add our own.


which does not make sense if UTF8PointLength(utfstring_1) is  
smaller than UTF8PointLength(utfstring_2).
It does not make any sense under any circumstances, because there  
is no way for UTF8PointSetLength to know how many bytes it has to  
allocate when you pass a value (any value, regardless of where it  
comes from) to it.
If UTF8PointLength(utfstring_1) is greater than  
UTF8PointLength(utfstring_2) no new bytes need to be allocated


but the function is just equivalent to

utfstring1 := UTF8PointCopy(utfstring1, 1,  
UTF8PointLength(utfstring_2));


To me this does not seem to impose any problem.


Except if the point is to reserve exactly enough space for utfstring1  
and to overwrite its contents with something else afterwards (using  
move() or whatever). That's a very common use of setlength (at least  
in the FPC run time library, and I guess elsewhere as well). The fact  
that it also doesn't work if the string has to be made longer is  
basically the same problem.


Your system just does not work, and the more examples you give the  
more it falls down, as far as I can see. Please first write a wiki  
page explaining how to deal with all cases, or at least noting which  
cases will not work. Only then it is possible to decide on whether or  
not it is both feasible and worthwhile to go through the trouble of  
implementing all this. Without it, I feel I am mainly wasting my time  
writing these mails because it seems you haven't thought it through  
yet at all.



Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell
If your point is that there is no way to allow for legacy code to be 
used with a String type that holds UTF8 code and that it is not 
possible (or desirable) to allow for code used in simple occasions that 
is understandable to someone who does not want to go into the complete 
depth of the UTF8, I can totally accept this.


But in that case the normal user just should not use UTF8 (but 
WideStrings that in most European/American Projects can be considered 
to be UCS2 coded (This is the way that D2009 seems to go).


With that of course the UTF8 API of LCL is not at all desirable,.

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Daniël Mantione



Op Fri, 21 Nov 2008, schreef Michael Schnell:

If your point is that there is no way to allow for legacy code to be used 
with a String type that holds UTF8 code and that it is not possible (or 
desirable) to allow for code used in simple occasions that is understandable 
to someone who does not want to go into the complete depth of the UTF8, I can 
totally accept this.


Legacy code that assumes ASCII can be used in UTF-8. Code that needs to 
deal with higher code points needs to be rewritten and the user must 
understand the full UTF-8 spec. There is no other way to hide this.


But in that case the normal user just should not use UTF8 (but WideStrings 
that in most European/American Projects can be considered to be UCS2 coded 
(This is the way that D2009 seems to go).


I agree with your observation.


With that of course the UTF8 API of LCL is not at all desirable,.


LCL had its reasons to go UTF8.

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Florian Klaempfl
Folks, before your waste your time again with endless discussions, have
a look at Yury's work on an unicode rtl, test it and help with patches
and suggestions, it's available in svn at
http://svn.freepascal.org/svn/fpc/branches/unicodertl
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Michael Schnell


Legacy code that assumes ASCII can be used in UTF-8. Code that needs 
to deal with higher code points needs to be rewritten 
This is any Program that formerly used (ANSIS) String and now is 
automatically converted to use UTF8 and that is to be released in 
Germany, France 



With that of course the UTF8 API of LCL is not at all desirable,.

LCL had its reasons to go UTF8.
And thus forces all users to understand the full UTF-8 spec and to 
rewrite their programs, even though the old code perfectly compiles and 
up to a certain extent seems to work.


This is what I think is not at all desirable :( .

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Yury Sidorov

From: Florian Klaempfl [EMAIL PROTECTED]
Folks, before your waste your time again with endless discussions, 
have
a look at Yury's work on an unicode rtl, test it and help with 
patches

and suggestions, it's available in svn at
http://svn.freepascal.org/svn/fpc/branches/unicodertl


It is works for win32 only for now. Only system unit is finished. Work 
in progress...


Yury. 
___

fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Felipe Monteiro de Carvalho
On Fri, Nov 21, 2008 at 2:42 PM, Michael Schnell [EMAIL PROTECTED] wrote:
 And thus forces all users to understand the full UTF-8 spec and to rewrite
 their programs, even though the old code perfectly compiles and up to a
 certain extent seems to work.

 This is what I think is not at all desirable :( .

Your comments are absolutely vague and meaningless. Not to mention
thay also don't propose an alternative.

Sorry to be blunt, but so were your comments.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Martin Friebe

Felipe Monteiro de Carvalho wrote:

On Fri, Nov 21, 2008 at 2:42 PM, Michael Schnell [EMAIL PROTECTED] wrote:
  

And thus forces all users to understand the full UTF-8 spec and to rewrite
their programs, even though the old code perfectly compiles and up to a
certain extent seems to work.

This is what I think is not at all desirable :( .


Your comments are absolutely vague and meaningless. Not to mention
thay also don't propose an alternative.

Sorry to be blunt, but so were your comments


I must agree with the FPC can not to it all automatically line (as 
much as I regret, and admit the beauty there was, if fpc could).


What I mean is:

1) Any Application/Program, that currently compiles and works (using 
none utf8, never mind if ascii or ansi) will keep working, if compiled 
using *none* utf8 mode.


2) If such a program wants to be compiled to be extended to utf8 
support, then there is a need for decisions that can not be made without 
knowledge what the program is doing. Or even within the same program in 
which context the operation takes place.
Such knowledge is only available to the programmer of this application, 
therefore the application must be changed to include this decisions. FPC 
simple can not make them. (And even {$SWITCH} would not solve the issue.)


Example is the composed and decomposed ü:

- If you edit a text (human readable text), or search in a text, you 
certainly do want to handle both representations as equals (a Find 
dialog must find both)
- If the same text editor saves the file, it must handle them as non 
equal.   Assume the user has 2 files wünsche.txt in the same folder. 
The filesystem allows this, because one of them is decomposed and one is 
composed.  If the user had opened a text from the composed version, it 
should be written back to the composed version. If the user had opened 
it from the decomposed version it must be written back to the decomposed 
version. Otherwise a completely unrelated file would simply be 
overwritten, and the contents lost. (the same applies if the application 
iterates through the directory content and compares file names. So here 
the same compare version that would be used by the Find dialog must 
behave different)


FPC can simply not know, if a string contains a file name, which must be 
kept exactly as it, or a string contains some human readable text, which 
would benefit from a normalisation.


If you are going to put a compiler switch in front of each statement to 
indicate the needs, you may as well change the statements. There is no 
one statement for the whole application, as both of the above example 
occur within a single application.


You could use two different UTF8Strings which behave different on 
decomposed chars (I am *not* proposing this as a solution). But then you 
can not just recompile your app by saying string now means UTF8String 
throughout the whole application. You have again to  go through all of 
the source code and edit the app. So you may as well just go through the 
sourcecode, and add the appropriate utf8-clean up calls to those part in 
the code, that will need it.


In the end, switching an application to unicode means that within the 
same app different parts are going to need different handling of unicode 
(where no such difference existed for ascii/ansi). And no compiler can 
figure out which part will need which behaviour.



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] wrong rtti default value in the fixes_2_2 branch (dont know about trunk)

2008-11-21 Thread Michael Van Canneyt


On Fri, 21 Nov 2008, Paul Ishenin wrote:

 Michael Van Canneyt wrote:
  I fixed the bug in trunk. Please do some tests in Lazarus with the 12114
  revision of the compiler. If all works still OK and the testsuites don't
  give any regressions, I'll merge it to the fix branch.

 Here nothing bad happen - at least I had not note. If you have no related
 tracker issues then maybe you will merge your fix?

Merged.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] new 27 page document describing Unicode support in D2009

2008-11-21 Thread Graeme Geldenhuys
Hello,

I thought you guys might find this interesting. It's a new 27 page
document describing Unicode support in D2009.

http://dn.codegear.com/article/38980

--
Abstract: Learn more about the new Unicode support in Delphi 2009 and
CodeGear RAD Studio 2009 in this white paper by Marco Cantù

Delphi and Unicode

One of the most relevant new features of Delphi 2009 is its complete
support for the Unicode character set. While Delphi applications
written exclusively for the English language and based on a
26-character alphabet were already working fine and will keep working
fine in Delphi 2009, applications written for most other languages
spoken around the world will have a distinct benefit by this change.

Learn more about Unicode in Delphi 2009 and CodeGear RAD Studio 2009
in this white paper.
--

Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Graeme Geldenhuys
On Fri, Nov 21, 2008 at 11:08 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:

 I thought you guys might find this interesting. It's a new 27 page
 document describing Unicode support in D2009.

 http://dn.codegear.com/article/38980

Seeing that I don't own D2009 and only read about it's Unicode support
I found some of the information interesting - and it was things we
argued about in this mailing list.

For example:

1...
  Length() returns the bytes for UTF8String
  but Length() returns the elements (what we know as characters) for
String or UTF16 strings.
  Length() also returns bytes for AnsiString.


var
  str8: Utf8String;
  str16: string;
begin
  str8 := 'Cantù';
  Memo1.Lines.Add ('UTF-8');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str8)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str8[5])));
  Memo1.Lines.Add('6: ' + IntToStr (Ord (str8[6])));
  str16 := str8;
  Memo1.Lines.Add ('UTF-16');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str16)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str16[5])));
As you might expect, the str8 string has a length of 6 (meaning 6
bytes), while the str16
string has a length of 5 (meaning 10 bytes, though). Notice that
Length invariably returns the
number of string elements, which in case of variable-length
representations don't match the
number of Unicode code points represented by the string. This is the
output of the program:
UTF-8
Length: 6
5: 195
6: 185
UTF-16
Length: 5
5: 249



2...   TStrings can now take an encoding parameter to specify how it
should load or save files.

-
STREAMING TSTRINGS
The ReadFromFile and WriteToFile methods of the TStrings class can be
called with
an encoding. If you write a string list to text file without providing
a specific encoding, the class
will use TEncoding.Default, which uses the internal DefaultEncoding in turn
extracted at the first occurrence by the current Windows code page. In
other words, if you save
a file you'll get the same ANSI file as before.
Of course, you can also easily force the file to a different format,
for example the UTF-16 format:

Memo1.Lines.SaveToFile('test.txt',  TEncoding.Unicode);
-


anyway, there are a lot more interesting facts in this document. Well
worth reading to get a better understanding of unicode.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
  I thought you guys might find this interesting. It's a new 27 page
  document describing Unicode support in D2009.
 
  http://dn.codegear.com/article/38980
 
 Seeing that I don't own D2009 and only read about it's Unicode support
 I found some of the information interesting - and it was things we
 argued about in this mailing list.

This is all information that is already on the blogs since July. Note that
Tcharacter is a sealed class, something that FPC doesn't support yet.

The whole tencoding/tcharacter is a bastard-class stuff seems to be out of .NET
compatibility (as noted in the document), but Borland changed course of its
.NET efforts after Tiburon. Sigh.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Luiz Americo Pereira Camara

Graeme Geldenhuys escreveu:

On Fri, Nov 21, 2008 at 11:08 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:
  

I thought you guys might find this interesting. It's a new 27 page
document describing Unicode support in D2009.

http://dn.codegear.com/article/38980



Seeing that I don't own D2009 and only read about it's Unicode support
I found some of the information interesting - and it was things we
argued about in this mailing list.

For example:

1...
  Length() returns the bytes for UTF8String
  but Length() returns the elements (what we know as characters) for
String or UTF16 strings.
  


No Length for String will return the number of Code Units (the number of 
WideChar in UnicodeString case). When there's surrogate pairs it will 
differ the number of Code Points (Characters) and Code Units. See the 
excerpt:



A way to create a string with surrogate pairs is to use the 
ConvertFromUtf32 function that
returns a string with the surrogate pair (two WideChar) in the proper 
circumstances, like the

following:

var
str1: string;
begin
str1 := 'Surr. ' + ConvertFromUtf32($1D11E);

Now if you ask for the string length, you'll get 8, which is the number 
of WideChar, but not the
number of logical Unicode code points in the string. If you print the 
string you get the proper
effect (well, at least Windows will generally show one square block as 
placeholder of the

surrogate pair, rather than two).




  Length() also returns bytes for AnsiString.


var
  str8: Utf8String;
  str16: string;
begin
  str8 := 'Cantù';
  Memo1.Lines.Add ('UTF-8');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str8)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str8[5])));
  Memo1.Lines.Add('6: ' + IntToStr (Ord (str8[6])));
  str16 := str8;
  Memo1.Lines.Add ('UTF-16');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str16)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str16[5])));
As you might expect, the str8 string has a length of 6 (meaning 6
bytes), while the str16
string has a length of 5 (meaning 10 bytes, though). Notice that
Length invariably returns the
number of string elements, which in case of variable-length
representations don't match the
number of Unicode code points represented by the string. This is the
output of the program:
UTF-8
Length: 6
5: 195
6: 185
UTF-16
Length: 5
5: 249



2...   TStrings can now take an encoding parameter to specify how it
should load or save files.

-
STREAMING TSTRINGS
The ReadFromFile and WriteToFile methods of the TStrings class can be
called with
an encoding. If you write a string list to text file without providing
a specific encoding, the class
will use TEncoding.Default, which uses the internal DefaultEncoding in turn
extracted at the first occurrence by the current Windows code page. In
other words, if you save
a file you'll get the same ANSI file as before.
Of course, you can also easily force the file to a different format,
for example the UTF-16 format:

Memo1.Lines.SaveToFile('test.txt',  TEncoding.Unicode);
-


anyway, there are a lot more interesting facts in this document. Well
worth reading to get a better understanding of unicode.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
  



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
  


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Sergei Gorelkin

Graeme Geldenhuys wrote:

On Fri, Nov 21, 2008 at 11:08 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:

I thought you guys might find this interesting. It's a new 27 page
document describing Unicode support in D2009.

http://dn.codegear.com/article/38980


Seeing that I don't own D2009 and only read about it's Unicode support
I found some of the information interesting - and it was things we
argued about in this mailing list.

Well, with exclusion of the class helper for TStrings (notable is that 
they call it a hack themselves :) the design looks rather clean. Since 
each string stores its element size, both ansi and unicode strings are 
probably handled with common set of procedures, avoiding RTL size bloat.


And they explain why there is no compiler option for switching back and 
forth.


Unfortunately, the article does not provide information about how things 
like Pos() and Copy() work with utf8 strings. However, one may 
understand words utf-8 support is more limited than utf-16 as they 
continue to work with elements (bytes).


Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Luiz Americo Pereira Camara

Sergei Gorelkin escreveu:


Well, with exclusion of the class helper for TStrings (notable is 
that they call it a hack themselves :) the design looks rather clean. 
Since each string stores its element size, both ansi and unicode 
strings are probably handled with common set of procedures, avoiding 
RTL size bloat.




I also like the design since is flexible enough to allow the programmer 
work with different encodings.


And they explain why there is no compiler option for switching back 
and forth.


Unfortunately, the article does not provide information about how 
things like Pos() and Copy() work with utf8 strings. 
Here ( http://www.jacobthurman.com/?p=30  see comments) there's an 
explanation about those functions. Basically they will handle Code Units 
and not Code Points (characters)


However, one may understand words utf-8 support is more limited than 
utf-16 as they continue to work with elements (bytes).



Yes. This is a good decision also IMO.

Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support in RTL - Roadmap

2008-11-21 Thread Luiz Americo Pereira Camara

Graeme Geldenhuys escreveu:

Hi,

I have added a Roadmap section in the following wiki page. If you find
anything missing or not 100% implemented, please add it to the wiki
page.

http://wiki.freepascal.org/FPC_Unicode_support#Roadmap_of_RTL_Unicode_support
  


I started a wiki page to list the use cases where the developers (fpc 
users) are facing problems when dealing with Unicode. This can be useful 
to define what the programmers are expecting from the fpc Unicode 
support. Optionally, suggestion can be made to how fpc can handle each case.


http://wiki.freepascal.org/unicode_use_cases

Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel