[fpc-devel] Unicode support (again)

2008-11-10 Thread Graeme Geldenhuys
Hi everybody,

I know we had so many discussions on how to implement Unicode support
in FPC in the past. From what i remember, lots was based on lets see
what CodeGear does with D2009.

So now that D2009 is out, is there any further working being done on
Unicode support in FPC?  Is anybody working on it at the moment? If
so, is there something I can help test?  Anybody know what CodeGear
did with Unicode enabled locale strings like ThousandSeparator etc...?
 I remember FPC has a major issue with locale information and Unicode
support.

Also, if FPC is not going to follow Delphi's implementation to the
letter, could somebody summarize what has already been decided for
FPC+Unicode?


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (again)

2008-11-10 Thread Jonas Maebe


On 10 Nov 2008, at 17:00, Vincent Snijders wrote:


procedure TForm1.Button1Click(Sender: TObject);
var
 w: widestring;
 i: integer;
begin
 w := UTF8Decode('hallo äöü');
 Edit1.Caption := UTF8Encode(w);


Note that if the file has been saved using an UTF-8 BOM, then the  
compiler will at compile time create a widestring containing the  
UTF-16 version of 'hallo äöü'. If you then pass this to a function  
expecting an ansistring (such as UTF8Decode), then the widestring  
manager will be used to decode that string and this decoded string  
will be passed to UTF8Decode. So then you'll pass an ansi-encoded  
string to UTF8Decode rather than an UTF-8-encoded string (unless ansi  
= utf-8 for the current execution).


It seems much more advisable to me to save the file with an UTF-8 BOM,  
or even better to add {$encoding utf-8} (and/or to pass -Fcutf-8 to  
the compiler) and then just use


Edit1.Caption := UTF8Encode('hallo äöü');


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (again)

2008-11-10 Thread Jonas Maebe


On 10 Nov 2008, at 17:22, Jonas Maebe wrote:

It seems much more advisable to me to save the file with an UTF-8  
BOM, or even better to add {$encoding utf-8}


Well, {$codepage utf-8}


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (again)

2008-11-10 Thread Vincent Snijders

Michael Schnell schreef:
I found that the current FPC does have Unicode support, but there are 
some problems.




I am going to give it another try, maybe it helps somebody.

- by design (for speed sake), UTF8String (and WideString when surrogate 
codes are used) count in subcodes and not in Unicode-Characters, so the 
behavior is unexpected when doing things like s[i], pos(s), copy(), 
delete(), ... There are not _slow_ functions that do the expected 
versions of s[i], pos(s), copy(), delete(), ... (I've yet to find out 
how I can print just the first character of an UTF8String :)


- there are different option on how the compiler expects the coding of 
the source file. Seemingly if it detects it to be UTF8 coded and a 
certain (otherwise correct) option is set, even s := 'hallo äöü';  
does not work correctly as expected if s is a WideString. (Lazarus with 
default settings suffers from this problem).


Create a new lazarus project, drop a memo, button and edit on a form and add the 
lclproc unit. Create a button on click handler and add the following code:


procedure TForm1.Button1Click(Sender: TObject);
var
  w: widestring;
  i: integer;
begin
  w := UTF8Decode('hallo äöü');
  Edit1.Caption := UTF8Encode(w);
  Memo1.Clear;
  for i := 1 to UTF8Length(Edit1.Caption) do
Memo1.Lines.Add(UTF8Copy(Edit1.Caption, i,1));
end;

IMHO, this is working fine.

Vincent
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (again)

2008-11-10 Thread Jonas Maebe


On 10 Nov 2008, at 16:48, Michael Schnell wrote:

- there are different option on how the compiler expects the coding  
of the source file. Seemingly if it detects it to be UTF8 coded


The compiler only sets the encoding of the source to UTF-8 if the file  
identifies itself as I am UTF-8 encoded (by starting with an UTF-8  
BOM). The compiler does not guess about the encoding in any way.



and a certain (otherwise correct) option is set,


Which option?

even s := 'hallo äöü';  does not work correctly as expected if s  
is a WideString. (Lazarus with default settings suffers from this  
problem).




Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (again)

2008-11-10 Thread Felipe Monteiro de Carvalho
On Mon, Nov 10, 2008 at 1:48 PM, Michael Schnell [EMAIL PROTECTED] wrote:
, ... There are not _slow_ functions that do the expected versions
 of s[i], pos(s), copy(), delete(), ... (I've yet to find out how I can print
 just the first character of an UTF8String :)

Lazarus has a set of utf-8 ready routines, using utf-8 inside of a ansistring.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (again)

2008-11-10 Thread Mattias Gaertner
On Mon, 10 Nov 2008 15:04:01 -0200
Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote:

 On Mon, Nov 10, 2008 at 1:48 PM, Michael Schnell [EMAIL PROTECTED]
 wrote:
 , ... There are not _slow_ functions that do the expected versions
  of s[i], pos(s), copy(), delete(), ... (I've yet to find out how I
  can print just the first character of an UTF8String :)
 
 Lazarus has a set of utf-8 ready routines, using utf-8 inside of a
 ansistring.

Yes. 
Keep in mind that they work in unicode code points. Composed characters
are treated as several units. For example umlaute can be 2 characters
(3 bytes). Same problem for UTF-16 with widestrings. 
We still need a normalize function.


Mattias
 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Re: Unicode support (again)

2008-11-10 Thread Graeme Geldenhuys
On Mon, Nov 10, 2008 at 4:54 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:
 I know we had so many discussions on how to implement Unicode support
 in FPC in the past. From what i remember, lots was based on lets see
 what CodeGear does with D2009.

OK, so here goes again yet another discussion... :-)

What I meant is, does Delphi 2009 solve all these issue you guys have
just mentioned?

* Unicode source code?
* Copy, Pos etc functions?
* Nomalization?
* Does Vincent's example works as follows in D2009:

procedure TForm1.Button1Click(Sender: TObject);
var
 i: integer;
begin
 Edit1.Caption := 'hallo äöü';
 Memo1.Clear;
 for i := 1 to Length(Edit1.Caption) do
   Memo1.Lines.Add(Copy(Edit1.Caption, i,1));
end;


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] asm offset question

2008-11-10 Thread ABorka
True, but if the programs only run on PCs (Windows and Linux on Intel 
processors in this case) it should work. Not everyone who considers 
using FPC/Lazarus wants to run the compiled programs on 15 platforms. 
Sometimes all that is needed is 1 platform.
There are some encrypting/decrypting functions that would be hard to 
write in other than asm, especially if they are already done.


Some projects I just wanted to try to compile strictly on Windows 
without wanting to go to other OS-es. Just to see if it is possible at 
all to move projects over to FPC/Lazarus from Delphi with a reasonable 
amount of work. Until that doesn't work, going to other OS-es is not 
really viable anyway.
Not having a debugger in Lazarus with a properly working watch window 
and CPU window doesn't help either to do these tasks.


The offset, aam and aad asm instructions are not working as they 
are in Delphi, but there are workarounds fortunately, until they will be 
fixed in FPC, if ever.


http://bugs.freepascal.org/view.php?id=12595

aam 16 (or aam 8, etc.)
aad 16 (or aad 8, etc.)
Both give compiler errors and points to the wrong source line. Need to 
use db instructions to put the machine code there to circumvent it.


move esi, offset variable
Gives compiler error but the functionality can be replaced with the lea 
esi, variable instruction instead as a workaround.




Michael Schnell wrote:
IMHO, it's not a good idea to port ASM code to TP (as TP's purpose is 
platform independence). So at best you should rewrite this in Pascal. 
Normally with modern PCs the performance decrease is not noticeable.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] asm offset question

2008-11-10 Thread Michael Schnell
IMHO, it's not a good idea to port ASM code to TP (as TP's purpose is 
platform independence). So at best you should rewrite this in Pascal. 
Normally with modern PCs the performance decrease is not noticeable.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Zero terminated strings

2008-11-10 Thread dmitry boyarintsev
strings are always null-terminated, for delphi compatibility.

zero char is located at s[length(s)+1], but should never accessed directly.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel