[fpc-devel] Unicode support (again)
Hi everybody, I know we had so many discussions on how to implement Unicode support in FPC in the past. From what i remember, lots was based on lets see what CodeGear does with D2009. So now that D2009 is out, is there any further working being done on Unicode support in FPC? Is anybody working on it at the moment? If so, is there something I can help test? Anybody know what CodeGear did with Unicode enabled locale strings like ThousandSeparator etc...? I remember FPC has a major issue with locale information and Unicode support. Also, if FPC is not going to follow Delphi's implementation to the letter, could somebody summarize what has already been decided for FPC+Unicode? Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode support (again)
On 10 Nov 2008, at 17:00, Vincent Snijders wrote: procedure TForm1.Button1Click(Sender: TObject); var w: widestring; i: integer; begin w := UTF8Decode('hallo äöü'); Edit1.Caption := UTF8Encode(w); Note that if the file has been saved using an UTF-8 BOM, then the compiler will at compile time create a widestring containing the UTF-16 version of 'hallo äöü'. If you then pass this to a function expecting an ansistring (such as UTF8Decode), then the widestring manager will be used to decode that string and this decoded string will be passed to UTF8Decode. So then you'll pass an ansi-encoded string to UTF8Decode rather than an UTF-8-encoded string (unless ansi = utf-8 for the current execution). It seems much more advisable to me to save the file with an UTF-8 BOM, or even better to add {$encoding utf-8} (and/or to pass -Fcutf-8 to the compiler) and then just use Edit1.Caption := UTF8Encode('hallo äöü'); Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode support (again)
On 10 Nov 2008, at 17:22, Jonas Maebe wrote: It seems much more advisable to me to save the file with an UTF-8 BOM, or even better to add {$encoding utf-8} Well, {$codepage utf-8} Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode support (again)
Michael Schnell schreef: I found that the current FPC does have Unicode support, but there are some problems. I am going to give it another try, maybe it helps somebody. - by design (for speed sake), UTF8String (and WideString when surrogate codes are used) count in subcodes and not in Unicode-Characters, so the behavior is unexpected when doing things like s[i], pos(s), copy(), delete(), ... There are not _slow_ functions that do the expected versions of s[i], pos(s), copy(), delete(), ... (I've yet to find out how I can print just the first character of an UTF8String :) - there are different option on how the compiler expects the coding of the source file. Seemingly if it detects it to be UTF8 coded and a certain (otherwise correct) option is set, even s := 'hallo äöü'; does not work correctly as expected if s is a WideString. (Lazarus with default settings suffers from this problem). Create a new lazarus project, drop a memo, button and edit on a form and add the lclproc unit. Create a button on click handler and add the following code: procedure TForm1.Button1Click(Sender: TObject); var w: widestring; i: integer; begin w := UTF8Decode('hallo äöü'); Edit1.Caption := UTF8Encode(w); Memo1.Clear; for i := 1 to UTF8Length(Edit1.Caption) do Memo1.Lines.Add(UTF8Copy(Edit1.Caption, i,1)); end; IMHO, this is working fine. Vincent ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode support (again)
On 10 Nov 2008, at 16:48, Michael Schnell wrote: - there are different option on how the compiler expects the coding of the source file. Seemingly if it detects it to be UTF8 coded The compiler only sets the encoding of the source to UTF-8 if the file identifies itself as I am UTF-8 encoded (by starting with an UTF-8 BOM). The compiler does not guess about the encoding in any way. and a certain (otherwise correct) option is set, Which option? even s := 'hallo äöü'; does not work correctly as expected if s is a WideString. (Lazarus with default settings suffers from this problem). Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode support (again)
On Mon, Nov 10, 2008 at 1:48 PM, Michael Schnell [EMAIL PROTECTED] wrote: , ... There are not _slow_ functions that do the expected versions of s[i], pos(s), copy(), delete(), ... (I've yet to find out how I can print just the first character of an UTF8String :) Lazarus has a set of utf-8 ready routines, using utf-8 inside of a ansistring. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode support (again)
On Mon, 10 Nov 2008 15:04:01 -0200 Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote: On Mon, Nov 10, 2008 at 1:48 PM, Michael Schnell [EMAIL PROTECTED] wrote: , ... There are not _slow_ functions that do the expected versions of s[i], pos(s), copy(), delete(), ... (I've yet to find out how I can print just the first character of an UTF8String :) Lazarus has a set of utf-8 ready routines, using utf-8 inside of a ansistring. Yes. Keep in mind that they work in unicode code points. Composed characters are treated as several units. For example umlaute can be 2 characters (3 bytes). Same problem for UTF-16 with widestrings. We still need a normalize function. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] Re: Unicode support (again)
On Mon, Nov 10, 2008 at 4:54 PM, Graeme Geldenhuys [EMAIL PROTECTED] wrote: I know we had so many discussions on how to implement Unicode support in FPC in the past. From what i remember, lots was based on lets see what CodeGear does with D2009. OK, so here goes again yet another discussion... :-) What I meant is, does Delphi 2009 solve all these issue you guys have just mentioned? * Unicode source code? * Copy, Pos etc functions? * Nomalization? * Does Vincent's example works as follows in D2009: procedure TForm1.Button1Click(Sender: TObject); var i: integer; begin Edit1.Caption := 'hallo äöü'; Memo1.Clear; for i := 1 to Length(Edit1.Caption) do Memo1.Lines.Add(Copy(Edit1.Caption, i,1)); end; Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] asm offset question
True, but if the programs only run on PCs (Windows and Linux on Intel processors in this case) it should work. Not everyone who considers using FPC/Lazarus wants to run the compiled programs on 15 platforms. Sometimes all that is needed is 1 platform. There are some encrypting/decrypting functions that would be hard to write in other than asm, especially if they are already done. Some projects I just wanted to try to compile strictly on Windows without wanting to go to other OS-es. Just to see if it is possible at all to move projects over to FPC/Lazarus from Delphi with a reasonable amount of work. Until that doesn't work, going to other OS-es is not really viable anyway. Not having a debugger in Lazarus with a properly working watch window and CPU window doesn't help either to do these tasks. The offset, aam and aad asm instructions are not working as they are in Delphi, but there are workarounds fortunately, until they will be fixed in FPC, if ever. http://bugs.freepascal.org/view.php?id=12595 aam 16 (or aam 8, etc.) aad 16 (or aad 8, etc.) Both give compiler errors and points to the wrong source line. Need to use db instructions to put the machine code there to circumvent it. move esi, offset variable Gives compiler error but the functionality can be replaced with the lea esi, variable instruction instead as a workaround. Michael Schnell wrote: IMHO, it's not a good idea to port ASM code to TP (as TP's purpose is platform independence). So at best you should rewrite this in Pascal. Normally with modern PCs the performance decrease is not noticeable. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] asm offset question
IMHO, it's not a good idea to port ASM code to TP (as TP's purpose is platform independence). So at best you should rewrite this in Pascal. Normally with modern PCs the performance decrease is not noticeable. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Zero terminated strings
strings are always null-terminated, for delphi compatibility. zero char is located at s[length(s)+1], but should never accessed directly. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel