[fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Graeme Geldenhuys
Hello again, We are seeing more and more hacks being applied to projects trying to scramble around the missing FPC feature - no built-in Unicode supporting. A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo. Normally you would write code as follows (for ANSI text):

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread dmitry boyarintsev
shorter (and faster) hacky crap: ls := TStringList.Create; ls.LoadFromFile('someunicodefile.txt'); Memo.Text := UTF8Encode(ls.Text); ls.Free ___ fpc-devel maillist - fpc-devel@lists.freepascal.org

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Florian Klaempfl
Graeme Geldenhuys schrieb: Hello again, We are seeing more and more hacks being applied to projects trying to scramble around the missing FPC feature - no built-in Unicode supporting. A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo. Normally you would write code as

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys: All that crap just to load a simple text file that contains unicode content!!! :-( And the other problem is that the hack above assumes the files content is UTF-8 encoded. If the content is UTF-16 encoded, you need yet another hack. :-( As far

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: Ok, two questions for the example above: - how do you maintain backward compatibility? - how do you load a plain old ansi file? If the file is UTF-8 or ANSI, the above should work. UTF-8 was designed to be backward

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Florian Klaempfl
Graeme Geldenhuys schrieb: On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: Ok, two questions for the example above: - how do you maintain backward compatibility? - how do you load a plain old ansi file? If the file is UTF-8 or ANSI, the above should work. UTF-8

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys: On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: Ok, two questions for the example above: - how do you maintain backward compatibility? - how do you load a plain old ansi file? If the file is UTF-8 or ANSI, the

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 11:28 AM, Daniël Mantione [EMAIL PROTECTED] wrote: These instructions are highly unproductive. Work on being able to compile the RTL in either ansi/unicode depending on the platform has started. Full Unicode support is for FPC 2.4. Well, that's the first I heard of it.

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Aleksa Todorovic
On Thu, Nov 20, 2008 at 10:06, Graeme Geldenhuys [EMAIL PROTECTED] wrote: Unfortunately that doesn't work if the file contains unicode content, so the following hack is required which is quite nasty: ls := TStringList.Create; ls.LoadFromFile('someunicodefile.txt'); for i := 0 to

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
FPC supports Unicode, in 2.3.x is the UnicodeString type available being a ref. counted utf-16 string on all platforms. Is same used by TStringList ? I don't think so, otherwise LoadFromFile should need to be aware of several possible file encodings. And I suppose the utf8-API of the LCL

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys: On Thu, Nov 20, 2008 at 11:37 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: FPC supports Unicode, in 2.3.x is the UnicodeString type available being a ref. counted utf-16 string on all platforms. OK, I'll try to switch fpGUI's TfpgString

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
Full Unicode support is for FPC 2.4. If you need it today, widestrings are your best option. Unfortunately working with WideString in Lazarus is close to impossible as the LCL API is done with UTF8String and there is no correct automatic conversion between UTF8String and WideString, as the

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 12:07 PM, Michael Schnell [EMAIL PROTECTED] wrote: Russian locale requires a 1 byte char. Hmmm. We did lots of non-Unicode Delphi programs with a Russian ANSI variant. Well, I have a Russian user of fpGUI. He noted quite a few issues with FPC's locale variables and

[fpc-devel] Unicode and Lazarus

2008-11-20 Thread Felipe Monteiro de Carvalho
I started a separate thread for this lazarus part of the unicode talk. On Thu, Nov 20, 2008 at 7:37 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: And that's why I urge all core FPC developers to try and finalize a Unicode design. Otherwise you leave it up to developers to keep adding

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
If you want to help, we need to implement the Delphi 2009 encoding aware string type, both runtime support as well as the compiler support. A previous discussion showed that this also breaks a lot of old code and is not really nice. So a better concept seems to have a dedicated type for

[fpc-devel] Re: Unicode and Lazarus

2008-11-20 Thread Felipe Monteiro de Carvalho
if a real utf8string would be a solution for Lazarus (I am not saying it is, but it could be), we need to have a directive to change the default string into utf8string. To avoid a huge amount of code to need to be suddenly changed. Then only ansistring needs to be changed. -- Felipe Monteiro de

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Michael Schnell: If you want to help, we need to implement the Delphi 2009 encoding aware string type, both runtime support as well as the compiler support. A previous discussion showed that this also breaks a lot of old code and is not really nice. As I

Re: [fpc-devel] Unicode and Lazarus

2008-11-20 Thread Michael Schnell
Maybe a real UTF8String? Does this mean teach the compiler tell the type UTF8String from the type ANSIString and do the appropriate conversion automatically (and do the assignment of constants appropriately) ? I suppose this in fact would solve a lot problems for Lazarus. If on top of

Re: [fpc-devel] Unicode and Lazarus

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Bernd Mueller: Felipe Monteiro de Carvalho wrote: I would like to hear of others actually have a better proposal for Lazarus. sorry, I have no idea since I am doing primarily embedded stuff. Speed and backward compatibility are the most important factors to

Re: [fpc-devel] Unicode and Lazarus

2008-11-20 Thread Michael Schnell
But it seems, that not everybody is happy with the current Codegear Unicode solution: https://forums.codegear.com/thread.jspa?threadID=7140tstart=0 This is neither backwards compatible, nor nice, nor fast nor small :( After reading this thread, I am not sure, if Delphi 2009

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Michael Schnell: The file is assumed to be in system encoding (which can be UTF-8). Support for reading of other encodings has not been decided on about yet and is not part of the initial plan. What is system encoding regarding different OS, locale, ... ?

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 12:50 PM, Daniël Mantione [EMAIL PROTECTED] wrote: What is system encoding regarding different OS, locale, ... ? System encoding is the encoding your files are written in when doing a echo Hello file.txt. Good explanation Daniël. :-) I always wonder that same

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
* Copy, Length, Pos etc...? Yup. * What about usage like: SomeString[x] := 'A'; String element based. This also holds for Copy, Length, Pos, etc. I thinks if would be a good idea to provide dedicated functions for the element based (fast) and the character based (old style

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
System encoding is the encoding your files are written in when doing a echo Hello file.txt. nice point :) I Suppose with my German WinXP system encoding is German ANSI Does it hold only for files ? I suppose WinXP provides an OS API with WideStrings (supposedly UCS16). But how do I have

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
Isn't this the same?? I understand that D2009 uses dynamic code information, while my suggestion is based on several different (static) types. I feel that static types are a lot easier to implement and if using them correctly, the user can tune the program to be as fast as possible or as

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
UCS16 UTF16 :) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode and Lazarus

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho: So, what kind of support could be implemented in Free Pascal to improve things for Lazarus and it´s users? Maybe a real UTF8String? There will be a real UTF8string, i.e. ansistring with UTF-8 encoding as part of type information,

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 12:55 PM, Michael Schnell [EMAIL PROTECTED] wrote: * What about usage like: SomeString[x] := 'A'; String element based. This also holds for Copy, Length, Pos, etc. I thinks if would be a good idea to provide dedicated functions for the element based (fast) and the

Re: [fpc-devel] Re: Unicode and Lazarus

2008-11-20 Thread Mattias Gärtner
Zitat von Felipe Monteiro de Carvalho [EMAIL PROTECTED]: if a real utf8string would be a solution for Lazarus (I am not saying it is, but it could be), we need to have a directive to change the default string into utf8string. To avoid a huge amount of code to need to be suddenly changed. Then

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Michael Schnell: Isn't this the same?? I understand that D2009 uses dynamic code information, while my suggestion is based on several different (static) types. As I understand it is static. type cp850string=ansistring(CP_850);

Re: [fpc-devel] Unicode and Lazarus

2008-11-20 Thread Martin Friebe
Daniël Mantione wrote: Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho: So, what kind of support could be implemented in Free Pascal to improve things for Lazarus and it´s users? Maybe a real UTF8String? There will be a real UTF8string, i.e. ansistring with UTF-8 encoding as part of

Re: [fpc-devel] Unicode and Lazarus

2008-11-20 Thread Daniël Mantione
Op Thu, 20 Nov 2008, schreef Martin Friebe: Daniël Mantione wrote: Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho: So, what kind of support could be implemented in Free Pascal to improve things for Lazarus and it´s users? Maybe a real UTF8String? There will be a real UTF8string,

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread peter green
For best backward compatibility, I would say Copy, Length, Pos etc should work by character based by default. The thing is we can't reasonablly provide functions based on what a user would see as a character because doing so would require huge lookup tables (one user visible character != one

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
For best backward compatibility, I would say Copy, Length, Pos etc should work by character based by default. Agreed. Then introduce more optimised versions like ElementCopy, ElementLength, etc... Old programs will work out of the box, but might experience a minor speed penalty, until the

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
The thing is we can't reasonablly provide functions based on what a user would see as a character because doing so would require huge lookup tables (one user visible character != one code point) so the best we can do is code point based which isn't really much better for most tasks than code

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
type cp850string=ansistring(CP_850); utf8string=ansistring(CP_UTF8); Why not use the current locale for this ? Would that be just ANSIString ? a:=b; {Compiler knows conversion to perform at compile time. I suppose the conversion function is provided with the locale and this it

Re: [fpc-devel] Re: Unicode and Lazarus

2008-11-20 Thread Michael Schnell
Compiler support for a unicode string is not enough for the LCL. As long as base classes like TStrings uses ansistrings, the LCL must use a string type, that does no conversion. Of course you are right that the RTL needs to be made up accordingly. Maybe TStrings and friends are needed in

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread listmember
Ok, two questions for the example above: - how do you maintain backward compatibility? - how do you load a plain old ansi file? You could alter the LoadFromFile(), LoadFromStream(), SaveToFile(), SaveToStrwam() routines like below: procedure TStringList.LoadFromFile(AFileName: TFilename;

Re: [fpc-devel] Re: Unicode and Lazarus

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 1:50 PM, Michael Schnell [EMAIL PROTECTED] wrote: Compiler support for a unicode string is not enough for the LCL. As long as base classes like TStrings uses ansistrings, the LCL must use a string type, that does no conversion. Of course you are right that the RTL

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 1:22 PM, peter green [EMAIL PROTECTED] wrote: The thing is we can't reasonablly provide functions based on what a user would see as a character because doing so would require huge lookup tables (one user visible character != one code point) so the best we can do is code

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Jonas Maebe
On 20 Nov 2008, at 13:13, Graeme Geldenhuys wrote: I think basing those functions on code points should suffice. I also think as soon as strings are assigned or loaded from file, they should be normalized. So two code points like the A and Umlaut code points would become one. How would one

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Mattias Gärtner
Zitat von Graeme Geldenhuys [EMAIL PROTECTED]: On Thu, Nov 20, 2008 at 1:22 PM, peter green [EMAIL PROTECTED] wrote: The thing is we can't reasonablly provide functions based on what a user would see as a character because doing so would require huge lookup tables (one user visible

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Vincent Snijders
Graeme Geldenhuys schreef: Hello again, We are seeing more and more hacks being applied to projects trying to scramble around the missing FPC feature - no built-in Unicode supporting. A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo. Normally you would write code as

[fpc-devel] UnicodeString and RTL

2008-11-20 Thread Graeme Geldenhuys
Hi, Is there any list of missing features for UnicodeString in the RTL? For example: * I can't seem to find a UnicodeString version of TStrings or TStringList Any more such cases? I would like to create a RTL UnicodeString RoadMap, so the missing parts can be know and implemented.

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Florian Klaempfl
Graeme Geldenhuys schrieb: Hi, Is there any list of missing features for UnicodeString in the RTL? For example: * I can't seem to find a UnicodeString version of TStrings or TStringList Any more such cases? No idea, nobody complainted so far ;) Just create one. I would like

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Aleksa Todorovic
Or... it could be implemented using generics, so one can choose: TStringListUnicodeString TStringListAnsiString TStringListShortString (sorry for C++ish syntax, but I hope you understand) On Thu, Nov 20, 2008 at 15:07, Florian Klaempfl [EMAIL PROTECTED] wrote: Graeme Geldenhuys schrieb: Hi,

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 4:10 PM, Aleksa Todorovic [EMAIL PROTECTED] wrote: Or... it could be implemented using generics, so one can choose: TStringListUnicodeString TStringListAnsiString TStringListShortString (sorry for C++ish syntax, but I hope you understand) I somehow managed to skip

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Florian Klaempfl
Graeme Geldenhuys schrieb: On Thu, Nov 20, 2008 at 4:10 PM, Aleksa Todorovic [EMAIL PROTECTED] wrote: Or... it could be implemented using generics, so one can choose: TStringListUnicodeString TStringListAnsiString TStringListShortString (sorry for C++ish syntax, but I hope you understand)

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Zaher Dirkey
That must name Convert not Hack it is same when you work with Ansi version of Lazarus/Delphi and then try to load from unicode file. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Michael Schnell
As long as the ANSIString and UTF8String and String types are the same to the compiler this questions does not make too much sense. Well those all refer to ANSI string types. What do you mean by this ? These refer to Byte String Types I was referring to WideString and UnicodeString

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Graeme Geldenhuys
On Thu, Nov 20, 2008 at 4:46 PM, Michael Schnell [EMAIL PROTECTED] wrote: UTF8 _is_ a Unicode coding and thus UTF8String _should_be_ a Unicode String type (of course it is not in the current implementation, as the compiler can't tell it from ANSIString, but that is exactly what we are

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Zaher Dirkey
I meant TStringList must not make Converting, convert string must be outside of TStringList (or add special methods to it), and without detecting the encode inside the file when LoadFromFile or Stream, Detecting may use Seek function in the stream, and that break load from tcp/ip connection or

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Michael Schnell
UnicodeString (the type in FPC 2.3.1) is a UTF-16 type, I was not aware that there is a type with this name. Why does it exist ? WideString that is not Unicode does not make much sense. -Michael ___ fpc-devel maillist -

Re: [fpc-devel] Unicode support - for the 20th time... ;-)

2008-11-20 Thread Michael Schnell
Zaher Dirkey wrote: I meant TStringList must not make Converting, If it's known that a file is in some encoding and the instance of TStringList uses another one, I suppose LoadFromFile needs to do the re-encoding appropriately. -Michael ___

Re: [fpc-devel] UnicodeChar and Locale variables don't work

2008-11-20 Thread Florian Klaempfl
Graeme Geldenhuys schrieb: Hi How am I supposed to handle unicode characters for locale variables? All locale variables like ThousandSeparator is type Char and there is no overloaded UnicodeChar versions. This causes problems in Russian locales as the example below shows. c :=

Re: [fpc-devel] UnicodeChar and Locale variables don't work

2008-11-20 Thread Graeme Geldenhuys
On 11/20/08, Florian Klaempfl [EMAIL PROTECTED] wrote: Well, this is one of the thousands of little problems which need to be solved ... OK, I'll add this to the RoadMap wiki page as well... Regards, - Graeme - ___ fpGUI - a cross-platform

Re: [fpc-devel] UnicodeString and RTL

2008-11-20 Thread Graeme Geldenhuys
On 11/20/08, Michael Schnell [EMAIL PROTECTED] wrote: UnicodeString (the type in FPC 2.3.1) is a UTF-16 type, I was not aware that there is a type with this name. Why does it exist ? WideString that is not Unicode does not make much sense. I'm new to this, but as far as I understand,

[fpc-devel] Unicode support in RTL - Roadmap

2008-11-20 Thread Graeme Geldenhuys
Hi, I have added a Roadmap section in the following wiki page. If you find anything missing or not 100% implemented, please add it to the wiki page. http://wiki.freepascal.org/FPC_Unicode_support#Roadmap_of_RTL_Unicode_support This applies to FPC 2.3.x Regards, - Graeme -