Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 28, 2013 at 10:39:30AM -0200, Marcos Douglas wrote: I understand. But if the major companies prefer to use C# or Java instead Delphi well, they not care about Delphi compatibilities. If they care, why they would be leaving Delphi? If they leave Delphi compatibility, they normally don't go for a marginal oss compiler. So you're saying that FPC cannot survive without Delphi? Survive is a big word and very black and white. Not sustainable on the current level with a serious hope of continued growth? Probably. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 28, 2013 at 02:09:58PM +0100, Florian Kl??mpfl wrote: marginal oss compiler. So you're saying that FPC cannot survive without Delphi? Define survive. But I'am saying indeed that FPC's usage would drop significantly if Delphi wouldn't be around anymore. A few years it might increase because people would use FPC to rescue old sources but after IMHO that effect has always been exaggerated. If it already happens, it doesn't really lead to growth in FPC/Lazarus, only people using the finished product in silence. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 27, 2013 at 9:07 PM, Graeme Geldenhuys gra...@geldenhuys.co.uk wrote: On 2013-12-27 22:49, Marco van de Voort wrote: Just look on e.g. the forums. All people are asking about Delphi packages. And once those Delphi packages are ported to Free Pascal, nobody needs Delphi any more. ;-) Touché. Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY! Converting to Free Pascal and Lazarus will *always* be easier that rewriting everything in C# or Java - no matter how many I totally agree. incompatibilities Free Pascal might have with Delphi. The language still stays a lot more similar than the alternative. Yet, looking at the current employment market, it seems most companies opted to rewrite there Delphi projects in C# and Java - so they took the even harder route! Why? Probably due to more innovation happing in those other languages. ...most companies opted to rewrite there Delphi projects in C# and Java... I agree. I see this here in Brazil too. So, if the companies prefer to rewrite everything to another language, this is another prove that people do not want compatibility with Delphi (so much). Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-27 23:55, Marco van de Voort wrote: Somehow there is still more Delphi use than Lazarus, so I'll bounce back the statistics request. Umm, I never quoted any stats. Free Pascal and Lazarus projects are like Linux - there is NO reliable way of tracking usage. So at best, any usage claims or stats are just a best guess, nothing more. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 28.12.2013 11:01, schrieb Marcos Douglas: incompatibilities Free Pascal might have with Delphi. The language still stays a lot more similar than the alternative. Yet, looking at the current employment market, it seems most companies opted to rewrite there Delphi projects in C# and Java - so they took the even harder route! Why? Probably due to more innovation happing in those other languages. I'am pretty sure this is not the case. It is a story of “No one ever got fired for buying IBM.” ...most companies opted to rewrite there Delphi projects in C# and Java... I agree. I see this here in Brazil too. So, if the companies prefer to rewrite everything to another language, this is another prove that people do not want compatibility with Delphi (so much). And you think they would switch instead to some marginal OSS language which is compatible to nothing and nobody knows? C# and Java are used because they provide a huge user base (user in the sense of programmers knowing it) and being developed by huge companies so people expect their code base has a future. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 28.12.2013 13:02, schrieb Marcos Douglas: On Sat, Dec 28, 2013 at 9:41 AM, Florian Klämpfl flor...@freepascal.org wrote: Am 28.12.2013 11:01, schrieb Marcos Douglas: [...] So, if the companies prefer to rewrite everything to another language, this is another prove that people do not want compatibility with Delphi (so much). And you think they would switch instead to some marginal OSS language which is compatible to nothing and nobody knows? C# and Java are used because they provide a huge user base (user in the sense of programmers knowing it) and being developed by huge companies so people expect their code base has a future. I understand. But if the major companies prefer to use C# or Java instead Delphi well, they not care about Delphi compatibilities. If they care, why they would be leaving Delphi? If they leave Delphi compatibility, they normally don't go for a marginal oss compiler. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 28, 2013 at 10:19 AM, Florian Klämpfl flor...@freepascal.org wrote: Am 28.12.2013 13:02, schrieb Marcos Douglas: On Sat, Dec 28, 2013 at 9:41 AM, Florian Klämpfl flor...@freepascal.org wrote: Am 28.12.2013 11:01, schrieb Marcos Douglas: [...] So, if the companies prefer to rewrite everything to another language, this is another prove that people do not want compatibility with Delphi (so much). And you think they would switch instead to some marginal OSS language which is compatible to nothing and nobody knows? C# and Java are used because they provide a huge user base (user in the sense of programmers knowing it) and being developed by huge companies so people expect their code base has a future. I understand. But if the major companies prefer to use C# or Java instead Delphi well, they not care about Delphi compatibilities. If they care, why they would be leaving Delphi? If they leave Delphi compatibility, they normally don't go for a marginal oss compiler. So you're saying that FPC cannot survive without Delphi? Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 28, 2013 at 10:37 AM, Jürgen Hestermann juergen.hesterm...@gmx.de wrote: Am 2013-12-28 13:19, schrieb Florian Klämpfl: I understand. But if the major companies prefer to use C# or Java instead Delphi well, they not care about Delphi compatibilities. If they care, why they would be leaving Delphi? If they leave Delphi compatibility, they normally don't go for a marginal oss compiler. The question is: Why did they use Delphi before at all? If the reason was that Delphi was a very common and widespread programming environment then it is a understandable behaviour to move to the next main stream environment as soon as budget and time allows. Such people would never care about FPC/Lazarus (even when it was fully Delphi compatible). They would never think about using it. So making FPC/Lazarus compatible would not hold any user of this group. If the reason was that they like Pascal as an easy to learn and mantain language then they will invest into migration even if not all parts are the identical to Delphi. Just the opposite: They may like that not all misconcepts are repeated in FPC/Lazarus and they may like that it is open source. +1 Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 28.12.2013 13:39, schrieb Marcos Douglas: If they leave Delphi compatibility, they normally don't go for a marginal oss compiler. So you're saying that FPC cannot survive without Delphi? Define survive. But I'am saying indeed that FPC's usage would drop significantly if Delphi wouldn't be around anymore. A few years it might increase because people would use FPC to rescue old sources but after that FPC's usage would probably decay significantly. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 28, 2013 at 11:09 AM, Florian Klämpfl flor...@freepascal.org wrote: Am 28.12.2013 13:39, schrieb Marcos Douglas: If they leave Delphi compatibility, they normally don't go for a marginal oss compiler. So you're saying that FPC cannot survive without Delphi? Define survive. To remain alive or in existence. But I'am saying indeed that FPC's usage would drop significantly if Delphi wouldn't be around anymore. A few years it might increase because people would use FPC to rescue old sources but after that FPC's usage would probably decay significantly. Well, this is very frustrating... and even more because you, the FPC main developer, wrote. :-( Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Flávio Etrusco wrote: Yes, but we all know you are a special case :-) Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY! I tend to agree with Graeme on this one. -Flávio My opinion as well, and its already hard at this moment as generics and string/utf8 are the first problems people will encounter while moving XE* code to FPC. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 28, 2013 at 2:35 PM, Marius fpclaza...@home.nl wrote: Flávio Etrusco wrote: Yes, but we all know you are a special case :-) Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY! I tend to agree with Graeme on this one. -Flávio My opinion as well, and its already hard at this moment as generics and string/utf8 are the first problems people will encounter while moving XE* code to FPC. Oops, I misquoted. Graeme actually replied to that opinion with this: Converting to Free Pascal and Lazarus will *always* be easier that rewriting everything in C# or Java - no matter how many incompatibilities Free Pascal might have with Delphi. The language still stays a lot more similar than the alternative. Yet, looking at the current employment market, it seems most companies opted to rewrite there Delphi projects in C# and Java - so they took the even harder route! Why? Probably due to more innovation happing in those other languages. IOW I think that for some years already, innovation would be a much better selling point for Free Pascal rather than Delphi compatibility. -Flávio -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Sven Barth schrieb: On 26.12.2013 17:02, Sven Barth wrote: Am 26.12.2013 12:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: Sven Barth schrieb: Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com mailto:drdiettri...@aol.com mailto:drdiettri...@aol.com: Please specify AnsiString, of which encoding? When I concat an AnsiString and an UTF8String and assign it to an OEMString o := a + u; then I get these warnings in XE: [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'AnsiString' to 'string' [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'UTF8String' to 'string' [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with potential data loss from 'string' to 'OEMString' I cannot see the system codepage used here. Try to make o of type RawByteString. And maybe also use more than two strings. As I already statet: RawByteString is not for application use! Ok, I didn't remember the situation correctly. When searching for Jonas' mail I mentioned below I also found this which I was referring to: === quote of Jonas begin === var mypath: utf8string; sr: tsearchrec; begin { assign some utf-8 string to mypath } if findfirst(mypath+allfilesmask,faAnyFile,sr)=0 then begin ... end; end Delphi has no problem with this code, because all strings are upgraded to UnicodeString. If the DefaultSystemCodePage is something different from UTF-8, the result of mypath+allfilesmask will be downgraded to DefaultSystemCodePage because the string constant allfilesmask is encoded using that code page. Delphi has no rule of downgrading. When mypath+allfilesmask is assigned to a variable, the result has the correct encoding, not necessarily CP_ACP. This is due to rule that concatenating ansistrings with different encodings results in an ansistring with the encoding of the destination ansistring is followed, and the destination ansistring is a rawbytestring here (the first argument of findfirst), in which case the ansi encoding is used. Again: RawByteString is a mess, should be used with care. The first argument of FindFirst (file mask) certainly *can not* be a RawByteString. === quote of Jonas end === What I want to point out are the string function overloads, where Delphi supplies only string (UTF-16) and RawByteString arguments, and AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String overloads and use these when dealing with AnsiStrings of an encoding different from CP_ACP. That was already discussed some time ago between devs and was deemed not useable by Jonas. I'll try to find his mail with his explanation. === quote of Jonas begin === Adding explicitly named UTF-8 versions of routines with constant or value rawbytestring arguments (FindFirstUTF8 etc) with UTF8String arguments and that internally simply call through to the rawbytestring versions could perhaps be useful. Interestingly, Lazarus users probably won't suffer from this particular problem as they already use such routines from the LCL, and those routines can simply be adapted by simply removing all the UTF8ToSys calls (they will keep working in their current state though, they simply keep suffering from the same data loss issues they had before). === quote of Jonas end === I see no argument for or against UTF-8 overloads here. Please note that Jonas states that different named overloads would be needed. Equally named UTF8String overloads won't necessarily work correctly. You see the need for making RawByteString a compiler magic? :-] It should be used only as the last resort, when no other string type matches a given string encoding. As for FindFirst, a choice of the mask string exists only on Windows, depending on the use of the A or W API. Other targets have an dedicated encoding for filenames, that should be used in all file and directory functions. Even on Windows only the W API should be used nowadays; the A API (as used in older Delphi versions) was only for support of legacy Win9x systems, where not all W subroutine versions were available. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Juha Manninen schrieb: It happened again. The word Unicode was mentioned and the result is an endless debate of how it should be done. Now 100 messages and counting ... Now that we are in pre-release of strings with Encoding, the debate enters a very new round. I personally don't care much what the default encoding will be, but I wonder how easy it will be to use UTF-8 for my employer's code. The situation with FPC will be better than with Delphi because FPC does not convert automatically to default encoding ALWAYS. It only converts when the conversion is needed. For example TStringList can be used for UTF8Strings and it does not trigger automatic conversion. Isn't it so? Please correct me if I still got it wrong. That's the old state, where strings have no stored Encoding. As soon as AnsiStrings have an encoding, the default encoding becomes important for the reduction of automatic conversions. When the RTL is converted to UTF-16, you'll have to accept either this new default encoding, or any number of automatic conversions between Ansi and UnicodeStrings. It means UTF-8 with FPC will be easier than UTF-8 with Delphi, even if UTF-16 was the default. Delphi suffers from the use of CP_ACP, which was the only supported encoding before, and still is the only explicitly supported encoding when the AnsiString unit is used. In Lazarus we had the same only one encoding philosophy, except that here the default string type is UTF-8. With the encoded AnsiStrings the problem of other encodings and automatic conversion arises. Delphi solved most problems by changing string to UTF-16, so that only the forced used of AnsiString will ever result in automatic conversions due to different string encodings. In FPC/Lazarus the situation is somewhat different, because now the default string type could be UTF-8, UTF-16 or even CP_ACP, with a number of users voting for each of them. Technically the simplest solution would be to keep the de-facto standard UTF-8, as assumed by Lazarus. But when string becomes UTF-16, as in recent Delphi versions, Lazarus and the LCL deserves heavy refactoring. That's the top discussion topic right now. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Marco van de Voort schrieb: On Thu, Dec 26, 2013 at 12:28:54AM +0100, Hans-Peter Diettrich wrote: is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. Then the implementation is wrong. Wrong according to you. Wrong, or better *broken*, with regards to expected results. Not wrong according to defined Microsoft applications. Where do you see Microsoft applications using Ansi strings, nowadays? This way of top-down thinking will turn FPC into a Java, where you are lugging along an own platform-within-an-platform everywhere. That's what FPC and Lazarus do already: they assume an UTF-8 environment, till now. That's okay for all targets execpt Windows, where a UTF-8/16 conversion is required on the app-WinAPI boundary. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Thu, Dec 26, 2013 at 04:48:34PM +0200, Juha Manninen wrote: It happened again. The word Unicode was mentioned and the result is an endless debate of how it should be done. Now 100 messages and counting ... This is because still nothing definitive is chosen, after 4+ years of discussion (this started in April 2009, when the first D2009 details were leaked) It was a situation I hoped to avoid by going for two encodings per target directly. Or at least try. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Thu, Dec 26, 2013 at 11:53:38PM -0200, Marcos Douglas wrote: If you totally drop Delphi compatibility you can do whatever you want. But IMHO that is more something for the Graeme's and Martin (MSEGUI's) of the world, not Lazarus. Ok... but if FPC, on Windows, will be UTF-16 and Lazarus continues using UTF-8 what is the difference? Well, currently, Lazarus has no other choice, since the unicode FPC is not ready. (only up to classes level). This alone means that the current uncertainty will persist at least several years. This approach is not like Delphi. It has the RTL and VCL using the same encode... FPC RTL and LCL will continue fighting! :( I always considered the UTF8 choice of Lazarus a temporary solution till FPC caught up with Delphi. The current situation really worries me, since at work I invested in FPC/Lazarus in the assumption that compatibility would increase, not decrease. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-27 09:16, Hans-Peter Diettrich wrote: But when string becomes UTF-16, as in recent Delphi versions, Lazarus and the LCL deserves heavy refactoring. That's the top discussion topic right now. Personally I think FPC and Lazarus should get rid of string altogether! It should be a user definable type that can be defined per project. eg: Projects could do the following type {$IFDEF WINDOWS} UnicodeString = UTF16String {$ENDIF} {$IFDEF UNIX} UnicodeString = UTF8String {$ENDIF} String = UnicodeString // or for backwards compatibility with old projects: // String = AnsiString or they could simply say they prefer to work with a specific encoding, so use UTF16String or UTF8String directly. Thus no alias type needed. Also the very broken logic of UnicodeString = UTF16String should disappear. Unicode Standard UTF-16!!! That is just some sh*t Microsoft came up with and Delphi followed suite! It's just WRONG. The Unicode Standard exists of multiple encodings, not just one. But that's just my 2c worth - and arguments like these have been raised before. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 2013-12-27 12:21, schrieb Marco van de Voort: The current situation really worries me, since at work I invested in FPC/Lazarus in the assumption that compatibility would increase, not decrease. I think that's the root cause of the discussion: Some want to make FPC/Lazarus into a (possibly exact) clone of Delphi (which means to follow every sh*t that is and will be put into this product) and others (like me) hope for a more Pascal like programming environment which at least avoids future (maybe even removes existing) obscurity crept into Pascal with Borland/Embarcadero. The ease of use was the reason for success of Turbo-Pascal but meanwhile this goal has been put aside and it becomes a more C-like environment (with lots of ugly hacks..). On the other hand I understand the demand for compatibility if there is a lot of (Delphi) code that needs to be reused and cannot be changed easily. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-27 17:11, Jürgen Hestermann wrote: Some want to make FPC/Lazarus into a (possibly exact) clone of Delphi (which means to follow every sh*t that is and will be put into this product) and others (like me) hope for a more Pascal like programming environment which at least avoids future (maybe even removes existing) obscurity crept into Pascal with Borland/Embarcadero. Yeah, and I wonder what is the plans for the Free Pascal project, now that Embarcadero is changing the compiler and language - specifically for mobile platforms. EMBT are introducing even more sh*t - as you put it. ;-) Enhancements for Desktop or Web development are getting very little attention these days by EMBT. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-27 13:42, Marco van de Voort wrote: Please read 4 years of discussion backlog. It is not all language, there is something called libraries, and they are generally installed precompiled. I don't know of a single ISV that ships precompiled *.ppu files for FPC or Lazarus. They all include source code (yes even most Delphi ISV's do this now) that can be compiled with a project. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 27/12/2013 18:34, Graeme Geldenhuys wrote: On 2013-12-27 13:42, Marco van de Voort wrote: Please read 4 years of discussion backlog. It is not all language, there is something called libraries, and they are generally installed precompiled. I don't know of a single ISV that ships precompiled *.ppu files for FPC or Lazarus. They all include source code (yes even most Delphi ISV's do this now) that can be compiled with a project. raudus, see e.g. http://forum.lazarus.freepascal.org/index.php/topic,22059.0.html -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 27, 2013 at 05:34:33PM +, Graeme Geldenhuys wrote: Please read 4 years of discussion backlog. It is not all language, there is something called libraries, and they are generally installed precompiled. I don't know of a single ISV that ships precompiled *.ppu files for FPC or Lazarus. What about fpc ? Binary downloads top source downloads 20-30 to 1. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 27, 2013 at 06:11:31PM +0100, J?rgen Hestermann wrote: I think that's the root cause of the discussion: Some want to make FPC/Lazarus into a (possibly exact) clone of Delphi (which means to follow every sh*t that is and will be put into this product) I always think it is amusing when you talk about compatability, proponents always are described as brainless drones that can't think for them selves. Please come back after you fixed 700 bugs. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 27, 2013 at 3:38 PM, Graeme Geldenhuys gra...@geldenhuys.co.uk wrote: On 2013-12-27 17:11, Jürgen Hestermann wrote: Some want to make FPC/Lazarus into a (possibly exact) clone of Delphi (which means to follow every sh*t that is and will be put into this product) and others (like me) hope for a more Pascal like programming environment which at least avoids future (maybe even removes existing) obscurity crept into Pascal with Borland/Embarcadero. Yeah, and I wonder what is the plans for the Free Pascal project, now that Embarcadero is changing the compiler and language - specifically for mobile platforms. EMBT are introducing even more sh*t - as you put it. ;-) Enhancements for Desktop or Web development are getting very little attention these days by EMBT. Hmm... Maybe Martin Schreiber saw a dark future and has already taken the initiative to keep the true living legacy. :-) Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 27, 2013 at 06:39:38PM -0200, Marcos Douglas wrote: If we continue to follow Delphi, means that we are always one step behind. If we stop following delphi, we are multiple steps behind. Most of the antis only agree on being anti, they only have simplistic topdown proposals and talk about an elusive own way. Fact is that the extensions of FPC are used much less than the Delphi compatibility aspect. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-27 21:12, Marco van de Voort wrote: Fact is that the extensions of FPC are used much less than the Delphi compatibility aspect. And would you mind sharing how you came to that conclusion? Can you share the data on that research? From my personal experience of using FPC since 2004-2005, there is NO need to have support for two compilers (Delphi FPC) in a single project. Free Pascal is more than capable enough to stand on its own feet. Speaking as someone that has personally ported large Delphi and Kylix project not only to Free Pascal and Lazarus's LCL, but also to fpGUI - a completely VCL incompatible UI toolkit. I have done this for multiple projects, frameworks and GUI widgets. My conclusion after all this... A conversion is NOT THAT HARD, and it's a great time to review old code too. So it has a triple positive. Moving to a real cross-platform compiler, a real cross-platform toolkit (be that LCL or fpGUI) and being able to review and improve old code and designs (your second attempt and software is ALWAYS better that your first). Many projects have moved away from Delphi in the last few years - mostly to other languages like C# or Java. That requires a total rewrite - which is infinitely more work than moving to Free Pascal. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 27, 2013 at 10:43:16PM +, Graeme Geldenhuys wrote: Fact is that the extensions of FPC are used much less than the Delphi compatibility aspect. And would you mind sharing how you came to that conclusion? Can you share the data on that research? Just look on e.g. the forums. All people are asking about Delphi packages. From my personal experience of using FPC since 2004-2005, there is NO need to have support for two compilers (Delphi FPC) in a single project Yes, but we all know you are a special case :-) Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY! -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
perjantai 27. joulukuuta 2013 Marcos Douglas kirjoitti: I think Lazarus team did not think the same. Lazarus team has not thought it much. The question is not acute yet and we already have a working Unicode system in LCL. Thinking of it now when -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-27 22:49, Marco van de Voort wrote: Just look on e.g. the forums. All people are asking about Delphi packages. And once those Delphi packages are ported to Free Pascal, nobody needs Delphi any more. ;-) Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY! Converting to Free Pascal and Lazarus will *always* be easier that rewriting everything in C# or Java - no matter how many incompatibilities Free Pascal might have with Delphi. The language still stays a lot more similar than the alternative. Yet, looking at the current employment market, it seems most companies opted to rewrite there Delphi projects in C# and Java - so they took the even harder route! Why? Probably due to more innovation happing in those other languages. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Lazarus team has not thought it much. The question is not acute yet and we already have a working Unicode system in LCL. Thinking of it now when FPC behavior is not decided would be waste of time. My comment about UTF-8 was based on poor knowledge and maybe wishfull thinking. Some other Lazarus developers have more knowledge of the issue but they are wise enough not to enter this discussion. Let's wait till FPC developers get their work done and then let's discuss about LCL. If you need UTF-8 solution right now, it is possible with LCL. It is also good to read the old mail threads because the same things get repeated again and again. Juha P.S. Learning to type with iPad. Sent an unfinished text. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Marco van de Voort schrieb: On Fri, Dec 27, 2013 at 06:39:38PM -0200, Marcos Douglas wrote: If we continue to follow Delphi, means that we are always one step behind. If we stop following delphi, we are multiple steps behind. FPC/Lazarus always was in front of Delphi. It had Unicode support (UTF-8) since long, supports multiple widgetsets, platforms and machines. Even the help system and is better and more user friendly, as well as are the editing helpers. The compatibility problems are selfmade, IMO. Compatibility with all versions of a continuously moving target is near impossible, at least not feasable with the available manpower. Now that Delphi introduced something really useful (encoded strings and automatic conversion), the new Unicode support should be integrated into FPC and Lazarus. When this works for the AnsiString version (UTF-8), somewhat compatible with D7, a UnicodeString (UTF-16) version can be considered, compatible with some newer Delphi version. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 27, 2013 at 11:07:02PM +, Graeme Geldenhuys wrote: Just look on e.g. the forums. All people are asking about Delphi packages. And once those Delphi packages are ported to Free Pascal, nobody needs Delphi any more. ;-) Somehow there is still more Delphi use than Lazarus, so I'll bounce back the statistics request. Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY! Converting to Free Pascal and Lazarus will *always* be easier that rewriting everything in C# or Java - no matter how many incompatibilities Free Pascal might have with Delphi. The language still stays a lot more similar than the alternative. Yet, looking at the current employment market, it seems most companies opted to rewrite there Delphi projects in C# and Java - so they took the even harder route! Why? Probably due to more innovation happing in those other languages. Not really, it is simply vendors pushing it, and bundling it with their products, giving the SDKs preferential treatment. Therefore I don't really think it is sane to compare FPC (or even Delphi) to C# and Java. Worse, one of the motivators, webframeworks often need support serverside, and getting into the ISP's portfolios is hard, specially as native language. But more importantly, however which way you turn it, there are still way more new users coming from Delphi than from other sources (and then I'm already generous, since those other sources also include other pascals). And I see the numbers of _knowledgable_ users from old Delphi decreasing, and the more able people also working with /new/ Delphi (and e.g. testing Lazarus to see if they can get a subset running on some other target). This trend will only increase, and follows the same pattern as the TP to Delphi migration years ago. Still a definitive switch is still some time off, since most OSS projects and vendors still support D7. (support before that is getting scarce). But that is now. Decisions for FPC/Lazarus NG will only come to fruition in 2-3 years. I think the D7 installed base will erode, but only very slowly. But the /activities/ of that installed base, and their investments in new (D7 level) code will erode quicker. Again this prediction is based on the same pattern as with TP. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 28, 2013 at 12:42:22AM +0100, Hans-Peter Diettrich wrote: If we continue to follow Delphi, means that we are always one step behind. If we stop following delphi, we are multiple steps behind. FPC/Lazarus always was in front of Delphi. It had Unicode support (UTF-8) since long, supports multiple widgetsets, platforms and machines. It did not. It half a solution, like the wellknown TNT components. Even the help system and is better and more user friendly, as well as are the editing helpers. There is some potential there, but the content is still subpar to Delphi, specially the Lazarus/LCL part. Way subpar. The compatibility problems are selfmade, IMO. Compatibility with all versions of a continuously moving target is near impossible, at least not feasable with the available manpower. I don't see this at all. Yes, it is hard. Yes, it will be at a distance. But I don't see impossible. Also major change is fairly rare. But the unicode change has been a done deal for 6 versions. This is is not about the bleeding edge, this is about planning steps that Embarcadero brought to production nearly 5 years ago, and which affect many levels of the code (more so than later additions, with the dotted change being debatable) Now that Delphi introduced something really useful (encoded strings and automatic conversion), the new Unicode support should be integrated into FPC and Lazarus. When this works for the AnsiString version (UTF-8), somewhat compatible with D7, a UnicodeString (UTF-16) version can be considered, compatible with some newer Delphi version. A utf8 ansistring version will be per definition not compatible. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich drdiettri...@aol.com: Sven Barth schrieb: If in 2.6.2 your three strings contain text of different encodings then the resulting string might be garbage from the user's POV. In trunk the encoding is part of each string and if they differ then each strings will be converted to the default string encoding (defined by a global variable inside unit System) and thus the string might still be valid. If so, this flaw should be fixed immediately. Delphi uses lossless conversions, i.e. an up-cast to Unicode. No it does not. If the variables you concatenate are AnsiString and the variable or parameter you pass them to is AnsiString as well (AFAIK it even needs to be RawByteString) then the strings are converted to the system encoding before they are concatenated and passed. This is implemented Delphi compatible in FPC. Such problems can be avoided by making RawByteString a compiler magic, that enforces a Unciode conversion whenever AnsiStrings of a different dynamic encoding have to be combined. RawByteString is already as magical as it gets and exactly is what's on the tin: a raw byte string. No automatic conversions ever. This is a type that is needed for implementing String handling in RTL so overloading it with another meaning will only result on problems. If you want UTF-8 encoded strings then use UTF8String. Period. Furthermore the use of UTF-8 will allow for lossless conversions of AnsiStrings of any encoding, with the result still being an AnsiString. Here Delphi has the problem that a RawByteString result type requires a conversion of an intermediate Unicode string (UTF-16) into an AnsiString(CP_ACP), with possible losses. This is not required when FPC treats UTF-8 as a fully supported encoding, in addition to CP_ACP - it also were a strong argument for using UTF-8 for UnicodeString, *instead* of UTF-16. The related functions already exist in the FPC libraries, they only have to take precedence over CP_ACP (if different). Then additional UTF-8/16 conversions are required only on Windows, when calling external (API...) functions which expect/return WideStrings. UnicodeString is *defined* as 2-Byte character reference counted string. There will be no change there. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 2013-12-25 19:50, schrieb Marco van de Voort: In short, I don't think fighting the native encoding of an target is worth the shallow appeal of the one encoding rules all principle. That is mostly pushed by people that don't even use windows, and thus won't feel the pain. This is not true! I am programming for Windows exlusively (currently) and still want UTF8 everywhere. UTF8 is the most useful encoding *and* Lazarus uses it *and* it is used in many other situations. And I don't want to think about encodings all the time. I want a single string type in my programs. Therefore I now need to write my own Windows interface unit because FPC does not provide Unicode file API functions. If the Windows unit of FPC migrates to unicode soon I still cannot use it because it would use the foolish UTF16 string type which I still need to convert to UTF8. What an incredible short-sighted decision. The unique opportunity to establish a single Unicode string type encoding (UTF8) within the whole programming environment Free Pascal/Lazarus has been missed. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Sven Barth schrieb: Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: Sven Barth schrieb: If in 2.6.2 your three strings contain text of different encodings then the resulting string might be garbage from the user's POV. In trunk the encoding is part of each string and if they differ then each strings will be converted to the default string encoding (defined by a global variable inside unit System) and thus the string might still be valid. If so, this flaw should be fixed immediately. Delphi uses lossless conversions, i.e. an up-cast to Unicode. No it does not. If the variables you concatenate are AnsiString and the variable or parameter you pass them to is AnsiString as well (AFAIK it even needs to be RawByteString) then the strings are converted to the system encoding before they are concatenated and passed. This is implemented Delphi compatible in FPC. Please specify AnsiString, of which encoding? When I concat an AnsiString and an UTF8String and assign it to an OEMString o := a + u; then I get these warnings in XE: [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'AnsiString' to 'string' [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'UTF8String' to 'string' [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with potential data loss from 'string' to 'OEMString' I cannot see the system codepage used here. What I want to point out are the string function overloads, where Delphi supplies only string (UTF-16) and RawByteString arguments, and AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String overloads and use these when dealing with AnsiStrings of an encoding different from CP_ACP. Such problems can be avoided by making RawByteString a compiler magic, that enforces a Unciode conversion whenever AnsiStrings of a different dynamic encoding have to be combined. RawByteString is already as magical as it gets and exactly is what's on the tin: a raw byte string. No automatic conversions ever. This is a type that is needed for implementing String handling in RTL so overloading it with another meaning will only result on problems. If you want UTF-8 encoded strings then use UTF8String. Period. Please understand that the use of RawByteString in Delphi can lead to strings with wrong encoding. This type should not be available for declaring variables, only for parameters and function results. This restriction requires compiler magic. UnicodeString is *defined* as 2-Byte character reference counted string. There will be no change there. Sorry, I meant the generic string type. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 25, 2013 at 10:43:24PM +0100, Jy V wrote: Sorry Marco, No problem. On Wed, Dec 25, 2013 at 6:15 PM, Marco van de Voort mar...@stack.nl wrote: There is no utf8 on Windows. One can try to mess with the defaultcodepage, but that will probably only force a different kind of problems. I cannot let you answer alone and make you appear as the only knowledgeable reference for this important subject, it looks like defining default code page 65001 for Windows make it perfect fit to handle UTF-8 http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx As the others already said, that is mostly a target for conversions. (you still need from/to utf8 conversion if you use utf8 documents, thus the conversion routines support utf8). That is something else than Windows APIs (and stuff like MSXML, ADO etc) support utf8. utf8 proponents often then say then just set utf8 as default encoding, but some limited experiments from me created more problems than it solved. IOW it is always suggested as the solution, but few to none people did anything substantial with it. I know the command shell can crash if you chcp 65001 It is mostly a suggestion to end debate and get their way. I'm going to study the links posted in this thread (from the Microsoft guy) to get more info myself too. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
It happened again. The word Unicode was mentioned and the result is an endless debate of how it should be done. Now 100 messages and counting ... I personally don't care much what the default encoding will be, but I wonder how easy it will be to use UTF-8 for my employer's code. The situation with FPC will be better than with Delphi because FPC does not convert automatically to default encoding ALWAYS. It only converts when the conversion is needed. For example TStringList can be used for UTF8Strings and it does not trigger automatic conversion. Isn't it so? Please correct me if I still got it wrong. It means UTF-8 with FPC will be easier than UTF-8 with Delphi, even if UTF-16 was the default. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Thu, Dec 26, 2013 at 12:28:54AM +0100, Hans-Peter Diettrich wrote: is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. Then the implementation is wrong. Wrong according to you. Not wrong according to defined Microsoft applications. This way of top-down thinking will turn FPC into a Java, where you are lugging along an own platform-within-an-platform everywhere. IMHO this is not desirable. There is no utf8 on Windows. Yep, that's why the Unicode (W) API should be used. No problem with UTF-8 strings there :-) If you totally drop Delphi compatibility you can do whatever you want. But IMHO that is more something for the Graeme's and Martin (MSEGUI's) of the world, not Lazarus. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 26.12.2013 12:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com: Sven Barth schrieb: Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich drdiettri...@aol.commailto: drdiettri...@aol.com: Please specify AnsiString, of which encoding? When I concat an AnsiString and an UTF8String and assign it to an OEMString o := a + u; then I get these warnings in XE: [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'AnsiString' to 'string' [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'UTF8String' to 'string' [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with potential data loss from 'string' to 'OEMString' I cannot see the system codepage used here. Try to make o of type RawByteString. And maybe also use more than two strings. What I want to point out are the string function overloads, where Delphi supplies only string (UTF-16) and RawByteString arguments, and AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String overloads and use these when dealing with AnsiStrings of an encoding different from CP_ACP. That was already discussed some time ago between devs and was deemed not useable by Jonas. I'll try to find his mail with his explanation. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 26.12.2013 17:02, Sven Barth wrote: Am 26.12.2013 12:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: Sven Barth schrieb: Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com mailto:drdiettri...@aol.com mailto:drdiettri...@aol.com: Please specify AnsiString, of which encoding? When I concat an AnsiString and an UTF8String and assign it to an OEMString o := a + u; then I get these warnings in XE: [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'AnsiString' to 'string' [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 'UTF8String' to 'string' [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with potential data loss from 'string' to 'OEMString' I cannot see the system codepage used here. Try to make o of type RawByteString. And maybe also use more than two strings. Ok, I didn't remember the situation correctly. When searching for Jonas' mail I mentioned below I also found this which I was referring to: === quote of Jonas begin === var mypath: utf8string; sr: tsearchrec; begin { assign some utf-8 string to mypath } if findfirst(mypath+allfilesmask,faAnyFile,sr)=0 then begin ... end; end If the DefaultSystemCodePage is something different from UTF-8, the result of mypath+allfilesmask will be downgraded to DefaultSystemCodePage because the string constant allfilesmask is encoded using that code page. This is due to rule that concatenating ansistrings with different encodings results in an ansistring with the encoding of the destination ansistring is followed, and the destination ansistring is a rawbytestring here (the first argument of findfirst), in which case the ansi encoding is used. === quote of Jonas end === What I want to point out are the string function overloads, where Delphi supplies only string (UTF-16) and RawByteString arguments, and AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String overloads and use these when dealing with AnsiStrings of an encoding different from CP_ACP. That was already discussed some time ago between devs and was deemed not useable by Jonas. I'll try to find his mail with his explanation. === quote of Jonas begin === Adding explicitly named UTF-8 versions of routines with constant or value rawbytestring arguments (FindFirstUTF8 etc) with UTF8String arguments and that internally simply call through to the rawbytestring versions could perhaps be useful. Interestingly, Lazarus users probably won't suffer from this particular problem as they already use such routines from the LCL, and those routines can simply be adapted by simply removing all the UTF8ToSys calls (they will keep working in their current state though, they simply keep suffering from the same data loss issues they had before). === quote of Jonas end === Please note that Jonas states that different named overloads would be needed. Equally named UTF8String overloads won't necessarily work correctly. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Thu, Dec 26, 2013 at 1:04 PM, Marco van de Voort mar...@stack.nl wrote: On Thu, Dec 26, 2013 at 12:28:54AM +0100, Hans-Peter Diettrich wrote: is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. Then the implementation is wrong. Wrong according to you. Not wrong according to defined Microsoft applications. This way of top-down thinking will turn FPC into a Java, where you are lugging along an own platform-within-an-platform everywhere. IMHO this is not desirable. There is no utf8 on Windows. Yep, that's why the Unicode (W) API should be used. No problem with UTF-8 strings there :-) If you totally drop Delphi compatibility you can do whatever you want. But IMHO that is more something for the Graeme's and Martin (MSEGUI's) of the world, not Lazarus. Ok... but if FPC, on Windows, will be UTF-16 and Lazarus continues using UTF-8 what is the difference? This approach is not like Delphi. It has the RTL and VCL using the same encode... FPC RTL and LCL will continue fighting! :( Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 2013-12-25 01:36, schrieb Hans-Peter Diettrich: Whenever the encoding matters, most users and applications are best off with their regional Ansi encoding - all used characters are single bytes. You forget that using ANSI API functions on Windows not only has the drawback that you cannot access all files (which have unicode characters in them) but also that there is the limit of 255 characters for the path length (while unicode API functions allow up to 32k characters). So you run into problems in 2 cases: 1.) if strings (file names) contain non-ANSI unicode characters 2.) if paths are longer than 255 characters Do you realy advice people nowadays to restrict their programs so far by using ANSI API functions? I wouldn't. I was always wondering why so many programs fail with these 2 limitations on Windows after an alternative has been available for such a long time. Now you want to extent this time by yet another generation of programmers. That's not good. Hopefully not too many programmers follow this road... UTF-16 extends the range of languages whose characters can be assumed to have a fixed size, That's not true. You still you cannot rely on having a number of bytes for characters in UTF16 either. Also, UTF8 would not have any BOM problem while UTF16 and UTF32 have. So UTF16 has all drawbacks of all encodings but no benefit (except that this awfull decision is used by Windows). -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-24 17:13, Jürgen Hestermann wrote: All units used should use the same string encoding IMO. But which? UTF-8 of course! It's the newest Unicode encoding that overcomes all problems found in other encodings. It is also the only Unicode encoding that is backwards compatible with ASCII - hence the W3C and the rest of the Internet etc standardised on it. It is also future proof and can (again) be extended to full (4 byte range) or to using 5 or 6 byte code points [*1]. Performance wise, it is also NOT any slower than any of the other Unicode encodings. Probably the only reason UTF-16 is still being used is because of Windows - which used to use UCS2, and moving to UTF-16 was easier at the time (and I don't think UTF-8 existed at that point). [1] A couple years back they limited the range of UTF-8 so that it stays compatible for now with the limited range of UTF-16. But the UTF-8 encoding can actually go all the way to 6 bytes per code page, which is an absolute massive range. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-25 10:05, Graeme Geldenhuys wrote: But which? UTF-8 of course! It's the newest Unicode encoding that overcomes all problems found in other encodings. This guy explains it very well. https://www.youtube.com/watch?v=MijmeoH9LT4 Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 2013-12-25 10:03, Jürgen Hestermann wrote: So UTF16 has all drawbacks of all encodings but no benefit (except that this awful decision is used by Windows). +1 Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Jürgen Hestermann schrieb: Am 2013-12-25 01:36, schrieb Hans-Peter Diettrich: Whenever the encoding matters, most users and applications are best off with their regional Ansi encoding - all used characters are single bytes. You forget that using ANSI API functions on Windows not only has the drawback that you cannot access all files (which have unicode characters in them) but also that there is the limit of 255 characters for the path length (while unicode API functions allow up to 32k characters). For that purpose (file names) I vote for a dedicated string type, that matches the target platform requirements. Then the user has not to look at filenames on a per-character base. Do you realy advice people nowadays to restrict their programs so far by using ANSI API functions? How many users have to use API functions, which are bound to a single platform? And which of these do not understand how to handle strings of whatever encoding? DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wednesday 25 December 2013 11:03:55 Jürgen Hestermann wrote: So UTF16 has all drawbacks of all encodings but no benefit (except that this awfull decision is used by Windows). This is not true. Everytime someone claims this nonsense I need to comment but I will not argue again. ;-) Martin -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Tue, Dec 24, 2013 at 7:08 PM, Sven Barth pascaldra...@googlemail.com wrote: Am 24.12.2013 15:34 schrieb Marcos Douglas m...@delfire.net: Sorry if I say something crazy, but what do you think to use UTF-16 on {mode delphi} and UTF-8 in {mode fpc}? That is already the case with mode delphiunicode. But the big problem are classes and their inheritance. Take TStringList for example. Let's assume it's declared with String=AnsiString and you override it in a unit with String=UnicodeString then you'll get problems with overloads/overrides, because UnicodeString AnsiString. Hmm, you're right. Understood. The mode concept is all good and well, but here it breaks down... :( So, if the {mode} continue to be a way, I think it should be used on platform level, not per unit level. Even if the programmer can to change this, he will change in all code. Thinking better, this is to be used on compiler level, not source level. Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Tue, Dec 24, 2013 at 12:33:41PM -0200, Marcos Douglas wrote: But the prime point is that IMHO an utf8 Windows is insane, and it should be possible to port modern Delphi VCL apps at least to Windows. Preferably to all. Sorry if I say something crazy, but what do you think to use UTF-16 on {mode delphi} and UTF-8 in {mode fpc}? This is not possible. The precompiled files remain the same. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 25, 2013 at 09:57:05AM -0200, Marcos Douglas wrote: The mode concept is all good and well, but here it breaks down... :( So, if the {mode} continue to be a way, I think it should be used on platform level, not per unit level. That was the original proposal from me. Add encoding to the target. (so i386-linux-utf8) and make a distro per target. Call them appropriately. Encoding wise there are three options: - ansi - utf8 - utf16 but not all options are relevant for all targets. E.g. most *nix are utf8, so it wouldn't make sense to make an ansi port. Windows does not support utf8, so only ansi and utf16 would make sense. Since that means typically two per target, it was suggested to combine this using dotted unit functionality. Keeping it in the same distribution at least gives hope of keeping encoding agnostic units shared, but that required compiler extensions nobody started. (I personally don't see the benefit in this) Even if the programmer can to change this, he will change in all code. Thinking better, this is to be used on compiler level, not source level. Something like that, but not compiler, but unit directory on a per project basis. Or dotted unit prefix in the dotted alternative. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Tue, Dec 24, 2013 at 12:22:41PM -0200, Marcos Douglas wrote: IMO the biggest group are old fashioned Delphi (D7) users, which want their existing Ansi/VCL code base supported *without* complications and incompatibilities introduced by the newer Delphi versions. The subject of this thread clearly indicates that UTF-8 is *not* a solution for this group of users. I started this thread. My problem isn't to use UTF-8 on Windows... my problem is use different encodings on the same code, ie, RTL LCL. Yes. But the selection of UTF8, and the legacy concerns with that are for Lazarus, and lazarus alone. Use functions, always, to convert string between RTL and LCL and vice-versa IHMO is wrong because the final code is confusing. In a huge application you still need to think here is UTF-8 or ANSI/UTF-16? There are many scenarios up in the sky, and nothing is 100% certain, but it would at least be significantly better. It is already significantly better in trunk. The only problem on Windows is that you must only pass a string with a very clear encoding to a RTL function. so assignfile(f,s+s2+s3); is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. It is defined, but somewhat special. That's my conclusion as well. But is that new audience worth to abandon the entire existing Lazarus audience? Of course nobody will abandon the entire existing Lazarus audience. If the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues -- I think -- working using UTF-8. There is no utf8 on Windows. One can try to mess with the defaultcodepage, but that will probably only force a different kind of problems. On Windows there is only ansi or utf16, or keeping it manual. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 25, 2013 at 3:08 PM, Marco van de Voort mar...@stack.nl wrote: On Wed, Dec 25, 2013 at 09:57:05AM -0200, Marcos Douglas wrote: The mode concept is all good and well, but here it breaks down... :( So, if the {mode} continue to be a way, I think it should be used on platform level, not per unit level. That was the original proposal from me. Add encoding to the target. (so i386-linux-utf8) and make a distro per target. Call them appropriately. Encoding wise there are three options: - ansi - utf8 - utf16 but not all options are relevant for all targets. E.g. most *nix are utf8, so it wouldn't make sense to make an ansi port. Windows does not support utf8, so only ansi and utf16 would make sense. Make sense. Since that means typically two per target, it was suggested to combine this using dotted unit functionality. I did not understand this... dotted unit functionality? Keeping it in the same distribution at least gives hope of keeping encoding agnostic units shared, but that required compiler extensions nobody started. (I personally don't see the benefit in this) ... you mean 3 compiled units (ansi, utf8 and utf16) using dotted unit names functionality? Even if the programmer can to change this, he will change in all code. Thinking better, this is to be used on compiler level, not source level. Something like that, but not compiler, but unit directory on a per project basis. Or dotted unit prefix in the dotted alternative. Maybe I did not understand: Using only directories or per project we have the same problem that use {mode} directive, ie, TStringList could be compiled using utf-16 by default, the programmer inherit this class and compile your own directory or project using utf-8... something will break. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 25, 2013 at 3:15 PM, Marco van de Voort mar...@stack.nl wrote: On Tue, Dec 24, 2013 at 12:22:41PM -0200, Marcos Douglas wrote: IMO the biggest group are old fashioned Delphi (D7) users, which want their existing Ansi/VCL code base supported *without* complications and incompatibilities introduced by the newer Delphi versions. The subject of this thread clearly indicates that UTF-8 is *not* a solution for this group of users. I started this thread. My problem isn't to use UTF-8 on Windows... my problem is use different encodings on the same code, ie, RTL LCL. Yes. But the selection of UTF8, and the legacy concerns with that are for Lazarus, and lazarus alone. Use functions, always, to convert string between RTL and LCL and vice-versa IHMO is wrong because the final code is confusing. In a huge application you still need to think here is UTF-8 or ANSI/UTF-16? There are many scenarios up in the sky, and nothing is 100% certain, but it would at least be significantly better. It is already significantly better in trunk. When you say that is better in trunk is only on FPC context or there are improvements for Lazarus users too? The only problem on Windows is that you must only pass a string with a very clear encoding to a RTL function. so assignfile(f,s+s2+s3); is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. Yes but where is the difference between 2.6.2 and trunk, in that case? It is defined, but somewhat special. That's my conclusion as well. But is that new audience worth to abandon the entire existing Lazarus audience? Of course nobody will abandon the entire existing Lazarus audience. If the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues -- I think -- working using UTF-8. There is no utf8 on Windows. One can try to mess with the defaultcodepage, but that will probably only force a different kind of problems. On Windows there is only ansi or utf16, or keeping it manual. You're right. But if we imagine a perfect world that FPC and Lazarus use the same encode -- doesn't matter if is UTF-8 or UTF-16 -- everything would work. Do you agree? So, if the encode chosen was UTF-8 for all, RTL only needs to decode strings -- on Windows -- before to call API functions. The same on Linux (whatever platforms that uses UTF-8) if the encode chosen was UTF-16. My thinking is correct? Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 25, 2013 at 04:34:40PM -0200, Marcos Douglas wrote: There are many scenarios up in the sky, and nothing is 100% certain, but it would at least be significantly better. It is already significantly better in trunk. When you say that is better in trunk is only on FPC context or there are improvements for Lazarus users too? Functions like mkdir/findfirst/assign etc are encoding safe. The only exception is the concatenation problem below. The only problem on Windows is that you must only pass a string with a very clear encoding to a RTL function. so assignfile(f,s+s2+s3); is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. Yes but where is the difference between 2.6.2 and trunk, in that case? 1. UTF16 works fine 2. You can actually pass utf8 to functions, as long as you are careful with concatenations. There is no utf8 on Windows. One can try to mess with the defaultcodepage, but that will probably only force a different kind of problems. On Windows there is only ansi or utf16, or keeping it manual. You're right. But if we imagine a perfect world that FPC and Lazarus use the same encode -- doesn't matter if is UTF-8 or UTF-16 -- everything would work. Do you agree? The point is (as shown by the above problem) is that the choice must align with what the OS offers. Because otherwise you are yet an island again. E.g. the Windows unit (also in trunk) will only work in ansi or utf16. So, if the encode chosen was UTF-8 for all, RTL only needs to decode strings -- on Windows -- before to call API functions. The Windows unit is not wrapped, and the only unicode available on Windows is UTF16. And the windows target converts mixes of 1-byte strings (say ansi+utf8) to the default encoding (ansi). One can attempt to fix that by messing with Windows encoding settings, but the effect of doing that for large applications is unknown. Another possibility is using only own unicode routines (linking in the tables into each binaries). But that again could lead to strange artefacts. The same on Linux (whatever platforms that uses UTF-8) if the encode chosen was UTF-16. Yes. I don't think that is a good default choice either. But at least it has some merits for modern Delphi compat. My thinking is correct? Oversimplified. The RTL will never abstract everything, and there is the issue of the default OS encoding. In short, I don't think fighting the native encoding of an target is worth the shallow appeal of the one encoding rules all principle. That is mostly pushed by people that don't even use windows, and thus won't feel the pain. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 25.12.2013 19:19 schrieb Marcos Douglas m...@delfire.net: Since that means typically two per target, it was suggested to combine this using dotted unit functionality. I did not understand this... dotted unit functionality? Delphi XE 2 with the introduction of FireMonkey switched from normal unit names for the RTL to dotted ones (aka unit namespaces). E.g. SysUtils and Classes became System.SysUtils and System.Classes respectively, the Windows units moved into a Windows namespace (AFAIK) and Forms became VCL.Forms. Now the XE2 IDE and command line compiler also provide the possibility to specify multiple default namespaces (e.g. a VCL application would have System and VCL) to ensure backwards compatibility. Now the idea is to have dotted units in FPC where String=UnicodeString and the legacy non-dotted ones where String=AnsiString. That only leaves out Delphi 2009 and XE compatibility (which uses non-dotted UnicodeString units), but that's a small price IMHO. Also there are a few further problems that need to be tackled with that approach. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Sorry Marco, On Wed, Dec 25, 2013 at 6:15 PM, Marco van de Voort mar...@stack.nl wrote: There is no utf8 on Windows. One can try to mess with the defaultcodepage, but that will probably only force a different kind of problems. I cannot let you answer alone and make you appear as the only knowledgeable reference for this important subject, it looks like defining default code page 65001 for Windows make it perfect fit to handle UTF-8 http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx Jerome. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 25.12.2013 19:35 schrieb Marcos Douglas m...@delfire.net: The only problem on Windows is that you must only pass a string with a very clear encoding to a RTL function. so assignfile(f,s+s2+s3); is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. Yes but where is the difference between 2.6.2 and trunk, in that case? If in 2.6.2 your three strings contain text of different encodings then the resulting string might be garbage from the user's POV. In trunk the encoding is part of each string and if they differ then each strings will be converted to the default string encoding (defined by a global variable inside unit System) and thus the string might still be valid. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Dec 25, 2013, at 3:43 PM, Jy V jyv...@gmail.com wrote: I cannot let you answer alone and make you appear as the only knowledgeable reference for this important subject, it looks like defining default code page 65001 for Windows make it perfect fit to handle UTF-8 http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx Windows doesn't support using UTF-8 as the default system code page and never will. Michael Kaplan, from Microsoft, has talked about it a number of times in his blog. The original site is unfortunately offline, but it's available through the Internet Archive at https://web.archive.org/web/20120414160234/http://blogs.msdn.com/b/michkap/archive/2006/10/11/816996.aspx https://web.archive.org/web/20110108050100/http://blogs.msdn.com/b/michkap/archive/2006/07/04/656051.aspx The short answer is that all of the selectable ANSI codepages have at most 2 bytes, and UTF-8 can have up to 4, which would require auditing/updating huge amounts of code. -- Craig Peterson Scooter Software -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 25.12.2013 22:44 schrieb Jy V jyv...@gmail.com: Sorry Marco, On Wed, Dec 25, 2013 at 6:15 PM, Marco van de Voort mar...@stack.nl wrote: There is no utf8 on Windows. One can try to mess with the defaultcodepage, but that will probably only force a different kind of problems. I cannot let you answer alone and make you appear as the only knowledgeable reference for this important subject, it looks like defining default code page 65001 for Windows make it perfect fit to handle UTF-8 http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx The Windows API *A functions use the system's (or more precisely user's) code page set through the selected locale to convert ANSI to Unicode. So even if you could set the system's codepage to UTF-8 (which you can not) you'd need the user to change his/her codepage to UTF-8 which is a definite no-go. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 25, 2013 at 7:41 PM, Sven Barth pascaldra...@googlemail.com wrote: Am 25.12.2013 19:19 schrieb Marcos Douglas m...@delfire.net: Since that means typically two per target, it was suggested to combine this using dotted unit functionality. I did not understand this... dotted unit functionality? Delphi XE 2 with the introduction of FireMonkey switched from normal unit names for the RTL to dotted ones (aka unit namespaces). E.g. SysUtils and Classes became System.SysUtils and System.Classes respectively, the Windows units moved into a Windows namespace (AFAIK) and Forms became VCL.Forms. Now the XE2 IDE and command line compiler also provide the possibility to specify multiple default namespaces (e.g. a VCL application would have System and VCL) to ensure backwards compatibility. Simple explanation, I understood, thank you. So the new Delphi namespace is virtual (eg: there is no VCL.Forms.pas file only a Forms.pas) or they have two options, two files? If is virtual and could be changed in command line compiler, looks like an ideia that I had (posted on fpc-list) about namespaces to use two units with the same name in the same project. ;-) Now the idea is to have dotted units in FPC where String=UnicodeString and the legacy non-dotted ones where String=AnsiString. That only leaves out Delphi 2009 and XE compatibility (which uses non-dotted UnicodeString units), but that's a small price IMHO. Also there are a few further problems that need to be tackled with that approach. I see. Well, it seems that the way has already been decided and is in development. Thanks again for the update, in a few words, about the implementations on the trunk. Regards, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Marco van de Voort schrieb: The only problem on Windows is that you must only pass a string with a very clear encoding to a RTL function. so assignfile(f,s+s2+s3); is dangerous if they are not all the same encoding. If there is any mismatch, it will be converted down to default encoding. Then the implementation is wrong. All string conversion is done via UTF (lossless), the result can be either UTF-8 (FPC) or UTF-16 (Delphi). The final conversion depends on the target, i.e. the declaration of AssignFile. There is no utf8 on Windows. Yep, that's why the Unicode (W) API should be used. No problem with UTF-8 strings there :-) DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Sven Barth schrieb: If in 2.6.2 your three strings contain text of different encodings then the resulting string might be garbage from the user's POV. In trunk the encoding is part of each string and if they differ then each strings will be converted to the default string encoding (defined by a global variable inside unit System) and thus the string might still be valid. If so, this flaw should be fixed immediately. Delphi uses lossless conversions, i.e. an up-cast to Unicode. BTW the use of RawByteString variables or parameters *in Delphi* can result in stored strings of an encoding that doesn't match the declaration of the target variable. This in turn can confuse the compiler, when two such strings of the same declared (static) encoding, but of different actual (dynamic) encoding, are simply appended without further checks/conversions. Such problems can be avoided by making RawByteString a compiler magic, that enforces a Unciode conversion whenever AnsiStrings of a different dynamic encoding have to be combined. Furthermore the use of UTF-8 will allow for lossless conversions of AnsiStrings of any encoding, with the result still being an AnsiString. Here Delphi has the problem that a RawByteString result type requires a conversion of an intermediate Unicode string (UTF-16) into an AnsiString(CP_ACP), with possible losses. This is not required when FPC treats UTF-8 as a fully supported encoding, in addition to CP_ACP - it also were a strong argument for using UTF-8 for UnicodeString, *instead* of UTF-16. The related functions already exist in the FPC libraries, they only have to take precedence over CP_ACP (if different). Then additional UTF-8/16 conversions are required only on Windows, when calling external (API...) functions which expect/return WideStrings. Conclusion: FPC can treat RawByteString as *the one and only* string type of a variable dynamic encoding. Procedures accepting RawByteString arguments either retain the dynamic encoding of these strings, or convert parameters of different encoding into UTF-8. A conversion back to a different encoding may be required *only* when a RawByteString is assigned to a variable or parameter in another subroutine call. There remains one problem with empty strings, whose declared encoding cannot be determined at runtime in the Delphi model, because empty strings are represented by Nil pointers. I can imagine two workarounds, to add an Encoding field to every string variable, or to make empty strings point to a string constant of their static encoding. Alternatively typed AnsiStrings and RawByteString can be dropped, so that every AnsiString variable or parameter can have any dynamic encoding (equivalent to RawByteString), with the favorite encoding being UTF-8. This would allow to keep Lazarus and other existing code unmodified, all eventual string conversions can be inserted by the compiler, the obsolete UTF8... functions can be dropped. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 2013-12-23 23:08, schrieb Marco van de Voort: But if I have to chose to kill one, it is utf8. It is the lesser used choice for unicode strings INSIDE APPLICATIONS. Yes, UTF8 is dominant in documents, but not in APIs. But in APIs it would not matter much to convert (in general the time for conversion is negligible compared to the time that is needed for the rest around the API call). I have written a file manager for Windows that can log and store millions of files in memory. It uses the (UTF16) unicode API from Windows and converts the file names as UTF8 internally. There exists another file manager who uses UTF16 internally too which can also log millions of files. When logging the same source I can't see any difference in performance (even when logging multiple times so that everything is cached!) although I have to convert and the other one does not. But the memory footprints are very different. UTF16 is the most horrible decision (all bad things combined). For what? Most of the sentiments I hear are echoed discussions on the web that are mostly about document encodings, NOT application internal encodings. IMO this decision is based on the assumption to choose one encoding for everything. So the same encoding is used *everywhere* as much as possible. Then UTF8 is the best solution. Why use UTF16/32? They cannot be treated the same as ancient ANSI strings either. So what would be the reason behind it? Just wasting memory? UTF8 has the lowest memory demand Not according to 1 billion Chinese. How many of the strings stored and processed on a chinese computer are in chinese language? A lot of the strings are still in english (HTML etc.). So for asian countries the real memory demand is a mix and is not so easy to determine. In most western countries UTF8 definitely uses less memory. On the other hand, adapting the string encoding for each Widgetset/OS would be a can of worms IMO. If you feel that way, I think Delphi compatibility should prevail. Why this? Free Pascal/Lazarus should fledge and not repeat all the bad decissions of Borland/Embarcadero/.. Note that the language support for utf8 breaks down when you pass e.g. a string to rawbytestring on Windows. (because it is converted to the default 1-byte encoding, which is not utf8 in general). I am not sure what you are talking about here. For Windows I would use the unicode (UTF16) API interface exclusively and convert it to UTF8 internally. From then on, everything should be UTF8. As said, UTF8 on Windows is a crutch, and attempts to workaround that moves Lazarus in the direction of portability to everything as long as it is unix philosophies, a la Cygwin. For me the decision of what Unicode encoding should be used is primary OS independent. Just do the conversion once at the API interface level but then use internal what was decided to be the best (UTF8 IMO). Conversions seem to be unavoidable anyway. So it is just a decision where and when they take place. And the API level is a good place IMO. And when other OS's use the same encoding it is even better but not the reason to chose one or the other. A lot of additional knowledge about strings is put on the programmer because handling of strings has to be done differently depending on OS. No!. That's just the aim: If *all* Free Pascal/Lazarus programmers can rely on having UTF8 in all cases then you only need to handle UTF8 strings. No IFDEFS to handle UTF16 on Windows and UTF8 on Linux. The same code just works on *all* platforms! Constructs that happen to work with Linux will fail on Windows. Because on Windows the default 1-byte encoding is not UTF8. The ANSI interface should not be used anymore. It is obsolete and only needed for ancient OS's like DOS. But programmers should not be encourraged to use it on modern platforms. Just use UTF8 *everywhere*. That should be the aim IMO. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Tue, Dec 24, 2013 at 06:18:41AM +0100, Hans-Peter Diettrich wrote: Not necessarily. Supporting both on both platforms is a sane reason too. One can't ditch utf16 because of Delphi compatibility. It will be hard to ditch utf8 because of old Lazarus compatibility. In the meantime we have 2 Delphi compiler/RTL versions: - Ansi (Win32) - Unicode (UTF-16, multi target) and 4 GUI versions - VCL Win32 - CLX - VCL.NET - FireMonkey summing up to 8 versions in theory, and 3 versions in practice. The older delphi compilers are unsupported. We never supported anything but VCL 32/64, so this list seems artificially inflated to me. So what does Delphi compatible mean *really*? The same as it always has. VCL, and language level at a distance. The rest is irrelevant. The FPC compiler supports multiple targets, and most probably can be managed to support both string types using the *same code base* (maintenance issue!). Yes. IMO this does *not* apply to the libraries (RTL and LCL) RTL is less of a problem than one might think. The problem mostly only comes in at the classes level. and existing applications, where Lazarus counts as the most important and prominent application. Existing Lazarus applications are toast anyway, without changes. We can be happy to have one single LCL and IDE version, which is already incompatible due to the use of UTF-8 strings instead of Ansi. Multiple versions, for compatibility with the other Delphi combinations, are beyond *development capacities*. Then drop the old stuff, and simply go for full compatibility. Anything else will only cause the loss of all OSS Delphi projects (and even the commercial ones that support Lazarus). And people like me that are torn between both systems. This sheds a very different light on Delphi compatibility, meaning that a Unicode LCL and IDE can *not* be supported in parallel to the existing UTF-8 implementation. There is no existing UTF8 implementation that can be continued as is anyway. Dumping UTF-8 would discontinue support for the entire* range of **existing* LCL applications, i.e. loose all the current Lazarus users :-( So what should be the intended *audience* for a future Lazarus version? IMO the biggest group are old fashioned Delphi (D7) users, which want their existing Ansi/VCL code base supported *without* complications and incompatibilities introduced by the newer Delphi versions. The subject of this thread clearly indicates that UTF-8 is *not* a solution for this group of users. It was like that two years ago. But I see more and more people migrate to the unicode versions, and updating packages. The D7 base is eroding, and worse, many of its users are mostly hedging bets to keep their codebases running. Not to make new code. (and we need people that DO things) It's like with turbo pascal in the (1.0.x) past. Yes, the numbers are huge, but all they say is they want something 100% compatible to effortless keep their codebases running. But when the times come to actually _invest_ in the code again, they pick something that is at least halfwhat modern. And all you are stuck with is oldtimers and l33t tinkerers. That is the curse of supporting legacy targets, you can't do that forever without making yourself irrelevant. Keep in mind that any Lazarus solution in production use based on 2.8.x is years away. The current activity levels in that group will be even less. Our decisions must be aimed not at the situation now, but good for at least 5 years. Another important user group is targeting mobiles, where time will tell whether FM will ever succeed, or shares the fate of Kylix or VCL.NET. Everywhere I see FM (Mobile plugin) buyers, I see existing Delphi users hoping for an easy conversion to mobile and a quick buck to tide them over the crisis. Not real go-getters that really go for mobile. That makes me think this is not sustainable. But Embarcadero is said to use it heavily internally, so they won't quickly kill it off, and I assume a certain kind of customers will adapt it. But IMHO for us it is irrelevant IMO these should be happy already with fpGUI or mseGUI, no need to raise another competitor in this area. I don't really see any adaptation there. Those teams and offerings are again a magnitude smaller than Lazarus, and for most of those users switching from Embacadero to Lazarus is already the biggest step they are willing to make. But if I have to chose to kill one, it is utf8. It is the lesser used choice for unicode strings INSIDE APPLICATIONS. Yes, UTF8 is dominant in documents, but not in APIs. That's my conclusion as well. But is that new audience worth to abandon the entire existing Lazarus audience? I myself hope for the two tracks way. It satisfies multiple demands, and the extra work is offset by less rewriting from current Delphi sources and less discussion. But the prime point is that IMHO an utf8 Windows is
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Tue, Dec 24, 2013 at 3:18 AM, Hans-Peter Diettrich drdiettri...@aol.com wrote: Marco van de Voort schrieb: On Mon, Dec 23, 2013 at 06:52:21PM +0100, J?rgen Hestermann wrote: Am 2013-12-23 11:32, schrieb Marco van de Voort: So I would say UTF16, and maybe, if there is demand, some can get utf8 :-) The question is: Should FPC and LCL use a fixed encoding for all platforms or should the encoding be adapted for each WidgetSet/OS? Not necessarily. Supporting both on both platforms is a sane reason too. One can't ditch utf16 because of Delphi compatibility. It will be hard to ditch utf8 because of old Lazarus compatibility. In the meantime we have 2 Delphi compiler/RTL versions: - Ansi (Win32) - Unicode (UTF-16, multi target) and 4 GUI versions - VCL Win32 - CLX - VCL.NET - FireMonkey summing up to 8 versions in theory, and 3 versions in practice. So what does Delphi compatible mean *really*? The FPC compiler supports multiple targets, and most probably can be managed to support both string types using the *same code base* (maintenance issue!). IMO this does *not* apply to the libraries (RTL and LCL) and existing applications, where Lazarus counts as the most important and prominent application. We can be happy to have one single LCL and IDE version, which is already incompatible due to the use of UTF-8 strings instead of Ansi. Multiple versions, for compatibility with the other Delphi combinations, are beyond *development capacities*. This sheds a very different light on Delphi compatibility, meaning that a Unicode LCL and IDE can *not* be supported in parallel to the existing UTF-8 implementation. Dumping UTF-8 would discontinue support for the *entire* range of *existing* LCL applications, i.e. loose all the current Lazarus users :-( So what should be the intended *audience* for a future Lazarus version? IMO the biggest group are old fashioned Delphi (D7) users, which want their existing Ansi/VCL code base supported *without* complications and incompatibilities introduced by the newer Delphi versions. The subject of this thread clearly indicates that UTF-8 is *not* a solution for this group of users. I started this thread. My problem isn't to use UTF-8 on Windows... my problem is use different encodings on the same code, ie, RTL LCL. Use functions, always, to convert string between RTL and LCL and vice-versa IHMO is wrong because the final code is confusing. In a huge application you still need to think here is UTF-8 or ANSI/UTF-16? Another important user group is targeting mobiles, where time will tell whether FM will ever succeed, or shares the fate of Kylix or VCL.NET. IMO these should be happy already with fpGUI or mseGUI, no need to raise another competitor in this area. But if I have to chose to kill one, it is utf8. It is the lesser used choice for unicode strings INSIDE APPLICATIONS. Yes, UTF8 is dominant in documents, but not in APIs. That's my conclusion as well. But is that new audience worth to abandon the entire existing Lazarus audience? Of course nobody will abandon the entire existing Lazarus audience. If the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues -- I think -- working using UTF-8. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Tue, Dec 24, 2013 at 12:19 PM, Marco van de Voort mar...@stack.nl wrote: On Tue, Dec 24, 2013 at 06:18:41AM +0100, Hans-Peter Diettrich wrote: Not necessarily. Supporting both on both platforms is a sane reason too. One can't ditch utf16 because of Delphi compatibility. It will be hard to ditch utf8 because of old Lazarus compatibility. In the meantime we have 2 Delphi compiler/RTL versions: - Ansi (Win32) - Unicode (UTF-16, multi target) and 4 GUI versions - VCL Win32 - CLX - VCL.NET - FireMonkey summing up to 8 versions in theory, and 3 versions in practice. The older delphi compilers are unsupported. We never supported anything but VCL 32/64, so this list seems artificially inflated to me. So what does Delphi compatible mean *really*? The same as it always has. VCL, and language level at a distance. The rest is irrelevant. The FPC compiler supports multiple targets, and most probably can be managed to support both string types using the *same code base* (maintenance issue!). Yes. IMO this does *not* apply to the libraries (RTL and LCL) RTL is less of a problem than one might think. The problem mostly only comes in at the classes level. and existing applications, where Lazarus counts as the most important and prominent application. Existing Lazarus applications are toast anyway, without changes. We can be happy to have one single LCL and IDE version, which is already incompatible due to the use of UTF-8 strings instead of Ansi. Multiple versions, for compatibility with the other Delphi combinations, are beyond *development capacities*. Then drop the old stuff, and simply go for full compatibility. Anything else will only cause the loss of all OSS Delphi projects (and even the commercial ones that support Lazarus). And people like me that are torn between both systems. This sheds a very different light on Delphi compatibility, meaning that a Unicode LCL and IDE can *not* be supported in parallel to the existing UTF-8 implementation. There is no existing UTF8 implementation that can be continued as is anyway. Dumping UTF-8 would discontinue support for the entire* range of **existing* LCL applications, i.e. loose all the current Lazarus users :-( So what should be the intended *audience* for a future Lazarus version? IMO the biggest group are old fashioned Delphi (D7) users, which want their existing Ansi/VCL code base supported *without* complications and incompatibilities introduced by the newer Delphi versions. The subject of this thread clearly indicates that UTF-8 is *not* a solution for this group of users. It was like that two years ago. But I see more and more people migrate to the unicode versions, and updating packages. The D7 base is eroding, and worse, many of its users are mostly hedging bets to keep their codebases running. Not to make new code. (and we need people that DO things) It's like with turbo pascal in the (1.0.x) past. Yes, the numbers are huge, but all they say is they want something 100% compatible to effortless keep their codebases running. But when the times come to actually _invest_ in the code again, they pick something that is at least halfwhat modern. And all you are stuck with is oldtimers and l33t tinkerers. That is the curse of supporting legacy targets, you can't do that forever without making yourself irrelevant. Keep in mind that any Lazarus solution in production use based on 2.8.x is years away. The current activity levels in that group will be even less. Our decisions must be aimed not at the situation now, but good for at least 5 years. Another important user group is targeting mobiles, where time will tell whether FM will ever succeed, or shares the fate of Kylix or VCL.NET. Everywhere I see FM (Mobile plugin) buyers, I see existing Delphi users hoping for an easy conversion to mobile and a quick buck to tide them over the crisis. Not real go-getters that really go for mobile. That makes me think this is not sustainable. But Embarcadero is said to use it heavily internally, so they won't quickly kill it off, and I assume a certain kind of customers will adapt it. But IMHO for us it is irrelevant IMO these should be happy already with fpGUI or mseGUI, no need to raise another competitor in this area. I don't really see any adaptation there. Those teams and offerings are again a magnitude smaller than Lazarus, and for most of those users switching from Embacadero to Lazarus is already the biggest step they are willing to make. But if I have to chose to kill one, it is utf8. It is the lesser used choice for unicode strings INSIDE APPLICATIONS. Yes, UTF8 is dominant in documents, but not in APIs. That's my conclusion as well. But is that new audience worth to abandon the entire existing Lazarus audience? I myself hope for the two tracks way. It satisfies multiple demands, and the extra work is offset by less rewriting from
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 24.12.2013 15:22, schrieb Marcos Douglas: Use functions, always, to convert string between RTL and LCL and vice-versa IHMO is wrong because the final code is confusing. In a huge application you still need to think here is UTF-8 or ANSI/UTF-16? That's true. It's a pain to pay attention to this. All units used should use the same string encoding IMO. But which? I think that's the discussion in this thread. If the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues -- I think -- working using UTF-8. But that would be a real pain. In a program it should be possible to use strings without the need to convert back and forth between encodings. So all strings from/to FPC and LCL routines should have the same encoding. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Tue, Dec 24, 2013 at 3:13 PM, Jürgen Hestermann juergen.hesterm...@gmx.de wrote: Am 24.12.2013 15:22, schrieb Marcos Douglas: Use functions, always, to convert string between RTL and LCL and vice-versa IHMO is wrong because the final code is confusing. In a huge application you still need to think here is UTF-8 or ANSI/UTF-16? That's true. It's a pain to pay attention to this. Someone agreed! :-) All units used should use the same string encoding IMO. But which? I think that's the discussion in this thread. Yes, this is the major problem... ;-) If the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues -- I think -- working using UTF-8. But that would be a real pain. Would not.. IS a real pain today. In a program it should be possible to use strings without the need to convert back and forth between encodings. So all strings from/to FPC and LCL routines should have the same encoding. This will depend only on the FPC team... When I created this thread I was looking for a way to only minimize this problem but... Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 24.12.2013 15:34 schrieb Marcos Douglas m...@delfire.net: Sorry if I say something crazy, but what do you think to use UTF-16 on {mode delphi} and UTF-8 in {mode fpc}? That is already the case with mode delphiunicode. But the big problem are classes and their inheritance. Take TStringList for example. Let's assume it's declared with String=AnsiString and you override it in a unit with String=UnicodeString then you'll get problems with overloads/overrides, because UnicodeString AnsiString. The mode concept is all good and well, but here it breaks down... :( Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Jürgen Hestermann schrieb: The ANSI interface should not be used anymore. It is obsolete and only needed for ancient OS's like DOS. But programmers should not be encourraged to use it on modern platforms. Just use UTF8 *everywhere*. That should be the aim IMO. Whenever the encoding matters, most users and applications are best off with their regional Ansi encoding - all used characters are single bytes. UTF-16 extends the range of languages whose characters can be assumed to have a fixed size, i.e. all character sets in the BMP. Such fixed-size characters IMO are on the top of the wishlist of most users, so that none of them ever will be happy with UTF-8. Certainly UTF-8 was the best choice when Delphi (and FPC) did not have native UTF-16 strings, but when we have Unicode strings, now or soon, it should be dropped. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Marcos Douglas schrieb: On Tue, Dec 24, 2013 at 3:18 AM, Hans-Peter Diettrich drdiettri...@aol.com wrote: I started this thread. My problem isn't to use UTF-8 on Windows... my problem is use different encodings on the same code, ie, RTL LCL. This mix would cause problems, of course. Use functions, always, to convert string between RTL and LCL and vice-versa IHMO is wrong because the final code is confusing. In a huge application you still need to think here is UTF-8 or ANSI/UTF-16? The simplest (feasable) solution IMO is the adaptation of (OS...) string types behind the scene, i.e. inside the RTL and widgetsets. Then you can have any common encoding in the application and library API, while encoding-dependent code is encapsulated in lower level functions receiving explicit (Unicode, UTF8String...) string types, so that the compiler can insert required conversions. Such explicit parameter types also were required for legacy code, where a specific encoding is assumed. I'm not sure how this conversion process can be automated or supported, perhaps removing/renaming the tradional UTF8... functions would help in spotting the procedures that require special attention. The number of automatic conversions can be reduced in the next step, by e.g. adding overrides, or conditional code, for both string types one by one, as time permits. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sun, Dec 22, 2013 at 11:52:04PM +0100, Hans-Peter Diettrich wrote: Keeping UTF8 on Windows makes a majority platform seem only half supported. Not good either. Worse, it is Delphi incompatible. You favor a special FPC and Lazarus for Windows, in addition to the UTF-8 version for all other platforms? IMHO the utf8 is not a done deal, and Delphi compatibility requires at least also UTF16 on other platforms. QT is utf16, and so is Cocoa. Only GTK is utf8 So I would say UTF16, and maybe, if there is demand, some can get utf8 :-) -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Juha Manninen schrieb: However using a new Unicode-Delphi would cause many problems because all VCL functions and classes, including TStringList, expect UTF-16 string. When using UTF8String, the compiler converts between encodings all the time. Then you can give your favorite string type a unique name, and set it to whatever is best in your favorite environment. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Mon, Dec 23, 2013 at 1:58 PM, Hans-Peter Diettrich drdiettri...@aol.com wrote: Then you can give your favorite string type a unique name, and set it to whatever is best in your favorite environment. The favorite string type in this case would be UTF8String. It already has a name. Please see what I was writing earlier. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 2013-12-23 11:32, schrieb Marco van de Voort: So I would say UTF16, and maybe, if there is demand, some can get utf8 :-) The question is: Should FPC and LCL use a fixed encoding for all platforms or should the encoding be adapted for each WidgetSet/OS? If it should be the same for all platforms then it should be UTF8 IMO. UTF16 is the most horrible decision (all bad things combined). UTF32 would at least have the advantage of fixed character size but pays this with *a lot* of memory consumption. UTF8 has the lowest memory demand (in general) and a good backward compatibility. On the other hand, adapting the string encoding for each Widgetset/OS would be a can of worms IMO. A lot of additional knowledge about strings is put on the programmer because handling of strings has to be done differently depending on OS. That would be a hazadous decision and would only be of use if programs are exclusively written for one OS only. But FPC/Lazarus is meant to be portable so this should not be done. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Mon, Dec 23, 2013 at 8:38 AM, Marco van de Voort mar...@stack.nl wrote: On Sun, Dec 22, 2013 at 05:06:27PM -0200, Marcos Douglas wrote: FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. Most system and sysutils file related routines are already unicode (UTF16 with Rawbytestring overload). So FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. But how it will work with Lazarus that uses UTF-8? Not without conversions. UTF8 on Windows IMHO _NEVER_ was a good idea. Lazarus will not to change to UTF-16 -- only for Windows -- then everything will stay the same to Windows programmers? I think it is too early to say what will happen. One way or the other. Everybody is still searching, and the current 2.6.x based UTF8 support will need an overhaul anyway for 2.8.x. I think 2.8.x will be a transition version anyway, and a definitive unicode solution will only in the major release after that. Ok, thanks for the explanation. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Mon, Dec 23, 2013 at 06:52:21PM +0100, J?rgen Hestermann wrote: Am 2013-12-23 11:32, schrieb Marco van de Voort: So I would say UTF16, and maybe, if there is demand, some can get utf8 :-) The question is: Should FPC and LCL use a fixed encoding for all platforms or should the encoding be adapted for each WidgetSet/OS? Not necessarily. Supporting both on both platforms is a sane reason too. One can't ditch utf16 because of Delphi compatibility. It will be hard to ditch utf8 because of old Lazarus compatibility. But if I have to chose to kill one, it is utf8. It is the lesser used choice for unicode strings INSIDE APPLICATIONS. Yes, UTF8 is dominant in documents, but not in APIs. If it should be the same for all platforms then it should be UTF8 IMO. UTF16 is the most horrible decision (all bad things combined). For what? Most of the sentiments I hear are echoed discussions on the web that are mostly about document encodings, NOT application internal encodings. However we UTF32 would at least have the advantage of fixed character size but pays this with *a lot* of memory consumption. (it is not fixed character, but fixed codepoint) UTF8 has the lowest memory demand Not according to 1 billion Chinese. (in general) and a good backward compatibility. Hardly. Only for western languages, and even there conversions often go wrong. That's why the whole BOM kludge became so important. On the other hand, adapting the string encoding for each Widgetset/OS would be a can of worms IMO. If you feel that way, I think Delphi compatibility should prevail. Old Lazarus code needs to be modified anyway. Note that the language support for utf8 breaks down when you pass e.g. a string to rawbytestring on Windows. (because it is converted to the default 1-byte encoding, which is not utf8 in general). As said, UTF8 on Windows is a crutch, and attempts to workaround that moves Lazarus in the direction of portability to everything as long as it is unix philosophies, a la Cygwin. IMHO a bad direction. FPC has in general avoided having an outright preference and IMHO should continue to do so. A lot of additional knowledge about strings is put on the programmer because handling of strings has to be done differently depending on OS. It will anyway, even with utf8. Constructs that happen to work with Linux will fail on Windows. Because on Windows the default 1-byte encoding is not UTF8. Moreover, I think people step over the Delphi compatibility card too easy. Way, way ,way to easy. But FPC/Lazarus is meant to be portable so this should not be done. FPC/Lazarus is supposed to be portable, not an emulated Unix on everything. Using other systems default encoding is emulation, and not portability. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Marco van de Voort schrieb: On Mon, Dec 23, 2013 at 06:52:21PM +0100, J?rgen Hestermann wrote: Am 2013-12-23 11:32, schrieb Marco van de Voort: So I would say UTF16, and maybe, if there is demand, some can get utf8 :-) The question is: Should FPC and LCL use a fixed encoding for all platforms or should the encoding be adapted for each WidgetSet/OS? Not necessarily. Supporting both on both platforms is a sane reason too. One can't ditch utf16 because of Delphi compatibility. It will be hard to ditch utf8 because of old Lazarus compatibility. In the meantime we have 2 Delphi compiler/RTL versions: - Ansi (Win32) - Unicode (UTF-16, multi target) and 4 GUI versions - VCL Win32 - CLX - VCL.NET - FireMonkey summing up to 8 versions in theory, and 3 versions in practice. So what does Delphi compatible mean *really*? The FPC compiler supports multiple targets, and most probably can be managed to support both string types using the *same code base* (maintenance issue!). IMO this does *not* apply to the libraries (RTL and LCL) and existing applications, where Lazarus counts as the most important and prominent application. We can be happy to have one single LCL and IDE version, which is already incompatible due to the use of UTF-8 strings instead of Ansi. Multiple versions, for compatibility with the other Delphi combinations, are beyond *development capacities*. This sheds a very different light on Delphi compatibility, meaning that a Unicode LCL and IDE can *not* be supported in parallel to the existing UTF-8 implementation. Dumping UTF-8 would discontinue support for the *entire* range of *existing* LCL applications, i.e. loose all the current Lazarus users :-( So what should be the intended *audience* for a future Lazarus version? IMO the biggest group are old fashioned Delphi (D7) users, which want their existing Ansi/VCL code base supported *without* complications and incompatibilities introduced by the newer Delphi versions. The subject of this thread clearly indicates that UTF-8 is *not* a solution for this group of users. Another important user group is targeting mobiles, where time will tell whether FM will ever succeed, or shares the fate of Kylix or VCL.NET. IMO these should be happy already with fpGUI or mseGUI, no need to raise another competitor in this area. But if I have to chose to kill one, it is utf8. It is the lesser used choice for unicode strings INSIDE APPLICATIONS. Yes, UTF8 is dominant in documents, but not in APIs. That's my conclusion as well. But is that new audience worth to abandon the entire existing Lazarus audience? DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 21, 2013 at 5:41 PM, Marcos Douglas m...@delfire.net wrote: LCL (and VCL) typically use events, like TNotifyEvent. They are basically just call-back functions. Oh, not same. I use a lot Events -- no only Form or GUI components -- in my core codes but PostMessage is very different, eg., you call a PostMessage, show a Modal Form and the process will start after; the task code is not inside the instance of the Form and the Form knows nothing about the task. Ok, true. Some of the Windows message are ported to be cross-platform. I have used OnIdle handler and sometimes threads when I want the action to happen later. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 21, 2013 at 7:55 PM, Jürgen Hestermann juergen.hesterm...@gmx.de wrote: The bottom line is: Use only UTF-16 with Delphi and it works very well. I would not like Lazarus to do the same. UTF16 is the worst of all possible unicode encodings. I believe LCL will continue to use UTF-8. Nobody knows yet how many changes are needed later with new FPC versions but no worries, that question is not acute now. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sun, Dec 22, 2013 at 7:06 AM, Juha Manninen juha.mannine...@gmail.com wrote: On Sat, Dec 21, 2013 at 5:41 PM, Marcos Douglas m...@delfire.net wrote: LCL (and VCL) typically use events, like TNotifyEvent. They are basically just call-back functions. Oh, not same. I use a lot Events -- no only Form or GUI components -- in my core codes but PostMessage is very different, eg., you call a PostMessage, show a Modal Form and the process will start after; the task code is not inside the instance of the Form and the Form knows nothing about the task. Ok, true. Some of the Windows message are ported to be cross-platform. I have used OnIdle handler and sometimes threads when I want the action to happen later. I use threads too, but I like make things as simple as possible and threads can be hard sometimes. Use PostMessage is very easy and simple. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sun, Dec 15, 2013 at 06:13:32PM +0100, Reinier Olislagers wrote: FPC's context. These components do not use Lazarus' routines and that is the BIG problem. I need to remember in pass only ANSI strings for these components as remember to convert the component's output string results to use in Lazarus. Why not just include a project reference to LCLBase (IIRC that should be enough) and just always use the LCL units until FPC catches up? FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. Most system and sysutils file related routines are already unicode (UTF16 with Rawbytestring overload). -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 18, 2013 at 12:03:56PM +0100, Hans-Peter Diettrich wrote: Apart from that there's not much else you can do except contribute patches to help unicode-ise the FPC RTL... The new AnsiStrings (with Encoding and automatic conversion) should be sufficient, Unicode is not required. In fact a move to a Unicode RTL would require that either Lazarus is converted, too, or that 2 RTL flavors (Ansi and Unicode) must be supported. Not a good idea, IMO. Keeping UTF8 on Windows makes a majority platform seem only half supported. Not good either. Worse, it is Delphi incompatible. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sun, Dec 22, 2013 at 4:56 PM, Marco van de Voort mar...@stack.nl wrote: On Sun, Dec 15, 2013 at 06:13:32PM +0100, Reinier Olislagers wrote: FPC's context. These components do not use Lazarus' routines and that is the BIG problem. I need to remember in pass only ANSI strings for these components as remember to convert the component's output string results to use in Lazarus. Why not just include a project reference to LCLBase (IIRC that should be enough) and just always use the LCL units until FPC catches up? FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. Most system and sysutils file related routines are already unicode (UTF16 with Rawbytestring overload). So FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. But how it will work with Lazarus that uses UTF-8? Lazarus will not to change to UTF-16 -- only for Windows -- then everything will stay the same to Windows programmers? Thanks, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Marco van de Voort schrieb: Keeping UTF8 on Windows makes a majority platform seem only half supported. Not good either. Worse, it is Delphi incompatible. You favor a special FPC and Lazarus for Windows, in addition to the UTF-8 version for all other platforms? DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 21, 2013 at 5:56 AM, Juha Manninen juha.mannine...@gmail.com wrote: On Sat, Dec 21, 2013 at 3:08 AM, Marcos Douglas m...@delfire.net wrote: I didn't understand. If I have a TStringList instance, on Windows, I need to convert Text property to ANSI. But some components, e.g. TMemo, do these conversions automatically, but this is different. TMemo is a GUI component. I know, of course... :) Then the string encoding matters and it must be converted to the native widgetset encoding. Still, the conversion is automatic. You don't need to care about it if you work with LCL components only and not with WinAPI directly. Yes and so I wrote TMemo, do these conversions automatically, but this is different. TStringList is not a GUI component. It can be used for example in an embedded Linux program with no GUI. Yes again. I use a lot TStrings as a transfer of information in many cases... no GUI envolved. It does not need to know the encoding (except for sorting maybe). With FPC no automatic conversions happen. And that is one of these problems because I need convert the Text property to the right encode. In Delphi things are different. The auto-conversion happens ALWAYS when assigning between eg. UTF-8 and UTF-16. It has nothing to do with WinAPI, or any other widgetset API. Native string is UTF-16. If you have var MyUTF8Str: UTF8String; ... StringList.Add(MyUTF8Str); - triggers conversion MyUTF8Str := StringList[0]; - triggers conversion again The amazing thing is that such code works. Delphi does a good job in converting the strings. That's it! I think you talking about of new versions of Delphi, right? So I always read that new Unicode implementation in new versions of Delphi is wrong, broke things, etc. but you is writing other vision. These conversions, IMHO, could be automatic -- as Delphi does -- when I use the correct type of string, in that case UT8String. So, I can write my packages and opt to use only UTF8String or UTF16String in all arguments and the compiler convert for me. What is wrong in that approach? It is also reasonably fast, but still not acceptable in a speed critical code. This was the problem in my employer's code base. We are thinking how to use UTF-8 for the core program without triggering many auto-conversions. One choice is to dump Delphi and use only FPC. Now the code still works with both. If you do not want automatic conversions, use the RawByteString type. Delphi does not do conversions in that case, right? Thank you, I'm learning. Marcos Douglas P.S. I am still wondering why you are so fond of WinAPI while you have a nice cross-platform API available. Fond? Of course not! I use WinAPI when I need or when I don't know another way to do the same using cross-plataform. I'm a classic Delphi programmer. I still use Delphi (stoped in 7 version) today but all new projects I use Lazarus -- MSEgui a little. For example, I use a lot PostMessage, SendMessage, PeekMessage... Are these cross-plataform? If not, how can I do the same? -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 21, 2013 at 4:33 PM, Marcos Douglas m...@delfire.net wrote: That's it! I think you talking about of new versions of Delphi, right? So I always read that new Unicode implementation in new versions of Delphi is wrong, broke things, etc. but you is writing other vision. Yes, Delphi 2009+. Delphi 2009 is soon 5 years old, not really new any more. IMO it does a good job for such a fundamental change in string type. Only code that relies on sizeof(char) = 1 does not work. It includes streaming strings, file I/O or I/O with some outside devices, using Length(Str) as a parameter for GetMem(), Move() etc. Most clean code works amazingly well, if you are ok with using UTF-16 everywhere. These conversions, IMHO, could be automatic -- as Delphi does -- when I use the correct type of string, in that case UT8String. So, I can write my packages and opt to use only UTF8String or UTF16String in all arguments and the compiler convert for me. What is wrong in that approach? Nothing wrong I guess. I hope it will be possible with FPC. Still, let's not speculate more, we already have such mail threads in fpc-dev list that continued for months. If you do not want automatic conversions, use the RawByteString type. Delphi does not do conversions in that case, right? Thank you, I'm learning. You can bypass the conversion sometimes by using RawByteString but it would be rather hackish. Remember, all VCL classes and string functions expect UTF-16. I don't want to try what happens if you pass them a UTF-8 encoded string using some hack. The bottom line is: Use only UTF-16 with Delphi and it works very well. For example, I use a lot PostMessage, SendMessage, PeekMessage... Are these cross-plataform? If not, how can I do the same? LCL (and VCL) typically use events, like TNotifyEvent. They are basically just call-back functions. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 21, 2013 at 1:18 PM, Juha Manninen juha.mannine...@gmail.com wrote: On Sat, Dec 21, 2013 at 4:33 PM, Marcos Douglas m...@delfire.net wrote: That's it! I think you talking about of new versions of Delphi, right? So I always read that new Unicode implementation in new versions of Delphi is wrong, broke things, etc. but you is writing other vision. Yes, Delphi 2009+. Delphi 2009 is soon 5 years old, not really new any more. IMO it does a good job for such a fundamental change in string type. Only code that relies on sizeof(char) = 1 does not work. It includes streaming strings, file I/O or I/O with some outside devices, using Length(Str) as a parameter for GetMem(), Move() etc. Most clean code works amazingly well, if you are ok with using UTF-16 everywhere. These conversions, IMHO, could be automatic -- as Delphi does -- when I use the correct type of string, in that case UT8String. So, I can write my packages and opt to use only UTF8String or UTF16String in all arguments and the compiler convert for me. What is wrong in that approach? Nothing wrong I guess. I hope it will be possible with FPC. Still, let's not speculate more, we already have such mail threads in fpc-dev list that continued for months. If you do not want automatic conversions, use the RawByteString type. Delphi does not do conversions in that case, right? Thank you, I'm learning. You can bypass the conversion sometimes by using RawByteString but it would be rather hackish. Remember, all VCL classes and string functions expect UTF-16. I don't want to try what happens if you pass them a UTF-8 encoded string using some hack. The bottom line is: Use only UTF-16 with Delphi and it works very well. I always said here -- FPC/Lazarus lists -- that FPC should never follow Delphi but you're making me change my mind about Unicode implementation. Ok, no more speculations about how next FPC will work with Unicode. For example, I use a lot PostMessage, SendMessage, PeekMessage... Are these cross-plataform? If not, how can I do the same? LCL (and VCL) typically use events, like TNotifyEvent. They are basically just call-back functions. Oh, not same. I use a lot Events -- no only Form or GUI components -- in my core codes but PostMessage is very different, eg., you call a PostMessage, show a Modal Form and the process will start after; the task code is not inside the instance of the Form and the Form knows nothing about the task. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 2013-12-21 16:18, schrieb Juha Manninen: The bottom line is: Use only UTF-16 with Delphi and it works very well. I would not like Lazarus to do the same. UTF16 is the worst of all possible unicode encodings. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On 12/20/2013 07:22 AM, Juha Manninen wrote: When Unicode is mentioned, usually people start to argue about how it SHOULD be done. :-) :-) :-) Especially because the big boss Delphi does not provide a really good model to go for. Delphi String handling is done with UTF16 (using other encoding results in bad performance and other problems) in mind and with no respect to portability at all. And in spite of that there still are some soles that claim Unicode support is not a complicated thing :-( -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Wed, Dec 18, 2013 at 6:48 AM, Juha Manninen juha.mannine...@gmail.com wrote: What more, UTF-16 is confusing because it has variations. It all is well explained here: http://www.utf8everywhere.org/ my experience at: http://www.utf8bootcamp.org/ -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 20, 2013 at 3:19 AM, Jürgen Hestermann juergen.hesterm...@gmx.de wrote: Am 2013-12-19 23:04, schrieb Marcos Douglas: Well, the same problem... If there is no solution (for now), I prefer using SysToUTF8/ UTF8ToSys because is more simpler than use WideString API and conversion to UnicodeString, UTF8Decode, etc. Don't you think? As Bart already mentions, the ANSI (SYS) interface does *not* support Unicode. Also, you are not be able to access long paths (longer than 255 characters) when using ANSI API functions. Therefore the [W]ide (unicode) character API functions are a must. So, these limitations exist in Lazarus too, right? Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, 20 Dec 2013 17:55:48 -0200 Marcos Douglas m...@delfire.net wrote: On Fri, Dec 20, 2013 at 3:19 AM, Jürgen Hestermann juergen.hesterm...@gmx.de wrote: Am 2013-12-19 23:04, schrieb Marcos Douglas: Well, the same problem... If there is no solution (for now), I prefer using SysToUTF8/ UTF8ToSys because is more simpler than use WideString API and conversion to UnicodeString, UTF8Decode, etc. Don't you think? As Bart already mentions, the ANSI (SYS) interface does *not* support Unicode. Also, you are not be able to access long paths (longer than 255 characters) when using ANSI API functions. Therefore the [W]ide (unicode) character API functions are a must. So, these limitations exist in Lazarus too, right? The file functions with UTF8 in name use internally the W functions under Windows. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 20, 2013 at 4:22 AM, Juha Manninen juha.mannine...@gmail.com wrote: On Fri, Dec 20, 2013 at 2:47 AM, Marcos Douglas m...@delfire.net wrote: Not _only_ to UTF-16? It will depend on the OS? No, FPC string will know its encoding and the conversion is made to any encoding but only when needed. Let's not go deeper into this subject here. The details of future FPC are still open and they are not yet documented. When Unicode is mentioned, usually people start to argue about how it SHOULD be done. You can search in fpc-pascal and fpc-dev histories for that. Ok, you're right. For now (2.6.2) works ok only for AnsiString... I'm talking about codify TStringList class to work with UTF-8 but no changes in string type arguments. Again no. TStringList in 2.6.2 works ok for UTF-8 encoded strings, too. The same is true for future FPC versions because they are not hard-coded for UTF-16 (as Delphi is). I didn't understand. If I have a TStringList instance, on Windows, I need to convert Text property to ANSI. But some components, e.g. TMemo, do these conversions automatically, but this is different. With Delphi you would need to copy the whole class, name it TUtf8StringList, and replace string with UTF8String. This new class must NOT inherit from Classes.TStringList. The same here... I think. No no no :) But you talking about to make a new StringList... this is not the proposal. ;-) Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 20, 2013 at 7:16 AM, Michael Schnell mschn...@lumino.de wrote: On 12/20/2013 12:46 AM, Juha Manninen wrote: Yes, Delphi does that. Future FPC versions will do automatic conversion, too, but not only to UTF-16. It's a long winding debate whether or not this is a good idea from a technical POW, but as Delphi does this, FPC seems to need to follow. In fact there are decent positive aspects. But it obviously is a negative aspect if TStringList and such functions are implemented using a fixed encoding scheme forcing conversions to and fro when e.g. using TStringList as an intermediate store. Here, a generic implementation (which Delphi does not provide) would be good. IMHO this is doable without loosing Delphi compatibility or performance. +1 That's I was talking about in previous mail, using TStringList as an intermediate store. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Fri, Dec 20, 2013 at 8:44 PM, Mattias Gaertner nc-gaert...@netcologne.de wrote: On Fri, 20 Dec 2013 17:55:48 -0200 Marcos Douglas m...@delfire.net wrote: On Fri, Dec 20, 2013 at 3:19 AM, Jürgen Hestermann juergen.hesterm...@gmx.de wrote: Am 2013-12-19 23:04, schrieb Marcos Douglas: Well, the same problem... If there is no solution (for now), I prefer using SysToUTF8/ UTF8ToSys because is more simpler than use WideString API and conversion to UnicodeString, UTF8Decode, etc. Don't you think? As Bart already mentions, the ANSI (SYS) interface does *not* support Unicode. Also, you are not be able to access long paths (longer than 255 characters) when using ANSI API functions. Therefore the [W]ide (unicode) character API functions are a must. So, these limitations exist in Lazarus too, right? The file functions with UTF8 in name use internally the W functions under Windows. I didn't know, thanks. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
On Sat, Dec 21, 2013 at 3:08 AM, Marcos Douglas m...@delfire.net wrote: I didn't understand. If I have a TStringList instance, on Windows, I need to convert Text property to ANSI. But some components, e.g. TMemo, do these conversions automatically, but this is different. TMemo is a GUI component. Then the string encoding matters and it must be converted to the native widgetset encoding. Still, the conversion is automatic. You don't need to care about it if you work with LCL components only and not with WinAPI directly. TStringList is not a GUI component. It can be used for example in an embedded Linux program with no GUI. It does not need to know the encoding (except for sorting maybe). With FPC no automatic conversions happen. In Delphi things are different. The auto-conversion happens ALWAYS when assigning between eg. UTF-8 and UTF-16. It has nothing to do with WinAPI, or any other widgetset API. Native string is UTF-16. If you have var MyUTF8Str: UTF8String; ... StringList.Add(MyUTF8Str); - triggers conversion MyUTF8Str := StringList[0]; - triggers conversion again The amazing thing is that such code works. Delphi does a good job in converting the strings. It is also reasonably fast, but still not acceptable in a speed critical code. This was the problem in my employer's code base. We are thinking how to use UTF-8 for the core program without triggering many auto-conversions. One choice is to dump Delphi and use only FPC. Now the code still works with both. Juha P.S. I am still wondering why you are so fond of WinAPI while you have a nice cross-platform API available. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Am 2013-12-18 02:16, schrieb Marcos Douglas: On Tue, Dec 17, 2013 at 4:15 PM, Jürgen Hestermann juergen.hesterm...@gmx.de wrote: I am just writing a file manager for Windows (hopefully can port it to Linux later) and I don't see any performance problems by using UTF8 in my program while the API is UTF16. Most (if not all) things that I do with files take much longer than the string conversion so it does not matter much. Ok. But how do you work, using SysToUTF8 / UTF8ToSys? I use the following: --- var X,Path : UTF8String; FW : Win32_Find_DataW; H := FindFirstFileW(pwidechar(UTF8Decode(WinAPIPathName(Path))),FW); ... X := UTF8Encode(UnicodeString(FW.cFileName)); --- where WinAPIPathName just prepends the \\?\ string to the pathname to overcome the 255 char length limitation. Path is the UTF8 string for the file search and X holds the found file name(s) in UTF8 notation. When I later need an API-call I convert back: --- ... Windows.DeleteFileW(pwidechar(UTF8Decode(WinAPIPathName(AppendDir(Pfad,X) --- -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus