Re: [Lazarus] TStrings, Windows and Unicode
On Wed, Oct 24, 2012 at 11:24 AM, Luca Olivetti wrote: > Al 24/10/2012 16:14, En/na Howard Page-Clark ha escrit: > >> On 24/10/12 2:37, Marcos Douglas wrote: >> > ...in my example I didn't do that and worked, why? >>> >>> >>> I had not explained very well. When I said the file is Ok I mean the >>> file on Windows Explorer, not in LCL. >>> Do the test and you see. >> >> >> I guess the stringlist streaming saves the string with a BOM, and so >> Explorer then recognises the encoding? > > > > There's no BOM, but notepad is capable of showing it correctly. The problem > is, if you then save the file from notepad it will add a BOM, that will mess > up the loading of the file in a stringlist (or, in my case, in a TIniFile). Explained well. I tested and it's correct, thank you. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
Al 24/10/2012 16:14, En/na Howard Page-Clark ha escrit: On 24/10/12 2:37, Marcos Douglas wrote: ...in my example I didn't do that and worked, why? I had not explained very well. When I said the file is Ok I mean the file on Windows Explorer, not in LCL. Do the test and you see. I guess the stringlist streaming saves the string with a BOM, and so Explorer then recognises the encoding? There's no BOM, but notepad is capable of showing it correctly. The problem is, if you then save the file from notepad it will add a BOM, that will mess up the loading of the file in a stringlist (or, in my case, in a TIniFile). Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es Tel. +34 935883004 Fax +34 935883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
On 24/10/12 2:37, Marcos Douglas wrote: ...in my example I didn't do that and worked, why? I had not explained very well. When I said the file is Ok I mean the file on Windows Explorer, not in LCL. Do the test and you see. I guess the stringlist streaming saves the string with a BOM, and so Explorer then recognises the encoding? Howard -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
On Wed, Oct 24, 2012 at 10:27 AM, Howard Page-Clark wrote: > On 24/10/12 12:40, Marcos Douglas wrote: > >>> Nope. >>> A stringlist is not LCL and it does not expect anything but strings. >> >> >> I agree... >> >>> If you write the contents of a stringlist to a file you get bytes you >>> have >>> put into the list. Nothing more nothing less. >> >> >> Yes, but... >> >>> If you want a UTF8 text file written, make sure you put UTF8 in it. >>> Same for reading. >>> When using filenames, you need to convert the name using the LCL function >>> LCLtoSys (or something like that) >> >> >> ...in my example I didn't do that and worked, why? > > > Because you added a UTF8 string to a stringlist. It was saved as UTF8 bytes > (as Marc pointed out) and when later retrieved via a stringlist LoadFromFile > call those bytes will be inserted into the stringlist just as they were > saved, ready to display in UTF8 encoding. I had not explained very well. When I said the file is Ok I mean the file on Windows Explorer, not in LCL. Do the test and you see. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
On Wed, Oct 24, 2012 at 10:58 AM, Hans-Peter Diettrich wrote: > Jürgen Hestermann schrieb: > >> Am 2012-10-24 08:58, schrieb Howard Page-Clark: >> > Internally the LCL uses only UTF8 encoding, >> > so stringlists etc. all expect strings in that >> > encoding and process them correctly when supplied >> > (as here) with a string from the IDE Editor which >> > is also UTF8. However the streaming code in SaveToFile >> > eventually calls some Windows system routine which >> > does not handle UTF8 and so requires conversion >> > from the UTF8 filename to be correctly handled. >> >> >> That's one of the most important drawbacks of Lazarus: >> The handling of strings is not consistent in all cases. >> There is the theory of having UTF8 but in practice you have to >> watch out because in *some* situations the UTF8 paradigma >> is *not* valid and you have to convert strings yourself. > > > This will change with the new Unicode and Ansi strings, which have a defined > Encoding. > > Once the Encoding comes, I'd vote for an dedicated TFileName string type, > that matches the platform conventions. This type will allow to handle > filenames in the platform specific part of the RTL easily, without too many > string conversions. +1 Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
On 24/10/12 12:40, Marcos Douglas wrote: Nope. A stringlist is not LCL and it does not expect anything but strings. I agree... If you write the contents of a stringlist to a file you get bytes you have put into the list. Nothing more nothing less. Yes, but... If you want a UTF8 text file written, make sure you put UTF8 in it. Same for reading. When using filenames, you need to convert the name using the LCL function LCLtoSys (or something like that) ...in my example I didn't do that and worked, why? Because you added a UTF8 string to a stringlist. It was saved as UTF8 bytes (as Marc pointed out) and when later retrieved via a stringlist LoadFromFile call those bytes will be inserted into the stringlist just as they were saved, ready to display in UTF8 encoding. Howard -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
Jürgen Hestermann schrieb: Am 2012-10-24 08:58, schrieb Howard Page-Clark: > Internally the LCL uses only UTF8 encoding, > so stringlists etc. all expect strings in that > encoding and process them correctly when supplied > (as here) with a string from the IDE Editor which > is also UTF8. However the streaming code in SaveToFile > eventually calls some Windows system routine which > does not handle UTF8 and so requires conversion > from the UTF8 filename to be correctly handled. That's one of the most important drawbacks of Lazarus: The handling of strings is not consistent in all cases. There is the theory of having UTF8 but in practice you have to watch out because in *some* situations the UTF8 paradigma is *not* valid and you have to convert strings yourself. This will change with the new Unicode and Ansi strings, which have a defined Encoding. Once the Encoding comes, I'd vote for an dedicated TFileName string type, that matches the platform conventions. This type will allow to handle filenames in the platform specific part of the RTL easily, without too many string conversions. DoDi -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
On Wed, Oct 24, 2012 at 7:39 AM, Marc Weustink wrote: > Howard Page-Clark wrote: >> >> On 24/10/12 2:25, Marcos Douglas wrote: >>> >>> In the example below, running on Windows: >>> The name of file, after saved, is "atenção.txt" but the content is >>> "atenção". >>> I understand the filename, i.e, I need to use UTF8ToSys but I do not >>> understand the valid content. >>> >>> procedure TForm1.Button1Click(Sender: TObject); >>> var >>>lStrings: TStrings; >>> begin >>>lStrings := TStringList.Create; >>>lStrings.Text := 'atenção'; >>>lStrings.SaveToFile('c:\atenção.txt'); >>>lStrings.Free; >>> end; >> >> >> Internally the LCL uses only UTF8 encoding, so stringlists etc. all >> expect strings in that encoding and process them correctly when supplied > > > Nope. > A stringlist is not LCL and it does not expect anything but strings. I agree... > If you write the contents of a stringlist to a file you get bytes you have > put into the list. Nothing more nothing less. Yes, but... > If you want a UTF8 text file written, make sure you put UTF8 in it. > Same for reading. > When using filenames, you need to convert the name using the LCL function > LCLtoSys (or something like that) ...in my example I didn't do that and worked, why? For me, the same example should be written like that: procedure TForm1.Button1Click(Sender: TObject); var lStrings: TStrings; begin lStrings := TStringList.Create; lStrings.Text := UTF8ToSys('atenção'); lStrings.SaveToFile(UTF8ToSys('c:\atenção.txt')); lStrings.Free; end; But, as I said, I have not used the UTF8ToSys function in contents of strings, so this be should broken, right? BTW, I'm using FPC 2.6.1 and Lazarus trunk on Windows. Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
Marc Weustink wrote: A stringlist is not LCL and it does not expect anything but strings. If you write the contents of a stringlist to a file you get bytes you have put into the list. Nothing more nothing less. Thanks for the clarification. Juergen Hestermann wrote: > There is the theory of having UTF8 but in practice you have to > watch out because in *some* situations the UTF8 paradigm > is *not* valid and you have to convert strings yourself. > But where are these places? Mostly they are not even documented. There may be other places, but this applies at least to every RTL system call involving names (mainly file and directory routines). It's annoying, but not insurmountable. And when the new Unicode string is fully implemented these code-page-related woes will fade into history (well, that's the theory). -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
On 10/24/2012 12:39 PM, Marc Weustink wrote: A stringlist is not LCL and it does not expect anything but strings Yep And, in fpc, String right now (by default) is a sequence of 8 bit units (e.g. UTF-8 coded, but they are often called "ANSIString", and thus locale based ANSI coding is used as well). This (with locale based ANSI is inherited from and compatible with pre-2005 Delphi. I read discussions in the fpc devel mailing list that this might change to define String (by default) as a sequence of 16 bit units (e.g. UCS-2 / UFT-16 coded by default) and this upgrade to Delphi post 2005 compatibility On 10/24/2012, Jürgen Hestermann wrote: That's very frustrating. ... thus the frustration is bound to continue and increase. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
Howard Page-Clark wrote: On 24/10/12 2:25, Marcos Douglas wrote: In the example below, running on Windows: The name of file, after saved, is "atenção.txt" but the content is "atenção". I understand the filename, i.e, I need to use UTF8ToSys but I do not understand the valid content. procedure TForm1.Button1Click(Sender: TObject); var lStrings: TStrings; begin lStrings := TStringList.Create; lStrings.Text := 'atenção'; lStrings.SaveToFile('c:\atenção.txt'); lStrings.Free; end; Internally the LCL uses only UTF8 encoding, so stringlists etc. all expect strings in that encoding and process them correctly when supplied Nope. A stringlist is not LCL and it does not expect anything but strings. If you write the contents of a stringlist to a file you get bytes you have put into the list. Nothing more nothing less. If you want a UTF8 text file written, make sure you put UTF8 in it. Same for reading. When using filenames, you need to convert the name using the LCL function LCLtoSys (or something like that) Marc (as here) with a string from the IDE Editor which is also UTF8. However the streaming code in SaveToFile eventually calls some Windows system routine which does not handle UTF8 and so requires conversion from the UTF8 filename to be correctly handled. Howard -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
Am 2012-10-24 08:58, schrieb Howard Page-Clark: > Internally the LCL uses only UTF8 encoding, > so stringlists etc. all expect strings in that > encoding and process them correctly when supplied > (as here) with a string from the IDE Editor which > is also UTF8. However the streaming code in SaveToFile > eventually calls some Windows system routine which > does not handle UTF8 and so requires conversion > from the UTF8 filename to be correctly handled. That's one of the most important drawbacks of Lazarus: The handling of strings is not consistent in all cases. There is the theory of having UTF8 but in practice you have to watch out because in *some* situations the UTF8 paradigma is *not* valid and you have to convert strings yourself. But where are these places? Mostly they are not even documented. So you are left with trial and error (or ransacking the sources which are often not easy to understand). I often have the situation that these cases appear suddenly after using programs for a long time because it took so long until someone uses a file name with umlauts (or other non-ASCII characters). That's very frustrating. -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] TStrings, Windows and Unicode
On 24/10/12 2:25, Marcos Douglas wrote: In the example below, running on Windows: The name of file, after saved, is "atenção.txt" but the content is "atenção". I understand the filename, i.e, I need to use UTF8ToSys but I do not understand the valid content. procedure TForm1.Button1Click(Sender: TObject); var lStrings: TStrings; begin lStrings := TStringList.Create; lStrings.Text := 'atenção'; lStrings.SaveToFile('c:\atenção.txt'); lStrings.Free; end; Internally the LCL uses only UTF8 encoding, so stringlists etc. all expect strings in that encoding and process them correctly when supplied (as here) with a string from the IDE Editor which is also UTF8. However the streaming code in SaveToFile eventually calls some Windows system routine which does not handle UTF8 and so requires conversion from the UTF8 filename to be correctly handled. Howard -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
[Lazarus] TStrings, Windows and Unicode
Hi, In the example below, running on Windows: The name of file, after saved, is "atenção.txt" but the content is "atenção". I understand the filename, i.e, I need to use UTF8ToSys but I do not understand the valid content. procedure TForm1.Button1Click(Sender: TObject); var lStrings: TStrings; begin lStrings := TStringList.Create; lStrings.Text := 'atenção'; lStrings.SaveToFile('c:\atenção.txt'); lStrings.Free; end; Thanks, Marcos Douglas -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus