Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Marcos Douglas
On Wed, Oct 24, 2012 at 11:24 AM, Luca Olivetti  wrote:
> Al 24/10/2012 16:14, En/na Howard Page-Clark ha escrit:
>
>> On 24/10/12 2:37, Marcos Douglas wrote:
>>
> ...in my example I didn't do that and worked, why?
>>>
>>>
>>> I had not explained very well. When I said the file is Ok I mean the
>>> file on Windows Explorer, not in LCL.
>>> Do the test and you see.
>>
>>
>> I guess the stringlist streaming saves the string with a BOM, and so
>> Explorer then recognises the encoding?
>
>
>
> There's no BOM, but notepad is capable of showing it correctly. The problem
> is, if you then save the file from notepad it will add a BOM, that will mess
> up the loading of the file in a stringlist (or, in my case, in a TIniFile).

Explained well. I tested and it's correct, thank you.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Luca Olivetti

Al 24/10/2012 16:14, En/na Howard Page-Clark ha escrit:

On 24/10/12 2:37, Marcos Douglas wrote:


...in my example I didn't do that and worked, why?


I had not explained very well. When I said the file is Ok I mean the
file on Windows Explorer, not in LCL.
Do the test and you see.


I guess the stringlist streaming saves the string with a BOM, and so
Explorer then recognises the encoding?



There's no BOM, but notepad is capable of showing it correctly. The 
problem is, if you then save the file from notepad it will add a BOM, 
that will mess up the loading of the file in a stringlist (or, in my 
case, in a TIniFile).


Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es
Tel. +34 935883004  Fax +34 935883007

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Howard Page-Clark

On 24/10/12 2:37, Marcos Douglas wrote:


...in my example I didn't do that and worked, why?


I had not explained very well. When I said the file is Ok I mean the
file on Windows Explorer, not in LCL.
Do the test and you see.


I guess the stringlist streaming saves the string with a BOM, and so 
Explorer then recognises the encoding?


Howard


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Marcos Douglas
On Wed, Oct 24, 2012 at 10:27 AM, Howard Page-Clark  wrote:
> On 24/10/12 12:40, Marcos Douglas wrote:
>
>>> Nope.
>>> A stringlist is not LCL and it does not expect anything but strings.
>>
>>
>> I agree...
>>
>>> If you write the contents of a stringlist to a file you get bytes you
>>> have
>>> put into the list. Nothing more nothing less.
>>
>>
>> Yes, but...
>>
>>> If you want a UTF8 text file written, make sure you put UTF8 in it.
>>> Same for reading.
>>> When using filenames, you need to convert the name using the LCL function
>>> LCLtoSys (or something like that)
>>
>>
>> ...in my example I didn't do that and worked, why?
>
>
> Because you added a UTF8 string to a stringlist. It was saved as UTF8 bytes
> (as Marc pointed out) and when later retrieved via a stringlist LoadFromFile
> call those bytes will be inserted into the stringlist just as they were
> saved, ready to display in UTF8 encoding.

I had not explained very well. When I said the file is Ok I mean the
file on Windows Explorer, not in LCL.
Do the test and you see.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Marcos Douglas
On Wed, Oct 24, 2012 at 10:58 AM, Hans-Peter Diettrich
 wrote:
> Jürgen Hestermann schrieb:
>
>> Am 2012-10-24 08:58, schrieb Howard Page-Clark:
>>  > Internally the LCL uses only UTF8 encoding,
>>  > so stringlists etc. all expect strings in that
>>  > encoding and process them correctly when supplied
>>  > (as here) with a string from the IDE Editor which
>>  > is also UTF8. However the streaming code in SaveToFile
>>  > eventually calls some Windows system routine which
>>  > does not handle UTF8 and so requires conversion
>>  > from the UTF8 filename to be correctly handled.
>>
>>
>> That's one of the most important drawbacks of Lazarus:
>> The handling of strings is not consistent in all cases.
>> There is the theory of having UTF8 but in practice you have to
>> watch out because in *some* situations the UTF8 paradigma
>> is *not* valid and you have to convert strings yourself.
>
>
> This will change with the new Unicode and Ansi strings, which have a defined
> Encoding.
>
> Once the Encoding comes, I'd vote for an dedicated TFileName string type,
> that matches the platform conventions. This type will allow to handle
> filenames in the platform specific part of the RTL easily, without too many
> string conversions.

+1

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Howard Page-Clark

On 24/10/12 12:40, Marcos Douglas wrote:


Nope.
A stringlist is not LCL and it does not expect anything but strings.


I agree...


If you write the contents of a stringlist to a file you get bytes you have
put into the list. Nothing more nothing less.


Yes, but...


If you want a UTF8 text file written, make sure you put UTF8 in it.
Same for reading.
When using filenames, you need to convert the name using the LCL function
LCLtoSys (or something like that)


...in my example I didn't do that and worked, why?


Because you added a UTF8 string to a stringlist. It was saved as UTF8 
bytes (as Marc pointed out) and when later retrieved via a stringlist 
LoadFromFile call those bytes will be inserted into the stringlist just 
as they were saved, ready to display in UTF8 encoding.


Howard


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Hans-Peter Diettrich

Jürgen Hestermann schrieb:

Am 2012-10-24 08:58, schrieb Howard Page-Clark:
 > Internally the LCL uses only UTF8 encoding,
 > so stringlists etc. all expect strings in that
 > encoding and process them correctly when supplied
 > (as here) with a string from the IDE Editor which
 > is also UTF8. However the streaming code in SaveToFile
 > eventually calls some Windows system routine which
 > does not handle UTF8 and so requires conversion
 > from the UTF8 filename to be correctly handled.


That's one of the most important drawbacks of Lazarus:
The handling of strings is not consistent in all cases.
There is the theory of having UTF8 but in practice you have to
watch out because in *some* situations the UTF8 paradigma
is *not* valid and you have to convert strings yourself.


This will change with the new Unicode and Ansi strings, which have a 
defined Encoding.


Once the Encoding comes, I'd vote for an dedicated TFileName string 
type, that matches the platform conventions. This type will allow to 
handle filenames in the platform specific part of the RTL easily, 
without too many string conversions.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Marcos Douglas
On Wed, Oct 24, 2012 at 7:39 AM, Marc Weustink  wrote:
> Howard Page-Clark wrote:
>>
>> On 24/10/12 2:25, Marcos Douglas wrote:
>>>
>>> In the example below, running on Windows:
>>> The name of file, after saved, is "atenção.txt" but the content is
>>> "atenção".
>>> I understand the filename, i.e, I need to use UTF8ToSys but I do not
>>> understand the valid content.
>>>
>>> procedure TForm1.Button1Click(Sender: TObject);
>>> var
>>>lStrings: TStrings;
>>> begin
>>>lStrings := TStringList.Create;
>>>lStrings.Text := 'atenção';
>>>lStrings.SaveToFile('c:\atenção.txt');
>>>lStrings.Free;
>>> end;
>>
>>
>> Internally the LCL uses only UTF8 encoding, so stringlists etc. all
>> expect strings in that encoding and process them correctly when supplied
>
>
> Nope.
> A stringlist is not LCL and it does not expect anything but strings.

I agree...

> If you write the contents of a stringlist to a file you get bytes you have
> put into the list. Nothing more nothing less.

Yes, but...

> If you want a UTF8 text file written, make sure you put UTF8 in it.
> Same for reading.
> When using filenames, you need to convert the name using the LCL function
> LCLtoSys (or something like that)

...in my example I didn't do that and worked, why?

For me, the same example should be written like that:

procedure TForm1.Button1Click(Sender: TObject);
var
  lStrings: TStrings;
begin
  lStrings := TStringList.Create;
  lStrings.Text := UTF8ToSys('atenção');
  lStrings.SaveToFile(UTF8ToSys('c:\atenção.txt'));
  lStrings.Free;
end;

But, as I said, I have not used the UTF8ToSys function in contents of
strings, so this be should broken, right?

BTW, I'm using FPC 2.6.1 and Lazarus trunk on Windows.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Howard Page-Clark

Marc Weustink wrote:


A stringlist is not LCL and it does not expect anything but strings.
If you write the contents of a stringlist to a file you get bytes you
have put into the list. Nothing more nothing less.


Thanks for the clarification.

Juergen Hestermann wrote:

> There is the theory of having UTF8 but in practice you have to
> watch out because in *some* situations the UTF8 paradigm
> is *not* valid and you have to convert strings yourself.
> But where are these places? Mostly they are not even documented.

There may be other places, but this applies at least to every RTL system 
call involving names (mainly file and directory routines). It's 
annoying, but not insurmountable. And when the new Unicode string is 
fully implemented these code-page-related woes will fade into history 
(well, that's the theory).



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Michael Schnell

On 10/24/2012 12:39 PM, Marc Weustink wrote:


A stringlist is not LCL and it does not expect anything but strings

Yep

And, in fpc, String right now (by default) is a sequence of 8 bit units 
(e.g. UTF-8 coded, but they are often called "ANSIString", and thus 
locale based ANSI coding is used as well). This (with locale based ANSI 
is inherited from and compatible with pre-2005 Delphi.


I read discussions in the fpc devel mailing list that this might change 
to define String (by default) as a sequence of 16 bit units (e.g. UCS-2 
/ UFT-16 coded by default) and this upgrade to Delphi post 2005 
compatibility




On 10/24/2012, Jürgen Hestermann wrote:
That's very frustrating. 


... thus the frustration is bound to continue and increase.

-Michael


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Marc Weustink

Howard Page-Clark wrote:

On 24/10/12 2:25, Marcos Douglas wrote:

In the example below, running on Windows:
The name of file, after saved, is "atenção.txt" but the content is
"atenção".
I understand the filename, i.e, I need to use UTF8ToSys but I do not
understand the valid content.

procedure TForm1.Button1Click(Sender: TObject);
var
   lStrings: TStrings;
begin
   lStrings := TStringList.Create;
   lStrings.Text := 'atenção';
   lStrings.SaveToFile('c:\atenção.txt');
   lStrings.Free;
end;


Internally the LCL uses only UTF8 encoding, so stringlists etc. all
expect strings in that encoding and process them correctly when supplied


Nope.
A stringlist is not LCL and it does not expect anything but strings.
If you write the contents of a stringlist to a file you get bytes you 
have put into the list. Nothing more nothing less.

If you want a UTF8 text file written, make sure you put UTF8 in it.
Same for reading.
When using filenames, you need to convert the name using the LCL 
function LCLtoSys (or something like that)


Marc



(as here) with a string from the IDE Editor which is also UTF8. However
the streaming code in SaveToFile eventually calls some Windows system
routine which does not handle UTF8 and so requires conversion from the
UTF8 filename to be correctly handled.

Howard


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus




--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-24 Thread Jürgen Hestermann

Am 2012-10-24 08:58, schrieb Howard Page-Clark:
> Internally the LCL uses only UTF8 encoding,
> so stringlists etc. all expect strings in that
> encoding and process them correctly when supplied
> (as here) with a string from the IDE Editor which
> is also UTF8. However the streaming code in SaveToFile
> eventually calls some Windows system routine which
> does not handle UTF8 and so requires conversion
> from the UTF8 filename to be correctly handled.


That's one of the most important drawbacks of Lazarus:
The handling of strings is not consistent in all cases.
There is the theory of having UTF8 but in practice you have to
watch out because in *some* situations the UTF8 paradigma
is *not* valid and you have to convert strings yourself.

But where are these places? Mostly they are not even documented.
So you are left with trial and error (or ransacking the sources
which are often not easy to understand).
I often have the situation that these cases appear suddenly after
using programs for a long time because it took so long until someone
uses a file name with umlauts (or other non-ASCII characters).
That's very frustrating.



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] TStrings, Windows and Unicode

2012-10-23 Thread Howard Page-Clark

On 24/10/12 2:25, Marcos Douglas wrote:

In the example below, running on Windows:
The name of file, after saved, is "atenção.txt" but the content is "atenção".
I understand the filename, i.e, I need to use UTF8ToSys but I do not
understand the valid content.

procedure TForm1.Button1Click(Sender: TObject);
var
   lStrings: TStrings;
begin
   lStrings := TStringList.Create;
   lStrings.Text := 'atenção';
   lStrings.SaveToFile('c:\atenção.txt');
   lStrings.Free;
end;


Internally the LCL uses only UTF8 encoding, so stringlists etc. all 
expect strings in that encoding and process them correctly when supplied 
(as here) with a string from the IDE Editor which is also UTF8. However 
the streaming code in SaveToFile eventually calls some Windows system 
routine which does not handle UTF8 and so requires conversion from the 
UTF8 filename to be correctly handled.


Howard


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


[Lazarus] TStrings, Windows and Unicode

2012-10-23 Thread Marcos Douglas
Hi,

In the example below, running on Windows:
The name of file, after saved, is "atenção.txt" but the content is "atenção".
I understand the filename, i.e, I need to use UTF8ToSys but I do not
understand the valid content.

procedure TForm1.Button1Click(Sender: TObject);
var
  lStrings: TStrings;
begin
  lStrings := TStringList.Create;
  lStrings.Text := 'atenção';
  lStrings.SaveToFile('c:\atenção.txt');
  lStrings.Free;
end;

Thanks,

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus