Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Michael W. Vogel
Am 31.03.2016 um 17:04 schrieb Juha Manninen: Anyway, the original issue was about inserting {codepage UTF8} automatically to every unit. We can conclude it is not a good idea. It does not solve anything when using plain constants with default String type but adds conversion overhead. It

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Michael W. Vogel
Am 31.03.2016 um 12:44 schrieb Mattias Gaertner: On Thu, 31 Mar 2016 00:16:13 +0200 "Michael W. Vogel" wrote: [...] I've tested the example too and I got different results with different options. The test was: - BOM / no BOM at the beginning of the sourcefile - {$codepage

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Bart
On 3/31/16, Juha Manninen wrote: >> In my fantasy scenario the String would of course have the meaning of >> UnicodeString. > > That is not anyhow better (or worse) inherently than a UTF-8 based > solution. No, but I don't see fpc moving towards String equals

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Juha Manninen
On Thu, Mar 31, 2016 at 5:20 PM, Bart wrote: > In my fantasy scenario the String would of course have the meaning of > UnicodeString. That is not anyhow better (or worse) inherently than a UTF-8 based solution. Delphi just happened to implement it so, for various reasons.

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Bart
On 3/31/16, Juha Manninen wrote: > I doubt you will change every "String" into "UnicodeString" in your code. > Somehow you missed the fundamental idea of our new Unicode system. > "String" has Unicode and you don't need to care about it, or even > about endianess. >

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Juha Manninen
On Thu, Mar 31, 2016 at 4:25 PM, Bart wrote: > in this scenario adding {$codepage utf8} may be the wise thing to do: > it eliminates all confusion about the intended encoding of the string > constant. How is a conversion to UTF-16 and then back to UTF-8 less confusing than

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Mattias Gaertner
On Thu, 31 Mar 2016 15:25:03 +0200 Bart wrote: > On 3/31/16, Mattias Gaertner wrote: > > >> Will all this mess go away if we would go the Delphi way > >> (String=UnicodeString)? > >> (I know *nix users are going to hate me now) > > > > Which

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Bart
On 3/31/16, Mattias Gaertner wrote: >> Will all this mess go away if we would go the Delphi way >> (String=UnicodeString)? >> (I know *nix users are going to hate me now) > > Which mess do you mean? > As long as you have to consider codepages, you can get a mess. When

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Mattias Gaertner
On Thu, 31 Mar 2016 14:32:27 +0200 Bart wrote: > On 3/31/16, Mattias Gaertner wrote: >[...] > So, when my usecase for string constants with diacritics in real life > most of the time is just captions for buttons/menu's etc., the extra > overhead

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Bart
On 3/31/16, Mattias Gaertner wrote: >> AFAIK the IDE does not save the file with a BOM, so the compiler may >> very well decide that my sourcefile has ACP codepage? > > Yes and no. > When the compiler assumes ACP, it treats the string special. It does > not convert it

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Mattias Gaertner
On Wed, 30 Mar 2016 18:16:32 +0200 Bart wrote: >[...] > > Any valid UTF-8 string should work, including diacritics. > Without the codepage identier? Yes, if you use LazUTF8. If you don't use LazUTF8 and assign a literal to a UnicodeString you need the codepage. > Quote

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Mattias Gaertner
On Thu, 31 Mar 2016 00:16:13 +0200 "Michael W. Vogel" wrote: >[...] > I've tested the example too and I got different results with different > options. The test was: > - BOM / no BOM at the beginning of the sourcefile > - {$codepage UTF8} or not The compiler understands

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Mattias Gaertner
On Thu, 31 Mar 2016 01:20:14 +0200 Bart wrote: >[...] > I was wondering why DefaultSystemCodepage would return CP_ACP on > Graemes FreeBsd with an UTF8 locale? The problem only exists on Windows (more exact: OS with system codepage<>CP_UTF8). Mattias --

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Mattias Gaertner
On Thu, 31 Mar 2016 00:26:44 +0200 Bart wrote: > On 3/30/16, Juha Manninen wrote: >[...] > I think the statement in the wiki that {$codepage utf8} is not needed is > wrong. You can use UTF-8 without the {$codepage utf8}. But there are cases

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-31 Thread Sven Barth
Am 31.03.2016 00:48 schrieb "Bart" : > > On 3/31/16, Graeme Geldenhuys wrote: > > > [~]$ echo $LANG > > en_GB.UTF-8 > > This is what I hink is happening to your test (Sven can probably > explain it better): Jonas would probably be a better

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Bart
On 3/31/16, Maxim Ganetsky wrote: > Lazarus switches DefaultSystemCodePage to 65001, so your example works > OK here without codepage directive (when inserted into LCL dependent > project, of course). To me it is unclear wether "your example" refers to Graeme or to me. Either

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Maxim Ganetsky
31.03.2016 1:48, Bart пишет: On 3/31/16, Graeme Geldenhuys wrote: [~]$ echo $LANG en_GB.UTF-8 This is what I hink is happening to your test (Sven can probably explain it better): Since your locale is UTF8, CP_ACPand CP_UTF8 refer to the same codepage,

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Bart
On 3/31/16, Graeme Geldenhuys wrote: > [~]$ echo $LANG > en_GB.UTF-8 This is what I hink is happening to your test (Sven can probably explain it better): Since your locale is UTF8, CP_ACPand CP_UTF8 refer to the same codepage, therefor the contents of S1 in

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Graeme Geldenhuys
On 2016-03-30 23:29, Bart wrote: > DefaultSystemCodePage = 0? > > What system are you on (Linux, Windows) and what locale? 64-bit FreeBSD 10.1 with English (UK) locale. I also used FPC 3.0.0 released compiler installed from the official .tar file. [~]$ echo $LANG en_GB.UTF-8 My test program

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Bart
On 3/30/16, Graeme Geldenhuys wrote: > Just thought I would let you know that with or without the {$codepage > utf8}, your code works just fine here. Source code is saved in a UTF-8 > encoding with no BOM marker. > >

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Bart
On 3/30/16, Juha Manninen wrote: > If your "s1" is a plain String then something has changed. IIRC it worked > well. It is a plain string. And it behaves like the quote said. The compiler treats my sourcefile as ACP > I am out of energy for the string encoding issue

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Graeme Geldenhuys
On 2016-03-30 17:16, Bart wrote: > Without {$codepage utf8} it outputs: > DefaultSystemcodePage = 1252 > TestUtf8 = $C3 $84 $41 $C3 $84 > S1 = $C3 $84 $41 $C3 $84 [0] > Ã"AÃ" > > The compiler treats my source as if it were written in my system's codepage. > With cp1552 S1 now contains

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Juha Manninen
On Wed, Mar 30, 2016 at 7:16 PM, Bart wrote: > [...] > I would say that this experiment contradicts the statement in > http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals > ? If your "s1" is a plain String then something has changed. IIRC it worked

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Bart
On 3/30/16, Juha Manninen wrote: > Do your files have UTF-8 encoding? It is a necessity for the Unicode > system to work. Yes, all my code is either from Lazarus or from my own editor (which is a synedit). > Any valid UTF-8 string should work, including diacritics.

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Juha Manninen
On Wed, Mar 30, 2016 at 3:12 PM, Michael W. Vogel wrote: >> The cases fail with UTF-8 file encoding. > I don't understand this. I meant that some cases fail even when the file encoding is UTF-8. File encoding is not the issue. >>

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Sven Barth
Am 30.03.2016 11:23 schrieb "Juha Manninen" : > > Ok, FPC had UnicodeString earlier than I remembered. > Currently WideString is often used with WinAPI when UnicodeString > should be used, as Marco reminded in another discussion. The WinAPI does not know UnicodeString.

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Michael W. Vogel
Am 30.03.2016 um 13:02 schrieb Juha Manninen: Conversions in your testproject may work, but you ignored the forum link I gave earlier. There "malcome" gave examples that fail. I don't ignored it, but I'm not so fast to test the examples there. I'll try the examples there for myself with and

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Juha Manninen
On Wed, Mar 30, 2016 at 1:03 PM, Bart wrote: > The IDE at least runs fine (in my locale on Windows) with -FcUTF8. Lazarus IDE does not have string constants beyond 7-bit ASCII. Encoding does not matter obviously. > (I have it there because I build all my projects with this

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Juha Manninen
On Wed, Mar 30, 2016 at 12:38 PM, Michael W. Vogel wrote: > With the hack that the LCL makes and the added {$codepage UTF8} all > conversions work like a charm (see added testproject). Conversions in your testproject may work, but you ignored the forum link I gave earlier.

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Bart
On 3/30/16, Mattias Gaertner wrote: > LCL applications nowadays use CP_UTF8 as default. We (laz team) tested > adding > -FcUTF8 and it failed in too many cases. Also it adds some overhead. So we > decided to *not* add it by default. The IDE at least runs fine (in my

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Michael W. Vogel
Am 30.03.2016 um 10:13 schrieb Juha Manninen: I don't know what is a a Predefined String Sorry, I was not clear. I mean a string with a declared codepage http://wiki.freepascal.org/FPC_Unicode_support#Declared_code_page -- ___ Lazarus mailing list

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Michael W. Vogel
Am 30.03.2016 um 10:11 schrieb Mattias Gaertner: You have to distinguish between with CP_UTF8 as default and with CP_ACP as default. Yes, I know it. What I mean with default: Go to Project -> New Project ... -> Application Now a new Application is created. With the added patch {$codepage

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Martin Schreiber
On Wednesday 30 March 2016 11:23:49 Juha Manninen wrote: > > If one wants to handle BMP-chars comfortably and with good performance > > one has to convert from utf-8 in AnsiString to UnicodeString first. > > Maybe, but BMP-chars are not enough for a proper Unicode support. But they are enough to

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Juha Manninen
Ok, FPC had UnicodeString earlier than I remembered. Currently WideString is often used with WinAPI when UnicodeString should be used, as Marco reminded in another discussion. Anyway, the problems found by Michael W. Vogel and "malcome" all deal with constants. Assignment between variables always

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Martin Schreiber
On Wednesday 30 March 2016 10:13:36 Juha Manninen wrote: > With Unicodestring we don't need to care about backwards compatibility > really because it is so new type. Ouch! WideString has been introduced in Delphi 4 IIRC, FPC had an on all platforms reference counted 16-bit string which worked

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Juha Manninen
No, originally we had -FcUTF8 set by default but it caused more problems. See: http://forum.lazarus.freepascal.org/index.php?topic=30022 > In the most cases the string magic works without a defined {$codepage utf8}, > but not if you want to assign a const to a Predefined String or

Re: [Lazarus] Feature Request: Insert {codepage UTF8} per default

2016-03-30 Thread Mattias Gaertner
> "Michael W. Vogel" hat am 29. März 2016 um 23:20 > geschrieben: >[...] > I'm thinking about the thread here > http://forum.lazarus.freepascal.org/index.php/topic,31939.msg206688.html#msg206688[...] > In the most cases the string magic works without a defined {$codepage