Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread wkitty42

On 04/04/2016 06:23 AM, tobiasgie...@gmail.com wrote:

OK, I just confirmed. Adding clocale to my 5-line test program doesn't
affect the DefaultSystemCodePage result, but as soon as I add cwstring to
the uses clause, then DefaultSystemCodePage returns 65001.


On Mac, not even cwstring does that. It sets the DefaultSystemCodePage to
20127.

So, on Mac, the DefaultSystemCodePage is not "typically" set to UTF_8. It is
never set to UTF_8 unless you do it yourself.


FWIW: i keep seeing the argument "on Mac" but never has the OS on that Mac been 
mentioned... AFAIK there is more than one OS for Mac or at least more than one 
version of the OS... it is possible that the default has been changed plus 
there's whatever was selected for the language during the installation... this 
really should be clarified for your Mac and its OS...


--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list* unless
   private contact is specifically requested and granted.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Seth Grover
> Not necessarily:
> if you are dealing with UTF8/UnicodeString and other codepages, it is
> quite likely, and preferred, that you have unit cwstring included.
> Michael.

Question about this: If I use cthreads, cwstring, and load my own memory
manager, what should the order in the uses clause be? I know if they are
loaded/unloaded in the wrong order you can get SIGSEGVs during shutdown.

>From my experience cthreads needs to be first. So should it be:

cthreads, my_memory_manager, cwstring

or

cthreads, cwstring, my_memory_manager?

Thanks,

-SG

--
Seth Grover
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Jonas Maebe


tobiasgiesen wrote on Mon, 04 Apr 2016:


> Terminal has LC_CTYPE=UTF-8.

What about LC_ALL?


My Mac OS installations do not have LC_ALL.

But I just noticed that Carbon GUI programs do not get LC_CTYPE in their
environment either.


If none of the environment variables related to code pages are set,  
FPC falls back to UTF-8 for (a.o.) OS X.



So maybe cwstring needs to be fixed for Carbon GUI Mac OS X programs.


If you get ASCII, it means that one of the LC_ALL, LC_CTYPE and/or  
LANG environment variables is set to a setting that corresponds to  
ASCII (such as "C"), or set to a value that is not recognised as or  
translatable into a Windows code page number.



What I see in the environment is
__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0

I think Carbon apps should override DefaultSytemCodePage, because the
Carbon interfaces always use UTF-8, they do not care about any
environment strings.


On OS X, unlike on Windows, there is no inherent difference between  
"GUI" (be it Carbon, Cocoa, or --most likely-- a mixture of the two)  
and "non-GUI" applications. You can have command line applications  
linking to a Carbon framework to deal with aliases, and a GUI  
application calling into libutil to open a pseudo tty.


The above environment variable is also unrelated to Carbon, but comes  
from CoreFoundation. 0x1F5 is the hexadecimal value of your user ID.  
At least one of the 0x0's indeed refers to the default/ansi encoding  
of CoreFoundation, but it's definitely not the value you want to use.  
It's the value of the MacRoman text encoding.


That said, FPC 3.1.1 also contains an OS X/iOS-specific widestring  
manager unit that you can use instead of cwstring (iosxwstr), and that  
one will always default to UTF-8 (because the "ansi" code page of  
CoreFoundation only makes sense from a classic Mac backward  
compatibility standpoint, which we don't have to care about because we  
don't have a legacy code base that depends on this default setting --  
if someone would want to port code that depends on this to FPC, they  
would have to set this themselves).



Jonas


Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 13:15, Sven Barth wrote:
> Qt uses UTF-16 as well...

I always thought that strange. After all, Qt was born as a Unix-type GUI
toolkit. Unless I got my facts wrong. Then again, it's only in recent
years that Unix-like systems moved to UTF-8. I think even FreeBSD didn't
use UTF-8 out of the box (until the last one or two releases).

Java also uses UTF-16. What I like about those two are that they only
have one string type. No confusion, but then they don't have as long a
history as Pascal, so no legacy code to worry about.


Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Sven Barth
Am 04.04.2016 13:21 schrieb "Graeme Geldenhuys" <
mailingli...@geldenhuys.co.uk>:
>
> On 2016-04-04 12:06, Michael Van Canneyt wrote:
> > 1. Using UTF8 is a choice of lazarus. Other people may prefer
UnicodeString.
> > On Windows, UnicodeString is more 'natural' or 'native'.
>
> Based on Internet standards and most popular OSes (mobile devices
> included), UTF-8 is kind - so we all know Windows backed the wrong horse
> [encoding]. ;-)
>
>[...Graeme runs and hides...]
>

Qt uses UTF-16 as well...

(and our company's OS uses UTF-32)

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Mon, 04 Apr 2016:


On Mon, 4 Apr 2016, Graeme Geldenhuys wrote:


[add LCL UTF-8 helper units to FPC]

Though it could probably be added as quick as in FPC 3.0.2. It's simply
two new units that need to be explicitly used by somebody to have any
affect, so it will not break existing code otherwise [if not used].


They should at least be renamed, to avoid confusion.

Other than that, I personally see no objections.


I do: it's more units that we have to maintain, process bug reports and 
feature requests for, etc (or, in case they are supposed to remain copies of 
the Lazarus units, then it's extra work keeping them in sync and given the 
non-synchronised release cycles, they will almost never be in sync). We 
already have plenty of work with our own code.


And that is why I wrote 'personally' :-)

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread tobiasgiesen
> > Terminal has LC_CTYPE=UTF-8.
> 
> What about LC_ALL? 

My Mac OS installations do not have LC_ALL.

But I just noticed that Carbon GUI programs do not get LC_CTYPE in their
environment either.

So maybe cwstring needs to be fixed for Carbon GUI Mac OS X programs.

What I see in the environment is
__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0

I think Carbon apps should override DefaultSytemCodePage, because the
Carbon interfaces always use UTF-8, they do not care about any
environment strings.

Cheers,
Tobias


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Graeme Geldenhuys wrote:


On 2016-04-04 12:06, Michael Van Canneyt wrote:

1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString.
On Windows, UnicodeString is more 'natural' or 'native'.


Based on Internet standards and most popular OSes (mobile devices
included), UTF-8 is kind - so we all know Windows backed the wrong horse
[encoding]. ;-)

  [...Graeme runs and hides...]



Well, in 2016, I still only use UTF-8, even on windows. 
It works without problems if you know what you're doing.




2. The release cycle of FPC is rather long, so updates will be available not
as fast as the lazarus team needs them.


That's a valid point.

Though it could probably be added as quick as in FPC 3.0.2. It's simply
two new units that need to be explicitly used by somebody to have any
affect, so it will not break existing code otherwise [if not used].


They should at least be renamed, to avoid confusion.

Other than that, I personally see no objections.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 12:06, Michael Van Canneyt wrote:
> 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString.
> On Windows, UnicodeString is more 'natural' or 'native'.

Based on Internet standards and most popular OSes (mobile devices
included), UTF-8 is kind - so we all know Windows backed the wrong horse
[encoding]. ;-)

   [...Graeme runs and hides...]



> 2. The release cycle of FPC is rather long, so updates will be available not
> as fast as the lazarus team needs them.

That's a valid point.

Though it could probably be added as quick as in FPC 3.0.2. It's simply
two new units that need to be explicitly used by somebody to have any
affect, so it will not break existing code otherwise [if not used].


Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Jonas Maebe


tobiasgiesen wrote on Mon, 04 Apr 2016:


How did you get a codepage 20127 Mac?


The Mac is UTF-8,


This statement makes no sense. There is no "UTF-8" or "non-UTF-8" Mac.  
The Unix environment on OS X can use any OS-supported code page.



but cwstring or whatever does not realize it.


cwstring simply sets the code page based on how it is defined in your  
environment.



Terminal has LC_CTYPE=UTF-8.


What about LC_ALL? LC_ALL overrides LC_CTYPE, because that is how the  
meaning of these environment variables is defined by POSIX (see e.g.  
http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html )



Well I will just set the default codepages manually.


Then you will probably be back in a few months with another message  
complaining how FPC 3.0 supposedly breaks X or Y, because you are  
merely hiding the real issue (e.g. when calling an external program,  
which may then try to interpret your UTF-8 command line arguments as  
plain ASCII).



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Graeme Geldenhuys wrote:


more complete solution for UTF-8. This is useful for many users. They
don't have to reinvent the wheel.


Not having looked at the two units you mentioned... but if this is a
general requirement for anybody using UTF-8 or similar with FPC 3.0,
then wouldn't it be best to see if those units can be contributed to
FPC's FCL? The ultimate "don't reinvent the wheel" location. ;-)


One would think so but:

1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString.
   On Windows, UnicodeString is more 'natural' or 'native'.

2. The release cycle of FPC is rather long, so updates will be available not
   as fast as the lazarus team needs them.
   And in view of 1. that may be a problem.

If memory serves well, there was initially an attempt to get some of the
functionality into FPC by Felipe, but this was quickly abandoned due to above
arguments...

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:34, Mattias Gaertner wrote:
> for that. In fact you don't have to use LazUtils: some users simply
> copied the two units FPCAdds and LazUTF8. It's all open source.

This was not made clear until you explicitly mentioned it. Juha's
initial comment was vague on the matter, and the original poster never
mentioned they used Lazarus or LCL.


> Second I find it funny that the statement comes from you

I simply wanted an answer or explanation that benefits anybody using FPC.


> more complete solution for UTF-8. This is useful for many users. They
> don't have to reinvent the wheel.

Not having looked at the two units you mentioned... but if this is a
general requirement for anybody using UTF-8 or similar with FPC 3.0,
then wouldn't it be best to see if those units can be contributed to
FPC's FCL? The ultimate "don't reinvent the wheel" location. ;-)


Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread tobiasgiesen
> How did you get a codepage 20127 Mac?

The Mac is UTF-8, but cwstring or whatever does not realize it.

Since I cannot easily step into it with the debugger, I can't tell you
why.

Terminal has LC_CTYPE=UTF-8.

Well I will just set the default codepages manually.

Cheers,
Tobias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 11:34:18 +0100
Graeme Geldenhuys  wrote:

> On 2016-04-04 11:23, tobiasgie...@gmail.com wrote:
> > On Mac, not even cwstring does that. It sets the DefaultSystemCodePage
> > to 20127.
> 
> I just installed FPC 3.0 on my Macbook Pro (bought in the UK) and did
> the same test. Here DefaultSystemCodePage returned 65001. So I guess it
> depends on your OSX installation and which default locale settings was
> set up during install.

All my Macs since 10.4 had UTF-8 as default and I can't remember a
setting during install to change it.

How did you get a codepage 20127 Mac?


Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:40, Mattias Gaertner wrote:
> Or simply copy the two units FPCAdds, LazUTF-8 or parts of them from
> here:

Thank you Juha and Mattias - I'll take a look at those to see what they do.

Regards,
  - Graeme -


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 13:27:05 +0300
Juha Manninen  wrote:

>[...]
> But yes, it requires Lazarus IDE because LazUtils is a Lazarus
> package. At least you must create and compile the project using
> Lazarus IDE.

Or simply copy the two units FPCAdds, LazUTF-8 or parts of them from
here:
http://svn.freepascal.org/svn/lazarus/tags/lazarus_1_6/components/lazutils/


Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:23, tobiasgie...@gmail.com wrote:
> On Mac, not even cwstring does that. It sets the DefaultSystemCodePage
> to 20127.

I just installed FPC 3.0 on my Macbook Pro (bought in the UK) and did
the same test. Here DefaultSystemCodePage returned 65001. So I guess it
depends on your OSX installation and which default locale settings was
set up during install.

Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 10:52:20 +0100
Graeme Geldenhuys  wrote:

> On 2016-04-04 10:27, Juha Manninen wrote:
> > Just use the new UTF-8 mode provided by Lazarus and remove all
> > explicit conversion functions.
> 
> This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so
> making such a suggestion is wishful thinking. For example, your
> suggestion means nothing to me, I don't use LCL.

First of all it's part of LazUtils. So you don't have to use the LCL
for that. In fact you don't have to use LazUtils: some users simply
copied the two units FPCAdds and LazUTF8. It's all open source.

Second I find it funny that the statement comes from you - a notorious
promoter of software on forums/lists of competing projects.

And third setting the DefaultSystemCodePage is a good start, but not
enough. Instead of explaining all the gory details, Juha promoted a
more complete solution for UTF-8. This is useful for many users. They
don't have to reinvent the wheel.


Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Graeme Geldenhuys wrote:


On 2016-04-04 10:43, Michael Van Canneyt wrote:


No. It says 'typically'.

That doesn't necessarily mean it is so, but is mostly correct.


It is still a very vague assumption. As I mentioned in my previous
reply, that statement is false on my FreeBSD system too. So I should
read the wiki as "typically incorrect" ;-)



No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ.
I know people that still use ISO-8895 (or something similar). Only the
programmer can know what is correct.


Just curious, how do you change the default codepage on a Linux system?
By exporting a new value to the LANG environment variable?


Yes. And various LC_ environment vars.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 12:52 PM, Graeme Geldenhuys
 wrote:
> This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so
> making such a suggestion is wishful thinking. For example, your
> suggestion means nothing to me, I don't use LCL.

Yes, I should have mentioned that this feature does not require LCL.
It only requires LazUtils package and LazUTF8 unit in your uses
section.
It can be used in cmd line and server programs and I guess in fpGUI,
too, although I have not tested.
But yes, it requires Lazarus IDE because LazUtils is a Lazarus
package. At least you must create and compile the project using
Lazarus IDE.

Anyway, this UTF-8 mode does more that sets the default String encoding.
It also provides proper UTF-8 functions as backends for RTL's
Ansi...() string functions.
It also uses cwstring although it pulls in clib.
Then typical users' code is amazingly Delphi compatible despite the
different encoding, because code only seldom deals with individual
codepoints beyond 7-bit ASCII.

Juha
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread tobiasgiesen
> OK, I just confirmed. Adding clocale to my 5-line test program doesn't
> affect the DefaultSystemCodePage result, but as soon as I add cwstring
> to the uses clause, then DefaultSystemCodePage returns 65001.

On Mac, not even cwstring does that. It sets the DefaultSystemCodePage
to 20127.

So, on Mac, the DefaultSystemCodePage is not "typically" set to UTF_8.
It is never set to UTF_8 unless you do it yourself.

Cheers,
Tobias


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:58, Jonas Maebe wrote:
> I have now updated the page to reflect this  
> fact.


Thanks Jonas, that's an important point to note.

Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:07, Michael Van Canneyt wrote:
> if you are dealing with UTF8/UnicodeString and other codepages, it is
> quite likely, and preferred, that you have unit cwstring included.

OK, I just confirmed. Adding clocale to my 5-line test program doesn't
affect the DefaultSystemCodePage result, but as soon as I add cwstring
to the uses clause, then DefaultSystemCodePage returns 65001.

I never use the UnicodeString type, and was under the impression that
only UnicodeString (and WideString) requires the cwstring unit to
function correctly. I use UTF-8 encoded text everywhere (stored inside
String), so never bothered with the cwstring unit. But clearly in FPC
3.0 it is vital to include cwstring if you do anything text related.


Something funny:
  I thought it really funny when Delphi introduced Unicode support, and
how everybody struggled. I thought: "what a mess". Now FPC developers
seem to be in the same boat. :-/


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:43, Michael Van Canneyt wrote:
> 
> No. It says 'typically'.
> 
> That doesn't necessarily mean it is so, but is mostly correct.

It is still a very vague assumption. As I mentioned in my previous
reply, that statement is false on my FreeBSD system too. So I should
read the wiki as "typically incorrect" ;-)


> No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. 
> I know people that still use ISO-8895 (or something similar). Only the
> programmer can know what is correct.

Just curious, how do you change the default codepage on a Linux system?
By exporting a new value to the LANG environment variable?



Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Graeme Geldenhuys wrote:


On 2016-04-04 10:39, tobiasgie...@gmail.com wrote:

It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
RTL uses
here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.


Indeed, and on my FreeBSD system DefaultSystemCodePage returns 0.

So the "typically" seems more like "unlikely in most cases".


Not necessarily: 
if you are dealing with UTF8/UnicodeString and other codepages, it is

quite likely, and preferred, that you have unit cwstring included.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:39, tobiasgie...@gmail.com wrote:
> It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
> RTL uses 
> here by default CP_UTF8."
> 
> Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Indeed, and on my FreeBSD system DefaultSystemCodePage returns 0.

So the "typically" seems more like "unlikely in most cases".

Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Mon, 04 Apr 2016:

Don't be too hasty in changing things. Jonas (who created that page) is 
usually very careful in such matters.


I did forget to mention that the Default*CodePage variables are only 
initialised with the "real" values on *nix platforms if you include a 
widestring manager unit. I have now updated the page to reflect this fact.


I will update the docs with such info for the upcoming 3.0.2.
It clearly needs mentioning...

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Jonas Maebe


Michael Van Canneyt wrote on Mon, 04 Apr 2016:

Don't be too hasty in changing things. Jonas (who created that page)  
is usually very careful in such matters.


I did forget to mention that the Default*CodePage variables are only  
initialised with the "real" values on *nix platforms if you include a  
widestring manager unit. I have now updated the page to reflect this  
fact.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:27, Juha Manninen wrote:
> Just use the new UTF-8 mode provided by Lazarus and remove all
> explicit conversion functions.

This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so
making such a suggestion is wishful thinking. For example, your
suggestion means nothing to me, I don't use LCL.

Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Jonas Maebe


tobiasgiesen wrote on Mon, 04 Apr 2016:


Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus


Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system  
codepage, so the RTL uses

here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.


That's because you don't use the cwstring unit nor another widestring  
manager. In that case, the DefaultSystemCodePage is set to ASCII  
because the default string conversion routines on Unix platforms do  
not support anything else (neither for converting from  
ansi/shortstring to wide/unicodestring, nor between ansistrings using  
different codepages). This is not new with FPC 3.0, it has always been  
like that.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:




On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:


Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus


Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
RTL uses
here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Should I fix this bit on the documentation page?


No. It says 'typically'.


That's the first part of the sentence.

The second part says: "the RTL uses here by default CP_UTF8."

That is wrong. It does not (at least not on Mac). This must be fixed.


Why do you think it is wrong ? 
It is perhaps wrong on your computer. Maybe it checks the environment ?

Did you include clocale ? etc. There are maybe a 100 things that can change
this.

Don't be too hasty in changing things. 
Jonas (who created that page) is usually very careful in such matters.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 11:35:53 +0200 (CEST)
Michael Van Canneyt  wrote:

>[...]
> >> Then no conversions will be done for all ansistrings that contain UTF8.
> >
> > And this really means AnsiString, not AnsiString(something).
> 
> The latter cannot contain UTF8 unless you do some really nasty tricks... :-)

UTF8String is type AnsiString(CP_UTF8) and if you mix that with
AnsiString the compiler adds conversions code, because at compile time
CP_ACP is not UTF-8.
These kind of traps confuse people.

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> 
> 
> On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:
> 
> >> Your question was not about Lazarus but maybe you should read this:
> >>   http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
> >
> > Very interesting, but apparently there is some wrong info.
> >
> > It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so 
> > the RTL uses
> > here by default CP_UTF8."
> >
> > Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.
> >
> > Should I fix this bit on the documentation page?
> 
> No. It says 'typically'.

That's the first part of the sentence.

The second part says: "the RTL uses here by default CP_UTF8."

That is wrong. It does not (at least not on Mac). This must be fixed.

Cheers,
Tobias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:


Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus


Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
RTL uses
here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Should I fix this bit on the documentation page?


No. It says 'typically'.

That doesn't necessarily mean it is so, but is mostly correct.



Apparently the LCL assumes that FCL sets it to UTF-8, but it does not.

I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux.


No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. 
I know people that still use ISO-8895 (or something similar). Only the

programmer can know what is correct.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> Your question was not about Lazarus but maybe you should read this:
>   http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus

Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
RTL uses 
here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Should I fix this bit on the documentation page?

Apparently the LCL assumes that FCL sets it to UTF-8, but it does not.

I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux.

Cheers,
Tobias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Mattias Gaertner wrote:


On Mon, 4 Apr 2016 10:32:58 +0200 (CEST)
Michael Van Canneyt  wrote:


[...]
You cannot, but you can set DefaultSystemCodePage to CP_UTF8.


I think it is important to note how to do this properly:

 SetMultiByteConversionCodePage(CP_UTF8);
 SetMultiByteRTLFileSystemCodePage(CP_UTF8);

You should add these lines in an early initialization section. The
beginning of your program might be too late.


Then no conversions will be done for all ansistrings that contain UTF8.


And this really means AnsiString, not AnsiString(something).


The latter cannot contain UTF8 unless you do some really nasty tricks... :-)

Michael
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 10:32:58 +0200 (CEST)
Michael Van Canneyt  wrote:

>[...]
> You cannot, but you can set DefaultSystemCodePage to CP_UTF8.

I think it is important to note how to do this properly:

  SetMultiByteConversionCodePage(CP_UTF8);
  SetMultiByteRTLFileSystemCodePage(CP_UTF8);

You should add these lines in an early initialization section. The
beginning of your program might be too late.

> Then no conversions will be done for all ansistrings that contain UTF8.

And this really means AnsiString, not AnsiString(something).

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 11:18 AM,   wrote:
> I use TStringList for UTF-8 strings. This is no longer possible, because
> automatic conversions cause question marks and data loss.

You are completely lost with this issue. The automatic conversion of
encodings is a big step forward.
Just use the new UTF-8 mode provided by Lazarus and remove all
explicit conversion functions.
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus

Juha
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 11:36 AM,   wrote:
> Sorry, I was not able to come to that conclusion from the existing docs.

Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
It works also without LCL.
The bottom line is: remove all explicit conversion functions.

Juha
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Jonas Maebe


tobiasgiesen wrote on Mon, 04 Apr 2016:


That please update the wiki - it is user editable.


Done:
http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility

I hope this is correct.


It is incorrect in the sense that there is nothing utf8-specific about  
the way your code (ab)used ansistrings. I will fix it, since that page  
is more or less part of the official FPC documentation (since it's  
linked from the FPC 3.0 release notes).



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
> That please update the wiki - it is user editable.

Done:
http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility

I hope this is correct.

Cheers,
Tobias


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 09:43, tobiasgie...@gmail.com wrote:
> Very theoretical. What you really need to tell
> people is something like this:

That please update the wiki - it is user editable. Even a seasoned
developers as myself still needs to get my head around all this FPC
Unicode stuff. So any information and tips on the wiki would be greatly
appreciated.

I haven't moved to FPC 3.0 yet, but when I do, I too will have lots of
testing to do in my own code. I don't use LCL, but but do currently
store UTF-8 text inside AnsiString's for years (on all platforms).

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
> > I use TStringList for UTF-8 strings. This is no longer possible, because
> > automatic conversions cause question marks and data loss.
> 
> Lazarus uses TStringList with UTF-8 all over the place.
> 
> Please post a complete example demonstrating the problem.

Sorry - this was only theoretical, because of the Backward compatibility
section on the FPC Unicode Support page.

It says that a "defined way" to use strings is "you do not store data in
an ansistring that has been encoded using something else than the
system's default code page, and subsequently pass this string as-is to
an FPC RTL routine".

That would mean I cannot use TStringList for UTF-8.  The paragraph is
misleading, really. Very theoretical. What you really need to tell
people is something like this:

"Unicode aware Pascal code needs to set DefaultSystemCodePage to
CP_UTF8".

I am sorry but I was really shocked this morning when I saw the question
marks :)

Cheers,
Tobias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> You cannot, but you can set DefaultSystemCodePage to CP_UTF8.
> Then no conversions will be done for all ansistrings that contain UTF8.

Fantastic. Many thanks. That fixes my problem entirely (I think).

Sorry, I was not able to come to that conclusion from the existing docs.

Cheers,
Tobias


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:


Hello,

disallowing "AnsiString" code for UTF-8 is a huge regression.

I use TStringList for UTF-8 strings. This is no longer possible, because
automatic conversions cause question marks and data loss.


Same answer as in my other mail. Set DefaultSystemCodePage to CP_UTF8.



I also use a large amount of third-party libraries that use the AnsiString
data type for UTF-8.

I really want to use FPC 3 due to other things, but this is a deal
breaker. Why not add a simple switch or even a run-time Boolean global
variable to turn off codepage conversions?

It behaves differently from Delphi too.


This depends on the version of Delphi :)

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Tobias Giesen wrote:


Hello,

my application uses the AnsiString type to store UTF-8 data. That was
totally fine. Now in FPC 3, automatic conversions cause data loss. I
get question marks replacing Chinese characters, for example.

I do not fully understand at which points these conversions are done.
The FPC 3 Unicode documentation says something about "passing it to a
RTL routine".

What about this code:
var a,b:Ansistring;
begin
 a:=Utf8Encode(AWideString);
 b:=Copy(a,1,10);
 end;

Is "Copy" an RTL routine? Is this OK or not?

Best for me would be to be able to turn the conversions off completely.


You cannot, but you can set DefaultSystemCodePage to CP_UTF8.
Then no conversions will be done for all ansistrings that contain UTF8.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 04 Apr 2016 10:18:18 +0200
tobiasgie...@gmail.com wrote:

> Hello,
> 
> disallowing "AnsiString" code for UTF-8 is a huge regression.
> 
> I use TStringList for UTF-8 strings. This is no longer possible, because
> automatic conversions cause question marks and data loss.

Lazarus uses TStringList with UTF-8 all over the place.

Please post a complete example demonstrating the problem.

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
Hello,

disallowing "AnsiString" code for UTF-8 is a huge regression.

I use TStringList for UTF-8 strings. This is no longer possible, because
automatic conversions cause question marks and data loss.

I also use a large amount of third-party libraries that use the AnsiString
data type for UTF-8.

I really want to use FPC 3 due to other things, but this is a deal
breaker. Why not add a simple switch or even a run-time Boolean global
variable to turn off codepage conversions?

It behaves differently from Delphi too. 

Cheers,
Tobias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Tobias Giesen
Hello,

my application uses the AnsiString type to store UTF-8 data. That was
totally fine. Now in FPC 3, automatic conversions cause data loss. I
get question marks replacing Chinese characters, for example.

I do not fully understand at which points these conversions are done.
The FPC 3 Unicode documentation says something about "passing it to a
RTL routine".

What about this code:
var a,b:Ansistring;
begin
  a:=Utf8Encode(AWideString);
  b:=Copy(a,1,10);
  end;

Is "Copy" an RTL routine? Is this OK or not?

Best for me would be to be able to turn the conversions off completely.
Is that possible?

Cheers,
Tobias


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal