Re: [fpc-devel] utf8 in 2.6.0

2013-01-07 Thread Frank Church
On 5 January 2013 13:39, Mattias Gaertner nc-gaert...@netcologne.de wrote:

 On Sat, 5 Jan 2013 13:06:42 +
 Frank Church vfcli...@gmail.com wrote:

 [...]
  It is obvious that Unicode is not a simple topic and among FPC/Lazarus
  developers/contributors,I suspect that few if any at all, have a detailed
  grasp of how it all hangs together in the current state of
 implementation.
  It brings to mind the parable of the 12 blind men and the elephant.

 The FPC and Lazarus UTF details are not that difficult. The
 complexity comes from adding Delphi *, third party libraries and
 old FPC, Lazarus versions.


  I think a diagram or graph of Unicode rules and their current state of
  implementation in FPC/Lazarus would go a long way to helping both
  developers and end users in this area. It is a topic which comes up
  regularly and it doesn't show signs of ever going to be properly
 resolved.

 For Lazarus:
 - works with fpc 2.6.x and 2.7.1
 - LCL and most code expect ansistrings to hold UTF-8.
 - pascal sources, lfm, po files are stored in UTF-8 without BOM.
   Special care has to be taken, when using widestrings/unicodestring.
 - there are UTF-8 functions and classes (most in package lazutils).
 - the IDE supports many encodings
 - all this is documented via wiki and fpdoc
 - no support for UTF-16 has been started


 [...]

 Mattias
 ___
 fpc-devel maillist  -  fpc-devel@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-devel



Glad to hear this.

-- 
Frank Church

===
http://devblog.brahmancreations.com
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-07 Thread Hans-Peter Diettrich

Martin Schreiber schrieb:

but I fear we can not use that information for development with Free Pascal 
because:


The string is represented internally as a Unicode string encoded as UTF-16. 
Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters 
not in the BMP require 4 bytes.


and

A control string is a sequence of one or more control characters, each of 
which consists of the # symbol followed by an unsigned integer constant from 
0 to 65,535 (decimal) or from $0 to $ (hexadecimal) in UTF-16 encoding, 
and denotes the character corresponding to a specified code value. Each 
integer is represented internally by 2 bytes in the string. This is useful 
for representing control characters and multibyte characters.


which seems to be different from Free Pascal.


Correction:

You're right, Delphi treats control characters as UTF-16 codes, where 
FPC treats them as byte values (if less than 256).


I noticed the possible problem already, that the FPC interpretation of 
control characters is context sensitive. This leads to write-only code, 
because a change of the $codepage would require to change all control 
characters in that unit accordingly. This in addition to the removal or 
addition of control characters  255, which also lead to a different 
interpretation of the remaining control characters *and* to a different 
internal representation.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Jonas Maebe

On 05 Jan 2013, at 10:29, Martin Schreiber wrote:

 Are these stupid questions?

No, but I seem to be unable to explain how it works since you keep asking about 
things I already tried to explain before, but I clearly failed to do properly. 
I can keep repeating myself, but I'm not sure whether that will help anyone.

For example, I said that basically nothing changed in 2.7.x compared to 2.6.x, 
except that certain string constants are no longer automatically converted to 
utf-16 at compile time, and then you ask Or should we not touch the theme 
strings and FPC anymore?. Since basically nothing changed except for a few 
less blind auto-conversions at compile time, why should you no longer be able 
to touch anything anymore?

Let me repeat: your string constants will be parsed by the compiler into 
character sequences with exactly the same content in both 2.6.x and 2.7.x (and 
with content I mean that if they would be converted to the same code page in 
2.6.x and in 2.7.x, you would end up with exactly the same binary data). 
Whether or not they contain character literals whose value is #127 in the 
source code's code page, or explicit #xx, #xxx etc expressions has no 
influence, nothing changed in the compiler in that account.

The *only* difference is that the compiler can now internally represent 
ansistrings with arbitrary code pages, and as a result the aforementioned 
character sequences may now be stored internally in the compiler in a different 
format, and also stored in the program in a different format if that can avoid 
conversions at run time. In particular, character sequences are no longer all 
converted immediately/by default/under all circumstances to UTF-16 in case 
characters #127 need to be interpreted according to a particular code page 
(i.e., if a {$codepage xxx} directive is present).

The compiler will now only convert such character sequences to UTF-16, still at 
compile time (just like it was able to do in 2.6.x), if it is actually assigned 
to an UTF-16-encoded string, passed to an UTF-16 parameter etc. And the 
compiler will also convert it to another ansistring code page is case the 
character sequence appeared in e.g. a file with {$codepage utf-8} and is then 
assigned to a variable whose type is declared as type ansistring(850).


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Sven Barth

On 05.01.2013 11:30, Jonas Maebe wrote:

For example, I said that basically nothing changed in 2.7.x compared to 2.6.x, except 
that certain string constants are no longer automatically converted to utf-16 at compile 
time, and then you ask Or should we not touch the theme strings and FPC 
anymore?. Since basically nothing changed except for a few less blind 
auto-conversions at compile time, why should you no longer be able to touch anything 
anymore?


I think it was more meant in the context of the mailing list instead of 
a technical context. Like in we had this topic a thousand times and 
maybe it's better we shut up about it now before we get moderated. 
Though I could not see where you, Jonas seemed upset about Martin's 
questions... (and in my opinion the answers cleared up many things - at 
least for me :) )


Regards,
Sven


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Martin Schreiber
On Saturday 05 January 2013 11:42:29 Sven Barth wrote:
 On 05.01.2013 11:30, Jonas Maebe wrote:
  For example, I said that basically nothing changed in 2.7.x compared to
  2.6.x, except that certain string constants are no longer automatically
  converted to utf-16 at compile time, and then you ask Or should we not
  touch the theme strings and FPC anymore?. Since basically nothing
  changed except for a few less blind auto-conversions at compile time, why
  should you no longer be able to touch anything anymore?

 I think it was more meant in the context of the mailing list instead of
 a technical context. Like in we had this topic a thousand times and
 maybe it's better we shut up about it now before we get moderated.

Correct. :-)

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Jonas Maebe

On 05 Jan 2013, at 12:16, Martin Schreiber wrote:

 On Saturday 05 January 2013 11:42:29 Sven Barth wrote:
 I think it was more meant in the context of the mailing list instead of
 a technical context. Like in we had this topic a thousand times and
 maybe it's better we shut up about it now before we get moderated.
 
 Correct. :-)

Then maybe I should just stop completely answering any questions about this, 
because apparently not answering completely enough to your liking gets 
interpreted as telling you to shut up or getting moderated. Just like Sven I 
don't understand where this interpretation comes from, and I strongly resent 
it. I didn't answer because I thought the information was all in my previous 
answers already, and if someone else felt they could clarify it better than I 
did, they were free to do so.

My time is also finite, and trying to get me to elaborate further by getting my 
on my high horse because I feel I'm being misrepresented, is something that 
will not work very well in the long term. It will much more likely result in 
silence than in more help.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Martin Schreiber
On Saturday 05 January 2013 12:39:21 Jonas Maebe wrote:

 Then maybe I should just stop completely answering any questions about
 this, because apparently not answering completely enough to your liking
 gets interpreted as telling you to shut up or getting moderated. Just like
 Sven I don't understand where this interpretation comes from, and I
 strongly resent it. I didn't answer because I thought the information was
 all in my previous answers already, and if someone else felt they could
 clarify it better than I did, they were free to do so.

 My time is also finite, and trying to get me to elaborate further by
 getting my on my high horse because I feel I'm being misrepresented, is
 something that will not work very well in the long term. It will much more
 likely result in silence than in more help.

No, no, you understand me wrong. I am merely cautious not to annoy the FPC 
team, please accept my apology, but I need to decide if FPC is still the 
right tool for my purposes.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Michael Van Canneyt



On Sat, 5 Jan 2013, Martin Schreiber wrote:


On Saturday 05 January 2013 12:39:21 Jonas Maebe wrote:


Then maybe I should just stop completely answering any questions about
this, because apparently not answering completely enough to your liking
gets interpreted as telling you to shut up or getting moderated. Just like
Sven I don't understand where this interpretation comes from, and I
strongly resent it. I didn't answer because I thought the information was
all in my previous answers already, and if someone else felt they could
clarify it better than I did, they were free to do so.

My time is also finite, and trying to get me to elaborate further by
getting my on my high horse because I feel I'm being misrepresented, is
something that will not work very well in the long term. It will much more
likely result in silence than in more help.


No, no, you understand me wrong. I am merely cautious not to annoy the FPC
team, please accept my apology, but I need to decide if FPC is still the
right tool for my purposes.


Seeing that you have already invested lots of time in FPC, you could also ask 
yourself

'How can I help improve fpc so it remains the right tool for my purposes' ?

Or have you decided that cooperation with the FPC team is an impossibility ?

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Martin Schreiber
On Saturday 05 January 2013 13:01:44 Michael Van Canneyt wrote:

 Seeing that you have already invested lots of time in FPC, you could also
 ask yourself

 'How can I help improve fpc so it remains the right tool for my purposes' ?

 Or have you decided that cooperation with the FPC team is an impossibility
 ?

It is not easy mainly because the mission goal is so broad. And a division of 
work probably would be the better solution. I make my job to build a highly 
productive open source development environment for Free Pascal and the FPC 
team makes a compiler which allows to build such a tool.
But now we are off topic.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Hans-Peter Diettrich

Martin Schreiber schrieb:

but I fear we can not use that information for development with Free Pascal 
because:


The string is represented internally as a Unicode string encoded as UTF-16. 
Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters 
not in the BMP require 4 bytes.


and

A control string is a sequence of one or more control characters, each of 
which consists of the # symbol followed by an unsigned integer constant from 
0 to 65,535 (decimal) or from $0 to $ (hexadecimal) in UTF-16 encoding, 
and denotes the character corresponding to a specified code value. Each 
integer is represented internally by 2 bytes in the string. This is useful 
for representing control characters and multibyte characters.


which seems to be different from Free Pascal.


Where do you see a difference? The strings are stored in UTF-16, which 
is the same in every implementation, regardless of (possibly) different 
more verbose descriptions.


The new AnsiStrings are safe against misinterpretation, because they 
contain their encoding (codepage). Every char in an AnsiString now can 
be converted to one and only one Unicode char, when needed. This is not 
true for single AnsiChars, which still have no codepage information 
stored with them (in both Delphi and FPC). I strongly discourage the use 
of Char variables in all flavours (Char, AnsiChar, WideChar), because 
these are incapable of holding all possible Unicode characters. Only 
UnicodeChar or UCS4Char (if these exist) can hold all possible character 
codes, without possible codepage misinterpreation.


The discussion mostly covers the compilation of string *literals*, like 
'äöü' or #123, for which every compiler tries to find the best 
interpretation and internal representation. FPC has a $codepage 
directive, which tells the compiler that *all* literals in this unit 
shall be treated as strings of that codepage. This is essential for 
files stored as Ansi, which have no information about the codepage of 
the contained single-byte characters. Files stored with UTF-8 encoding, 
and an UTF-8 BOM at their begin, are safe against misinterpretation.


When the compiler translates the source code string literals, it can 
store them either as Unicode (UTF-16) or as AnsiString of the given 
$codepage, depending on the *use* of the literal (type of the string 
variable in an assignment). This will reduce the number of implicit 
string conversions at runtime.


[Please correct me if I'm wrong]
DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-05 Thread Jy V
 For Lazarus:
 - works with fpc 2.6.x and 2.7.1
 - LCL and most code expect ansistrings to hold UTF-8.
 - pascal sources, lfm, po files are stored in UTF-8 without BOM.
   Special care has to be taken, when using widestrings/unicodestring.
 - there are UTF-8 functions and classes (most in package lazutils).
 - the IDE supports many encodings
 - all this is documented via wiki and fpdoc
 - no support for UTF-16 has been started


Your summary sounds clear to me,
and the strategy selected looks to be well designed,
I will bookmark this summary as the reference.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Jonas Maebe

On 15 Dec 2012, at 19:35, Martin wrote:

 I am trying to figure out how to do that, or what I do wrong. I found a page 
 about $codepage, but it did not help 
 http://wiki.freepascal.org/LCL_Unicode_Support
 I didnt find the fpc specific page, if exists (I suspect it does)
 
 
 I am calling a function function Foo(A:string) {$mode objfpc}{$H+}
 I call it with a constant, that contains an german umlaut. Checked with a hex 
 editor, the constant in the source file is utf8
 
 - If I save the source (in utf8), without a utf8 BOM, then it works fine on 
 windows.
 - If I had a bom, then the string received by the function appears to be 
 ascii (checked memory dump in debugger oe becomes d6
 - if I add {$codepage utf8} it also becomes ascii
 
 If I do *not* add that, it seems something gos wrong with the encoding on a 
 PowerPC Mac. Unfortunately this is someone else's pc, and I have no more info.
 If I add it things also go wrong, only different. Again no more info.
 
 ---
 
 I know the provided info, is very little.

There is not enough information to be able to give an answer. Source code, 
source code, source code.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Martin

On 01/01/2013 13:18, Jonas Maebe wrote:

On 15 Dec 2012, at 19:35, Martin wrote:


I am trying to figure out how to do that, or what I do wrong. I found a page 
about $codepage, but it did not help 
http://wiki.freepascal.org/LCL_Unicode_Support
I didnt find the fpc specific page, if exists (I suspect it does)


I am calling a function function Foo(A:string) {$mode objfpc}{$H+}
I call it with a constant, that contains an german umlaut. Checked with a hex 
editor, the constant in the source file is utf8

- If I save the source (in utf8), without a utf8 BOM, then it works fine on 
windows.
- If I had a bom, then the string received by the function appears to be ascii (checked 
memory dump in debugger oe becomes d6
- if I add {$codepage utf8} it also becomes ascii

If I do *not* add that, it seems something gos wrong with the encoding on a 
PowerPC Mac. Unfortunately this is someone else's pc, and I have no more info.
If I add it things also go wrong, only different. Again no more info.

---

I know the provided info, is very little.

There is not enough information to be able to give an answer. Source code, 
source code, source code.



Problem is, the original issue does not happen on my hardware.
It is about an issue with the test case in components/EditorMacroScript. 
But only happens on powerPC hardware.
I only have/had an extract of the results. From the looks ofwhat went 
wrong, and what was the output (calculating char positions) utf8 coding 
was/is a strong suspect (not confirmed though.


On my hardware it is normally all fine, but fails if I add the 
$codepage. I could spent a lot of work boiling that down to a sample. 
But given that I couldn't even find the docs what I really should 
expect, and therefore might be doing something wrong, I thought I go 
first looking for what should happen.


Add {$codepage utf8}
on top of
  lazarus\components\macroscript\test\testscriptprocs.pas

and the behaviour changes so that the test will fail. (there are utf8 
constants in the source, and it appears, that with the $codepage the 
called code does NOT get that utf8 string, but something else instead.








___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Jonas Maebe

On 01 Jan 2013, at 15:14, Martin wrote:

 On my hardware it is normally all fine, but fails if I add the $codepage. I 
 could spent a lot of work boiling that down to a sample. But given that I 
 couldn't even find the docs what I really should expect,

Without a {$codepage xxx} directive, string constants containing characters  
#127 remain exactly as they appear in the source code.

With a {$codepage xxx} directive, string constants containing characters  #127 
are converted into unicodestrings during the parsing (according to the 
specified code page), and converted back into ansistrings (using the ansi 
code page of that particular program run) at run time if they are assigned to 
ansistring/shortstrings or passed to routines expecting such parameters.

Note that the above is for 2.6.x (as the subject mentions).


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Martin

On 01/01/2013 14:24, Jonas Maebe wrote:

On 01 Jan 2013, at 15:14, Martin wrote:


On my hardware it is normally all fine, but fails if I add the $codepage. I 
could spent a lot of work boiling that down to a sample. But given that I 
couldn't even find the docs what I really should expect,

Without a {$codepage xxx} directive, string constants containing characters  
#127 remain exactly as they appear in the source code.

With a {$codepage xxx} directive, string constants containing characters  #127 are 
converted into unicodestrings during the parsing (according to the specified code page), and 
converted back into ansistrings (using the ansi code page of that particular 
program run) at run time if they are assigned to ansistring/shortstrings or passed to 
routines expecting such parameters.

Note that the above is for 2.6.x (as the subject mentions).



ok, leaves me with my original problem.

On said ppc, using the original file (no codepage directive). file 
should be identical (svn checkout) (yet on 2nd thought I can't be sure, 
that it wasn't open in an editor and saved with utf8 bom...).


So on that ppc something goes wrong. From the feedback I had, it looks 
exactly as if the encoding of the constant was changed.  So what I was 
looking for was a way to 100% prevent that. Something that tells the 
compiler: Whatever encoding you find or expect or whatever encoding the 
output should be, do not touch strings. just take them byte by byte.

Does that exist?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Jonas Maebe

On 01 Jan 2013, at 15:40, Martin wrote:

 ok, leaves me with my original problem.
 
 On said ppc, using the original file (no codepage directive). file should be 
 identical (svn checkout) (yet on 2nd thought I can't be sure, that it wasn't 
 open in an editor and saved with utf8 bom...).
 
 So on that ppc something goes wrong. From the feedback I had, it looks 
 exactly as if the encoding of the constant was changed.  So what I was 
 looking for was a way to 100% prevent that. Something that tells the 
 compiler: Whatever encoding you find or expect or whatever encoding the 
 output should be, do not touch strings. just take them byte by byte.
 Does that exist?

As mentioned in my previous reply: if you don't use the codepage directive, 
then the compiler won't change anything. If you assign the string constant to a 
unicodestring or pass it as such a parameter, it will of course still be 
converted from ansi to utf-16 at run time.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Martin Schreiber
On Tuesday 01 January 2013 15:24:05 Jonas Maebe wrote:
 On 01 Jan 2013, at 15:14, Martin wrote:
  On my hardware it is normally all fine, but fails if I add the $codepage.
  I could spent a lot of work boiling that down to a sample. But given that
  I couldn't even find the docs what I really should expect,

 Without a {$codepage xxx} directive, string constants containing characters
  #127 remain exactly as they appear in the source code.

 With a {$codepage xxx} directive, string constants containing characters 
 #127 are converted into unicodestrings during the parsing (according to the
 specified code page), and converted back into ansistrings (using the ansi
 code page of that particular program run) at run time if they are assigned
 to ansistring/shortstrings or passed to routines expecting such parameters.

 Note that the above is for 2.6.x (as the subject mentions).

How does it work in trunk?

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Jonas Maebe

On 01 Jan 2013, at 16:31, Martin Schreiber wrote:

 On Tuesday 01 January 2013 15:24:05 Jonas Maebe wrote:
 Without a {$codepage xxx} directive, string constants containing characters
 #127 remain exactly as they appear in the source code.
 
 With a {$codepage xxx} directive, string constants containing characters 
 #127 are converted into unicodestrings during the parsing (according to the
 specified code page), and converted back into ansistrings (using the ansi
 code page of that particular program run) at run time if they are assigned
 to ansistring/shortstrings or passed to routines expecting such parameters.
 
 Note that the above is for 2.6.x (as the subject mentions).
 
 How does it work in trunk?

The strings are stored as ansistrings with the appropriate code page.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Martin Schreiber
On Tuesday 01 January 2013 16:44:28 Jonas Maebe wrote:
 On 01 Jan 2013, at 16:31, Martin Schreiber wrote:
  On Tuesday 01 January 2013 15:24:05 Jonas Maebe wrote:
  Without a {$codepage xxx} directive, string constants containing
  characters
 
  #127 remain exactly as they appear in the source code.
 
  With a {$codepage xxx} directive, string constants containing characters
   #127 are converted into unicodestrings during the parsing (according
  to the specified code page), and converted back into ansistrings (using
  the ansi code page of that particular program run) at run time if they
  are assigned to ansistring/shortstrings or passed to routines expecting
  such parameters.
 
  Note that the above is for 2.6.x (as the subject mentions).
 
  How does it work in trunk?

 The strings are stored as ansistrings with the appropriate code page.

So 

UnicodeStringVariable:= 'abcdäüö';

always will call a conversion function?

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Martin Schreiber
On Tuesday 01 January 2013 16:54:21 Martin Schreiber wrote:
 So
 
 UnicodeStringVariable:= 'abcdäüö';
 
 always will call a conversion function?

And how works

{$codepage 8859-1}
...
 UnicodeStringVar:= 'abcd'#228#246#252#1092#1080#1089#1074;


?
Martin


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Jonas Maebe

On 01 Jan 2013, at 16:54, Martin Schreiber wrote:

 On Tuesday 01 January 2013 16:44:28 Jonas Maebe wrote:
 The strings are stored as ansistrings with the appropriate code page.
 
 So 
 
 UnicodeStringVariable:= 'abcdäüö';
 
 always will call a conversion function?

The assignment node will insert a type conversion of the right hand side to 
unicodestring.

In 2.6.x, the right hand side will already be a unicodestring and nothing will 
happen.

In 2.7.x, the type conversion node will be simplified into a unicodestring 
constant because it is a typeconversion of a constant (just like int64(1) is 
also handled at compile time).

 And how works
 
 {$codepage 8859-1}
 ...
 UnicodeStringVar:= 'abcd'#228#246#252#1092#1080#1089#1074;
 
 
 ?

That string contains codepoints  #255 and hence is a unicodestring rather than 
a single byte string. No conversion at either compile or run time happens, and 
the codepage directive has no influence.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Martin Schreiber
Thanks, another question, or is the behavior already documented?

  UnicodeStringVar:= 'abcd'#228#246#252#1092#1080#1089#1074;
 
  
  ?

 That string contains codepoints  #255 and hence is a unicodestring rather
 than a single byte string. No conversion at either compile or run time
 happens, and the codepage directive has no influence.

 {$codepage utf8}
 ...
UnicodeStringVar:= 'abcd'#228#252#246;

Does it store 'abcdäüö' in trunk?

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Jonas Maebe

On 01 Jan 2013, at 17:51, Martin Schreiber wrote:

 Thanks, another question, or is the behavior already documented?

What you are asking about has always been the same. I don't know to what extent 
it is documented.

 {$codepage utf8}
 ...
 UnicodeStringVar:= 'abcd'#228#252#246;
 
 Does it store 'abcdäüö' in trunk?

I have no idea how anything I wrote suggests that it wouldn't. As mentioned, 
the only difference is that string constants containing characters #127 are no 
longer always converted to unicodestring constants at compile time. They are 
ansistring constants with the appropriate code page by default, and hence are 
only converted (at compile, since they are constants) to a different string 
type/code page when required.


Jonas

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Martin Schreiber
On Tuesday 01 January 2013 18:00:59 Jonas Maebe wrote:

 I have no idea how anything I wrote suggests that it wouldn't. As
 mentioned, the only difference is that string constants containing
 characters #127 are no longer always converted to unicodestring constants
 at compile time. They are ansistring constants with the appropriate code
 page by default, and hence are only converted (at compile, since they are
 constants) to a different string type/code page when required.

So #n or #nn or #nnn or # or #n always means Unicode codepoint and 
will be at compiletime converted to an 8bit character sequence depending on 
{$codepage} and stored in a cpstrnew with the codepage of {$codepage} if  
assigned to a cpstrnew variable?
And if the constant is assigned to a UnicodeString variable the Unicode 
codepoints are converted and stored to a utf-16 16bit character sequence at 
compiletime independent if they contain codepoints  255?
Has somebody a link to Embarcadero documentation about the matter? I assume 
FPC trunk does exactly the same as Delphi XE3 with strings?

Thanks for your patience, Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2013-01-01 Thread Hans-Peter Diettrich

Jonas Maebe schrieb:


And how works  {$codepage 8859-1} ... UnicodeStringVar:=
'abcd'#228#246#252#1092#1080#1089#1074;

 ?


That string contains codepoints  #255 and hence is a unicodestring
rather than a single byte string. No conversion at either compile or
run time happens, and the codepage directive has no influence.


Does this really mean that, when the codes  #255 are removed, the 
remaining codes have a different meaning?


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2012-12-18 Thread Anton Kavalenka

On 15.12.2012 21:35, Martin wrote:
I am trying to figure out how to do that, or what I do wrong. I found 
a page about $codepage, but it did not help 
http://wiki.freepascal.org/LCL_Unicode_Support

I didnt find the fpc specific page, if exists (I suspect it does)


I am calling a function function Foo(A:string) {$mode objfpc}{$H+}
I call it with a constant, that contains an german umlaut. Checked 
with a hex editor, the constant in the source file is utf8


- If I save the source (in utf8), without a utf8 BOM, then it works 
fine on windows.
- If I had a bom, then the string received by the function appears to 
be ascii (checked memory dump in debugger oe becomes d6

- if I add {$codepage utf8} it also becomes ascii

If I do *not* add that, it seems something gos wrong with the encoding 
on a PowerPC Mac. Unfortunately this is someone else's pc, and I have 
no more info.

If I add it things also go wrong, only different. Again no more info.

---

I know the provided info, is very little. If there is anything obvious 
then tell me.


Thanks
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Probably this is due to significant change in FPC 2.7 RTL

*String* type implies the encoding inside
under WIndows it is ANSI by default.

Try to write simple application that concatenates (s:=a+b) two strings 
with umlauted letters.

The resulting string loose the umlauts under Windows.

The only thing that help at the RTL level -
{$ifdef FPC}
SetMultiByteConversionCodePage(CP_UTF8);
{$endif}

This brings similar behaviour for RTL functions ether in Windows and 
UNIX but completely breaks file IO.
You wont be able to open file which names translates to 
more-than-one-byte per symbol.

because RTL IO is ANSI-specific under Windows.

Other approach - use the *UnicodeString*. Forget the *string* type.

regards,
Anton










___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2012-12-18 Thread Sven Barth

Am 18.12.2012 14:47, schrieb Anton Kavalenka:

On 15.12.2012 21:35, Martin wrote:
I am trying to figure out how to do that, or what I do wrong. I found 
a page about $codepage, but it did not help 
http://wiki.freepascal.org/LCL_Unicode_Support

I didnt find the fpc specific page, if exists (I suspect it does)


I am calling a function function Foo(A:string) {$mode objfpc}{$H+}
I call it with a constant, that contains an german umlaut. Checked 
with a hex editor, the constant in the source file is utf8


- If I save the source (in utf8), without a utf8 BOM, then it works 
fine on windows.
- If I had a bom, then the string received by the function appears to 
be ascii (checked memory dump in debugger oe becomes d6

- if I add {$codepage utf8} it also becomes ascii

If I do *not* add that, it seems something gos wrong with the 
encoding on a PowerPC Mac. Unfortunately this is someone else's pc, 
and I have no more info.

If I add it things also go wrong, only different. Again no more info.

---

I know the provided info, is very little. If there is anything 
obvious then tell me.


Thanks
___
fpc-devel maillist  - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Probably this is due to significant change in FPC 2.7 RTL

*String* type implies the encoding inside
under WIndows it is ANSI by default.


Martin's question is related to 2.6.0 (see his mail's subject) not 2.7.1.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] utf8 in 2.6.0

2012-12-15 Thread Martin
I am trying to figure out how to do that, or what I do wrong. I found a 
page about $codepage, but it did not help 
http://wiki.freepascal.org/LCL_Unicode_Support

I didnt find the fpc specific page, if exists (I suspect it does)


I am calling a function function Foo(A:string) {$mode objfpc}{$H+}
I call it with a constant, that contains an german umlaut. Checked with 
a hex editor, the constant in the source file is utf8


- If I save the source (in utf8), without a utf8 BOM, then it works fine 
on windows.
- If I had a bom, then the string received by the function appears to be 
ascii (checked memory dump in debugger oe becomes d6

- if I add {$codepage utf8} it also becomes ascii

If I do *not* add that, it seems something gos wrong with the encoding 
on a PowerPC Mac. Unfortunately this is someone else's pc, and I have no 
more info.

If I add it things also go wrong, only different. Again no more info.

---

I know the provided info, is very little. If there is anything obvious 
then tell me.


Thanks
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] utf8 in 2.6.0

2012-12-15 Thread Sven Barth

On 15.12.2012 19:35, Martin wrote:

I am trying to figure out how to do that, or what I do wrong. I found a
page about $codepage, but it did not help
http://wiki.freepascal.org/LCL_Unicode_Support
I didnt find the fpc specific page, if exists (I suspect it does)


The page is this: http://wiki.freepascal.org/FPC_Unicode_support though 
it's rather outdated... :(


Otherwise I can not help you, but it has definitely something to do with 
the different code page handling in 2.6.0 compared to 2.7.1.


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel