Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said:
  concatenated without data loss and that the result is then converted to
  the target string's encoding (except in case the target is
  RawByteString). How that is implemented exactly is undefined; again in
  the meaning of undefined, not in the meaning of undefined when
  defined as meaning X.
 
 In this case the implementation is compiler specific, somewhat 
 different from undefined (in a RawByteString):
 CP_NONE: this value indicates that no code page information has been 
 associated with the string data. The result of any explicit or implicit 
 operation that converts this data to another code page is undefined.
 
 IMO the result is well defined: it's the string with the encoding of 
 that other codepage. An undefined result, as I understand it, would 
 mean the result can be anything, unrelated to the function input.

This is usually called implementation defined. But implementation defined
implies it will remain the same in every iteration of the compiler (usually
documented).  If that is not wanted/possible, then it is considered
undefined.

So even if a value happens to be defined in one version of the compiler, it
doesn't automatically make it implementation defined. It needs to be a
documented choice for that.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] ThousandSeparator

2014-11-27 Thread Frederic Da Vitoria
2014-11-26 16:54 GMT+01:00 Hans-Peter Diettrich drdiettri...@aol.com:

 2) Formatted numbers, as enterd by the user (maybe by copypaste from
 other applications), can have various encodings. Before a conversion into
 binary values I'd remove all unexpected characters, except for the last
 (rightmost) '.' or ',', which then becomes the decimal separator as
 expected by the decoding function (RTL provided).


You mean that the first string to be converted to binary would
automatically set the decimal separator? That would seem dangerous to me.
What if the first string to be converted contained something like 11,000,
does this mean 11000 with thousand separator = comma (which would be true
in at least USA), or 11 with decimal separator = comma (which would be
true at least in France)? I can't think of any way to choose automatically.
AFAICS, the code needs either to use the system settings or to be told
explicitly by the developer. Even relying on the system settings may not be
enough, because one may need to import data formatted with different
national settings from the system's settings.

-- 
Frederic Da Vitoria
(davitof)

Membre de l'April - « promouvoir et défendre le logiciel libre » -
http://www.april.org
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell

On 11/26/2014 05:25 PM, Sven Barth wrote:




 So seemingly you could do MyStringType   = type 
AnsiString(CP_UTF16), and seemingly the size information is set 
according to this.


No, you can't, because the RTL does not handle that. For AnsiString 
the element size is *always* 1. It's hardcoded. AFAIK Delphi even does 
a compile error if you use CP_UTF16.




Thanks for the clarification.

I now understand that the Element Size field in the String header is 
quite dummy, as under the hood there are two completely separate 
concepts for one-byte-Strings and 2-Byte Strings and none for other 
Element sizes.


This to me is not obvious at all, as the language syntax and the String 
header data structure suggest a more universal paradigm for multiple 
string type brands, that each have an element-size6 and 
code-ID-number setting, handled by a common infrastructure.


The universal paradigm would allow for extensions (e.g. UTF-32, 
multiple 16 Bit Code pages, an additional fully dynamic String type, 
n-byte un-encoded string types), as I described in the Wiki page.


The dual mode concept of course does not provide such extensibility, 
and so I stop thinking about this (and bothering the community), and am 
happy that it just works as it is.


Thanks again,
-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell

On 11/26/2014 05:37 PM, Jonas Maebe wrote:

invalid (in the meaning of undefined) in both FPC and Delphi.
Sorry (I am not a native speaker). But to me undefined and invalid 
have  completely different meanings (in this context). An Invalid use 
of the language would result in an error (compiler or runtime), while an 
undefined language construct would result in something that might work 
in some way, but there is no guarantee that the outcome is always the 
same (e.g. in another instance or another compiler version).



CP_UTF16 and CP_UTF16BE can be returned by StringCodePage() when 
called on a unicodestring, and that's it.


I now do understand (see my reply to Sven).
-Michael


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell

On 11/26/2014 09:30 PM, Hans-Peter Diettrich wrote:
So seemingly you could do MyStringType   = type 
AnsiString(CP_UTF16), and seemingly the size information is set 
according to this.

Not in Delphi XE.

Thanks for the clarification.

I did have some hope that fpc would be (or could be extended to be) 
better than Delphi on that behalf.


I now do see the reason that resulted in the (to me rather queer) Naming 
 AnsiString for the code page aware string type. I erroneously 
supposed the syntax that finally would be used would be something like 
MyStringType   = type String(CP_UTF16), with no restriction to 
ANSI, but the CP_ constant defining as well a code page as an 
Element size, as suggested by the language syntax while working with 
string using auto-conversion, and by the structure of the string content 
header.


There still might be room for (fully compatible) improvement (as I 
described in the Wiki), but it's even more difficult to do than I supposed.


Thanks again,
-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell

On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote:


Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString), 
just like with every dynamic array.
Sorry for sloppy wording. Of course I did mean element size 
(Character here obviously is not printable item).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Jonas Maebe
On 26/11/14 23:41, Hans-Peter Diettrich wrote:
 In this case the implementation is compiler specific, somewhat
 different from undefined (in a RawByteString):
 CP_NONE: this value indicates that no code page information has been
 associated with the string data. The result of any explicit or implicit
 operation that converts this data to another code page is undefined.
 
 IMO the result is well defined: it's the string with the encoding of
 that other codepage.

Unless you actually tested this on all platforms and noted that is the
case, you cannot state this. And if you would actually test it, you
would discover that it is wrong
(http://bugs.freepascal.org/view.php?id=22501#c61238 ).

As mentioned in a previous discussion: don't use IMO (in my opinion)
when talking about testable facts. A testable fact is either true or
false, opinions do not enter the picture.

 An undefined result, as I understand it, would
 mean the result can be anything, unrelated to the function input.

Which is 100% correct.

 IMO a better wording should be found, that does not cause the current
 obvious confusion of some readers.

The confusion only occurs for readers that do not believe what is written.


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicodesupport

2014-11-27 Thread Jonas Maebe
On 26/11/14 21:25, Hans-Peter Diettrich wrote:
 Jonas Maebe schrieb:
 On 26/11/14 17:41, Tomas Hajny wrote:
 BTW, in this context - can users choose UTF16BE on little endian
 platforms (and vice versa)?

 No, because we do not have any routines that allow a user to set/change
 the codepage of a unicodestring (either at run time or at compile time).
 
 What about file I/O?
 It should be possible to read (and write) files of either endianness.

Standard I/O only supports single byte code pages (which should be
documented). Reading a unicodestring from a text file converts from the
single byte code page to the native-endianess UTF-16 format.


Jonas

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] ThousandSeparator

2014-11-27 Thread Mattias Gaertner
On Thu, 27 Nov 2014 11:02:06 +0100
Sven Barth pascaldra...@googlemail.com wrote:

[...]
 Yes, there's a message for that and yes on non-Windows OSes this might be
 problematic...

AFAIK on Unix systems you can start each program with a different
locale. The system wide locale is just the default on start. Therefore
changing the system wide settings should not affect a running
application - aka that would be a bug.

Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] ThousandSeparator

2014-11-27 Thread Mark Morgan Lloyd

Mattias Gaertner wrote:

On Thu, 27 Nov 2014 11:02:06 +0100
Sven Barth pascaldra...@googlemail.com wrote:


[...]
Yes, there's a message for that and yes on non-Windows OSes this might be
problematic...


AFAIK on Unix systems you can start each program with a different
locale. The system wide locale is just the default on start. Therefore
changing the system wide settings should not affect a running
application - aka that would be a bug.


You can also have completely different desktops etc. (Gnome on one 
display, KDE on another, a text session on a third), which would be 
unlikely to use the same notification mechanism.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] ThousandSeparator

2014-11-27 Thread Mattias Gaertner
On Tue, 25 Nov 2014 20:12:31 +0100 (CET)
mar...@stack.nl (Marco van de Voort) wrote:

 In our previous episode, Michael Van Canneyt said:
   The ThousandSeparator is char and supports only 1 byte characters.
   For example French and Russian need more.
   Are there any plans to extend it?
  
  Plans: yes. Time: no.
  
  Maybe a widechar is sufficient ? 
 
 If you start changing the encoding of formatsettings (to something else than
 whatever string is, you get a lot of conversions. Even to just print a
 number you get thousandseparator (multiple times), decimal separator etc.
 
 I think biting the bullet and making it a string is better long term. If we
 ever target a different string it will also break less code.

+1

Linux uses strings too.
With LC_NUMERIC=ru_RU.utf8 the thousand separator is the two byte
nbsp/160. French for some reason uses simple space.

BTW, the clocale unit, which sets the formatsettings, simply takes the
first byte. It would be good if it would check if the character is
ASCII. If not and codepage is utf8 then use the default. 
Should I create a patch for 2.8.0?

Hopefully in 2.8.1+ the type has changed and clocale can use the
correct settings.

Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Regionalisation (Was ThousandSeparator)

2014-11-27 Thread Ralf Quint

On 11/27/2014 9:09 AM, peter green wrote:

Hans-Peter Diettrich wrote:


But back to the original problem: I managed to create another user, 
whose number format settings match the expections of the Ez-Builder, 
while using my German keyboard. For Linux users this may sound like 
an easy job, but adding and configuring users in Win8 turned out as 
kind of a nightmare :-(
Win8 requires an eMail address for every new user, but entering a 
fake address only allows to create the account, without any chance to 
log in subsequently. Probably the requested password has to be 
established by mail, at least I found no way to disable or specify or 
reset the password for the new account.


From what i've heard there is a way to create a local only user in 
win8 but it's fairly hidden.

Yes, you can run Windows 8(.1) with a local user only

Ralf
(typing this on a Windows 8.1/64bit laptop with a local user account (only))

---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] ThousandSeparator

2014-11-27 Thread Hans-Peter Diettrich

Sven Barth schrieb:

At my old company our Delphi application handled runtime changes to 
these settings rather well. For display the normal XToY (e.g. DateToStr) 
functions are used which use the DefaultFormatSettings which are updated 
automatically (the VCL's message loop triggers a repaint when format 
settings were changed in the system).


A repaint by itself doesn't change the strings. How do the new strings 
come into all the edit boxes, of all open forms?


Similarly, when the user changes the system language, can he expect that 
every running application updates itself, with changed menus etc., up to 
eventually open help viewers? What if the program is not prepared for a 
different language, because e.g. a tax assistant is bound to a specific 
country?


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

I now understand that the Element Size field in the String header is 
quite dummy, as under the hood there are two completely separate 
concepts for one-byte-Strings and 2-Byte Strings and none for other 
Element sizes.


After a code review I realized that the element size field is specific 
to dynamic strings, not present in dynamic arrays. Since the element 
size is bound to the string type, it could be omitted in the FPC 
implementation. [With little win, when the record alignment is preserved]


This to me is not obvious at all, as the language syntax and the String 
header data structure suggest a more universal paradigm for multiple 
string type brands, that each have an element-size6 and 
code-ID-number setting, handled by a common infrastructure.


This may have been envisaged by the Delphi architects, but was not 
continued later.


The universal paradigm would allow for extensions (e.g. UTF-32, 
multiple 16 Bit Code pages, an additional fully dynamic String type, 
n-byte un-encoded string types), as I described in the Wiki page.


Even if feasable, such arbitrary string storage can dramatically 
increase the number of implicit string conversions. An *efficient* 
implementation would be based on a single program-wide string 
representation, with different encodings being handled only in an 
exchange with external data sources.


That standard encoding may be Ansi or Unicode; even Delphi allows for 
both models, where Ansi again suggests the use of one specific codepage 
(CP_ACP) for best performance.



Cassandra
After all I have the impression that the known RawByteString flaws will 
never be fixed in Delphi, in order to encourage the users to take the 
step to UnicodeString. Now the question is whether these flaws are fixed 
in FPC, or whether Lazarus will become the first project that definitely 
requires an complete move to UnicodeString, for reliable operation.

For best support of non-UTF-16 platforms I'd suggest to fix the flaws...
/Cassandra

DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] ThousandSeparator

2014-11-27 Thread Hans-Peter Diettrich

Frederic Da Vitoria schrieb:
2014-11-26 16:54 GMT+01:00 Hans-Peter Diettrich drdiettri...@aol.com 
mailto:drdiettri...@aol.com:


2) Formatted numbers, as enterd by the user (maybe by copypaste
from other applications), can have various encodings. Before a
conversion into binary values I'd remove all unexpected characters,
except for the last (rightmost) '.' or ',', which then becomes the
decimal separator as expected by the decoding function (RTL provided).


You mean that the first string to be converted to binary would 
automatically set the decimal separator?


No, my code would make no assumption about the format of strings edited 
by the user.


That would seem dangerous to 
me. What if the first string to be converted contained something like 
11,000, does this mean 11000 with thousand separator = comma (which 
would be true in at least USA), or 11 with decimal separator = comma 
(which would be true at least in France)? I can't think of any way to

choose automatically.


Okay, that would require more knowledge about the value kept in a 
specific input field, for range checks or the like. As long as thousands 
separators occur in the string, different from '.' or ',', they are 
quite easy to identify.


AFAICS, the code needs either to use the system 
settings or to be told explicitly by the developer. Even relying on the 
system settings may not be enough, because one may need to import data 
formatted with different national settings from the system's settings.


Right. When e.g. a CAD program is fed with sizes from an external data 
sheet, it cannot be expected that the figures in that file change 
together with the system language, and are converted between inch and 
meter, temparatures between F and C ;-)


So it looks to me as a stupid idea, when the user changes such system 
settings *while* such a program is running. Furthermore the use of 
national formatting conventions for the exchange of values across 
applications looks to me like another stupid idea. Would somebody expect 
or even like it, when e.g. constant declarations are converted when 
copied into an Lazarus editor, and the compiler would require that all 
constants in source code conform to the current settings?


As mentioned in other contributions, the number formatting seems to be a 
Windows specific problem. How to deal with imported numbers on other 
platforms, with arbitrary settings per application?



After all a program could, when notified of such changes, ask the user 
whether to continue or restart, or force an restart. Restart should be 
safe, but when the user decides to continue, he must be aware of 
possible problems. When the restart takes considerable time, the user 
may learn that his behaviour is not very clever ;-)


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote:


Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString), 
just like with every dynamic array.
Sorry for sloppy wording. Of course I did mean element size 
(Character here obviously is not printable item).


I'd restrict the use of character to physical Char types, just to 
avoid any misinterpretation.


Printable items (glyphs) are independent from the storage format. 
Ligatures or umlauts can consist of multiple codepoints, and several 
Unicode codepoints are not even printable.


A single printable character, as selectable by a single cursor step, 
can consist of multiple codepoints, even (or just) in Unicode.



That's why I'd expect that the FPC documentation includes a glossary and 
definition of the terms, which should be used in the documentation and 
discussions.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich

Jonas Maebe schrieb:

On 26/11/14 23:41, Hans-Peter Diettrich wrote:

In this case the implementation is compiler specific, somewhat
different from undefined (in a RawByteString):
CP_NONE: this value indicates that no code page information has been
associated with the string data. The result of any explicit or implicit
operation that converts this data to another code page is undefined.

IMO the result is well defined: it's the string with the encoding of
that other codepage.


Unless you actually tested this on all platforms and noted that is the
case, you cannot state this. And if you would actually test it, you
would discover that it is wrong
(http://bugs.freepascal.org/view.php?id=22501#c61238 ).


Bugs obviously violate some specification/definition, else it's not a 
bug, it's a feature ;-)



As mentioned in a previous discussion: don't use IMO (in my opinion)
when talking about testable facts. A testable fact is either true or
false, opinions do not enter the picture.


We're just talking about interpretations, not facts.



An undefined result, as I understand it, would
mean the result can be anything, unrelated to the function input.


Which is 100% correct.


Do you see any use for such function definitions, except in random 
generators?



IMO a better wording should be found, that does not cause the current
obvious confusion of some readers.


The confusion only occurs for readers that do not believe what is written.


Such statements come only from writers that do not believe that their 
words can be understood in various ways ;-)


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Regionalisation (Was ThousandSeparator)

2014-11-27 Thread Hans-Peter Diettrich

Michael Thompson schrieb:

I hear you, but this issue is so much wider than separators.  I know one 
software package that will only successfully export data to excel if the 
system regional is one of the English (xxx) variations (Australian 
guaranteed to work, not really played with the rest...).  In this case, 
the client (in Denmark) has one PC in a corner, set to Australian 
settings, just for exports...


This may be a relict from the time, where Microsoft found it a good idea 
to nationalize VBA (VB, Word, Excel...). I appreciated that in so far, 
as no English macro virus could become active in my German Word (with 
only German keywords). The same language barrier may prevent proper data 
export, maybe starting with slightly different keyword spellings like 
for color/colour.


Similar problems exist(ed) in RTF export, so that MS had to ship another 
WinHelp compiler (HC) for every new WinWord version, that worked around 
the new errors in the RTF sources exported from Word, even after VBA was 
reverted to unique English-only keywords and function names.



As an Australian developer, this is just embarrassing...


I never felt a need or reason for considering Microsoft products as 
anything but buggy toys, hardly usable outside the USA :-(



To come back to the current discussions, the introduction of Unicode (as 
UCS-2 and UTF-16) was a similar (typically American) mistake, totally 
ignoring e.g. any Chinese character set (in favor of Klingon?). Apple, 
as another US company, invented the decomposed Unicode filenames - for 
lack of oversight, or to establish artificial platform barriers?


The step from strictly national Ansi applications to Unicode is a very 
tiny one, compared to the leaps that have to be taken afterwards, in 
order to make the program really work in foreign countries. I wonder how 
e.g. Belgian, Canadian or Swiss software has been written in such 
multi-lingual countries before, and how it is written nowadays.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote:


An AnsiString consists of AnsiChar's. The *meaning* of these char's 
(bytes) depends on their encoding, regardless of whether the used 
encoding is or is not stored with the string.
I understand that the implementation (in Delphi) seems to be driven more 
by the Wording (ANSI) than by the logical paradigm the language syntax 
suggests. The language syntax and the string header fields suggest that 
both the element-size as the code-ID-number need to be adhered to (be it 
statically or dynamically - depending on the usage instance). E.g. there 
are (are least two Code pages for UTF-16 (LE, and BE), that would 
be worth supporting.


You are confusing codepages and encodings :-(

UTF-7, UTF-8, UTF-16 and UTF-16BE describe different representations of 
the same values (Unicode codepoints). And I agree, all commonly used 
encodings should be implemented, at least for data import/export.



It's essential to distinguish between low-level (physical) AnsiChar 
values, and *logical* characters possibly consisting of multiple 
AnsiChars.
I now do see that the implementation is done following this concept. But 
the language syntax and the string header field suggest a more versatile 
paradigm, providing a universal reference counting element string type.


See it as a multi-level protocol for text processing. The bottom 
(physical) level deals with physical storage items (AnsiChar, 
WideChar...), and how they are stored in memory or files. Like it 
doesn't make sense to deal with individual bytes of real numbers in 
computations, it doesn't make sense to deal with individual bytes 
(AnsiChars) of logical characters - except in type/encoding conversions. 
Higher levels deal with logical values, which can consist of multiple 
physical items, and may need different interpretatons (in case of Ansi 
codepages). This level is partially coverd now by AnsiString encodings 
and UTF-16 surrogate pairs, which allow to map the values into full 
Unicode (UCS-4) codepoints. But these codepoints still are not 
sufficient for a correct interpretation and manipulation of logical 
characters, which again can consist of multiple codepoints (decomposed 
umlauts, ligatures...). In a next level another (mostly language 
specific) interpretation may be required, like which logical characters 
have to be treated together (ligatures, non-breaking characters...). 
Some natural languages (Hebrew, Arabic...) require another special 
handling of (mixed) LTR/RTL reading, and of paths, influencing the 
graphical representation of character sequences; but that's nothing an 
application or library writer should have to deal with, such 
functionality should be provided by the target platform.


There must be a boundary between the standard (RTL) handling of the 
physical items and encodings, and higher text processing levels, up to 
language specific processing (how to break words, when to apply 
capitalization, syntax checks...), so that such special handling can be 
implemented in dedicated extensions (libraries, classes), by developers 
familiar with the rules and conventions of the natural languages.


For now we are talking only about the handling up to individual Unicode 
codepoints, and related string manipulation. Herefore at least one 
string representation must exist, that covers the full Unicode range of 
codepoints (UTF-8 or UTF-16 for now). When such an implementation claims 
for undefined behaviour, then this can only mean implementation flaws, 
resulting in something different from what can be expected from proper 
Unicode handling. This includes invalid parameter values in subroutine 
calls, which should result in proper (defined) runtime error reporting 
(AV, error result...).


WRT to AnsiString encodings, the only acceptable (expected) differences 
can result from lossy conversions, when converting proper Unicode into a 
non-UTF encoding. Even then the results should be consistent, even if 
the concrete results depend on some external (platform...) convention or 
settings.


IMO.


That's why I wonder *when* exactly the result of such an expression 
*is* converted (implicitly) into the static encoding of the target 
variable, and when *not*.
I understand that the idea is, to use the static encoding information 
provided by the type definition whenever possible.


Right, but here whenever possible depends on the correspondence of 
static and dynamic encoding. When the dynamic encoding can *ever* be 
different from the static encoding, except for RawByteString, I consider 
it NOT possible to derive the need for a conversion from the static 
encoding. In the handling of floatingpoint values we may have to expect 
invalid operations (division by zero, overflow...) or values (NaN...), 
but NOT that a Double variable ever contains two Integer values - unless 
forced by dirty hacks out of compiler control. Why should this be 
different and acceptable with 

Re: [fpc-devel] ThousandSeparator

2014-11-27 Thread Sven Barth
Am 28.11.2014 05:01 schrieb Hans-Peter Diettrich drdiettri...@aol.com:

 Sven Barth schrieb:


 At my old company our Delphi application handled runtime changes to
these settings rather well. For display the normal XToY (e.g. DateToStr)
functions are used which use the DefaultFormatSettings which are updated
automatically (the VCL's message loop triggers a repaint when format
settings were changed in the system).


 A repaint by itself doesn't change the strings. How do the new strings
come into all the edit boxes, of all open forms?

Of course it's not a mere simple repaint. Controls like TSpinEdit and
TDateEdit (both controls are designed to display localized values) will
convert their internal data value to a string using the current format
settings upon a repaint. If you have a mere edit then you need to handle
that yourself of course by also intercepting the settings change message.
Then you can convert your internal data value (I don't know about you, but
I store my values internally localization independent) to the correct
display value.

 Similarly, when the user changes the system language, can he expect that
every running application updates itself, with changed menus etc., up to
eventually open help viewers? What if the program is not prepared for a
different language, because e.g. a tax assistant is bound to a specific
country?

Of course not every application does or even can handle this. Nevertheless
an OS like Windows provides the possibility for an application to handle
that situation to provide a more seamless experience for its users.

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel