On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote:
In Delphi *no* string can have an dynamic encoding of CP_NONE or CP_ACP,
If you really do have "Dynamic" strings, obviously, the *definition*
(i.e. CP_...) of such strings is strictly static (just for compiler use)
and never cant be used as
On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote:
You forget that Jonas refers to *dynamic* string encodings, unknown at
compile time.
???
In you other mail you pointed out that fpc (other than Delphi) does not
provide *dynamic* string encoding with RawByteString (and where else
would it
On 12/03/2014 10:42 AM, Michael Schnell wrote:
That is why I tried to invent a concept
BTW.:
I can't help with the implementation, but I'll be happy to do testing
and write documentation (e.g. in Wiki format).
-Michael
___
fpc-devel maillist -
On 12/03/2014 05:02 AM, Hans-Peter Diettrich wrote:
Michael Schnell schrieb:
- It does not result in additional conversions.
It does, e.g. in searching or sorting of StringList, when it can contain
strings of different encodings. The choice of a unique encoding for
application strings (maybe C
Michael Schnell schrieb:
On 11/29/2014 07:55 AM, Jonas Maebe wrote:
Exactly the same goes for converting strings with code page CP_NONE to
a different code page: your program is broken when it tries to do that,
While accessing an array beyond its bounds is not detectable at compile
time and a
Michael Schnell schrieb:
On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote:
Apart from that, every encoding-tolerant code will execute much slower
than code without a need for checks and conversions everywhere.
As I pointed out I don't agree at all.
- The check is only two ASM instructions
On Tue, 02 Dec 2014 13:31:44 +0100
Michael Schnell wrote:
>[...]*defined* *to* *be* *undefined*
Ooh, that is soo meta. lol
Mattias
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
On 11/29/2014 05:36 PM, Hans-Peter Diettrich wrote:
As Delphi doesn't allow for a dynamic encoding of CP_NONE, I don't
understand the purpose of the FPC description.
As you suggested in the other mail, the Delphi implementation of
RawByteString is decently flawed and this supposedly is introduc
On 11/29/2014 07:55 AM, Jonas Maebe wrote:
Exactly the same goes for converting strings with code page CP_NONE to
a different code page: your program is broken when it tries to do that,
While accessing an array beyond its bounds is not detectable at compile
time and accessing an array beyond i
On 12/02/2014 01:05 PM, Michael Schnell wrote:
But why do you say "would be appreciated" ? Is it not possible to use
"RawByteString" in a way the name suggests, by never bringing it
together with any String variable of a different encoding brand and
hence avoid any conversion - be same intentio
On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote:
You suggested to use "string" as UTF-16 on Windows, and UTF-8 on
Linux. That's what I understand as a unique program-wide string
representation (not sourcecode-wide, instead program as *compiled*).
Then I cannot see any need or use for anoth
Jonas Maebe schrieb:
On 28/11/14 21:30, Hans-Peter Diettrich wrote:
I prefer to specify and document everything *before* coding, so that
everybody can expect that the code will behave as specified.
If certain behaviour is explicitly undefined, it *is* specified and
documented. It means that yo
On 28/11/14 21:30, Hans-Peter Diettrich wrote:
> I prefer to specify and document everything *before* coding, so that
> everybody can expect that the code will behave as specified.
If certain behaviour is explicitly undefined, it *is* specified and
documented. It means that your program is buggy i
Michael Schnell schrieb:
I fear that there will be code that relies on the "flawed" behavior of
RawByteString ("it's a feature, not a bug") and using the same name with
different behavior would brake same. And a really usable DynmicString
would not adhere to that description.
How can somebo
Jonas Maebe schrieb:
I'm sorry, but I simply cannot discuss with people that, when I
literally state "the result is undefined", think that I may actually
have meant "the result is defined and if you change the
implementation and/or keep it stable across compiler releases, then
it will also confo
Michael Schnell schrieb:
On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:
An *efficient* implementation would be based on a single program-wide
string representation, with different encodings being handled only in
an exchange with external data sources.
Yep. But it would result in severe u
On 11/27/2014 07:29 PM, Hans-Peter Diettrich wrote:
Michael Schnell schrieb:
E.g. there are (are least two "Code pages" for UTF-16 ("LE", and
"BE"), that would be worth supporting.
You are confusing codepages and encodings :-(
That is why I put "goose-feet" around "Code pages". I used this wo
On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:
The "universal paradigm" would allow for extensions (e.g. UTF-32,
multiple 16 Bit Code pages, an additional fully dynamic String type,
n-byte "un-encoded" string types), as I described in the Wiki page.
Even if feasable, such arbitrary string
On 27 Nov 2014, at 17:11, Hans-Peter Diettrich wrote:
> Such statements come only from writers that do not believe that their words
> can be understood in various ways ;-)
I'm sorry, but I simply cannot discuss with people that, when I literally state
"the result is undefined", think that I
Michael Schnell schrieb:
On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote:
An AnsiString consists of AnsiChar's. The *meaning* of these char's
(bytes) depends on their encoding, regardless of whether the used
encoding is or is not stored with the string.
I understand that the implementation
Jonas Maebe schrieb:
On 26/11/14 23:41, Hans-Peter Diettrich wrote:
In this case the implementation is "compiler specific", somewhat
different from "undefined" (in a RawByteString):
"CP_NONE: this value indicates that no code page information has been
associated with the string data. The result
Michael Schnell schrieb:
On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote:
Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString),
just like with every dynamic array.
Sorry for sloppy wording. Of course I did mean "ele
Michael Schnell schrieb:
I now understand that the "Element Size" field in the String header is
quite dummy, as under the hood there are two completely separate
concepts for one-byte-Strings and 2-Byte Strings and none for other
Element sizes.
After a code review I realized that the element
Sven Barth schrieb:
Just a little remark: please don't throw in WideString, which is a
completely different type and only there for easy compatibility with COM
and other Windows APIs.
I mentioned it for completness, and because (at least in Delphi) the
elements of an UnicodeString are WideCh
On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote:
An AnsiString consists of AnsiChar's. The *meaning* of these char's
(bytes) depends on their encoding, regardless of whether the used
encoding is or is not stored with the string.
I understand that the implementation (in Delphi) seems to be d
On 26/11/14 23:41, Hans-Peter Diettrich wrote:
> In this case the implementation is "compiler specific", somewhat
> different from "undefined" (in a RawByteString):
> "CP_NONE: this value indicates that no code page information has been
> associated with the string data. The result of any explicit
On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote:
Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString),
just like with every dynamic array.
Sorry for sloppy wording. Of course I did mean "element size"
("Character" h
On 11/26/2014 07:54 PM, Hans-Peter Diettrich wrote:
Delphi XE does not properly support UTF-8.
That is what I supposed. Of course the developers at Embarcadero did not
need to think about portability to other OSes than Windows when crafting
the concept.
But obviously fpc needs proper support
On 11/26/2014 09:30 PM, Hans-Peter Diettrich wrote:
So seemingly you could do MyStringType = type
AnsiString(CP_UTF16), and seemingly the size information is set
according to this.
Not in Delphi XE.
Thanks for the clarification.
I did have some hope that fpc would be (or could be extend
On 11/26/2014 05:37 PM, Jonas Maebe wrote:
invalid (in the meaning of "undefined") in both FPC and Delphi.
Sorry (I am not a native speaker). But to me "undefined" and "invalid"
have completely different meanings (in this context). An "Invalid" use
of the language would result in an error (com
On 11/26/2014 05:25 PM, Sven Barth wrote:
>
> So seemingly you could do MyStringType = type
AnsiString(CP_UTF16), and seemingly the size information is set
according to this.
No, you can't, because the RTL does not handle that. For AnsiString
the element size is *always* 1. It's hard
In our previous episode, Hans-Peter Diettrich said:
> > concatenated without data loss and that the result is then converted to
> > the target string's encoding (except in case the target is
> > RawByteString). How that is implemented exactly is undefined; again in
> > the meaning of "undefined", n
On 26.11.2014 19:54, Hans-Peter Diettrich wrote:
UTF-16 is not a valid value for CP_ACP in Delphi, because it's a 2-byte
encoding. Even if the Delphi architects may have thought about an common
string type, with a variable element size (1,2,4), this certainly turned
out soon as a stupid idea, so
Jonas Maebe schrieb:
Technically, that section literally states that they will be
concatenated without data loss and that the result is then converted to
the target string's encoding (except in case the target is
RawByteString). How that is implemented exactly is undefined; again in
the meaning
Michael Schnell schrieb:
So seemingly you could do MyStringType = type
AnsiString(CP_UTF16), and seemingly the size information is set
according to this.
Not in Delphi XE.
DoDi
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http:/
Mattias Gaertner schrieb:
For example:
CP_ACP=0, DefaultSystemCodePage=1252
That means static code page is always 0, while dynamic code page can be
0 or 1252. Both describe the same encoding.
A *dynamic* encoding *never* can be CP_ACP nor CP_NONE (in Delphi).
These values are allowed only for
Michael Schnell schrieb:
I fail to understand some of the text.
It seems to be unavoidable to use the name "ANSIString" even though I
always though up when seeing a thing called "ANSI" containing Unicode
(e. g. "UTF8String = type AnsiString(CP_UTF8)" ).
Seemingly here the "bytes per chara
Michael Schnell schrieb:
On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
Ansistring supports only one byte per character code pages.
Even more confused. Am I wrong thinking that with code aware Strings,
for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not
right, than due
Michael Schnell schrieb:
On 11/26/2014 12:09 PM, Sven Barth wrote:
In Delphi (and FPC) CP_ACP corresponds by default with the current
system codepage (e.g. CP1252 on a German Windows).
OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP1252) but different from String wi
Mattias Gaertner schrieb:
On Wed, 26 Nov 2014 11:23:17 +0100
Michael Schnell wrote:
Seemingly here the "bytes per character" setting implicitly is thought
of as a port of the "code-page" definition. correct ?
Code page define bytes per character.
Huh?
Not all codepages have a fixed numbe
On Wed, 26 Nov 2014 17:50:31 +0100
Mattias Gaertner wrote:
> On Wed, 26 Nov 2014 17:23:48 +0100
> Jonas Maebe wrote:
>
> > On 26/11/14 17:21, Sven Barth wrote:
> > > Yes, nevertheless the header record is the same for UnicodeString and
> > > AnsiString and thus it also has a codepage field whic
On Wed, 26 Nov 2014 17:23:48 +0100
Jonas Maebe wrote:
> On 26/11/14 17:21, Sven Barth wrote:
> > Yes, nevertheless the header record is the same for UnicodeString and
> > AnsiString and thus it also has a codepage field which is always
> > initialized to CP_UTF16 however.
>
> It can also be CP_U
On 26/11/14 16:19, Michael Schnell wrote:
> So seemingly you could do MyStringType = type
> AnsiString(CP_UTF16), and seemingly the size information is set
> according to this.
As several people have told you several times, that is invalid (in the
meaning of "undefined") in both FPC and Delp
On 26/11/14 17:21, Sven Barth wrote:
> Yes, nevertheless the header record is the same for UnicodeString and
> AnsiString and thus it also has a codepage field which is always
> initialized to CP_UTF16 however.
It can also be CP_UTF16BE (which it is on big endian FPC targets right now).
Jonas
_
Am 26.11.2014 15:30 schrieb "Mattias Gaertner" :
>
> On Wed, 26 Nov 2014 15:05:16 +0100
> Sven Barth wrote:
>
> >[...]
> > While both AnsiString and UnicodeString have the current codepage and
the
> > character size in their header record
>
> AFAIK UnicodeString has only a static (fixed) code page
On 26/11/14 13:11, Michael Schnell wrote:
> In section "String concatenations" there is no mentioning about
> auto-conversion.
There is.
> For statically typed Strings it's rather obvious that
> they will be auto-converted if appropriate.
It's probably rather obvious because it is literally ment
On 11/26/2014 03:05 PM, Sven Barth wrote:
>
> OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP1252) but different from String without brackets which in
turn is the same as String(CP_UTF16) ? Correct ?
There is no "String with brackets". You can only use "AnsiString"
On 26/11/14 12:53, Michael Schnell wrote:
[CP_NONE]
> Is this "undefined" in the meaning of "not predictable by the user" in
> the "current" version of fpc, or in the meaning of "due to change" when
> updating fpc.
This "undefined" literally means "undefined". It does not mean
"undefined in a mean
On Wed, 26 Nov 2014 15:05:16 +0100
Sven Barth wrote:
>[...]
> While both AnsiString and UnicodeString have the current codepage and the
> character size in their header record
AFAIK UnicodeString has only a static (fixed) code page.
Mattias
___
fpc-d
Am 26.11.2014 12:37 schrieb "Michael Schnell" :
>
> On 11/26/2014 12:09 PM, Sven Barth wrote:
>>
>> In Delphi (and FPC) CP_ACP corresponds by default with the current
system codepage (e.g. CP1252 on a German Windows).
>
>
> OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP12
After re-reading yet another question:
In section "String concatenations" there is no mentioning about
auto-conversion. For statically typed Strings it's rather obvious that
they will be auto-converted if appropriate. Technically - if differently
encode - they seem to be converted to Unicode a
On 11/26/2014 12:10 PM, Mattias Gaertner wrote:
"the results of conversions from/to the CP_NONE code page are undefined."
... because CP_NONE is not a real code page.
So you understand "result" as what you would get when printing.
In the context of this wiki page I would understand "result" a
On 11/26/2014 12:13 PM, Mattias Gaertner wrote:
In mode delphiunicode String=UnicodeString.
I see.
So even in Delphi XE where "UnicodeString" is denoted by "CP_UTF16", the
value of the constant CP_UTF16 is not the same as the value of the
(constant or) variable CP_ACP, (while OTOH using the v
On 11/26/2014 12:09 PM, Sven Barth wrote:
In Delphi (and FPC) CP_ACP corresponds by default with the current
system codepage (e.g. CP1252 on a German Windows).
OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP1252) but different from String without brackets which in tu
On Wed, 26 Nov 2014 11:52:50 +0100
Michael Schnell wrote:
> On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
> > Ansistring supports only one byte per character code pages.
>
> Even more confused. Am I wrong thinking that with code aware Strings,
> for Delphi XE compatibility, in Windows CP_AC
On Wed, 26 Nov 2014 11:23:17 +0100
Michael Schnell wrote:
>[...]
> 2) I fail to understand how with this explanation that seems to force
> auto conversion for assignments between types with different "code page"
> settings (also for CP_ACP) the "static code page can differ from the
> dynamic c
Am 26.11.2014 11:53 schrieb "Michael Schnell" :
>
> On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
>>
>> Ansistring supports only one byte per character code pages.
>
>
> Even more confused. Am I wrong thinking that with code aware Strings,
for Delphi XE compatibility, in Windows CP_ACP needs to b
On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
Ansistring supports only one byte per character code pages.
Even more confused. Am I wrong thinking that with code aware Strings,
for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not
right, than due later) ?
What is a "Co
On Wed, 26 Nov 2014 11:23:17 +0100
Michael Schnell wrote:
>[...]
> It seems to be unavoidable to use the name "ANSIString" even though I
> always though up when seeing a thing called "ANSI" containing Unicode
> (e. g. "UTF8String = type AnsiString(CP_UTF8)" ).
Is there a question?
> Seemi
59 matches
Mail list logo