2) Nothing is copied on an assignment to a string variable, except the
reference to the memory object.
Sorry, I erroneously thought about the variable itself being ref
counted, while in fact the variable is a pointer to the (hidden) String
management record, which is the ref counted entity
On Thu, 27 Jun 2013, Michael Schnell wrote:
2) Nothing is copied on an assignment to a string variable, except the
reference to the memory object.
Sorry, I erroneously thought about the variable itself being ref counted,
while in fact the variable is a pointer to the (hidden) String
On 06/27/2013 09:51 AM, Michael Van Canneyt wrote:
There is no content pointer. The string array is appended to the record
I see. Thus the pointer is relative and implicate :-P . Silly me.
-Michael
___
fpc-devel maillist -
On 06/26/2013 05:09 PM, Marco van de Voort wrote:
Should is a complex thing here, since there is no implementation to
test with (and see if it has other consequences). I assume a
conversion should be inserted, so at least for non rawbytestrings the
runtime encoding always matches the
In our previous episode, Michael Schnell said:
Should is a complex thing here, since there is no implementation to
test with (and see if it has other consequences). I assume a
conversion should be inserted, so at least for non rawbytestrings the
runtime encoding always matches the
On 06/26/2013 06:29 PM, Hans-Peter Diettrich wrote:
Then you have two choices:
1) convert the string as required
2) copy the content unconverted, but update the encoding
What do you mean by you have two choices ?
In fact the compiler designer has the choice to implement some behavior:
1)
On 06/27/2013 12:26 PM, Marco van de Voort wrote:
That already has been decided, everything Delphi compatible.
I was just speaking hypothetic case.
The starting point of the discussion was the possibility to improve the
compiler/library and potentially introduce mode settings that introduce
On 06/26/2013 08:01 PM, Sven Barth wrote:
The RTL already uses RawByteString for the concatenation helpers.
Does this code do an assignment of RawByteString to normal String with
not already matching Types (and thus create erroneous Strings) ?
I would not suppose so.
Otherwise it would be
On 06/26/2013 10:28 PM, Hans-Peter Diettrich wrote:
Please note that I invited Michael Schnell to provide his version of
such RTL routines, compatible with *his* ideas about better string
handling.
I would be happy to do this, but unfortunately the modified behavior
would need to be
Am 27.06.2013 13:12, schrieb Michael Schnell:
A prominent example is TStringList. I have no idea how it is
implemented in DXE, but using decent RawByteStrings it can be
implemented in a way that can be used with all strings without a
severe performance hit.
Delphi uses String as type for the
On 06/26/2013 06:19 PM, Hans-Peter Diettrich wrote:
A string variable has no encoding type stored. Only non-empty strings
have an encoding.
Sorry for bad wording. Not the String variable itself (as same is just a
pointer to the String Record) but the string Record it points to has the
On 06/27/2013 01:24 PM, Sven Barth wrote:
Delphi uses String as type for the TStringList and thus with
Delphi 2009 and newer this is UnicodeString.
I did assume this.
As I don't have a new Delphi I in turn don't know what exactly
UnicodeString means.
From what I read I assume this means
In our previous episode, Michael Schnell said:
As I don't have a new Delphi I in turn don't know what exactly
UnicodeString means.
utf16 as has been said hundreds of times, and can be seen in thousands of
locations on the web. If you don't get these essential features, then all
this discussion
On 06/27/2013 01:48 PM, Marco van de Voort wrote:
when storing a - say UTF-8 - String in a
stringlist and retrieving it later to a String variable with encoding
type UTF-8 a dual conversion is done.
Yes.
To me this seems absolutely silly.
Correct. Using UTF8 on Windows is silly, as it is
Michael Schnell schrieb:
2) Nothing is copied on an assignment to a string variable, except the
reference to the memory object.
Sorry, I erroneously thought about the variable itself being ref
counted, while in fact the variable is a pointer to the (hidden) String
management record,
On 06/27/2013 12:54 PM, Hans-Peter Diettrich wrote:
Now you also should understand that a string variable points directly
to the string content, it's usable as PChar(str) without any
conversion. The other information about the string resides *before*
that address.
I did do the testing
In our previous episode, Michael Schnell said:
Yep. But fpc is not windows-centric,
These are all discussion that have raged for years, and an implementation
was made. Basta.
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
On 06/27/2013 02:52 PM, Marco van de Voort wrote:
These are all discussion that have raged for years, and an implementation
was made. Basta.
As I can't do any patch for the compiler myself, I can't comment on that.
-Michael
___
fpc-devel maillist -
Am 27.06.2013 13:37 schrieb Michael Schnell mschn...@lumino.de:
As I don't have a new Delphi I in turn don't know what exactly
UnicodeString means.
But you do remember that I sent you a list of string types a few days ago?
Regards,
Sven
___
fpc-devel
On 06/27/2013 05:22 PM, Sven Barth wrote:
But you do remember that I sent you a list of string types a few days ago?
I just wanted to avoid to state something that might be wrong :-[
-Michael
___
fpc-devel maillist -
On 06/25/2013 01:25 PM, Hans-Peter Diettrich wrote:
8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7
and the currently released Lazarus this seems to be 8 Bits.
Please read before confusing everything.
Sorry that I maybe did not phrase my question/ request appropriately:
On 06/25/2013 01:19 PM, Hans-Peter Diettrich wrote:
This is not the case :-(
A variable can not force a conversion, when a RawByteString is
assigned to it :-(
I suppose you decently tested this with the newest version of Delphi XE.
I can't comment, as I dont have DXE. :-( .
But you and the
On 06/26/2013 09:41 AM, Michael Schnell wrote:
It shows ... how it is done.
Hi DoDi,
You might be inclined to enhance the test program for me and compile it
with DXE:
AFAI understand the encoding type and as I see in
http://wiki.freepascal.org/FPC_Unicode_support :
Am 26.06.2013 09:41, schrieb Michael Schnell:
On 06/25/2013 01:20 PM, Hans-Peter Diettrich wrote:
Michael Schnell schrieb:
Supposedly the length and encoding number and code-bytecount is
copied, too.
Please understand reference counted memory objects :-]
Please check this program I tested
Michael Schnell schrieb:
On 06/25/2013 01:20 PM, Hans-Peter Diettrich wrote:
Michael Schnell schrieb:
Supposedly the length and encoding number and code-bytecount is
copied, too.
Please understand reference counted memory objects :-]
Please check this program I tested with a pre-Unicode
On 06/26/2013 12:13 PM, Sven Barth wrote:
You do know that s2 will point to the same record of s1 after the
assignment? The contents of the string record are not copied, only the
pointer of s2 will change. See this example:
You are right (my testing program in pre-Unicode-Delphi does show
On 06/26/2013 12:05 PM, Hans-Peter Diettrich wrote:
After an assignment both strings refer to the same memory, i.e.
pchar(s1)=pchar(s2). Everything else indicates an error, somwehere.
This is exactly what I wanted to show: it results in ContentPointer,
StringLength, ReferenceCount (plus - if
Am 26.06.2013 12:38, schrieb Michael Schnell:
On 06/26/2013 12:13 PM, Sven Barth wrote:
You do know that s2 will point to the same record of s1 after the
assignment? The contents of the string record are not copied, only
the pointer of s2 will change. See this example:
You are right (my
BTW.
I think the implementation would be quite easy, straight forward, fast
and compatible.
- The compiler knows the static encoding type of each string variable.
- The dynamic encoding type of a String is preset to the static
encoding type when the string is allocated
- only
On 06/26/2013 01:40 PM, Sven Barth wrote:
It's the whole use of RawByteString that the encoding is kept. For all
other string types the content will be converted
That is what I did assume, but I understood dodi in a way that he
suggested that it (with normal means such as assigning to another
Am 26.06.2013 13:59, schrieb Michael Schnell:
BTW.
I think the implementation would be quite easy, straight forward, fast
and compatible.
- The compiler knows the static encoding type of each string variable.
- The dynamic encoding type of a String is preset to the static
encoding type
Am 26.06.2013 14:02, schrieb Michael Schnell:
On 06/26/2013 01:40 PM, Sven Barth wrote:
It's the whole use of RawByteString that the encoding is kept. For
all other string types the content will be converted
That is what I did assume, but I understood dodi in a way that he
suggested that it
On 06/26/2013 02:08 PM, Sven Barth wrote:
Am 26.06.2013 14:02, schrieb Michael Schnell:
That is what I did assume, but I understood dodi in a way that he
suggested that it (with normal means such as assigning to another
String) is not possible to make use of the encoding type of a String
On 06/26/2013 02:59 PM, Sven Barth wrote:
It's a counter argument to it is not possible to make use of the
encoding type of a String information that had been assigned to a
RawByteString. This function returns the current code page of the
string. And using SetCodePage you can force a
In our previous episode, Michael Schnell said:
If the RawByteString Variable already has a dynamic encoding type other
than $ a conversion might or might not be necessary.
There never is a conversion when assigning to/from rawbytestring, so this is
a strange line.
On 06/26/2013 03:44 PM, Marco van de Voort wrote:
There never is a conversion when assigning to/from rawbytestring, so
this is a strange line.
Sven replied to my contribution that suggested an implementation that in
fact does a conversion when doing an assignment from a RawByteString to
a
On 06/26/2013 03:44 PM, Marco van de Voort wrote:
There never is a conversion when assigning to/from rawbytestring,
So what do you suggest should happen when assigning a RawByteString to a
normal String ? The result could be a strange thing that is encoded
other than the type requires. To me
In our previous episode, Michael Schnell said:
On 06/26/2013 03:44 PM, Marco van de Voort wrote:
There never is a conversion when assigning to/from rawbytestring, so
this is a strange line.
Sven replied to my contribution that suggested an implementation that in
fact does a conversion
In our previous episode, Michael Schnell said:
There never is a conversion when assigning to/from rawbytestring,
So what do you suggest should happen when assigning a RawByteString to a
normal String ?
Should is a complex thing here, since there is no implementation to test
with (and see
Sven Barth schrieb:
Am 26.06.2013 14:02, schrieb Michael Schnell:
On 06/26/2013 01:40 PM, Sven Barth wrote:
It's the whole use of RawByteString that the encoding is kept. For
all other string types the content will be converted
That is what I did assume, but I understood dodi in a way that
Michael Schnell schrieb:
On 06/26/2013 12:05 PM, Hans-Peter Diettrich wrote:
After an assignment both strings refer to the same memory, i.e.
pchar(s1)=pchar(s2). Everything else indicates an error, somwehere.
This is exactly what I wanted to show: it results in ContentPointer,
StringLength,
Michael Schnell schrieb:
On 06/26/2013 03:44 PM, Marco van de Voort wrote:
There never is a conversion when assigning to/from rawbytestring,
So what do you suggest should happen when assigning a RawByteString to a
normal String ? The result could be a strange thing that is encoded
other
Am 26.06.2013 18:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com:
Michael Schnell schrieb:
On 06/26/2013 03:44 PM, Marco van de Voort wrote:
There never is a conversion when assigning to/from rawbytestring,
So what do you suggest should happen when assigning a RawByteString to a
Sven Barth schrieb:
IMO a reasonable decision should take into account the use of the
RawByteString type in RTL code, e.g. for concatenation.
The RTL already uses RawByteString for the concatenation helpers.
This means that the assumptions implied by that code have to be matched
by the
2013/6/21 Sergei Gorelkin sergei_gorel...@mail.ru:
I've profiled the code and found no conversions taking place. All the
slowdown appears to be caused by other reasons, hard to tell the topmost
contributor. What catches the eye is the large amount of calls to
UniqueString, and the fact that
On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:
A RawByteString can obtain any encoding, so no conversions are required.
But when assigned back to an UnicodeString, the obtained encoding is
used to convert the string.
That sounds good. The name RAW just misled me to think it would not
On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:
In fact it looks like only the string pointers are copied between
AnsiString and RawByteString, with the refcount changed accordingly.
Supposedly the length and encoding number and code-bytecount is copied,
too.
-Michael
On 06/24/2013 08:21 PM, Sven Barth wrote:
AnsiString:
up to 2^23-1 characters, reference counted, system encoding
(determined by the code page at compilation time AFAIK)
8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7
and the currently released Lazarus this seems to
In our previous episode, Sven Barth said:
AnsiString:
up to 2^23-1 characters, reference counted, system encoding
(determined by the code page at compilation time AFAIK)
(2^31-1 obviously, since it is 32-bit variable, but many operations
use signed types)
WideString
- on
Michael Schnell schrieb:
On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:
In fact it looks like only the string pointers are copied between
AnsiString and RawByteString, with the refcount changed accordingly.
Supposedly the length and encoding number and code-bytecount is copied,
too.
Michael Schnell schrieb:
On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote:
A RawByteString can obtain any encoding, so no conversions are required.
But when assigned back to an UnicodeString, the obtained encoding is
used to convert the string.
That sounds good. The name RAW just misled
Michael Schnell schrieb:
On 06/24/2013 08:21 PM, Sven Barth wrote:
AnsiString:
up to 2^23-1 characters, reference counted, system encoding
(determined by the code page at compilation time AFAIK)
8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7
and the currently
On 06/25/2013 01:19 PM, Hans-Peter Diettrich wrote:
Efficient code must be based on a single encoding, with conversions
only from and to the outer world (OS, files...).
That does not force to prevent intermediately storing a string in
something that can hold any encoding type.
-Michael
On 06/21/2013 07:43 PM, Sven Barth wrote:
Just to clear up the names: UnicodeString is *not* the code page aware
string type (although they share the metadata record). It is a
dynamic length 2 byte string. The code page aware string type is
AnsiString.
Thanks for making this clear.
Michael Schnell schrieb:
Could you give us a list of the different - legacy and to be supported
- string types we might be seeing including their official names to
make the discussion less ambiguous.
This should be clear since a long time. I e.g. remember your strange
(Delphi incompatible)
On 06/24/2013 12:43 PM, Hans-Peter Diettrich wrote:
This should be clear since a long time.
Sorry, but e.g. I don't know the official names of the Delphi 7
compatible String and the Delphi XE compatible String in fpc/Lazarus.
I suppose in DXE the Delphi 7 compatible String is not available
In our previous episode, Michael Schnell said:
I do now that that the Delphi 7 compatible String in fpc sometimes has
been called ANSIString, while Lazarus funnily stores UTF8 in the type
ANSIString, even in spite of the naming.
You can funnily store utf8 in type ansistring under Delphi 7
On 06/24/2013 03:11 PM, Marco van de Voort wrote:
You can funnily store utf8 in type ansistring under Delphi 7 too.
Yep. But D7 does not rely on some string to be encoded in UTF8 (but in
the ANSI table the System configuration defines), while the LCL API
wants to see the strings in UTF8 code.
Michael Schnell schrieb:
On 06/24/2013 12:43 PM, Hans-Peter Diettrich wrote:
I e.g. remember your strange (Delphi incompatible) opinions about
RawByteString and encodings in a startup discussion.
Yep. As I did not have DTX to try it, I only read what I could find in
the internet and
On 24-6-2013 17:13, Michael Schnell wrote:
On 06/24/2013 04:44 PM, Hans-Peter Diettrich wrote:
Not in Delphi. For binary data TBytes has been added.
Which (AFAIK) is not reference counting can't do + and thus much
less versatile.
It is also highly controversial since XE4:
For example a
On 24.06.2013 11:36, Michael Schnell wrote:
On 06/21/2013 07:43 PM, Sven Barth wrote:
Just to clear up the names: UnicodeString is *not* the code page aware
string type (although they share the metadata record). It is a
dynamic length 2 byte string. The code page aware string type is
On 24.06.2013 16:44, Hans-Peter Diettrich wrote:
I hope, now I understand that the type RawByteString ( = String
($) ) means codesize = 1 Byte, never to be auto-converted to any
differently encoded String type variable.
No. Even if I would like such an encoding, too, Delphi doesn't
Sven Barth schrieb:
On 24.06.2013 16:44, Hans-Peter Diettrich wrote:
I hope, now I understand that the type RawByteString ( = String
($) ) means codesize = 1 Byte, never to be auto-converted to any
differently encoded String type variable.
No. Even if I would like such an encoding, too,
Am 21.06.2013 16:29, schrieb Sergei Gorelkin:
and the fact that SetCodePage goes through implicit
try..finally block even if it does not need to convert the string.
I've fixed this one on r24942
___
fpc-devel maillist -
On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
The point is that i would expect a smaller performance hit when
there's no conversion going on. Something between 10% slower. In the
cited case is more than 50% slow.
As the dynamic types of (most) String Variables are already defined
On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
Maybe in that example there's going an (unneeded) conversion?
If you use the same string type all over the place it would be a severe
bug if _any_ conversion is done.
Please check.
-Michael
Michael Schnell schrieb:
On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
The point is that i would expect a smaller performance hit when
there's no conversion going on. Something between 10% slower. In the
cited case is more than 50% slow.
As the dynamic types of (most) String
On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote:
Again I'd assume that the memory allocation for the result is the most
expensive operation with UnicodeString operands, independent from
string lengths.
Do you suggest that with UnicodeString - even when using 1 Byte encoding
types such
Michael Schnell schrieb:
On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote:
Again I'd assume that the memory allocation for the result is the most
expensive operation with UnicodeString operands, independent from
string lengths.
Do you suggest that with UnicodeString - even when using 1
2013/6/21 Michael Schnell mschn...@lumino.de:
On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
Maybe in that example there's going an (unneeded) conversion?
If you use the same string type all over the place it would be a severe bug
if _any_ conversion is done.
Please check.
On 06/21/2013 02:20 PM, Hans-Peter Diettrich wrote:
Please note that I was *not* talking about AnsiStrings.
Sorry I don't understand.
I recon the OP asking about a performance hit, meant a degradation
regarding the new (Delphi XE compatible) vs the old (Delphi 7
compatible) String
On 21.06.2013 17:11, luiz americo pereira camara wrote:
2013/6/21 Michael Schnell mschn...@lumino.de:
On 06/20/2013 05:31 PM, luiz americo pereira camara wrote:
Maybe in that example there's going an (unneeded) conversion?
If you use the same string type all over the place it would be a
On 06/21/2013 04:29 PM, Sergei Gorelkin wrote:
What catches the eye is the large amount of calls to UniqueString,
It would be interesting to see whether the old (not new Unicode
library) project does the same amount of UniqueString. I don't see why
the new library should do more of these
Michael Schnell schrieb:
On 06/21/2013 02:20 PM, Hans-Peter Diettrich wrote:
Please note that I was *not* talking about AnsiStrings.
Sorry I don't understand.
You snipped the context, which was UnicodeString (second case). The
AnsiString case was covered before.
I recon the OP asking
Am 21.06.2013 10:36 schrieb Michael Schnell mschn...@lumino.de:
On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote:
Again I'd assume that the memory allocation for the result is the most
expensive operation with UnicodeString operands, independent from string
lengths.
Do you suggest that
20.06.2013 16:15, luiz americo pereira camara пишет:
I looked at http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html
There's a significant performance drop in fpc trunk
The difference of generated code is a call to fpc_ansistr_assign and a
different implementation of
2013/6/20 Sergei Gorelkin sergei_gorel...@mail.ru:
20.06.2013 16:15, luiz americo pereira camara пишет:
I looked at
http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html
There's a significant performance drop in fpc trunk
Is there anything wrong or this is the expected result?
20.06.2013 19:31, luiz americo pereira camara пишет:
Maybe in that example there's going an (unneeded) conversion?
This is possible. One needs to profile the example to tell for sure.
Regards,
Sergei
___
fpc-devel maillist -
78 matches
Mail list logo