Re: [fpc-devel] Performance of string handling in trunk
2) Nothing is copied on an assignment to a string variable, except the reference to the memory object. Sorry, I erroneously thought about the variable itself being ref counted, while in fact the variable is a pointer to the (hidden) String management record, which is the ref counted entity and holds the content pointer to the String array. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On Thu, 27 Jun 2013, Michael Schnell wrote: 2) Nothing is copied on an assignment to a string variable, except the reference to the memory object. Sorry, I erroneously thought about the variable itself being ref counted, while in fact the variable is a pointer to the (hidden) String management record, which is the ref counted entity and holds the content pointer to the String array. There is no content pointer. The string array is appended to the record Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/27/2013 09:51 AM, Michael Van Canneyt wrote: There is no content pointer. The string array is appended to the record I see. Thus the pointer is relative and implicate :-P . Silly me. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 05:09 PM, Marco van de Voort wrote: Should is a complex thing here, since there is no implementation to test with (and see if it has other consequences). I assume a conversion should be inserted, so at least for non rawbytestrings the runtime encoding always matches the compiletime one. I feel such implementation details are up to the fpc developers to decide. If everybody agrees that doing a conversion when a RawByteString is assigned to a normal String and the dynamic encoding does not match, is the better alternative vs the potentially unpredictable behavior in DXE??, I think this should be the default behavior in fpc. It might be nice to add Delphi quirks modes that issue an error message in that case or just do the assignment even if an intersexual String (the static encoding that the compiler sees does not match the dynamic encoding) is the result with unpredictable consequences. The result could be a strange thing that is encoded other than the type requires. To me this behavior is a quirk go and should not be capt just for compatibility. . The whole concept is about compatibility, and that is a race that has already been run. The incompatibility only arises when doing something that in Delphi XE is depreciated according to the docs (assigning a RawByteString to a normal String), anyway. Thus I don't see any problem with implemented a sensible behavior in that case. Another option would be to invent yet another String type that basically is a RawByteString but at compile time is used differently just in that when assigning it to a normal string that does not match the dynamic encoding, the conversion library call is done. (In fact this always was my initial idea, in fact: giving the name RawByteString back the meaning, the name suggests.) When doing so assigning a RawByteStrinig to a normal String could be strictly forbidden (unless some Delphi Quirks Mode is set). But I do see that the additional complexity of defining jet another String type might be not nice. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Michael Schnell said: Should is a complex thing here, since there is no implementation to test with (and see if it has other consequences). I assume a conversion should be inserted, so at least for non rawbytestrings the runtime encoding always matches the compiletime one. I feel such implementation details are up to the fpc developers to decide. That already has been decided, everything Delphi compatible. I was just speaking hypothetic case. If everybody agrees that doing a conversion when a RawByteString is assigned to a normal String and the dynamic encoding does not match, is the better alternative vs the potentially unpredictable behavior in DXE??, I think this should be the default behavior in fpc. In general, it has been proven time and time again that deviating from Delphi is nearly always the worse choice. It just creates two cases to check for code that must still compile under Delphi instead of just one. So even if the extra case is better, it still produces more heartbreak for many people. Unless delphi changes it in some later version (since then, Delphi codebases will be adapted anyway) It might be nice to add Delphi quirks modes that issue an error message in that case or just do the assignment even if an intersexual String (the static encoding that the compiler sees does not match the dynamic encoding) is the result with unpredictable consequences. I don't think quirks mode would be useful. It is not just syntax, the resulting encoding in the string type is different between quirks and non-quirks mode. IOW passing strings between an quirks and non-quirks unit would get very intransparent. The whole concept is about compatibility, and that is a race that has already been run. The incompatibility only arises when doing something that in Delphi XE is depreciated according to the docs (assigning a RawByteString to a normal String), anyway. Thus I don't see any problem with implemented a sensible behavior in that case. In general the average code doesn't really honour such fine differences, so in practice this doesn't matter. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 06:29 PM, Hans-Peter Diettrich wrote: Then you have two choices: 1) convert the string as required 2) copy the content unconverted, but update the encoding What do you mean by you have two choices ? In fact the compiler designer has the choice to implement some behavior: 1) convert the string as required (seems most sensible) 2) copy the content unconverted, but update the encoding (does not seem sensible at all as with that the static encoding type of the normal target String does not match the dynamic encoding type any more). At other locations in the code the compiler creates will implicitly use the static encoding type (e.g. to decide whether or not a conversion is necessary) and the content will be interpreted wrong. 3) issue a warning or (better) an error at compile time for any assignment of a RawByteString to a normal String (as conversion is not implemented and not converting leads to unpredictable behavior) 4) issue an exception at runtime when the types don't match (not nice but consistent) Of course appropriate Delphi Quirks modes could influence the compiler on that behalf. IMO a reasonable decision should take into account the use of the RawByteString type in RTL code, e.g. for concatenation. The RTL of course needs to perfectly match the compiler. But as both are under construction right now (regarding the behavior with this kind of Strings however they are called) I think that is easily doable. Can you show us your intended code for these functions? What functions ? We are talking compiler behavior. I think I already did write down what I meant (the version with just RawByteString and not with an additional String Type of another name that might be even more attractive. I can do this again in a matrix instead of a the text version I wrote; When assigning such Strings (I hope the monospace is visible in the List): (The compiler does the test for encoding using the static (compile time) encoding type with normal strings and the dynamic (in the string record) encoding type value for RawByteString.) Source: |normal String | RawByteString target: | | normal String with the same static encoding | set pointer | set pointer (after checking dynamic encoding) normal String with different static encoding |call conversion | call conversion(after checking dynamic encoding) RawByteString(dynamic type ignored) |set pointer | set pointer (checking dynamic encoding not necessary) Note: - if the static types match (be it Raw or not) just set pointer. - the compiler only needs to issue code to check the dynamic type if the source is RawByteString. - the dynamic type of the target is ignored by the compiler. Only the conversion function will use it. - the static type of source and target is not used by the conversion library function. It can work according to the dynamic types and thus just needs to be given the two string variables (Pointers) in the call the compiler creates (in assembler object code). - for a normal String, a mismatch between static and dynamic type (that would be erroneous in DXE as well) can't happen. - for RawByteString, a normal dynamic type means: this is printable information and a dynamic type $ (that had been assigned to the string when instantiating) means: this String just holds just bytes with no encoding assumed. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/27/2013 12:26 PM, Marco van de Voort wrote: That already has been decided, everything Delphi compatible. I was just speaking hypothetic case. The starting point of the discussion was the possibility to improve the compiler/library and potentially introduce mode settings that introduce a different (better) behavior. I don't suppose there is any documentation on fpc yet that exactly describes the behavior when assigning a RawByteString to a normal String (in DXE I did not see such documentation either) In general, it has been proven time and time again that deviating from Delphi is nearly always the worse choice. Here the deviation is just doing a decent implementation for a case that is depreciated and not decently defined in the Delphi docs, and for which a decent behavior can easily be defined and implemented. I don't think quirks mode would be useful. It is not just syntax, Yes, it is just syntax as what I suggest to implement in DXE just is a depreciated statement (e.g. myString := myRawByteString). In general the average code doesn't really honour such fine differences, so in practice this doesn't matter. This is very wrong IMHO. It offers the possibility to create functions that can be fed with any (normal) type of String and act on this without doing a conversion. A prominent example is TStringList. I have no idea how it is implemented in DXE, but using decent RawByteStrings it can be implemented in a way that can be used with all strings without a severe performance hit. Another example is the Lazarus LCL that could provide a user interface done with RawByteStrings and thus allow for the user to use any encoding that is optimum for his application and LCL internally can use the string type that best matches the underling OS. The conversion - if necessary - would automatically be done at a decent point in the work. The performance hit would be minimal as the compiler only needs to implement any additional code (over using the same string type everywhere) when a normal String and a RawByteString get together in a single statement. Regarding that LCL calls are not done in close loops this would be close to zero. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 08:01 PM, Sven Barth wrote: The RTL already uses RawByteString for the concatenation helpers. Does this code do an assignment of RawByteString to normal String with not already matching Types (and thus create erroneous Strings) ? I would not suppose so. Otherwise it would be compatible to the suggested modification. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 10:28 PM, Hans-Peter Diettrich wrote: Please note that I invited Michael Schnell to provide his version of such RTL routines, compatible with *his* ideas about better string handling. I would be happy to do this, but unfortunately the modified behavior would need to be implemented in the compiler and I don't dare to touch that code. The conversion library function would not be affected. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 27.06.2013 13:12, schrieb Michael Schnell: A prominent example is TStringList. I have no idea how it is implemented in DXE, but using decent RawByteStrings it can be implemented in a way that can be used with all strings without a severe performance hit. Delphi uses String as type for the TStringList and thus with Delphi 2009 and newer this is UnicodeString. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 06:19 PM, Hans-Peter Diettrich wrote: A string variable has no encoding type stored. Only non-empty strings have an encoding. Sorry for bad wording. Not the String variable itself (as same is just a pointer to the String Record) but the string Record it points to has the field for storing the dynamic encoding type. The string variable of course only has it's static encoding type at compile time. Nonetheless you need to know that a string variable with one (normal) encoding type that points to a String Record that holds a different encoding type should never happen and might trigger unpredictable behavior. No string can have an encoding of $. Why then is same defined as a constant ? I assume that when creating a RawByteString variable and not assigning a normal string to same, it will point to a a string Record holding an encoding Type $, but I might be wrong regarding this implementation detail. You might use DXE and test var s;: RawByteString. .. setlength(s, 10); //force allocation if a String record. ... This only gets interesting when using RawByteString not in the way we discussed right now but according to what the name suggests - but this is jet another issue. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/27/2013 01:24 PM, Sven Barth wrote: Delphi uses String as type for the TStringList and thus with Delphi 2009 and newer this is UnicodeString. I did assume this. As I don't have a new Delphi I in turn don't know what exactly UnicodeString means. From what I read I assume this means System Encoding and this again means UTF-16 or UCS2. And if all this is true, when storing a - say UTF-8 - String in a stringlist and retrieving it later to a String variable with encoding type UTF-8 a dual conversion is done. To me this seems absolutely silly. It might be acceptable with Delhi that is Windows-centric and in fact depreciates the use of codes other than the System Encoding With a cross-platform tool such as fpc, a smarter (though compatible) implementation should be provided. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Michael Schnell said: As I don't have a new Delphi I in turn don't know what exactly UnicodeString means. utf16 as has been said hundreds of times, and can be seen in thousands of locations on the web. If you don't get these essential features, then all this discussion is useless. From what I read I assume this means System Encoding and this again means UTF-16 or UCS2. That's a bad term. Windows has THREE system encodings. One, OEM for legacy console, one (called ANSI) for 1-byte types, and UNICODE which means UTF16 (UCS2 on NT4 and Win2000) And if all this is true, when storing a - say UTF-8 - String in a stringlist and retrieving it later to a String variable with encoding type UTF-8 a dual conversion is done. Yes. To me this seems absolutely silly. Correct. Using UTF8 on Windows is silly, as it is not a native string type, and is never used by default. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/27/2013 01:48 PM, Marco van de Voort wrote: when storing a - say UTF-8 - String in a stringlist and retrieving it later to a String variable with encoding type UTF-8 a dual conversion is done. Yes. To me this seems absolutely silly. Correct. Using UTF8 on Windows is silly, as it is not a native string type, and is never used by default. Yep. But fpc is not windows-centric, thus i´rt should not force the user to n encoding that is suggested by Windows. And (at least the definitions in interface of) TStringList should be not OS or arch-depending. Thus using a String type that imposes a fixed encoding or (even worse) that might change according to the Arch/OS setting when compiling is a rather bad idea. As imposing a dual unnecessary conversion or forcing the user to use a certain encoding when working with TStringList is a bad idea as well. This IMHO we do need an appropriately versatile String type and (a decently fast) implementation. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: 2) Nothing is copied on an assignment to a string variable, except the reference to the memory object. Sorry, I erroneously thought about the variable itself being ref counted, while in fact the variable is a pointer to the (hidden) String management record, Fine that you finally start familiarizing with reference counted objects reality :-) which is the ref counted entity and holds the content pointer to the String array. Now you also should understand that a string variable points directly to the string content, it's usable as PChar(str) without any conversion. The other information about the string resides *before* that address. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/27/2013 12:54 PM, Hans-Peter Diettrich wrote: Now you also should understand that a string variable points directly to the string content, it's usable as PChar(str) without any conversion. The other information about the string resides *before* that address. I did do the testing program I provided you with appropriately already some days ago :-) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Michael Schnell said: Yep. But fpc is not windows-centric, These are all discussion that have raged for years, and an implementation was made. Basta. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/27/2013 02:52 PM, Marco van de Voort wrote: These are all discussion that have raged for years, and an implementation was made. Basta. As I can't do any patch for the compiler myself, I can't comment on that. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 27.06.2013 13:37 schrieb Michael Schnell mschn...@lumino.de: As I don't have a new Delphi I in turn don't know what exactly UnicodeString means. But you do remember that I sent you a list of string types a few days ago? Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/27/2013 05:22 PM, Sven Barth wrote: But you do remember that I sent you a list of string types a few days ago? I just wanted to avoid to state something that might be wrong :-[ -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/25/2013 01:25 PM, Hans-Peter Diettrich wrote: 8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7 and the currently released Lazarus this seems to be 8 Bits. Please read before confusing everything. Sorry that I maybe did not phrase my question/ request appropriately: I am interested in clear terms for use in a discussion. To demonstrate that I just wanted to show that the same language keywords are used with different meanings in different versions of the compilers / libraries. Your recent messages still indicate that you never understood even string basics. Why don't you start adjusting your weird mind to the facts, as have been given repeatedly since years? :-( Sorry again. We are (still) discussing how am implementation in fpc can potentially be better than that in DXE. Thus the basics and the facts are not really of interest. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/25/2013 01:19 PM, Hans-Peter Diettrich wrote: This is not the case :-( A variable can not force a conversion, when a RawByteString is assigned to it :-( I suppose you decently tested this with the newest version of Delphi XE. I can't comment, as I dont have DXE. :-( . But you and the docs state, that RawByteString is not intended to hold a string of unencoded raw bytes that never are supposed to represent printable characters, but despite of its name it is (e.g.) supposed to be used as formal parameter, a String holding printable characters with a known encoding is to be assigned to. Thus I in fact fail to see the sense of it existence, if really the information about the encoding type of the string, assigned (without conversion) to it, is lost. OTOH it seems to be easily to understand and to implement that/if the dynamic EncodingTyp tag (that according top the docs exists in the string management record together with the ContentPointer, the StringLength and ReferenceCounter) is updated with that information during the assignment (in the same way as ContentPointer and StringLength). This would allow for decent use of such a type variant and IMHO should be the way to go for fpc. This would be perfectly compatible, even if Delphi does not allow for such usage of the RawByteString Type. It would not slow down anything if you don't use that feature, and IMHO the performance hit would be close to zero (and still rather compatible) if implementing stuff like TStringList using this feature. Effect: - Such a TStringList would be able to work with any String type without ever forcing an auto-conversion (unless you check out a string to a variable of a different (static) type). - Lazarus could use such a string type as it's interface to the user code. This would allow for using greatly the same code for multiple archs, independent of the user code. IMHO, the performance hit for this should be small, as these interface functions mostly are not used in very long close loops. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 09:41 AM, Michael Schnell wrote: It shows ... how it is done. Hi DoDi, You might be inclined to enhance the test program for me and compile it with DXE: AFAI understand the encoding type and as I see in http://wiki.freepascal.org/FPC_Unicode_support : type TRefStringRec= packed record Encoding: word; // encoding of string ElementSize: byte; // size in bytes of string's element (1-4) Ref: SizeInt; // number of references Len: SizeInt; // number of elements is string end; (In fact I suppose that a dummy byte is inserted to prevent that the SizeInt types are misaligned) The encoding type information should be just before the ref counter and thus adding something like v1 := PInteger(j1-12); v2 := PInteger(j2-12); And printing this in hex should show this information. Now you could test in the newest DXE version what happens when assigning a normal string to a RawByteString and vice versa. Thanks for helping out... -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 26.06.2013 09:41, schrieb Michael Schnell: On 06/25/2013 01:20 PM, Hans-Peter Diettrich wrote: Michael Schnell schrieb: Supposedly the length and encoding number and code-bytecount is copied, too. Please understand reference counted memory objects :-] Please check this program I tested with a pre-Unicode Delphi. It shows that (of course) the string length gets copied when assigning a string variable to another and how it is done. You do know that s2 will point to the same record of s1 after the assignment? The contents of the string record are not copied, only the pointer of s2 will change. See this example: === code begin === program tstrassign; {$apptype console} {$ifdef fpc} {$H+} {$endif} {$ifndef fpc} uses SysUtils; function hexstr(ptr: Pointer): String; begin Result := IntToHex(Integer(ptr), 8); end; {$endif} var s1, s2: String; begin s1 := 'Test'; Writeln(hexstr(Pointer(s1)), ' ', hexstr(Pointer(s2))); s2 := s1; Writeln(hexstr(Pointer(s1)), ' ', hexstr(Pointer(s2))); end. === code end === Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/25/2013 01:20 PM, Hans-Peter Diettrich wrote: Michael Schnell schrieb: Supposedly the length and encoding number and code-bytecount is copied, too. Please understand reference counted memory objects :-] Please check this program I tested with a pre-Unicode Delphi. It shows that (of course) the string length gets copied when assigning a string variable to another and how it is done. I don't see how this is checked by your code. After an assignment both strings refer to the same memory, i.e. pchar(s1)=pchar(s2). Everything else indicates an error, somwehere. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 12:13 PM, Sven Barth wrote: You do know that s2 will point to the same record of s1 after the assignment? The contents of the string record are not copied, only the pointer of s2 will change. See this example: You are right (my testing program in pre-Unicode-Delphi does show exactly this). But what I wanted to show is, that here more is done but just managing the reference count. Regarding the dispute I had with dodi, he thus is right that the length is not exactly _copied_over_, but the pointer is managed in a way that the same length (and content) is shown. (I admit that he is correct calling this just reference counting.) Regarding the underlying discussion about RawByteString: If exactly this is done when assigning a normal String variable to a RawByteString Variable, it happens exactly what I suppose (and dodi seems to deny): the dynamic encoding type of the RawByteString (target) will be set to the encoding type of the normal String (source). Thus the encoding type is _not_ lost and (in principle) when assigning a RawByteString to a normal String, the library would be able to check the actual dynamic encoding type of the source against the (static=dynamic) encoding type of the target and do a conversion if appropriate. IMHO this would be a very sensible behavior. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 12:05 PM, Hans-Peter Diettrich wrote: After an assignment both strings refer to the same memory, i.e. pchar(s1)=pchar(s2). Everything else indicates an error, somwehere. This is exactly what I wanted to show: it results in ContentPointer, StringLength, ReferenceCount (plus - if no auto-conversion is done - supposedly EncodingType and ElementSize in DXE) being identical for both strings after the assignment. Thus a RawByteString supposedly will in fact get the source's encoding type). (see my mail to Sven). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 26.06.2013 12:38, schrieb Michael Schnell: On 06/26/2013 12:13 PM, Sven Barth wrote: You do know that s2 will point to the same record of s1 after the assignment? The contents of the string record are not copied, only the pointer of s2 will change. See this example: You are right (my testing program in pre-Unicode-Delphi does show exactly this). But what I wanted to show is, that here more is done but just managing the reference count. Regarding the dispute I had with dodi, he thus is right that the length is not exactly _copied_over_, but the pointer is managed in a way that the same length (and content) is shown. (I admit that he is correct calling this just reference counting.) Regarding the underlying discussion about RawByteString: If exactly this is done when assigning a normal String variable to a RawByteString Variable, it happens exactly what I suppose (and dodi seems to deny): the dynamic encoding type of the RawByteString (target) will be set to the encoding type of the normal String (source). Thus the encoding type is _not_ lost and (in principle) when assigning a RawByteString to a normal String, the library would be able to check the actual dynamic encoding type of the source against the (static=dynamic) encoding type of the target and do a conversion if appropriate. IMHO this would be a very sensible behavior. It's the whole use of RawByteString that the encoding is kept. For all other string types the content will be converted. See also compare_defs_ext in compiler/defcmp.pas around line 445 (look for don't convert ansistrings). Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
BTW. I think the implementation would be quite easy, straight forward, fast and compatible. - The compiler knows the static encoding type of each string variable. - The dynamic encoding type of a String is preset to the static encoding type when the string is allocated - only RawByteStrings (EncodingType $) are allowed to change their dynamic encoding type, with other Strings this will lead to unpredictable results When Strings are assigned: - If the static encoding type of source and target is identical (be it normal or RAW) (already checked by the compiler) - the same happens as with the pre-Unicode compiler (setting the pointer to the StringRecord and managing the RefCount) otherwise: - If the target is statically defined as RawByteString (already checked by the compiler) - the same happens - If the source is statically defined as RawByteString (already checked by the compiler), code is implemented that checks if the dynamic encoding of the source is identical to the (known to the compiler) static encoding type of the target - the same happens otherwise the conversion library is called. Same checks the _dynamic_ encoding type of source and target (thus it only needs to be provided with the Strings themselves and no additional information generated by the compiler) and does the conversion appropriately. When doing operation on two Strings (such as + and compare), one of the operators is (virtually) copied to a String with the same encoding type as the other. Here: - if one operand is a RawByteString use the (static or dynamic) encoding of the other. - if both are RawByteStrings use the dynamic encoding use the dynamic encoding of one of them (supposedly this is no alternate case to before) If the conversion library sees a dynamic encoding type of $ for either source or target it will fail and issue an exception. IMHO it makes a much more sense to implement things like TStringList on base of RawByteString, as when doing it based on the default System encoding, there will be a dual conversion when using it with any other encoding type. IMHO big commonly used, arch independent, non super high-performance libraries (like LCL) should use RawByteString as their user interface and internally as widely as possible, so that conversions are prevented whenever possible (e.g. when the user's call provides a string and during the work in the library it is decided that it is not actually used.) -Michael (the weird one) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 01:40 PM, Sven Barth wrote: It's the whole use of RawByteString that the encoding is kept. For all other string types the content will be converted That is what I did assume, but I understood dodi in a way that he suggested that it (with normal means such as assigning to another String) is not possible to make use of the encoding type of a String information that had been assigned to a RawByteString. Thanks for the affirmation -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 26.06.2013 13:59, schrieb Michael Schnell: BTW. I think the implementation would be quite easy, straight forward, fast and compatible. - The compiler knows the static encoding type of each string variable. - The dynamic encoding type of a String is preset to the static encoding type when the string is allocated - only RawByteStrings (EncodingType $) are allowed to change their dynamic encoding type, with other Strings this will lead to unpredictable results When Strings are assigned: - If the static encoding type of source and target is identical (be it normal or RAW) (already checked by the compiler) - the same happens as with the pre-Unicode compiler (setting the pointer to the StringRecord and managing the RefCount) otherwise: - If the target is statically defined as RawByteString (already checked by the compiler) - the same happens - If the source is statically defined as RawByteString (already checked by the compiler), code is implemented that checks if the dynamic encoding of the source is identical to the (known to the compiler) static encoding type of the target - the same happens otherwise the conversion library is called. Same checks the _dynamic_ encoding type of source and target (thus it only needs to be provided with the Strings themselves and no additional information generated by the compiler) and does the conversion appropriately. When doing operation on two Strings (such as + and compare), one of the operators is (virtually) copied to a String with the same encoding type as the other. Here: - if one operand is a RawByteString use the (static or dynamic) encoding of the other. - if both are RawByteStrings use the dynamic encoding use the dynamic encoding of one of them (supposedly this is no alternate case to before) If the conversion library sees a dynamic encoding type of $ for either source or target it will fail and issue an exception. See my previously sent answer... Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 26.06.2013 14:02, schrieb Michael Schnell: On 06/26/2013 01:40 PM, Sven Barth wrote: It's the whole use of RawByteString that the encoding is kept. For all other string types the content will be converted That is what I did assume, but I understood dodi in a way that he suggested that it (with normal means such as assigning to another String) is not possible to make use of the encoding type of a String information that had been assigned to a RawByteString. *sigh* See here: http://docwiki.embarcadero.com/VCL/XE/de/System.StringCodePage Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 02:08 PM, Sven Barth wrote: Am 26.06.2013 14:02, schrieb Michael Schnell: That is what I did assume, but I understood dodi in a way that he suggested that it (with normal means such as assigning to another String) is not possible to make use of the encoding type of a String information that had been assigned to a RawByteString. *sigh* See here: http://docwiki.embarcadero.com/VCL/XE/de/System.StringCodePage Sorry I don't see what this (very floppy) worded page (that I of course did know) should say me about the stuff in question to me: - static (known to the compiler) vs dynamic (stored with the string) encoding type - how the compiler and library handles RawByteString as source and/or target of an assignment. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 02:59 PM, Sven Barth wrote: It's a counter argument to it is not possible to make use of the encoding type of a String information that had been assigned to a RawByteString. This function returns the current code page of the string. And using SetCodePage you can force a conversion. I don't see the problem. (1) fpc does not need to closely follow Delphi with stuff that is only seldom used by average application programmers, if there are decent reasons to do another (better) decently compatible implementation. (2) According to the description, StringCodePage returns the Codepage. According to your wording it returns the current code page. With normal strings the _current_ (aka dynamic) code page always is identical to the code page the string had been given (by the compiler) when instantiated. (Otherwise the string would be intersexuel or erroneous and will behave erratically). That is why the (floppy) wording of the description omits this difference. As you stated before, the RawByteString _does_ preserve the encoding type of the information that is assigned to it. It can only do this using its dynamic EncodingType field. Thus it makes sense that the function returns the dynamic EncodingType with RawByteStrings. Thus it simply always might return the dynamic EncodingType. And this is exactly the information that (IMHO) should be used when auto-converting, with the only exception when assigning _to_ a RawByteString (_static_ encoding Type $). That easily can be decided by the compiler at compile time (that here and in many other cases does not even need to call the library, as assigning is simply done by setting the pointer and increasing the RefCount), which IMHO should be done inline. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Michael Schnell said: If the RawByteString Variable already has a dynamic encoding type other than $ a conversion might or might not be necessary. There never is a conversion when assigning to/from rawbytestring, so this is a strange line. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 03:44 PM, Marco van de Voort wrote: There never is a conversion when assigning to/from rawbytestring, so this is a strange line. Sven replied to my contribution that suggested an implementation that in fact does a conversion when doing an assignment from a RawByteString to a normal String when appropriate. The what _is_ (in DXE) is not discussed here (other than as a subject of comparing). - Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/26/2013 03:44 PM, Marco van de Voort wrote: There never is a conversion when assigning to/from rawbytestring, So what do you suggest should happen when assigning a RawByteString to a normal String ? The result could be a strange thing that is encoded other than the type requires. To me this behavior is a quirk go and should not be capt just for compatibility. . -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Michael Schnell said: On 06/26/2013 03:44 PM, Marco van de Voort wrote: There never is a conversion when assigning to/from rawbytestring, so this is a strange line. Sven replied to my contribution that suggested an implementation that in fact does a conversion when doing an assignment from a RawByteString to a normal String when appropriate. The what _is_ (in DXE) is not discussed here (other than as a subject of comparing). I was thinking about the below code that returns 1 in Windows and 0 in Linux. Specially the windows answer is interesting. The Linux result can probably be explained by non implementation of the windows specific OEM codepage concept. {$mode delphiunicode} const cp_oemcp=1; type oemstring = type ansistring(cp_OEMCP); function xx:ansistring; var nn:rawbytestring; begin setlength(nn,1); nn[1]:=#121; setcodepage(nn,cp_oemcp); result:=nn; end; var v : ansistring; begin v:=xx; writeln(stringcodepage(v)); end. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Michael Schnell said: There never is a conversion when assigning to/from rawbytestring, So what do you suggest should happen when assigning a RawByteString to a normal String ? Should is a complex thing here, since there is no implementation to test with (and see if it has other consequences). I assume a conversion should be inserted, so at least for non rawbytestrings the runtime encoding always matches the compiletime one. The result could be a strange thing that is encoded other than the type requires. To me this behavior is a quirk go and should not be capt just for compatibility. . The whole concept is about compatibility, and that is a race that has already been run. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Sven Barth schrieb: Am 26.06.2013 14:02, schrieb Michael Schnell: On 06/26/2013 01:40 PM, Sven Barth wrote: It's the whole use of RawByteString that the encoding is kept. For all other string types the content will be converted That is what I did assume, but I understood dodi in a way that he suggested that it (with normal means such as assigning to another String) is not possible to make use of the encoding type of a String information that had been assigned to a RawByteString. *sigh* +1 ;-) See here: http://docwiki.embarcadero.com/VCL/XE/de/System.StringCodePage The documentation is not complete. Empty strings have no associated string record, thus no encoding; StringCodePage always returns the CP_ACP for empty strings. This means that Delphi offers no means to determine the static (declared) type of an AnsiString variable (except by RTTI?). This also requires compiler magic on string assignments, so that the static encoding of the target variable can be determined and used to force a conversion if required, even if the target is an empty string. This magic seems to be buggy, or inconsistent at least, as observed in my test programs. When a RawByteString is assigned to an AnsiString variable, both variables refer to the same string memory. Afterwards the AnsiString can show strange behaviour, as long as it retains a foreign encoding :-( From ms-help://embarcadero.rs_xe/rad/String_Types.html The RawByteString type is type AnsiString($). RawByteString enables the passing of string data of any code page without doing any code page conversions. RawByteString should only be used as a const or value type parameter or a return type from a function. It should never be passed by reference (passed by var), and should never be instantiated as a variable. I'd extend this warning, that a RawByteString never should be assigned to an AnsiString variable, because the behaviour of that variable becomes almost unpredictable then. [Unless some newer Delphi version fixes this flaw] WRT performance, FPC can make use of that undefined behaviour, and create the most performant code by not checking and handling beforementioned situations. Or FPC can implement more consistent behaviour (to be defined). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/26/2013 12:05 PM, Hans-Peter Diettrich wrote: After an assignment both strings refer to the same memory, i.e. pchar(s1)=pchar(s2). Everything else indicates an error, somwehere. This is exactly what I wanted to show: it results in ContentPointer, StringLength, ReferenceCount (plus - if no auto-conversion is done - supposedly EncodingType and ElementSize in DXE) being identical for both strings after the assignment. Thus a RawByteString supposedly will in fact get the source's encoding type). 1) AnsiString has no ContentPointer. 2) Nothing is copied on an assignment to a string variable, except the reference to the memory object. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/26/2013 03:44 PM, Marco van de Voort wrote: There never is a conversion when assigning to/from rawbytestring, So what do you suggest should happen when assigning a RawByteString to a normal String ? The result could be a strange thing that is encoded other than the type requires. To me this behavior is a quirk go and should not be capt just for compatibility. . Then you have two choices: 1) convert the string as required 2) copy the content unconverted, but update the encoding IMO a reasonable decision should take into account the use of the RawByteString type in RTL code, e.g. for concatenation. Can you show us your intended code for these functions? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 26.06.2013 18:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com: Michael Schnell schrieb: On 06/26/2013 03:44 PM, Marco van de Voort wrote: There never is a conversion when assigning to/from rawbytestring, So what do you suggest should happen when assigning a RawByteString to a normal String ? The result could be a strange thing that is encoded other than the type requires. To me this behavior is a quirk go and should not be capt just for compatibility. . Then you have two choices: 1) convert the string as required 2) copy the content unconverted, but update the encoding IMO a reasonable decision should take into account the use of the RawByteString type in RTL code, e.g. for concatenation. The RTL already uses RawByteString for the concatenation helpers. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Sven Barth schrieb: IMO a reasonable decision should take into account the use of the RawByteString type in RTL code, e.g. for concatenation. The RTL already uses RawByteString for the concatenation helpers. This means that the assumptions implied by that code have to be matched by the RawByteString implementation, or the code must be changed when the RawByteString implementation is changed. Please note that I invited Michael Schnell to provide his version of such RTL routines, compatible with *his* ideas about better string handling. Any suggestions for deviating from the Delphi implementation/behviour deserve an proof that they are useful in the required low-level string manipulation functions. Only after this step we can decide for what *other* purposes RawByteString can be used in user code. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
2013/6/21 Sergei Gorelkin sergei_gorel...@mail.ru: I've profiled the code and found no conversions taking place. All the slowdown appears to be caused by other reasons, hard to tell the topmost contributor. What catches the eye is the large amount of calls to UniqueString, and the fact that SetCodePage goes through implicit try..finally block even if it does not need to convert the string. Seems that Florian changed SetCodePage to avoid implicit try finally. It improved the performance slightly but still a lot slower than 2.6.X . See: http://forum.lazarus.freepascal.org/index.php/topic,21223.msg124551.html#msg124551 Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote: A RawByteString can obtain any encoding, so no conversions are required. But when assigned back to an UnicodeString, the obtained encoding is used to convert the string. That sounds good. The name RAW just misled me to think it would not hold a character encoding. If this in fact is a completely dynamically encoded string type, things like TStringList should use same in their interface, thus preventing all conversions, when a string of any encoding type is stored there and retrieved to a variable of the appropriate dedicated encoding type (while being auto-converted if retrieved to variable forcing a different encoding). Only the documentation http://docwiki.embarcadero.com/Libraries/XE4/en/System.RawByteString shows that they seemingly are not convinced that all this decently works :-( . So a decent system should _additionally_ provide completely unencoded 8, 16, 32 and 64 Bit entity Strings for technical usage (similar to pipes etc) (now not using the RAW naming :-) . -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote: In fact it looks like only the string pointers are copied between AnsiString and RawByteString, with the refcount changed accordingly. Supposedly the length and encoding number and code-bytecount is copied, too. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/24/2013 08:21 PM, Sven Barth wrote: AnsiString: up to 2^23-1 characters, reference counted, system encoding (determined by the code page at compilation time AFAIK) 8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7 and the currently released Lazarus this seems to be 8 Bits. In fact I did ask for a way to distinguish all this verbally (not the keywords in a source file) to allow for doing a non ambiguous discussion. This needs Names that denote the version of the library used. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Sven Barth said: AnsiString: up to 2^23-1 characters, reference counted, system encoding (determined by the code page at compilation time AFAIK) (2^31-1 obviously, since it is 32-bit variable, but many operations use signed types) WideString - on non-Windows: same as UnicodeString - on Windows: up to 2^23-1 characters (?), non reference counted (but managed by OS), UTF-16 encoding (before Win2000, UCS2) String: - in all modes besides mode delphiunicode or modeswitch unicodestrings with H-: ShortString - in all modes besides mode delphiunicode or modeswitch unicodestrings with H+: AnsiString - in mode delphiunicode or modeswitch unicodestrings with H+: UnicodeString (- don't know whether this is correct: in mode delphiunicode or modeswitch unicodestrings with H-: ShortString) {$mode delphunicode}{$H-} results in shortstring yes (checked by sizeof) Note that {$mode delphi} and {$mode delphiunicode} also enable {$H+} while e.g. mode objfpc doesn't. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote: In fact it looks like only the string pointers are copied between AnsiString and RawByteString, with the refcount changed accordingly. Supposedly the length and encoding number and code-bytecount is copied, too. Please understand reference counted memory objects :-] DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/25/2013 01:05 AM, Hans-Peter Diettrich wrote: A RawByteString can obtain any encoding, so no conversions are required. But when assigned back to an UnicodeString, the obtained encoding is used to convert the string. That sounds good. The name RAW just misled me to think it would not hold a character encoding. If this in fact is a completely dynamically encoded string type, things like TStringList should use same in their interface, thus preventing all conversions, when a string of any encoding type is stored there and retrieved to a variable of the appropriate dedicated encoding type (while being auto-converted if retrieved to variable forcing a different encoding). This is not the case :-( A variable can not force a conversion, when a RawByteString is assigned to it :-( Only the documentation http://docwiki.embarcadero.com/Libraries/XE4/en/System.RawByteString shows that they seemingly are not convinced that all this decently works :-( . At least it doesn't work as you expected. So a decent system should _additionally_ provide completely unencoded 8, 16, 32 and 64 Bit entity Strings for technical usage (similar to pipes etc) (now not using the RAW naming :-) . It's *only* the use of strings of different encodings, that make conversions necessary. Efficient code must be based on a single encoding, with conversions only from and to the outer world (OS, files...). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/24/2013 08:21 PM, Sven Barth wrote: AnsiString: up to 2^23-1 characters, reference counted, system encoding (determined by the code page at compilation time AFAIK) 8 or 16 bit codes ? In Delphi XE this seems to be 16 bit, in Delphi 7 and the currently released Lazarus this seems to be 8 Bits. Please read before confusing everything. In fact I did ask for a way to distinguish all this verbally (not the keywords in a source file) to allow for doing a non ambiguous discussion. This needs Names that denote the version of the library used. Your recent messages still indicate that you never understood even string basics. Why don't you start adjusting your weird mind to the facts, as have been given repeatedly since years? :-( DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/25/2013 01:19 PM, Hans-Peter Diettrich wrote: Efficient code must be based on a single encoding, with conversions only from and to the outer world (OS, files...). That does not force to prevent intermediately storing a string in something that can hold any encoding type. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/21/2013 07:43 PM, Sven Barth wrote: Just to clear up the names: UnicodeString is *not* the code page aware string type (although they share the metadata record). It is a dynamic length 2 byte string. The code page aware string type is AnsiString. Thanks for making this clear. Could you give us a list of the different - legacy and to be supported - string types we might be seeing including their official names to make the discussion less ambiguous. Thanks, -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: Could you give us a list of the different - legacy and to be supported - string types we might be seeing including their official names to make the discussion less ambiguous. This should be clear since a long time. I e.g. remember your strange (Delphi incompatible) opinions about RawByteString and encodings in a startup discussion. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/24/2013 12:43 PM, Hans-Peter Diettrich wrote: This should be clear since a long time. Sorry, but e.g. I don't know the official names of the Delphi 7 compatible String and the Delphi XE compatible String in fpc/Lazarus. I suppose in DXE the Delphi 7 compatible String is not available at all, while I suppose in fpc this String type will still be available when setting appropriate compiler options. I do now that that the Delphi 7 compatible String in fpc sometimes has been called ANSIString, while Lazarus funnily stores UTF8 in the type ANSIString, even in spite of the naming. I seem to have read that in Delphi XE the strings also are called ANSIString, even if they work differently from what (the currently released) fpc call with that name. So a decent grid would be very helpful. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
In our previous episode, Michael Schnell said: I do now that that the Delphi 7 compatible String in fpc sometimes has been called ANSIString, while Lazarus funnily stores UTF8 in the type ANSIString, even in spite of the naming. You can funnily store utf8 in type ansistring under Delphi 7 too. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/24/2013 03:11 PM, Marco van de Voort wrote: You can funnily store utf8 in type ansistring under Delphi 7 too. Yep. But D7 does not rely on some string to be encoded in UTF8 (but in the ANSI table the System configuration defines), while the LCL API wants to see the strings in UTF8 code. _This_ is funny IMHO. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/24/2013 12:43 PM, Hans-Peter Diettrich wrote: I e.g. remember your strange (Delphi incompatible) opinions about RawByteString and encodings in a startup discussion. Yep. As I did not have DTX to try it, I only read what I could find in the internet and supposedly got it wrong. Yes, still wrong despite earlier explanations :-( I hope, now I understand that the type RawByteString ( = String ($) ) means codesize = 1 Byte, never to be auto-converted to any differently encoded String type variable. No. Even if I would like such an encoding, too, Delphi doesn't implement it. I seem to understand that DXE does not provide a fully dynamic string type (e.g. to be used as a function parameter taking any String(x) type without auto-conversion. I still do hope that fpc will provide this one day. This is what RawByteString is for. A RawByteString can have *any* encoding, it's kind of a generic AnsiString. Other AnsiStrings have a *fixed* encoding, that determines eventually required conversions. Moreover I do hope for RawWordString, RawDwordString and RawQWordeString. Not in Delphi. For binary data TBytes has been added. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 24-6-2013 17:13, Michael Schnell wrote: On 06/24/2013 04:44 PM, Hans-Peter Diettrich wrote: Not in Delphi. For binary data TBytes has been added. Which (AFAIK) is not reference counting can't do + and thus much less versatile. It is also highly controversial since XE4: For example a good breakdown in http://blog.synopse.info/post/2013/05/11/Delphi-XE4-NextGen-compiler-is-disapointing This is by no means the only complaint about the latest string whatever it is supposed to be. ;) Thaddy ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 24.06.2013 11:36, Michael Schnell wrote: On 06/21/2013 07:43 PM, Sven Barth wrote: Just to clear up the names: UnicodeString is *not* the code page aware string type (although they share the metadata record). It is a dynamic length 2 byte string. The code page aware string type is AnsiString. Thanks for making this clear. Could you give us a list of the different - legacy and to be supported - string types we might be seeing including their official names to make the discussion less ambiguous. ShortString: 255 character, non reference counted, system encoding String[X]: same as ShortString with maximum of X characters AnsiString: up to 2^23-1 characters, reference counted, system encoding (determined by the code page at compilation time AFAIK) AnsiString(X): same as AnsiString, but with the specified code page (UTF-16 code pages are not allowed) RawByteString: basically AnsiString($) (AFAIK); no code page conversions are done when a another AnsiString is assigned (UnicodeString is converted to currently active code page) and the other way round UnicodeString: up to 2^23-1 characters, reference counted, UTF-16 encoding WideString - on non-Windows: same as UnicodeString - on Windows: up to 2^23-1 characters (?), non reference counted (but managed by OS), UTF-16 encoding String: - in all modes besides mode delphiunicode or modeswitch unicodestrings with H-: ShortString - in all modes besides mode delphiunicode or modeswitch unicodestrings with H+: AnsiString - in mode delphiunicode or modeswitch unicodestrings with H+: UnicodeString (- don't know whether this is correct: in mode delphiunicode or modeswitch unicodestrings with H-: ShortString) Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 24.06.2013 16:44, Hans-Peter Diettrich wrote: I hope, now I understand that the type RawByteString ( = String ($) ) means codesize = 1 Byte, never to be auto-converted to any differently encoded String type variable. No. Even if I would like such an encoding, too, Delphi doesn't implement it. But he is right. RawByteString is defined in unit system as AnsiString(CP_NONE) where CP_NONE is defined as $. This means that no conversions to or from a variable of this type are done (or any other AnsiString type that has code page $) Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Sven Barth schrieb: On 24.06.2013 16:44, Hans-Peter Diettrich wrote: I hope, now I understand that the type RawByteString ( = String ($) ) means codesize = 1 Byte, never to be auto-converted to any differently encoded String type variable. No. Even if I would like such an encoding, too, Delphi doesn't implement it. But he is right. RawByteString is defined in unit system as AnsiString(CP_NONE) where CP_NONE is defined as $. This means that no conversions to or from a variable of this type are done (or any other AnsiString type that has code page $) Well, after some tests it looks more complicated to me. A RawByteString can obtain any encoding, so no conversions are required. But when assigned back to an UnicodeString, the obtained encoding is used to convert the string. In fact it looks like only the string pointers are copied between AnsiString and RawByteString, with the refcount changed accordingly. This can lead to strange results (in XE). As soon as an AnsiString has obtained a different encoding, no further conversions seem to occur. Once I copy an OEMString (cp 437) into an RawByteString, and from there into an AnsiString, the AnsiString has obtained OEM encoding. Adding further strings to it, of different codepages, only results in a concatenation of the strings, without any conversions, the encoding is still reported as OEM. This means that the encoding of an AnsiString is not guaranteed to be the defined one, not even a unique one! Can somebody test this with a newer Delphi version? Resetting such an ill-behaved AnsiString seems to require a direct assignment of another AnsiString variable, whereupon the AnsiString will return to its *defined* encoding and resume eventually required conversions to that encoding. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 21.06.2013 16:29, schrieb Sergei Gorelkin: and the fact that SetCodePage goes through implicit try..finally block even if it does not need to convert the string. I've fixed this one on r24942 ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/20/2013 05:31 PM, luiz americo pereira camara wrote: The point is that i would expect a smaller performance hit when there's no conversion going on. Something between 10% slower. In the cited case is more than 50% slow. As the dynamic types of (most) String Variables are already defined and known at compile time, and thus (usually) the library does not need to detect the encoding in realtime, the performance hit should be close to zero, as long as the same String encoding is used as with the non-(DXE-compatible)-Unicode project with the same source code. OTOH, if the former version used 1-Byte-Strings (ANSI or UTF-8) and the new version used 16 or 32 bit Strings (UTF-16 or UTF32) I would expect a severe performance hit as well because more bytes need to be moved and because the cache gets a lot more tight because of the double memory usage. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/20/2013 05:31 PM, luiz americo pereira camara wrote: Maybe in that example there's going an (unneeded) conversion? If you use the same string type all over the place it would be a severe bug if _any_ conversion is done. Please check. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/20/2013 05:31 PM, luiz americo pereira camara wrote: The point is that i would expect a smaller performance hit when there's no conversion going on. Something between 10% slower. In the cited case is more than 50% slow. As the dynamic types of (most) String Variables are already defined and known at compile time, and thus (usually) the library does not need to detect the encoding in realtime, the performance hit should be close to zero, as long as the same String encoding is used as with the non-(DXE-compatible)-Unicode project with the same source code. Right. Even with RawByteString the test for same encoding should not take considerable time (compared with memory allocation...). Different encodings require to convert both arguments into Unicode, and the result back into the target encoding. OTOH, if the former version used 1-Byte-Strings (ANSI or UTF-8) and the new version used 16 or 32 bit Strings (UTF-16 or UTF32) I would expect a severe performance hit as well because more bytes need to be moved and because the cache gets a lot more tight because of the double memory usage. Again I'd assume that the memory allocation for the result is the most expensive operation with UnicodeString operands, independent from string lengths. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote: Again I'd assume that the memory allocation for the result is the most expensive operation with UnicodeString operands, independent from string lengths. Do you suggest that with UnicodeString - even when using 1 Byte encoding types such as ANSIxxx or UTF-8 -, the memory allocation is more expensive than with the older String handling implementation ? Why ? In fact, the additional some 8 bytes for the Code-Element-Length and Code-Type definition (additional to the already existing String-Length, Content-Address and Ref-Count DWords) should not matter at all. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote: Again I'd assume that the memory allocation for the result is the most expensive operation with UnicodeString operands, independent from string lengths. Do you suggest that with UnicodeString - even when using 1 Byte encoding types such as ANSIxxx or UTF-8 -, the memory allocation is more expensive than with the older String handling implementation ? Please note that I was *not* talking about AnsiStrings. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
2013/6/21 Michael Schnell mschn...@lumino.de: On 06/20/2013 05:31 PM, luiz americo pereira camara wrote: Maybe in that example there's going an (unneeded) conversion? If you use the same string type all over the place it would be a severe bug if _any_ conversion is done. Please check. The affected code can be seen here http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html . I don't have a 2.7.1 setup so i cant debug myself. I'm just reporting what the user found Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/21/2013 02:20 PM, Hans-Peter Diettrich wrote: Please note that I was *not* talking about AnsiStrings. Sorry I don't understand. I recon the OP asking about a performance hit, meant a degradation regarding the new (Delphi XE compatible) vs the old (Delphi 7 compatible) String library. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 21.06.2013 17:11, luiz americo pereira camara wrote: 2013/6/21 Michael Schnell mschn...@lumino.de: On 06/20/2013 05:31 PM, luiz americo pereira camara wrote: Maybe in that example there's going an (unneeded) conversion? If you use the same string type all over the place it would be a severe bug if _any_ conversion is done. Please check. The affected code can be seen here http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html . I don't have a 2.7.1 setup so i cant debug myself. I'm just reporting what the user found I've profiled the code and found no conversions taking place. All the slowdown appears to be caused by other reasons, hard to tell the topmost contributor. What catches the eye is the large amount of calls to UniqueString, and the fact that SetCodePage goes through implicit try..finally block even if it does not need to convert the string. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
On 06/21/2013 04:29 PM, Sergei Gorelkin wrote: What catches the eye is the large amount of calls to UniqueString, It would be interesting to see whether the old (not new Unicode library) project does the same amount of UniqueString. I don't see why the new library should do more of these calls or why they should be slower (while using the same encoding.) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Michael Schnell schrieb: On 06/21/2013 02:20 PM, Hans-Peter Diettrich wrote: Please note that I was *not* talking about AnsiStrings. Sorry I don't understand. You snipped the context, which was UnicodeString (second case). The AnsiString case was covered before. I recon the OP asking about a performance hit, meant a degradation regarding the new (Delphi XE compatible) vs the old (Delphi 7 compatible) String library. Right, and there seem to be more issues with the current implementation. E.g. I don't understand the many tests in the RawByteString concatenation, and others found excess try-finally blocks and UniqueString calls. Another reason may be the (old?) TStringList in the test program, possibly using AnsiStrings, which will cause overhead when used with UnicodeStrings. I didn't do own researches yet, my statements are based on general considerations and a perfect implementation. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
Am 21.06.2013 10:36 schrieb Michael Schnell mschn...@lumino.de: On 06/21/2013 09:54 AM, Hans-Peter Diettrich wrote: Again I'd assume that the memory allocation for the result is the most expensive operation with UnicodeString operands, independent from string lengths. Do you suggest that with UnicodeString - even when using 1 Byte encoding types such as ANSIxxx or UTF-8 -, the memory allocation is more expensive than with the older String handling implementation ? Just to clear up the names: UnicodeString is *not* the code page aware string type (although they share the metadata record). It is a dynamic length 2 byte string. The code page aware string type is AnsiString. Regards Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
20.06.2013 16:15, luiz americo pereira camara пишет: I looked at http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html There's a significant performance drop in fpc trunk The difference of generated code is a call to fpc_ansistr_assign and a different implementation of fpc_AnsiStr_Concat AFAIK there should be significant performance hit only when assigning string with different code pages. It does not seem to be the case. Is there anything wrong or this is the expected result? Some slowdown is of course the expected result: it is impossible to add all codepage stuff without performance impact. Even though conversions happen only when codepages differ, the code which checks the codepages is executed anyway on every operation. The question is, which part of observed slowdown is unavoidable and which can be eliminated by more accurate implementation. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
2013/6/20 Sergei Gorelkin sergei_gorel...@mail.ru: 20.06.2013 16:15, luiz americo pereira camara пишет: I looked at http://forum.lazarus.freepascal.org/index.php/topic,21223.0.html There's a significant performance drop in fpc trunk Is there anything wrong or this is the expected result? Some slowdown is of course the expected result: it is impossible to add all codepage stuff without performance impact. Even though conversions happen only when codepages differ, the code which checks the codepages is executed anyway on every operation. I know that. The point is that i would expect a smaller performance hit when there's no conversion going on. Something between 10% slower. In the cited case is more than 50% slow. The question is, which part of observed slowdown is unavoidable and which can be eliminated by more accurate implementation. Maybe in that example there's going an (unneeded) conversion? Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Performance of string handling in trunk
20.06.2013 19:31, luiz americo pereira camara пишет: Maybe in that example there's going an (unneeded) conversion? This is possible. One needs to profile the example to tell for sure. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel