Re: [fpc-devel] Closures -- debug warning @ ttgobj.FreeTemp
Am 09.03.2015 um 14:36 schrieb bla...@blaise.ru: FPC trunk r30150, compiled with EXTDEBUG, emits a debug warning for the following program: --8-- type T = interface procedure Bar; end; function Foo: T; begin result := nil end; begin Foo().Bar() // ^-- Warning: tgobj: (FreeTemp) temp at pos -44 is already free ! end. --8-- Does this indicate a problem in the compiler, or is this warning bogus? I'd assume that the warning refers to result := nil in Foo(). Assign something different and try again to find out more. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] BOOL
Am 14.12.2014 um 16:51 schrieb Marco van de Voort: In our previous episode, Adriaan van Os said: reveals 0 for False and -1 for True, where I had expected 0 for False and 1 f according to http://msdn.microsoft.com/en-us/library/eke1xt9y.aspx the same respectively in Visual Studio 2013. There is a C (99?) bool type, and a winapi (and much older) BOOL type. Two different libraries, two different headers, two different cases :-) AFAIR Delphi defines some xxxBOOL types for the interpretation of *WinAPI function results*. If so, these types and values should be restricted to Windows platforms, not be used in general (cross-platform) code. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] cp1252 problems
Using FPC trunk, Lazarus on WinXP, file and $codepage UTF-8, DefaultSystemCodePage is 1252. Then AnsiString variables can contain either UTF-8 or cp1252 strings (inconsistent), but that's an already known problem :-( Now I found another bug with AnsiString(1252), which IMO should behave like AnsiString(CP_ACP). Unfortunately this is not true, the same assignments of literals to both variables leads to different strings: type WinAnsiString = type AnsiString(1252); const cACP: AnsiString = 'ä'; //encoded UTF-8 = 'ä' cWin: WinAnsiString = 'ä'; //encoded 1252 = 'ä?' var strA: AnsiString; strW: WinAnsiString; begin strA := 'ä'; //encoded UTF-8 = 'ä' strW := 'ä'; //encoded 1252 = 'ä?' WriteLn('equal ',strA=strW); //FALSE! strW := cACP; //1252 'ä' okay strA := cWin; //1252 'ä?' wrong as above end; It looks to me as if the cp1252 strings (both const and var) are converted from an UTF-16 char (2 bytes into 2 chars), with the first char being the letter, the second one being the UTF-16 high byte (0) as '?' (#63). Longer literals, like 'äöü', are converted properly, but to encoding UTF-8 for AnsiString and encoding 1252 for WinAnsiString. Should I submit an bug report? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] RawByteString Insert etc.
IMO the Insert procedure should change the encoding of the string-to-insert into the CP of the target string. Else the target string can become unusable, containing an mix of characters from different codepages. While a RawByteString can have any encoding, it cannot have two encodings at the same time. BTW, the documentation should be updated to RawByteString arguments. More candidates: Concat (implemented where? operator +=?) Pos (make SubStr CP match Source CP) To be converted to RawByteString at all (overload?): Format (?) StringReplace LastDelimiter, IsDelimiter (in case of non-ASCII delimiters?) ... Should I supply patches? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] AnsiUpperCase problems
The following console program demonstrates various problems with the new (encoded) AnsiStrings (FPC trunk): program litTest2; {.$codepage UTF8} //off for now uses Classes,SysUtils; var A: AnsiString; begin a := 'äöü'; //a := a+' '; //uncomment later WriteLn(a,'äöü'); WriteLn(AnsiUpperCase(a),AnsiUpperCase('äöü')); end. The output varies depending on (at least) the file encoding and target platform (tested only on Windows, using Lazarus). With an Ansi source file the last line shows as 'ÄÖÜÄÖÜ', as expected. The variable also shows as 'äöü', but not the literal (3 graphical characters). In all other (tested) cases something different is shown, no uppercase letters at all. With an UTF-8 source file (with BOM) both the variable and literal show as 'äöü', but unfortunately never in upper case. Adding {$codepage UTF8} requires an UTF-8 source file. That's compatible with Lazarus defaults, so that further tests (here) will use this combination. Please note that (currently) Lazarus sets or leaves DefaultSystemCodePage as according to the actual OS, i.e. 1252 for my installation, regardless of $codepage. Now all items are shown as 'äöü', but again never in uppercase - how that? AnsiUpperCase finally calls Win32AnsiUpperCase (on Windows), declared as function Win32AnsiUpperCase(const s: string): string; which in turn calls CharUpperBuffA. This explains why no uppercase conversion is performed, when S has a dynamic encoding different from (WinAPI) CP_ACP, which is expected by CharUpperBuffA. Actually I found the *dynamic* encoding of A and S as CP_UTF8, even if its static encoding is CP_ACP (or 1252). Consequently AnsiUpperCase should convert S to the WinAPI CP_ACP (GetACP), before passing it to CharUpperBuffA. The same for all other functions with AnsiString arguments, calling external (OS API...) routines expecting a specific encoding, on all platforms. And for user code, which relies on the encoding of all strings being the declared one, like in: str1[1]:=str2[1]; //both strings of same type IMO such additional checks and conversions should be avoided, they bloat the library code and consume runtime. Note that SetCodePage requires an RawByteString (var parameter), and thus cannot be used immediately to adjust the dynamic codepage of an AnsiString. Now let's add (uncomment) the line a := a+' '; and voila, AnsiUpperCase works, because now the string has the expected CP_ACP instead of UTF-8. The same effect occurs when A is assigned from an UnicodeString variable. Is it really intended, that AnsiString behaviour depends on such details? The most simple solution would disallow a different static and dynamic encoding of AnsiStrings, except for RawByteString. Then no additional checks and conversions are required, except the one in the assignment of an RawByteString to an AnsiString of different type, and everything else can be determined by the compiler from the known static=dynamic encoding of strings. More checks and conversions can be avoided, when the dynamic encoding of string literals is the actual encoding, as used by the compiler for the stored literal, not Delphi incompatible placeholders like CP_ACP. Then TranslatePlaceholderCP is required only for explicitly given encoding values, but no more for the dynamic encoding of strings. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Delphi incompatible encoding
Tomas Hajny schrieb: On Tue, December 2, 2014 08:31, Hans-Peter Diettrich wrote: When I compile a console program on the commandline, most strings are readable in the console (see previous answer). But when I compile using Lazarus, all strings (including UnicodeString!) are shown in unreadable UTF-8 encoding, regardless of $mode :-( Probably best to ask about the wrong behaviour with Lazarus on a Lazarus list? It really seems to be a Lazarus problem. Compiled from an PAS file, the behaviour is equal to FPC. The bad encoding is used when compiled from an LPR file (LPI project). Thanks DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Delphi incompatible encoding
Mattias Gaertner schrieb: On Tue, 02 Dec 2014 04:05:59 +0100 Hans-Peter Diettrich drdiettri...@aol.com wrote: Many things affect string literals. Source codepage, system codepage, string type, defaultsystemcodepage, library, compiler version. I started a table for UTF-8 literals: http://wiki.lazarus.freepascal.org/Character_and_string_types#String_constants Thanks, after some reading I changed the sourcefile encoding, and both UTF8bom and Ansi provide correct results. The Lazarus default (UTF-8 without BOM) is not usable on Windows :-( DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote: Apart from that, every encoding-tolerant code will execute much slower than code without a need for checks and conversions everywhere. As I pointed out I don't agree at all. - The check is only two ASM instructions - It does not result in additional conversions. It does, e.g. in searching or sorting of StringList, when it can contain strings of different encodings. The choice of a unique encoding for application strings (maybe CP_ACP, UTF-8 or UTF-16) eliminates such conversions. So the Checking Overhead is nothing but a rumor. (Remember, I don't suggest dropping the standard statically typed paradigm, altogether, as close loops of course work best in that way. The rumor is the unimportant Conversion Overhead, i.e. how often a check leads to a conversion. When no check is required, conversions consequently cannot ocur at all. RawXxxString can be used for really uncoded data as done with old-style strings in a lot of applications. Such a feature would be appreciated by many users, indeed :-) But why do you say would be appreciated ? Is it not possible to use RawByteString in a way the name suggests, by never bringing it together with any String variable of a different encoding brand and hence avoid any conversion - be same intentional/documented/useful or not. RawByteString cannot serve two different purposes :-( In *Delphi* it is used as a polymorphic string, capable of *holding* actual strings of any encoding. But when assigned to a variable of a different encoding, a conversion may occur that converts the string into the declared (static) encoding of the target variable. In *FPC* it currently is used somewhat close to your idea, i.e. no conversion occurs in both an assignment to *and from* an RawByteString to some other AnsiString. We only can *hope* that *all* AnsiString operations are based on the dynamic encoding of every operand, with according checks and conversions inserted everywhere. This actually is not true, because the compiler relies on the static encoding of AnsiString variables, and inserts checks and conversions only when that encoding is different. Actually a single AnsiString type were sufficient, because it already can hold data of any encoding :-( I understand the FPC attempt, to allow *at the same time* for the new (encoded) and old (unencoded) AnsiString behaviour, where no automatic conversions are allowed. But this would require at the same time, that e.g. all string literals *also* are stored in that (immutable) encoding, and that this encoding can *not* be changed at runtime, while DefaultSystemCodePage *can* be changed. When the result of a conversion of an string of encoding CP_NONE is undefined, what's of course correct for the *dynamic* encoding, this simply could be changed into conversions of CP_NONE strings do nothing. Then CP_NONE would be the perfect encoding for old-style AnsiStrings, with the only remaining problem with string expressions and assignments, when the operands have a different dynamic encoding. In these cases all operands had to be converted into the CP_NONE encoding, as specified in another DefaultNoneEncoding constant (not variable!); the same encoding would apply in assignments *to* variables of a different encoding. Then also all type alias for AnsiStrings must have unique names, which allow to distinguish e.g. type UTF8String = AnsiString; from type NewUTF8String = type AnsiString(CP_UTF8); DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Delphi incompatible encoding
Sven Barth schrieb: Am 01.12.2014 10:33 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: Another one: Delphi XE does not export the CP_xxx encoding constants from System.pas. This means that the encoding constants are not available in (compatible) user code. It's not the first and likely not the last we export from a different unit than Delphi. There will *always* be differences which already starts with Integer that is declared as an alias to LongInt in the ObjPas unit in FPC. Well, Integer (and String...) are generic types, adjustable to the best overall performance on every target - whatever best will mean. CP_NONE is declared in Windows.pas for clipping, as: CP_NONE = 0; { No clipping of output } different from the CP_NONE encoding ($). Do they really not have a CP_NONE constant in System? No, not even in the definition of RawByteString, and not in any other standard (RTL...) source file (except Windows.pas, see above). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Delphi incompatible encoding
Jonas Maebe schrieb: Hans-Peter Diettrich wrote on ma, 01 dec 2014: To get behaviour that is compatible with Delphi2009+, compile with -Mdelphiunicode or {$modeswitch delphiunicode}. The compiler option (-M) works, but the $modeswitch is not accepted by the compiler (2.7.1): Illegal compiler switch DELPHIUNICODE. The same for {$mode ObjPas} - what else did I miss? When I use Lazarus and set the compiler to the new 2.7.1, the modeswitch does not cause an error. This particular difference is also documented in http://wiki.freepascal.org/FPC_Unicode_support#String_constants (search for delphiunicode or systemcodepage) Thanks, that explains at least the FPC handling of literals. But where can I find information about all the differences caused by above compiler option/modeswitch? Does it affect implicit AnsiString encoding conversions? BTW it's nice that FPC console Write/Ln (mostly) converts AnsiStrings to the console codepage, while Delphi (XE) doesn't convert :-) But I found a somewhat strange result with generic String variables, tested with: var A: AnsiString; S: String; begin S := ' äöü'; A := S; //S := A; //changes nothing WriteLn('A CP: ',StringCodePage(A), A); //always shows ' äöü' WriteLn('S CP: ',StringCodePage(S), S); //letters differ end. When String is UnicodeString (DelphiUnicode), the output is correctly converted for both strings (CP 1200,1252). But when String is not UnicodeString, AnsiString and String should be the same type, no? The console however shows different letters for the generic String and AnsiString variable (both CP 1252). The output doesn't change when A is assigned back to S. How that? When A and S are echanged: A := ' äöü'; S := A; //A := S; //CP changes the encoding of A is shown as zero. Now it makes a difference when S is assigned back to A, but only the codepage of A then also is shown as 1252, while the letters still differ. Obviously String is not equivalent to AnsiString now, and string literals should be assigned to AnsiString variables only, not to String variables? Very confused DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Jonas Maebe schrieb: On 28/11/14 21:30, Hans-Peter Diettrich wrote: I prefer to specify and document everything *before* coding, so that everybody can expect that the code will behave as specified. If certain behaviour is explicitly undefined, it *is* specified and documented. It means that your program is buggy if it triggers such behaviour, and that the effect of triggering it could be anything. [...] An example from FPC itself is accessing an array beyond its bounds when range checking is switched off. After this hint I reviewd the Code page identifiers section again, and probably could find the source of misunderstandings. CP_NONE: this value indicates that no code page information has been associated with the string data. The result of any explicit or implicit operation that converts this data to another code page is undefined. Does this mean CP_NONE is not an allowed *dynamic* (string *data*) encoding, just like any other undefined encoding value? In this case the description is correct, but it describes an special case of some *undefined* general rule, about valid and invalid dynamic encodings in general. Then this general rule should be documented before, not only for CP_NONE. Then also documentation of the *intended* purpose of CP_NONE, for the *static* encoding of the RawByteString type, is missing at all. As Delphi doesn't allow for a dynamic encoding of CP_NONE, I don't understand the purpose of the FPC description. Now in turn some FPC developer might have misunderstood the (Delphi) handling of RawByteStrings, assuming that it were okay to omit a conversion in an assignment of RawByteString to an AnsiString of a different encoding. That's why I think that the incorrect handling of such RawByteString assignments in FPC should be fixed, according to the general rule of assignments to an string of a different (static) encoding. CP_NONE definitely *is* different from any other encoding, and Delphi does not define an exception for RawByteStrings. Exactly the same goes for converting strings with code page CP_NONE to a different code page: your program is broken when it tries to do that, and we cannot guarantee any outcome. This is exactly what the behaviour is undefined means. When a string *really* has a *dynamic* encoding of CP_NONE, this of course is illegal and thus will result in an undefined result. ACK, so far. But since Delphi (quietly) changes an SetCodePage to CP_NONE into the current CP_ACP, the undefined situation (invalid dynamic encoding) must have been forced by some illegal *hack* before, or in the FPC case by some erroneous (not Delphi conforming) RTL code. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] RFC: proper interpretation and implementation of Unicode Support
In response to another thread (this should start an new thread): CP_NONE: this value indicates that no code page information has been associated with the string data. The result of any explicit or implicit operation that converts this data to another code page is undefined. After rereading I found this definition incorrect, the entire section (and more) deserves a correction/clarification. The implementation may have to be changed accordingly. This is my interpretation of the Delphi API around encoded AnsiStrings, as documented and implemented there, with added clarifications and notes on omissions and possible problems on non-Windows platforms. I do not expect that the FPC developers fully agree with this interpretation, but I expect that all items of a revised version of the following draft become part of the FPC documentation, somehow. Draft 1) CP_ACP, CP_OEM and CP_NONE are generic encodings (placeholders), applicable as *static* string encodings inside a program only, they never can denote a dynamic string encoding. Note: codepage here means byte-based ANSI/ISO codepages, applicable to AnsiStrings, not Unicode codepages (BMP...). While CP_UTF16 (and BE/LE variations) can be used to specify a concrete (string,textfile...) *encoding*, they do not describe codepages (neither Ansi nor Unicode). Note: these identifiers (names) should be used with exreme care in documentation/discussions. In most cases CP_ACP stands for the *actual* default encoding, equivalent to the value of a hypothetical *variable* named CP_ACP, i.e. currently (see below) should be understood as DefaultSystemCodePage. It should be made clear that the value of the CP_ACP *constant* identifier (=0) is meant and usable only in few cases, like in the declaration of an string type; it may also be acceptable in explicit conversion requests, and to denote the encoding to use in file/stream I/O, where the functions replace CP_ACP by the actual (DefaultSystemCodePage) value internally. Note: in compiler, library and application code a value of CP_ACP should be considered equal to (be mapped into) the actual (DefaultSystemCodePage) encoding. 2) A platform (or Unicode library) may or may not provide their own *generic* values (constants) for application (CP_ACP) and console (CP_OEM) encoding, as well as further constants for e.g. filenames. Note: CP_ACP is zero on Windows, possibly different on other platforms or libraries. Thus AnsiString(0) may be different from AnsiString(CP_ACP). It may be required to distinguish between a named Pascal constant CP_ACP=0, and the value of the generic application/default encoding in API calls (CP_SYS?). 3) The *actually* associated codepages are defined by the platform, eventually can be changed by the user (admin). A program may or may not be allowed to change the associated codepages, either locally (process wide) or globally (system wide). Note: the name DefaultSystemCodePage should be reserved for the *system* defined codepage. When this setting can be different from an application-wide setting, another DefaultApplicationCodePage variable should be added. See the comments on Modifications and Notes on DefaultSystemCodePage in the Wiki page! Note: a process should determine (retrieve) the platform settings *before* any attempt to interpret system-provided strings (commandline, environment variables...). Depending on the platform, more generic settings may apply to specific strings, like for filenames. In all external API calls, the RTL is responsible for the correct encoding of all string arguments, as expected by the called function. This applies in detail to CP_ACP, when this encoding can be changed inside a program to something different from the external (platform...) setting. 4) A RawByteString variable, of the static encoding CP_NONE, can hold strings of *any* dynamic encoding. No conversion is performed when a string is assigned to such a variable. In the opposite direction the standard handling should apply, i.e. different static encodings require a conversion into the static target encoding. Note: Its known that Delphi does not always convert an RawByteString, in an assignment to a variable of an different type. This flaw should be fixed in FPC. Is the according Delphi behaviour *defined* anywhere? 5) Use StringCodePage to get an actual (dynamic) string encoding. StringCodePage never returns one of the generic values. The dynamic codepage of an unassigned (empty) string is assumed (by Delphi) as the actually selected CP_ACP codepage for AnsiString arguments, CP_UTF16 (or whatever applicable) for UnicodeString arguments. Note: while an unassigned (empty) string variable has a static encoding, known to the compiler, this encoding is unknown to StringCodePage. The overloaded Ansi/Unicode versions of StringCodePage only know about the basic string type (Ansi/Unicode) of their arguments, but cannot determine a
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Jonas Maebe schrieb: I'm sorry, but I simply cannot discuss with people that, when I literally state the result is undefined, think that I may actually have meant the result is defined and if you change the implementation and/or keep it stable across compiler releases, then it will also conform to whatever you think that this defined behaviour should be. I don't have the energy nor the patience for that. I also have no use for continuing such discussions. I prefer to specify and document everything *before* coding, so that everybody can expect that the code will behave as specified. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: I fear that there will be code that relies on the flawed behavior of RawByteString (it's a feature, not a bug) and using the same name with different behavior would brake same. And a really usable DynmicString would not adhere to that description. How can somebody rely on behaviour *stated* as undefined, or not working as defined? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote: An *efficient* implementation would be based on a single program-wide string representation, with different encodings being handled only in an exchange with external data sources. Yep. But it would result in severe user code portability issues (see above). IMHO using DynamicString at the correct locations would not be (noticeably) less efficient but a lot more versatile. You suggested to use string as UTF-16 on Windows, and UTF-8 on Linux. That's what I understand as a unique program-wide string representation (not sourcecode-wide, instead program as *compiled*). Then I cannot see any need or use for another DynamicString type. I also don't think we will ever see a fix for the poor implementation of RawByteString (avoiding the word flaw and the suggestion of a bad purpose), because it would brake existing user code. Nothing can be broken, as long as the Delphi behaviour is undefined. Code relying on specific compiler/library bugs is bound to that compiler, not portable in any way. Regarding fpc, correcting the flaws and keeping the name RawByteString would result in incompatibility issues vs Delphi and breaking code that will be ported from Delphi. Same as above. When application code works properly with strings of *sometimes* different static and dynamic encoding, it will not stop working with strings of *never* different encodings. Of course the opposite is not true. When some code works properly (only) with strings of the same static and dynamic encoding, it will stop working when compiled with Delphi. Then the coder has to insert explicit checks for the dynamic encoding of *all* strings, all over his code. Applied to FPC/Lazarus code (compiler, libraries, IDE...) this means that it's obviously easier to *prevent* possibly different static/dynamic encodings, instead of *checking and reacting* on such flaws throughout the entire codebase. Apart from that, every encoding-tolerant code will execute much slower than code without a need for checks and conversions everywhere. I seriously doubt that the FPC developers ever realized these consequences, and the amount of time required for finding, reporting and fixing the bugs in all affected pieces of their code :-( That is why fpc would need to define an additional type name (e.g DynamicString) and encoding brand number (e.g. CP_ANY = $FF00) for a decently usable type for intermediately holding a String content. This again would make *FPC* programs incompatible with Delphi. While fixing the RawByteString flaw would at least allow to *compile* FPC code with Delphi, the use of an different encoding value would definitely prevent compilation of such code with Delphi. What's the more serious incompatibility? RawXxxString can be used for really uncoded data as done with old-style strings in a lot of applications. Such a feature would be appreciated by many users, indeed :-) DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicodesupport
Marco van de Voort schrieb: In our previous episode, Hans-Peter Diettrich said: While it certainly is a stupid (Microsoft) idea to use UTF-16 for file storage, we'll have to take that into account. (16-bit codepages were designed into OS/2 and Windows NT before utf-8 even existed) Right, both systems were developed by Microsoft :-] No problem, as long as proper host/network byteorder conversion is applied in reading/writing such files. But in former times every computer manufacturer was proud of *his* clever text processing features, with characters stored in 6 up to 9 bit registers. In those times it was an essential *marketing* feature, when files could *not* be read by competing systems, due to different bytesize, bit-/byteorder, character sets, file formats etc. But times have changed, nowadays the Internet requires certain common standards (e.g. 8-bit bytes = octets, HTML, Unicode and more), which allow for data exchange across machine and country boundaries. The lack of far-east support already forced the Japanese to invent their own BIOS, codepages etc. Nowadays continued use of UCS2 had forced the Chinese to invent their own character encoding, which then would be used by more people than UCS2. Guess what would happen to the rest of the world, then... OT Or will the Chinese government enforce such a development soon, to eliminate the need for continued censorship of foreign web pages, because legal equipment then only could present genuine Chinese pages, but no more HTML, JavaScript and Unicode? How would the official Chinese programming language look like? /OT DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] ThousandSeparator
Sven Barth schrieb: At my old company our Delphi application handled runtime changes to these settings rather well. For display the normal XToY (e.g. DateToStr) functions are used which use the DefaultFormatSettings which are updated automatically (the VCL's message loop triggers a repaint when format settings were changed in the system). A repaint by itself doesn't change the strings. How do the new strings come into all the edit boxes, of all open forms? Similarly, when the user changes the system language, can he expect that every running application updates itself, with changed menus etc., up to eventually open help viewers? What if the program is not prepared for a different language, because e.g. a tax assistant is bound to a specific country? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: I now understand that the Element Size field in the String header is quite dummy, as under the hood there are two completely separate concepts for one-byte-Strings and 2-Byte Strings and none for other Element sizes. After a code review I realized that the element size field is specific to dynamic strings, not present in dynamic arrays. Since the element size is bound to the string type, it could be omitted in the FPC implementation. [With little win, when the record alignment is preserved] This to me is not obvious at all, as the language syntax and the String header data structure suggest a more universal paradigm for multiple string type brands, that each have an element-size6 and code-ID-number setting, handled by a common infrastructure. This may have been envisaged by the Delphi architects, but was not continued later. The universal paradigm would allow for extensions (e.g. UTF-32, multiple 16 Bit Code pages, an additional fully dynamic String type, n-byte un-encoded string types), as I described in the Wiki page. Even if feasable, such arbitrary string storage can dramatically increase the number of implicit string conversions. An *efficient* implementation would be based on a single program-wide string representation, with different encodings being handled only in an exchange with external data sources. That standard encoding may be Ansi or Unicode; even Delphi allows for both models, where Ansi again suggests the use of one specific codepage (CP_ACP) for best performance. Cassandra After all I have the impression that the known RawByteString flaws will never be fixed in Delphi, in order to encourage the users to take the step to UnicodeString. Now the question is whether these flaws are fixed in FPC, or whether Lazarus will become the first project that definitely requires an complete move to UnicodeString, for reliable operation. For best support of non-UTF-16 platforms I'd suggest to fix the flaws... /Cassandra DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] ThousandSeparator
Frederic Da Vitoria schrieb: 2014-11-26 16:54 GMT+01:00 Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: 2) Formatted numbers, as enterd by the user (maybe by copypaste from other applications), can have various encodings. Before a conversion into binary values I'd remove all unexpected characters, except for the last (rightmost) '.' or ',', which then becomes the decimal separator as expected by the decoding function (RTL provided). You mean that the first string to be converted to binary would automatically set the decimal separator? No, my code would make no assumption about the format of strings edited by the user. That would seem dangerous to me. What if the first string to be converted contained something like 11,000, does this mean 11000 with thousand separator = comma (which would be true in at least USA), or 11 with decimal separator = comma (which would be true at least in France)? I can't think of any way to choose automatically. Okay, that would require more knowledge about the value kept in a specific input field, for range checks or the like. As long as thousands separators occur in the string, different from '.' or ',', they are quite easy to identify. AFAICS, the code needs either to use the system settings or to be told explicitly by the developer. Even relying on the system settings may not be enough, because one may need to import data formatted with different national settings from the system's settings. Right. When e.g. a CAD program is fed with sizes from an external data sheet, it cannot be expected that the figures in that file change together with the system language, and are converted between inch and meter, temparatures between F and C ;-) So it looks to me as a stupid idea, when the user changes such system settings *while* such a program is running. Furthermore the use of national formatting conventions for the exchange of values across applications looks to me like another stupid idea. Would somebody expect or even like it, when e.g. constant declarations are converted when copied into an Lazarus editor, and the compiler would require that all constants in source code conform to the current settings? As mentioned in other contributions, the number formatting seems to be a Windows specific problem. How to deal with imported numbers on other platforms, with arbitrary settings per application? After all a program could, when notified of such changes, ask the user whether to continue or restart, or force an restart. Restart should be safe, but when the user decides to continue, he must be aware of possible problems. When the restart takes considerable time, the user may learn that his behaviour is not very clever ;-) DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote: Not all codepages have a fixed number of bytes per character. The string preamble contains the *element size* (1 for AnsiString), just like with every dynamic array. Sorry for sloppy wording. Of course I did mean element size (Character here obviously is not printable item). I'd restrict the use of character to physical Char types, just to avoid any misinterpretation. Printable items (glyphs) are independent from the storage format. Ligatures or umlauts can consist of multiple codepoints, and several Unicode codepoints are not even printable. A single printable character, as selectable by a single cursor step, can consist of multiple codepoints, even (or just) in Unicode. That's why I'd expect that the FPC documentation includes a glossary and definition of the terms, which should be used in the documentation and discussions. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Jonas Maebe schrieb: On 26/11/14 23:41, Hans-Peter Diettrich wrote: In this case the implementation is compiler specific, somewhat different from undefined (in a RawByteString): CP_NONE: this value indicates that no code page information has been associated with the string data. The result of any explicit or implicit operation that converts this data to another code page is undefined. IMO the result is well defined: it's the string with the encoding of that other codepage. Unless you actually tested this on all platforms and noted that is the case, you cannot state this. And if you would actually test it, you would discover that it is wrong (http://bugs.freepascal.org/view.php?id=22501#c61238 ). Bugs obviously violate some specification/definition, else it's not a bug, it's a feature ;-) As mentioned in a previous discussion: don't use IMO (in my opinion) when talking about testable facts. A testable fact is either true or false, opinions do not enter the picture. We're just talking about interpretations, not facts. An undefined result, as I understand it, would mean the result can be anything, unrelated to the function input. Which is 100% correct. Do you see any use for such function definitions, except in random generators? IMO a better wording should be found, that does not cause the current obvious confusion of some readers. The confusion only occurs for readers that do not believe what is written. Such statements come only from writers that do not believe that their words can be understood in various ways ;-) DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Regionalisation (Was ThousandSeparator)
Michael Thompson schrieb: I hear you, but this issue is so much wider than separators. I know one software package that will only successfully export data to excel if the system regional is one of the English (xxx) variations (Australian guaranteed to work, not really played with the rest...). In this case, the client (in Denmark) has one PC in a corner, set to Australian settings, just for exports... This may be a relict from the time, where Microsoft found it a good idea to nationalize VBA (VB, Word, Excel...). I appreciated that in so far, as no English macro virus could become active in my German Word (with only German keywords). The same language barrier may prevent proper data export, maybe starting with slightly different keyword spellings like for color/colour. Similar problems exist(ed) in RTF export, so that MS had to ship another WinHelp compiler (HC) for every new WinWord version, that worked around the new errors in the RTF sources exported from Word, even after VBA was reverted to unique English-only keywords and function names. As an Australian developer, this is just embarrassing... I never felt a need or reason for considering Microsoft products as anything but buggy toys, hardly usable outside the USA :-( To come back to the current discussions, the introduction of Unicode (as UCS-2 and UTF-16) was a similar (typically American) mistake, totally ignoring e.g. any Chinese character set (in favor of Klingon?). Apple, as another US company, invented the decomposed Unicode filenames - for lack of oversight, or to establish artificial platform barriers? The step from strictly national Ansi applications to Unicode is a very tiny one, compared to the leaps that have to be taken afterwards, in order to make the program really work in foreign countries. I wonder how e.g. Belgian, Canadian or Swiss software has been written in such multi-lingual countries before, and how it is written nowadays. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote: An AnsiString consists of AnsiChar's. The *meaning* of these char's (bytes) depends on their encoding, regardless of whether the used encoding is or is not stored with the string. I understand that the implementation (in Delphi) seems to be driven more by the Wording (ANSI) than by the logical paradigm the language syntax suggests. The language syntax and the string header fields suggest that both the element-size as the code-ID-number need to be adhered to (be it statically or dynamically - depending on the usage instance). E.g. there are (are least two Code pages for UTF-16 (LE, and BE), that would be worth supporting. You are confusing codepages and encodings :-( UTF-7, UTF-8, UTF-16 and UTF-16BE describe different representations of the same values (Unicode codepoints). And I agree, all commonly used encodings should be implemented, at least for data import/export. It's essential to distinguish between low-level (physical) AnsiChar values, and *logical* characters possibly consisting of multiple AnsiChars. I now do see that the implementation is done following this concept. But the language syntax and the string header field suggest a more versatile paradigm, providing a universal reference counting element string type. See it as a multi-level protocol for text processing. The bottom (physical) level deals with physical storage items (AnsiChar, WideChar...), and how they are stored in memory or files. Like it doesn't make sense to deal with individual bytes of real numbers in computations, it doesn't make sense to deal with individual bytes (AnsiChars) of logical characters - except in type/encoding conversions. Higher levels deal with logical values, which can consist of multiple physical items, and may need different interpretatons (in case of Ansi codepages). This level is partially coverd now by AnsiString encodings and UTF-16 surrogate pairs, which allow to map the values into full Unicode (UCS-4) codepoints. But these codepoints still are not sufficient for a correct interpretation and manipulation of logical characters, which again can consist of multiple codepoints (decomposed umlauts, ligatures...). In a next level another (mostly language specific) interpretation may be required, like which logical characters have to be treated together (ligatures, non-breaking characters...). Some natural languages (Hebrew, Arabic...) require another special handling of (mixed) LTR/RTL reading, and of paths, influencing the graphical representation of character sequences; but that's nothing an application or library writer should have to deal with, such functionality should be provided by the target platform. There must be a boundary between the standard (RTL) handling of the physical items and encodings, and higher text processing levels, up to language specific processing (how to break words, when to apply capitalization, syntax checks...), so that such special handling can be implemented in dedicated extensions (libraries, classes), by developers familiar with the rules and conventions of the natural languages. For now we are talking only about the handling up to individual Unicode codepoints, and related string manipulation. Herefore at least one string representation must exist, that covers the full Unicode range of codepoints (UTF-8 or UTF-16 for now). When such an implementation claims for undefined behaviour, then this can only mean implementation flaws, resulting in something different from what can be expected from proper Unicode handling. This includes invalid parameter values in subroutine calls, which should result in proper (defined) runtime error reporting (AV, error result...). WRT to AnsiString encodings, the only acceptable (expected) differences can result from lossy conversions, when converting proper Unicode into a non-UTF encoding. Even then the results should be consistent, even if the concrete results depend on some external (platform...) convention or settings. IMO. That's why I wonder *when* exactly the result of such an expression *is* converted (implicitly) into the static encoding of the target variable, and when *not*. I understand that the idea is, to use the static encoding information provided by the type definition whenever possible. Right, but here whenever possible depends on the correspondence of static and dynamic encoding. When the dynamic encoding can *ever* be different from the static encoding, except for RawByteString, I consider it NOT possible to derive the need for a conversion from the static encoding. In the handling of floatingpoint values we may have to expect invalid operations (division by zero, overflow...) or values (NaN...), but NOT that a Double variable ever contains two Integer values - unless forced by dirty hacks out of compiler control. Why should this be different and acceptable
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Mattias Gaertner schrieb: On Wed, 26 Nov 2014 11:23:17 +0100 Michael Schnell mschn...@lumino.de wrote: Seemingly here the bytes per character setting implicitly is thought of as a port of the code-page definition. correct ? Code page define bytes per character. Huh? Not all codepages have a fixed number of bytes per character. The string preamble contains the *element size* (1 for AnsiString), just like with every dynamic array. As you know: Don't confuse character with glyph and codepoint. Right, but what is what? I feel a need for an exact (official) definition of such (and more) terms, in order to prevent further misunderstandings of the documentation and in discussions. E.g. code page has different meanings, when used with ANSI/ISO and Unicode character sets. While ANSI/ISO codepages desribe different mappings of bytes into characters, Unicode codepages define subsets of the whole Unicode range. My understanding of character is a *logical* unit (letter), with possibly different encodings, values and sizes in different codepages (character sets). What's the term for the *physical* unit (AnsiChar, WideChar)? Ansistring supports only one byte per character code pages. Huh? What's your definition of character? AnsiString supports MBCS codepages as well. The restriction is the physical storage unit (1 byte per string item), as imposed by AnsiChar. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: On 11/26/2014 11:40 AM, Mattias Gaertner wrote: Ansistring supports only one byte per character code pages. Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not right, than due later) ? Delphi XE does not properly support UTF-8. CP_ACP seems to depend on western/far-eastern versions, where the western version assumes and allows for any SBCS; I don't know of the same in far-east versions. The SBCS restriction allows to simplify standard string handling and conversions, because every character (=byte) can be exchanged in place. UTF-8 doesn't fit into this picture, because it's a MBCS. UTF-16 is not a valid value for CP_ACP in Delphi, because it's a 2-byte encoding. Even if the Delphi architects may have thought about an common string type, with a variable element size (1,2,4), this certainly turned out soon as a stupid idea, so that AnsiString and WideString/UnicodeString still are strictly distinct types. WideString and UnicodeString imply UTF-16, with platform specific byte order (endianness). The latter becomes important almost only to compiler and library coders, in host/network byteorder conversions. For the sake of completeness, pdp-11 processors use yet another byte order, maybe more word-based processors (DG...) as well. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: On 11/26/2014 12:09 PM, Sven Barth wrote: In Delphi (and FPC) CP_ACP corresponds by default with the current system codepage (e.g. CP1252 on a German Windows). OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP1252) but different from String without brackets which in turn is the same as String(CP_UTF16) ? Correct ? CP_ACP (and CP_NONE) describes a *static* encoding, and has an fixed value (CP_ACP=0, CP_NONE=$). The dynamic encoding of strings, kept in AnsiString(0) or RawByteString variables, must be obtained from the string itself. When the string is empty, StringCodepage returns DefaultSystemCodePage (for CP_ACP). CP_UTF16 is not supported, because AnsiString only supports 1-Byte character strings (and UTF-8 as the odd one) and not 2-Byte character strings. I still don't understand. The wiki article seems to suggest that it is about a type called ANSIString that features a dynamically settable code page information. From discussions about Delphi and FPC, I only know a String type with a dynamically settable code page information that also features a dynamically settable Bytes per Character information and hence does support 1, 2 and 4 Bytes per Character. (e.g. UTF-8, UTF-16, and UTF-32). You should have noticed that there exists no String or Char type, that would allow for arbitrary bytes/char counts (see my other answer for details). The difference to Delphi currently is that for FPC String=AnsiString(CP_ACP) and for Delphi String=UnicodeString (aka 2-Byte string). I understand that you mean (e.g.) Delphi XE. But what version of FPC is currently. Am I wrong assuming that in the svn we do have the NewStrings library that supports dynamical code-page *and* byte-per-character settings and hence supports e.g. CP1251, UTF-8, UTF-16, and UTF-32 ? The byte-per-character field is read-only, just like for any dynamic array. So I seem to understand the meaning of String(CP1252), String(CP_UTF8), and String(CP_UTF16) (which seems do be the Delphi notation), but I seemingly don't get the exact meaning of AnsiString(CP_ACP) or AnsiString(CP1251) The Delphi notation is the same, e.g. AnsiString(CP_ACP). In the end, what the definition of String without brackets is, might be due to a settable compiler option and/or the OS the compiler is set to create code for. Right, the *generic* String type can be mapped to either ShortString, AnsiString(0) or UnicodeString, depending on compiler versions and switches. A raw guess can be derived from sizeof(Char). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Codepage aware RTL
Mattias Gaertner schrieb: Hi, The page about FPC Unicode support mentions what has already been updated to preserve character data. http://wiki.freepascal.org/FPC_Unicode_support#RTL_changes Is there already a page about what has not (yet) been updated aka does not work with all code pages? You mean this section? http://wiki.freepascal.org/FPC_Unicode_support#RTL_todos DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: I fail to understand some of the text. It seems to be unavoidable to use the name ANSIString even though I always though up when seeing a thing called ANSI containing Unicode (e. g. UTF8String = type AnsiString(CP_UTF8) ). Seemingly here the bytes per character setting implicitly is thought of as a port of the code-page definition. correct ? An AnsiString consists of AnsiChar's. The *meaning* of these char's (bytes) depends on their encoding, regardless of whether the used encoding is or is not stored with the string. It's essential to distinguish between low-level (physical) AnsiChar values, and *logical* characters possibly consisting of multiple AnsiChars. In section Dynamic code page: When assigning a string to a plain AnsiString (= AnsiString(CP_ACP)) or ShortString, the string data will however be converted to DefaultSystemCodePage. The dynamic code page of that AnsiString(CP_ACP) will then be the current value of DefaultSystemCodePage (e.g. 1250 for the Windows-1250 code page), even though its static code page is CP_ACP (which is a constant 1250). This is one example of how the static code page can differ from the dynamic code page. Subsequent sections will describe more such scenarios. 1) A short String does not have a Code page notification so for this static code page can differ from the dynamic code page does not seem to make much sense. The text correctly states dynamic code page of that AnsiString. ShortString (and AnsiChar) has no encoding indicator, they are assumed to be encoded in CP_ACP. 2) I fail to understand how with this explanation that seems to force auto conversion for assignments between types with different code page settings (also for CP_ACP) the static code page can differ from the dynamic code page can happen. Continue reading until you understood the special handling of string literals and RawByteString. In fact this disaster seems to be able to happen (see section RawByteString) if assigning a string with a static code page X1 to a RawByteString (hence no conversion) and then assigning that RawByteString to a string with a static code page X2 (no conversion again). In fact I assume that without abusing RawByteString such intersexual strings can't be produced, otherwise this would be rather disastrous for normal users. *All* intermediate strings, generated during the evaluation of string expressions, only have a dynamic encoding, thus can be considered as being RawByteStrings. That's why I wonder *when* exactly the result of such an expression *is* converted (implicitly) into the static encoding of the target variable, and when *not*. Obviously the compiler inserts an conversion request for the *direct* assignment of one string variable to another one, of an different *static* encoding. But what happens when a string expression doesn't have such a known static encoding??? In section RawByteString: the results of conversions from/to the CP_NONE code page are undefined. In effect the behavior is exactly defined in this section As a first approximation. Right, the result *is* well defined, but has no *predetermined* dynamic encoding. The entire mess results from the bad interpretation of RawByteString assignments, which IMO was well thought by the Delphi language architects, but not understood by the Delphi compiler coders. This interpretation also found its way into FPC: Less intuitive is probably that when a RawByteString is assigned to an AnsiString(X), the same happens: no code page conversion[...] It's clear that a conversion *can* be omitted for every assignment *to* an RawByteString. That's one of the purposes of that type - to avoid excess conversions into CP_ACP or UnicodeString. But it's unclear why the heck the assignment to any *other* AnsiString type should be omitted, as soon as the source string is a RawByteString??? Therefore I'd suggest an compiler switch, implementing the lame Delphi compatible behaviour only on *demand*, while the FPC default would force eventual conversions with *every* assignment to any other (non-CP_NONE) AnsiString type. This simple change will safely prevent strings of different static and dynamic encoding, so that according tests can be removed safely from library *and* user code. The proper use of RawByteStrings deserves further documentation, for users who want/need their own (generic) stringhandling routines. Topics should be: - how to determine the dynamic encoding of strings (StringCodePage) - how to force required conversions (SetCodePage) - how to deal with strings of different encodings - how to minimize the number of string conversions DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] ThousandSeparator
Ewald schrieb: Of the OS/window manager actually. You are of course right in that there are a certain set of separators that can be used, but the exact separator to use is dependent on the system. Sounds easy, but just yesterday I ran into a bunch of related problems. Even if the following is somewhat OT, my observations may be helpful to somebody else: I just encounterd an really problematic case, with the Ez-Builder IDE. That program aborts when the decimal separator is not a period '.', asking the user to adjust his national/language settings in the system. So what can/should a user do, in order to run this program on my German Windows with a comma ',' as the decimal separator? A developer might ask the author to add proper handling of the system-wide national settings. But when that author spends time in presenting instructions, how to change the inconvenient setting, instead of correcting his code, I doubt that such a wish will be heard :-( The dumb user might follow the instruction, causing problems in all other programs :-( When the system does not notify all other (running) programs of such an global change, or when some other stupid program doesn't know how to deal with changed settings, the user better shuts down and restarts his system, before and after using that ill behaved program. But exactly *what* should a clever program do, when it receives such a change notification? What should happen with the formatted numbers, shown in the forms of the program? Which code (app/OS?) puts the separators into formatted number strings? I don't know if it's worth to discuss such problems in detail, so let me only present my preferred handling: 1) The actual settings are determined at program start, and remain unchanged until program termination. 2) Formatted numbers, as enterd by the user (maybe by copypaste from other applications), can have various encodings. Before a conversion into binary values I'd remove all unexpected characters, except for the last (rightmost) '.' or ',', which then becomes the decimal separator as expected by the decoding function (RTL provided). 3) For all other (non-GUI) purposes a unique string format is used, according to the conversion functions used by the compiler. This means no thousands separator, and a '.' decimal separator. But back to the original problem: I managed to create another user, whose number format settings match the expections of the Ez-Builder, while using my German keyboard. For Linux users this may sound like an easy job, but adding and configuring users in Win8 turned out as kind of a nightmare :-( Win8 requires an eMail address for every new user, but entering a fake address only allows to create the account, without any chance to log in subsequently. Probably the requested password has to be established by mail, at least I found no way to disable or specify or reset the password for the new account. Fortunately I had retained an Guest account, could log in and adjust the format settings as prescribed, and then could successfully start Ez-Builder. After all I hope that these problems are due to the cheap (Premium?) version of my Win8, that is *intentionally* crippled in several ways. Conclusion: Proper handling of separators in formatted numbers is essential, or else users may run into so big trouble, that they will drop your program as unusable. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Mattias Gaertner schrieb: For example: CP_ACP=0, DefaultSystemCodePage=1252 That means static code page is always 0, while dynamic code page can be 0 or 1252. Both describe the same encoding. A *dynamic* encoding *never* can be CP_ACP nor CP_NONE (in Delphi). These values are allowed only for *static* types in type declarations. CP_UTF16 is also not allowed. Delphi StringCodePage reports the current default codepage (DefaultSystemCodePage) for empty AnsiStrings, CP_UTF16 for all UnicodeStrings. In section RawByteString: the results of conversions from/to the CP_NONE code page are undefined. ... because CP_NONE is not a real code page. The same for CP_ACP. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Michael Schnell schrieb: So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. Not in Delphi XE. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicodesupport
Jonas Maebe schrieb: On 26/11/14 17:41, Tomas Hajny wrote: BTW, in this context - can users choose UTF16BE on little endian platforms (and vice versa)? No, because we do not have any routines that allow a user to set/change the codepage of a unicodestring (either at run time or at compile time). What about file I/O? It should be possible to read (and write) files of either endianness. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support
Jonas Maebe schrieb: Technically, that section literally states that they will be concatenated without data loss and that the result is then converted to the target string's encoding (except in case the target is RawByteString). How that is implemented exactly is undefined; again in the meaning of undefined, not in the meaning of undefined when defined as meaning X. In this case the implementation is compiler specific, somewhat different from undefined (in a RawByteString): CP_NONE: this value indicates that no code page information has been associated with the string data. The result of any explicit or implicit operation that converts this data to another code page is undefined. IMO the result is well defined: it's the string with the encoding of that other codepage. An undefined result, as I understand it, would mean the result can be anything, unrelated to the function input. The branch taken in execution of an IF statement also is not undefined, only because it depends on the actual condition value. The value of a local variable initially is undefined, i.e. can be any value. But after an assignment it *is* defined, even if that value still may be *unpredictable* by static code analysis. IMO a better wording should be found, that does not cause the current obvious confusion of some readers. Regarding RawByteStrings there has been the definition a RawByteString has exactly the same behavior as assigning that AnsiString(X) to another AnsiString(X) variable with the same value of X: no code page conversion or copying occurs. Seemingly this is not true for the intermediate results of concatenations. That paragraph only specifies that code page-aware strings are concatenated without data loss, and then defines to which code page the result will be converted before assigning it to the target. What's the meaning of no copying occurs? Of course the reference to the string is copied into the target variable! What's the same value of X, in case of AnsiString(CP_ACP) and AnsiString(DefaultSystemCodePage)? Even if the intermediary result of a concatenation would be a RawByteString (which is not stated nor necessarily ever the case), then the above would apply and hence the (dynamic) code page of that RawByteString would be the one as defined by the above-mentioned rules before it would be assigned to the target. Please note that the other statements refer to *static* encodings, therefore my question about the (assumed) static encoding of an intermediate result. When the compiler inserts an conversion request based on *static* encodings, will it or will it not insert such an request, before an intermediate result is assigned to the target variable? Suggestion: During string operations the source strings are converted [to CP_ACP?] when they have a different [dynamic?] encoding. When the result is stored in a variable, it is converted as required by the static encoding of the target. Where as required means that a static target encoding of CP_ACP is replaced by the DefaultSystemCodePage, while CP_NONE does not require a conversion. The CP_ACP case should be clarified as well, because it's unclear whether CP_ACP(=0) is *considered* equal to the current DefaultSystemCodePage, even if both values are *always* different (see above). The use of CP_ACP instead of DefaultSystemCodePage can be confusing and should be avoided or clarified before. Perhaps it would help to concentrate on the following steps: 1) (string) operand fetch 2) (string) operations 3) (string) assignment 1) Fetching an operand removes any information about the static encoding of the source, only its dynamic encoding persists. [Now the handling of non-AnsiString sources can be explained, like for literals, ShortString etc. RawByteString is not special here, it's only a static encoding. ] 2) String operations take into account the dynamic encoding of their operands, with lossless conversions inserted as required. 3) When a string is assigned to a variable, it is eventually converted as required by the static encoding of the target, with possible data loss. [about required see above. Special case: when the source is a variable, no conversion occurs when the *static* source and target types are compatible. What exactly is compatible with CP_ACP? ] DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] ThousandSeparator
Michael Van Canneyt schrieb: The ThousandSeparator is char and supports only 1 byte characters. For example French and Russian need more. Are there any plans to extend it? Plans: yes. Time: no. Maybe a widechar is sufficient ? Making it a string is more invasive than making it a widechar. Are all possible separators members of the Unicode BMP? What when a properly decorated string has to be converted to a specific (AnsiChar) codepage? I'd assume that national separators are part of the according codepage, but is that always true? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] ThousandSeparator
Mattias Gaertner schrieb: Does concatenating a string and a WideChar create a UnicodeString? Can this become a problem? Concatenation requires 2 strings, so everything depends on the concrete code. Regardless of eventual compiler magics, something like this will happen: var c: WideChar; s, cs: string; cs := c; //dunno if accepted by the compiler s := s + cs; The WideChar can be converted into an Unicode (UTF-8 or UTF-16) string. Afterwards this string may need another conversion, when the other string has a different encoding. In the worst case *both* strings are converted to the default Unicode representation (Delphi: UTF-16, Lazarus: UTF-8?), before they are concatenated. Another conversion may occur when the resulting string is assigned to a variable. All this may become simpler when CP_ACP is used (at least in Delpi), and the separator is given in that encoding, as a single byte/AnsiChar in case of an SBCS CP_ACP. When Lazarus instead uses UTF-8 (MBCS) for CP_ACP, the character occupies more than one byte, so that this simplification is impossible. This suggests to store the delimiter as an string, instead of a WideChar, whereupon a concatenation of the strings may not require any further conversion. Finally, when the expression (s+cs) is of type RawByteString (depending on the involved function declarations), the result will be stored in the target variable *without* another conversion. Then the static and dynamic encoding of s may be different afterwards. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Clarify expression grammar
Vsevolod Alekseyev schrieb: Hi all, in the FPC reference at http://www.freepascal.org/docs-html/ref/refse68.html#x127-13700012.1 , the formal grammar spec only goes down as far as factor. Can I please see the grammar for variable reference? A variable reference can be an arbitrarily complex thing; for example, MyStructArray[MyFunction(I)*10+1].StructMember[Ord(J)] is a perfectly valid variable reference. ACK I'm also missing the ^, . and [...] operators/selectors from the list of operators. [This is a second post, the first one didn't show up yet] DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Clarify expression grammar
Vsevolod Alekseyev schrieb: Hi all, in the FPC reference at http://www.freepascal.org/docs-html/ref/refse68.html#x127-13700012.1 , the formal grammar spec only goes down as far as factor. Can I please see the grammar for variable reference? A variable reference can be an arbitrarily complex thing; for example, MyStructArray[MyFunction(I)*10+1].StructMember[Ord(J)] is a perfectly valid variable reference. ACK I'm also missing the ^, . and [...] operators/selectors from the list of operators. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Small virtual machine to cross compile FPC
Paul Breneman schrieb: I think 100Mb is a bit small. You'll need cross-binutils, X, cross-dev libs and whatnot. 650Mb would be feasable, I guess. Thanks for that info, but couldn't most of that be download into the VM *after* it is running? Seems to me I'd like the *smallest* VM and then have a way to load things into that standard PC. But maybe I'm thinking wrongly? If so please help me get it right. I don't understand why the VM *size* should matter - unless it's 30GB for current Windows versions. My goal would be a *simple* OS, easy to configure and manage, and then install into it whatever is required. Why download and configure all the required tools whenever the VM is run? This may take half an day, to get the VM up for cross-development, and the downloads end up on the virtual disk as well. For cross-development I'd install a network of dedicated target VMs, one of which can host the project files, and then build the project in every target VM. This would allow for parallel builds, and every created executable can be tested immediately on its platform - also in parallel for comparison of the GUI and operation. With a single development VM you would need another VM or emulator to perform the final checks, for every single target platform. I've looked at (or tried) laz4android and fpcup. Seems that such an approach would work much better on a standard PC? Virtual machines work well on the same hardware (CPU), but for other targets (ARM instead of x86) an emulator is required. Wikipedia says that a LiveCD and AndroVM with Android for x86 is available, where it might be possible to develop Android applications somewhat natively on an x86 machine. But finally an emulator or physical device is required, where the cross-compiled programs can run on their target CPU, using the according libraries (RTL, VCL... for ARM). Please don't ask me about Adroid, my experience is limited to FPC/Lazarus development on various Windows and Linux VMs, and I never tried to cross-compile myself. Why cross-compile when I cannot check the results? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: On 28.10.2014 10:15, Michael Schnell wrote: On 10/27/2014 05:17 PM, Hans-Peter Diettrich wrote: Something like ShortString and AnsiString? Only that ShortStrings can easily be avoided (AFAIK, no great performance advantage to use them) and hence are seldom used right now. ShortStrings don't have implicit initialization/finalization, thus no implicit try/finally blocks, which at least with FPC's platform independant exception handling mechanism or SEH on i386-win32 have quite some performance impact even in case no error occured (SEH on x86_64-win64 (and in theory arm-wince) only has an impact in case of an error (and an impact in binary size for the exception tables)). Also there is no reference counting for ShortString. So: basically the performance of reference counted objects compared to normal objects is more or less similar to the performance of AnsiStrings compared to ShortStrings (it's not completely equal, because AnsiStrings are allocated on the heap while ShortStrings are on the stack, but it's good enough...). And did anyone yet complain about the performance of AnsiStrings? ;) Right, this entire discussion is somewhat fruitless without benchmarks. I wonder how difficult it would be to implement the existing Interface refcounting model for TObject, so that this runtime variation could be tested and benchmarked as well, in addition to the current compiletime approach. According to the problems of the compiletime approach, revealed in this thread, it looks not viable to me at all. Just an idea about type incompatibilities: When a TArcObject cannot be assigned to a TObject variable, because a conversion (as between ShortString and AnsiString) is impossible, then a delegate could be created that turns the TArcObject into something compatible with TObject. I have no idea how this could be accomplished[1], but as long as it only affects refcounted objects, the overhead has to be accepted when using ARC objects at all. Some overhead is inevitable for ARC, and everybody should be free to decide whether such overhead is acceptable for his projects or targets. Not using ARC at all should always be an option. [1] Perhaps the same (possibly simplified) mechanism could be used, for TArcObjct/TObject conversion, as for Interface/TObject conversion. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: On 28.10.2014 10:19, Michael Schnell wrote: On 10/27/2014 07:59 PM, Sven Barth wrote: - in code that does not use ARC (modeswitch arc off - the default; or maybe better a local directive) all instance variables are considered weak While I do have a vision what weak means here, can you give an exact description ? - no change in reference count when assigning a refcounted object variable to it - no change in reference count when assigning it to a refcounted object variable I suspect that this can cause premature destruction of the object, when either - another value is assigned to the refcounted object variable - all other (counted) references to the object disappear But I don't have a solution for these problems, as long as the compiler inlines the refcounting code at compile time. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: On 28.10.2014 09:57, Hans-Peter Diettrich wrote: Something like ShortString and AnsiString? Take unit Typinfo for example where quite some methods take a TObject instance. The TypInfo methods can determine the exact type of their arguments, and act inside accordingly. If you have a method X that takes a TObject parameter how do you plan to pass it a reference counted object when these and TObject are not compatible *at compiletime*? That's intentionally impossible in general. For TypInfo, a dedicated method (override) can be added, or an untyped parameter can be used like in FreeAndNil. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: On 30.10.2014 04:14, Hans-Peter Diettrich wrote: I wonder how difficult it would be to implement the existing Interface refcounting model for TObject, so that this runtime variation could be tested and benchmarked as well, in addition to the current compiletime approach. According to the problems of the compiletime approach, revealed in this thread, it looks not viable to me at all. The code would mostly be the same as the one I already implemented. Add virtual to the ARCDecRef, ARCIncRef and ARCRefCount methods of TObject, adjust the RTL helper functions to not expect a refcount field at a specific offset, remove the restrictions that reference counting is only done for classes marked as refcounted and you're done... Looks quite easy :-) Could you introduce this feature into your branch, by conditional compilation? Mark just pointed me to another problem, possibly unhandled yet. Interface refcounting must be updated, as soon as the underlying object becomes refcounted as well. Do you already have an idea how to handle refcounting for classes with interfaces? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: Am 27.10.2014 21:00, schrieb Hans-Peter Diettrich: Sven Barth schrieb: Am 27.10.2014 17:20 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: Something like ShortString and AnsiString? With the difference that Short- and AnsiString are assignable to eachother while Jonas does not want that for reference counted and ordinary classes. Where would this matter? When TObject and TManagedObject are different (base) types, a direct assignment of references is impossible. Take unit Typinfo for example where quite some methods take a TObject instance. The TypInfo methods can determine the exact type of their arguments, and act inside accordingly. Or all those classes (TStrings, TObjectList, TComponent, etc.) that somewhere take a TObject as parameter. IMO containers play a different role in managed and unmanaged environments. E.g. an TObjectList.OwnsObjects property is useless with managed objects, and the circular owner/child and parent/child references in several persistent classes deserve special attention and handling, when used with managed objects. Similar considerations apply to strings - should TStrings contain AnsiStrings or UnicodeStrings, where despite their assignment compatibility the implicit conversions between both can consume much runtime. For such reasons I'd prefer a separate environment (RTL...) for only managed and unmanaged objects, just like for AnsiString and UnicodeString. But in combination such options would end up in many different library versions, so that I do not really suggest such an implementation. My dream are distinct FPC/Lazarus versions, designed for compatibility with D7, D2009, Unicode, Mobile and whatever versions may show up in the future. Then it should be possible to freeze the old versions with all bugs fixed, and new features will be added only to newer versions; this would eliminate all beforementioned problems, resulting from mixing features of different Delphi versions. IMO Delphi versions don't offer backwards compatibility for good reasons, instead a purchased licencse allows to *also* use all older versions, down to D7. What I'm missing here are bugfixes, because the development of older versions is almost stopped as soon as a new version is distributed. Known bugs are mostly fixed only in newer versions, which introduce new bugs and features at the same time - good for sales but bad for the customers. Since FPC/Lazarus are open source, user groups may offer continued support for their preferred version(s), by backporting bugfixes into these versions. What do you mean with virtual counting methods? Overriding these methods can enable/disable refcounting for a class, and all classes derived from it. The default then can be to do nothing (no counting). But then it's the same reference counting as the COM one, because without the COM reference counting those virtual methods would not be called. So you prefer to inline these methods? Now I understand why you want refcounting fully handled at compile time... DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Boian Mitov schrieb: In general the C/C++ notion of doing as little in the language as possible, and as much in library has worked very well for it over the years. Yes, pluggable languages concept has existed at least since C ;-) . I agree, and as I said has worked well. AFAIR such languages lack compatibility with themselves, as soon as projects start using their private extensions. Then no project can borrow parts (libraries...) from other projects, in the worst case. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: Am 27.10.2014 23:41 schrieb Boian Mitov mi...@mitov.com mailto:mi...@mitov.com: Well... we may differ on this one. I absolutely love attributes, but I guess that is just me :-D . I think attributes are the greatest thing that has happened to Delphi ever, I just wish they ware not so limited. Attributes allowed us to cut 3/4 of our code base. You can't beat that easily. Let me clarify: I have nothing against the concept of attributes, I just dislike the syntax and the introduction of attributes that influence compiler behavior. +1 DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Small virtual machine to cross compile FPC
Paul Breneman schrieb: I've spent a bit of time during the past 7 years trying to figure out how to simplify things by avoiding cross-compiling. This page has many of the details: http://turbocontrol.com/monitor.htm I think there is a way to simplify cross-compiling. Levinux is a small (~20 MB) QEMU download for x86 PCs (Windows, OS X, Linux) that provides a small Tiny Core Linux VM. I'd like to see something similar but with all the files and tools needed to pull the latest source code and cross-compile FPC (also with Debian instead of Tiny Core?). http://mikelev.in/ux/ It seems to me that such a small VM should allow a nice standard method that will make it easy to test and see things work. I look forward to your thoughts and comments! I wonder why you need or use cross-compilation at all? The biggest part of an cross compiler are the target specific libraries and tools, which allow to create executables for use on a specific target. IMO it will be easier to create a dedicated VM for every target, and install FPC there, instead of adding cross-compilation features for many targets to whatever machine. Mobile devices often require their own emulator, or a physical device, for program development, a single VM is of little use herefore. IMO. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Jonas Maebe schrieb: Additionally, as mentioned before, I still believe it's a very bad idea to be able to inherited from a regular class and turn it into a reference counted class. Reference counted and non-reference-counted classes are different language entities with different behaviour and different code generation requirements, and hence should be completely unrelated. Something like ShortString and AnsiString? I agree that a *compiler-based* implementation, of a single TObject base class, would require two sets of libraries, starting with the RTL, else a mix of units with different object types cannot be avoided in an executable. And it would almost disallow to use DLLs, of a possibly different model. Even if you completely forbid typecasting a reference counted class into a non-reference-counted parent class, simply calling an inherited method from a non-reference-counted parent class can easily completely mess up the reference counting (e.g. suppose that inherited method inserts self into a linked list). ACK. The only way out; I can see; is adding the *possibility* of refcounting to TObject, meaning Add/ReleaseRef methods and a RefCount field. Then the compiler can safely generate refcounting code for *all* objects and non-weak references, and the counting methods take care of required operations. Delphi offers two means for specialized refcounting, the virtual counting methods, and the (COM compatible?) refcount value of -1 for unmanaged objects. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: Am 27.10.2014 17:20 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: Something like ShortString and AnsiString? With the difference that Short- and AnsiString are assignable to eachother while Jonas does not want that for reference counted and ordinary classes. Where would this matter? When TObject and TManagedObject are different (base) types, a direct assignment of references is impossible. What do you mean with virtual counting methods? Overriding these methods can enable/disable refcounting for a class, and all classes derived from it. The default then can be to do nothing (no counting). The main reason I decided not to introduce reference counting for every class was that some people feared the performance impact of the reference counting. Though Florian said that it shouldn't be that bad on today's CPUs... Did you ever benchmark your model? That said: if someone wants to test it one could add refcounted to TObject (my code should(!) handle that correctly) and see what happens... (of course there will be problems with circular references then) Fine :-) DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Kostas Michalopoulos schrieb: On Mon, Oct 27, 2014 at 5:17 PM, Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com wrote: Then the compiler can safely generate refcounting code for *all* objects and non-weak references, and the counting methods take care of required operations Wouldn't that cause all objects to pay (both in terms of performance and memory) for something that they don't use? Right, that's why I suggest to keep both models separate. But the real runtime impact has to be benchmarked - it may be as low as with other managed types (AnsiString...), where nobody has complaints. Memory usage (4 bytes per object) should not matter, Delphi accepts it just for mobile devices! IMO it is better to fully disallow subclasses from introducing reference counting than force functionality on objects that they don't need to use. It *looks* better, but has several issues. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: A semicolon has the problem that you need to distinguish between it being a modifier and a normal following identifier as not every keyword is a keyword in every context (like for example read and write for properties). In this discussion I almost miss the elementary distinction between keywords (reserved words) and directives. Unlike keywords, directives are context sensitive and can be used as identifiers in all other places. That's why directives should *follow* identifiers, never precede them. The semicolon usage is not well designed in Delphi, additional (intermediate) semicolons are not required and should be banned. Then the parser can continue to search for directives until the end of an applicable construct (declaration...) is found, which may be a semicolon or something else (comma, parenthesis...), depending on the construct syntax. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: Am 25.10.2014 03:17 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: - a class instance is destroyed once the reference count reaches zero (and Free does not work for them) Shouldn't Free be usable as a finalizer, clearing all references to other objects within this instance? One could do that (for now I've chosen the simple way). One would however need to check how this would be implemented best (e.g. it should be marked somehow so that the destructor later on does not try to work with already finalized fields; also all fields (Strings, arrays, interfaces, records, etc.) should be finalized so that it is consistent). A finalizer must clear all managed fields, otherwise memory management were corrupted. Doing so may destroy other managed objects, so that possible consequences must be considered. I wonder whether the sequence of clearing fields may cause trouble? Also it needs to be observed how other reference holders might react to that zombie instance. Right, this should be considered by the developer. A further problem might be legacy code which gets passed a reference counted instance (on which ARCIncRef was called to keep it alive) and which then calls Free. Might not be the intended result by neither code... This might be the reason of Embarcadero to implement Free as a no-op and add a new DisposeOf which does what you suggested. Then Delphi compatibility has to be maintained. Is DisposeOf fully automatic, or can it be overridden or otherwise influenced (field sequence...)? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Proof of Concept ARC implementation
Sven Barth schrieb: Hello together! I've now finished my Proof of Concept ARC implementation which is based on the RFC I published a few weeks back: http://lists.freepascal.org/fpc-devel/2014-September/034263.html Fine :-) To recap: [...] - a class instance is destroyed once the reference count reaches zero (and Free does not work for them) Shouldn't Free be usable as a finalizer, clearing all references to other objects within this instance? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] suggestion: virtual method co-variance
Sven Barth schrieb: At least at first sight there don't seem to be any real (technical) reasons to not covariance for return values. Parameters would be a different topic though... Just so I get the idea right: === code begin === type TBar = class function Test: TObject; virtual; end; TFooBar = class(TBar) function Test: TStrings; override; end; //... I just wonder about the purpose and use of such refinement. Should the compiler relate on an more specific return type, based on the *static* type of an object reference? I'd use different names for specialized methods/properties instead. OTOH it would be nice to have specialized lists without much coding, i.e. without writing getters (and setters) which only do typecasts of their results. Something like type TBar = class property Test: TObject ...; virtual; end; TFooBar = class(TBar) property Test: TStrings; override; end; Please note that this kind of override should not require to override the getters/setters, it only would enforce (static) type checks/casts, as doable at compile time. But that smells like Generics, which already have their place in OPL... Parameters would require different handling, because a single type mismatch would defer to the base class implementation, with possibly strange effects when this results in an bypass of the modified code in the overridden methods. That's another argument for my above suggestion, which should not only eliminate the need for overriding related methods, but should *disallow* explicit getter/setter overrides - so we were back again at generics? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Mark Morgan Lloyd schrieb: Boian Mitov wrote: I think parallel processing belongs in library implementations. I have reservations, based in part on the fact that other language implementations are prepared to assume responsibility for parallelisation, in part on experience with e.g. APL which at the very least specifies that the user should assume that operations are parallelised, and in part on the fact that FPC already vectorises on e.g. SSE2 hardware. What do you (both) mean by parallel processing? Streaming (SSE...) does *vectoring*, i.e. multiple (floating point) operands of the same *array* are processed in parallel. Such cases can be handled by the compiler, no libraries are involved, no threads, no risk of side-effects. When instead a general loop, possibly containing multiple statements, is broken into multiple loops, which are processed in parallel, side-effects can occur depending on the operations (virtual methods...) in the loop. Then it IMO is up to the developer to check which loop can be parallelized without side-effects, and indicates that to the compiler. In this case the compiler could turn the body of the loop into an TThread, and insert an RTL call to execute this thread split into multiple instances. The RTL then creates the threads at runtime (depending on what? cores, already active threads...?), assigns to each an subrange of the entire loop interval, starts them and waits until all of them have terminated. This would limit the types of loops to FOR loops, with a known interval, excluding REPEAT, WHILE and FOREACH loops. Right? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Florian Klämpfl schrieb: If the time spent in this thread had been spent in coding, FPC would have already ARC. The list has approx. 600 members, 200 message were written. If each of the 600 members spent on average 1 min reading this thread, this is 2000 man-hour, i. e. approx. 1 MY :) First understanding the compiler code will take more time, until one knows where to start coding. Second I already supplied two proposals, which could be implemented in a few lines: 1) Use virtual _AddRef and _Release, override for ARC classes 2) Dto. non-virtual, add _RefCount to TObject, init to -1 for no ARC In either case the compiler inserts calls to _AddRef/_Release wherever it already does for interfaces. Last not least somebody must be entitled to implement ARC, against all objections of the users ;-) DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Sven Barth schrieb: There are however some nasty problems inside constructors and destructors, because Self is reference counted as well (and should be after all as we don't want the instance to be destroyed behind our backs suddenly). IMO before the end of a constructor, and before the start of an destructor, no references to the object exist at all, so that code outside these methods has no reference that could cause trouble. It looks to me like inside methods Self doesn't deserve refcounting, because a method can be invoked only with an existing instance, which will stay alive at least until the call returns. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Marco van de Voort schrieb: In our previous episode, Sven Barth said: It looks to me like inside methods Self doesn't deserve refcounting, because a method can be invoked only with an existing instance, which will stay alive at least until the call returns. That's the thing I'm not yet entirely sure about. Though disabling it for Self would definitely simplyfy things. I'll simply give it a try and then my constructor problems should hopefully be solved as well... Methods might return SELF as function result? Yes, and this might be a problem during and immediately following construction. When the constructor passes Self to another procedure, that procedure must not do anything that increases and later decreases the refcount. Increasing the refcount during construction might help, but then the refcount cannot be decremented at the end of the constructor. And what should happen to the result of Create? MyObj := TMyObj.Create; is fine when the refcount is increased to 1 when the instance is assigned to MyObj. In contrast TMyObj.Create; looks somewhat useless, and how to destroy this zombie later, without refcounting? This construct might be legal with TThread.Create? Can somebody test what Delphi does in this case? Suggestion: The constructor inits the refcount to 1, and decrements it on exit - *not* using _Release! This will prevent destruction by refcounting during construction. The compiler uses (always, or only with latter syntax?) a hidden local variable, where a new reference is stored and refcounted. This will destroy an zombie on exit from the subroutine (maybe unit initialization!). Another one - weak references: When the compiler only handles immediate assignments to weak references, such a variable cannot be passed as VAR, because the called code cannot know about that special handling. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Sven Barth schrieb: On 21.09.2014 21:09, Hans-Peter Diettrich wrote: Sven Barth schrieb: [...] I'd add a _RefCount field to TObject, regardless of whether it's really used later; this will fix the field offset - just like the VMT reference is fixed in TObject, but not in Object types. This will eliminate problems with class helpers. I've especially written that it's not part of *every* objects, because people complained about the size increase for all instances. That's why I also suggested an compiler option, useful for all people which care about program size or ARC at all. In general I agree with your thoughts, I only wanted to add a few remarks. [...] Here the compiler would always insert _AddRef, just like with interfaces, eventually optimized (inlined?) like: if Result._RefCounter -1 then Result._AddRef; //or InterlockedIncrement(Result._RefCounter); And that's another thing: people complained about having that reference count overhead for *all* assignments. See above :-) [...] It. Was. Just. An. Example. To. Illustrate. The. Problem! I would implement that differently as well, but there's *no* point to do that inside an example that's supposed to be as simple as possible! Is it me, or what else makes you so angry today? Sorry for that :-( Regards DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Sven Barth schrieb: Am 22.09.2014 09:47 schrieb Michael Schnell mschn...@lumino.de Why not use interface to add ref-counting to an object ? This seems to work nicely even though the name interface in not speaking on that behalf. Because you'll need to declare an interface for each class you want to have reference counted so that you can access its methods, properties, etc. This overhead could be eliminated by another syntax extension, like TMyARCclass = interface(TObject) where the Compiler could allow for implementations of the declared methods just as for TMyARCclass = class(TObject) bridging the gap between traditional (strictly declarative) interfaces and classes (including implementations), with or without ARC. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] weak referencing (was Suggestion:.....)
Marco van de Voort schrieb: (to Sven) So the cycle break mechanism is going to be marking potential cycle cases as weak. Do you still plan to at least detect cycles for debugging purposes? Or is the cycle detection itself already too hard? IOW I'm wondering what will happen (and what to do) if there is a cycle in a sufficiently complex program. I could imagine a tool for that purpose, instead of burdening the compiler with such rarely used functionality. More diagnostics could be removed from the compiler, like the detection of unused local variables or units - if that helps to speed up compilation. Separate diagnostic tools could immediately offer means to solve the detected problems interactively, what's not the purpose of an compiler. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Sven Barth schrieb: Am 22.09.2014 12:59 schrieb Hans-Peter Diettrich drdiettri...@aol.com That's why I also suggested an compiler option, useful for all people which care about program size or ARC at all. The problem with the compiler option is that you'd need to rebuild the complete RTL, packages and your application so that every unit has these changes. Otherwise e.g. the RTL would still contain reference counting code and TObject would still contain a reference count field. Right, it should be a compiler *build* option, so that everybody can create his favored flavor(s) of the compiler, and decide which one to use with which project. Every compiler then should provide a predefined constant for every such option, in case specific handling is required by conditional compilation of user code. Is it me, or what else makes you so angry today? Sorry for that :-( I'm sorry that I got aggressive, but it's quite frustrating when one writes a simple example to illustrate something and the first complain is how to make it use a better design which is completely besides the point of the example -.- Then I missed the point of the example(s). It looked to me like something final, well thought like the rest of your message. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Boian Mitov schrieb: In general, records and classes are inherently the same thing (and in C++ are indeed practically interchangeable). This model might have been the reason for introducing Object at all, for compatibility with CBuilder. The only real difference in Delphi/FPC is that records are instantiated in the stack, the objects in the heap, Like records, Objects can reside in either the stack, heap or even static memory. and the artificial restriction on record inheritance. Why inherit when you can't override virtual methods? I convert my Records into Objects, when I want to extend an record. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Marco van de Voort schrieb: In our previous episode, Hans-Peter Diettrich said: IMO Weak references should be reserved for users who accept possible consequential problems, but should never be used in standard libraries. At least I'd suggest to make weak references subject to an compiler switch, so that every user has a chance to disable them in case of trouble. IMHO weak references trade one manual memory system in for a different manual memory system. Weak references steer/guide *automatic* memory management. The hard part of manual memory systems, figuring out how a complex dynamic structure deallocates (that is usually tackled by having a bit of design and thought go into it), remains. And this is what the user of such a structure (standard libraries...) does not always know. He may be unable to determine the reason for some runtime error in his code, when an object was destroyed automagically where it should still be alive. The user can debug (and fix) ordinary (owner/owned) patterns, implemented in high level code, but not table-driven (RTTI) or otherwise hidden (intrinsic) management procedures. While the user can change the owner of an object at runtime, he cannot change a weak reference into a strong one, without recompilation of at least the unit containing that declaration, and without figuring out the consequences of such an change. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Sven Barth schrieb: On 20.09.2014 13:42, Sven Barth wrote: On 20.09.2014 13:11, Peter Popov wrote: - to remedy this TObject is extended with non-virtual methods that allow manual reference counting and would rely on the RTTI data I mentioned (let's call the methods AddRef, Release, IsReferenceCounted and RefCount for now, which can also be used to hook up the reference counting of IUnknown interfaces); I'd add a _RefCount field to TObject, regardless of whether it's really used later; this will fix the field offset - just like the VMT reference is fixed in TObject, but not in Object types. This will eliminate problems with class helpers. This approach also would allow to switch any object from managed to unmanaged on the fly, by setting the counter to -1, because the special value -1 already indicates an unmanaged/const memory object (like with string literals). In my first draft I considered virtual _AddRef/_Release methods, but calling a virtual method is more expensive than calling or inlining a static method. the code from above would then look like this to make it safe: === code begin === function CreateObject: TObject; begin Result := TARCObject.Create; Result.AddRef; end; === code end === Here the compiler would always insert _AddRef, just like with interfaces, eventually optimized (inlined?) like: if Result._RefCounter -1 then Result._AddRef; //or InterlockedIncrement(Result._RefCounter); - TObject.Free would be extended to take reference counting into account as well. If the object is reference counted (IsReferenceCounted returns true) it will call Release and otherwise it will continue to Destroy. - there would be a TARCObject declared in System which is a direct descendant of TObject, but with reference counting enabled; same maybe also for TInterfacedObject The convention, of -1 meaning unmanaged, favors managed objects by default, when InitInstance zeroes all fields of the instance just created. But when the VMT reference must be excluded or inserted afterwards afterwards, then _RefCount can be initialized at the same time (to -1 for the unmanaged default). Later on a TARCObject base class constructor/initializer will reset _RefCount to zero again. - all classes can now have operator overloads as well though it should be warned in the documentation that non-reference counted objects might result in memory leaks there ...unless operators also test _RefCount - this now only leaves the problems of cycles; take this code: === code begin === type TSomeClass = class(TARCObject) Children: specialize TListTSomeClass; Owner: TSomeClass; constructor Create(aOwner: TSomeClass); end; constructor TSomeClass.Create(aOwner: TSomeClass); begin Children := specialize TListTSomeClass.Create; Owner := aOwner; if Assigned(Owner) then Owner.Children.Add(Self); end; Here I'd prefer Owner.AddChild(Self); so that the Owner can implement any decent/appropriate child management under the hood. procedure Test; var t1, t2: TSomeClass; begin t1 := TSomeClass.Create(Nil); t2 := TSomeClass.Create(t1); // do something end; === code end === Now once Test is left it would leave the instances which were assigned to t1 and t2 hanging, because they have references to each other. This depends on the implementation of TOwner.Children[] and TChild.Owner. Is a stored TChild.Owner reference really required in a managaged environment? IMO a (strong) unidirectional reference from Owner to Child will do it all. Then no child will be destroyed, as long as its owner holds a reference to it. That's the intended purpose of both owner/child and automatic memory management. When it's desireable to definitely destroy an owned object at will, then its owner must be known, of course. In this case two different management approaches conflict with each other. In this case I'd accept a weak Owner reference, because the referenced Owner will stay alive longer than it's listed children. More problematic are circular references without a decicated owner/child relationship. There are (as far as I see) three ways to solve this: * provide a way to break the circle (in this example e.g. setting Owner to Nil before leaving Test; this is what Delphi provides with the DisposeOf virtual method) * introduce weak references which would disable reference counting, e.g.: === code begin === type TSomeClass = class(TARCObject) // ... Owner: TSomeClass weak; // ... end; === code end === Now the TSomeClass.Create(t1) line in Test wouldn't increase the reference count of t1 further and thus both class instances would be destroyed after Test is left. This IMO is the preferable way to go, in a definite owner/child relationship. The lifetime of an owner can not depend on the existence of owned children, so that the owner will survive until it has destroyed/released all his children himself. A child-to-owner
Re: [fpc-devel] Suggestion: reference counted objects
Florian Klämpfl schrieb: Am 21.09.2014 um 07:22 schrieb Hans-Peter Diettrich: Boian Mitov schrieb: That is easy. it gets incremented when it gets assigned. The running threads have no way of accessing it if there is no reference (assignment) already in place. The problem arises when an object is destroyed, or even elected for destruction in _Release, while another thread starts using the same instance. This is not possible for a correctly written program: if two threads having references to the same instance, the ref. count is 2. So no destruction is possible. What happens if such a reference (to A) is part of another object (B), known to the thread? I suspect a chance for a race condition here, when one thread clears B.A while another thread tries to acquire a reference from B.A. But this may be an excursion into threadsafe coding, where any modification to a shared resource requires a lock in a multi-threaded environment. Then above situation should never occur... Is it really sufficient to protect refcounter changes by Interlocked Inc/Dec, to prevent race conditions while obtaining object references? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Fabrício Srdic schrieb: Hello, In platforms with managed code (.NET, Java), objects are automatically freed by the memory manager / garbage collector. Would not it be interesting to have a similar feature in FPC? AFAIK some Delphi XE made TObject itself managed, by reference counting. It would be easy to introduce the same feature in FPC, so that no special base class would be required. Like with extended RTTI a decision should be made, whether managed objects should be enabled or disabled by default. Afterwards automatic management can be turned on or off for every single class or object individually. For example, through a root class where its objects are counted by reference, like the TInterfacedObjects. Thus, the programmer would be free from having to manually release objects. In practice it turned out that the automatic destruction of objects still requires assistance of the coder, in many cases, in all languages with garbage collection. I.e. a destructor (or finalizer) still is required to prepare an object for subsequent destruction. IMO it's sufficient to use Interfaces for all objects that should be subject to garbage collection. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] RTTI generating
Sven Barth schrieb: Am 20.09.2014 01:52 schrieb Hans-Peter Diettrich drdiettri...@aol.com It's up to the coder to make all properties etc. published, when he *intends* to ever use RTTI on them. That't the way to tell the compiler what to do. The extended RTTI introduced with Delphi 2010 allows you to even query private fields if the class developer decided to enable data generation for that. AFAIK it works in the opposite direction: the developer must *exclude* explicitly, in *every* unit, what should *not* be subject to extended RTTI. That's my strongest point against (Delphi) extended RTTI, while RTTI by itself is okay for me. I think that it's time to resume the work on my Delphi decompiler. I never published it before, but now it looks like it's time to wake up the XE coders, like the VB3 coders decades ago. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Sven Barth schrieb: On 20.09.2014 12:36, Hans-Peter Diettrich wrote: AFAIK some Delphi XE made TObject itself managed, by reference counting. It would be easy to introduce the same feature in FPC, so that no special base class would be required. Like with extended RTTI a decision should be made, whether managed objects should be enabled or disabled by default. Afterwards automatic management can be turned on or off for every single class or object individually. It's basically easy, yes, but then one has to deal with code like this: [...] Which could lead to some unintended side effects if o is passed to some other code which keeps the instance around and .Free merely decreases the reference count. Of course that would have been a memory leak before and now it's not, but nevertheless it changes behavior. I already mentioned that destructors still are required, but will have an different purpose and usage than before. This would discourage continued use of Destroy(), BeforeDestruction() etc., which should at least be renamed to prevent unconverted legacy code from compiling. This change already will break compatibilitiy, so that consequently all libraries (in detail when dealing with lists containing objects) have to be updated. I was aware of such consequences, but I'm no more sure of the consequences of my idea of simply turning refcounting on or off for specific objects or classes. The mere implementation of refcounting for TObject is easy, but the consequences are hell :-( DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Boian Mitov schrieb: The short story is that any approach has issues. The component container approach has issue of a single ownership with the easy loss of pointers. The ref counting has the danger of circular references. The GC has the non deterministic behavior (Actually I proposed a deterministic/semideterministic GC algorithm ~8 years ago or so, but that is a different story). I don't like the use of GC as a synonym for *mark-sweep* garbage collection only. Wikipedia also states Reference Counting as just another form of garbage collection. The point is that even with GC the developer is still required to carefully manage resources, and GC tends to make it even more complex. From all the the above approaches the ARC with optional Weak pointers is the easiest to manage and the one that tends to lead to the least problems IMHO . ACK - except for Weak references. Weak references turn the conservative memory management into an aggresive/optimistic one, with unpredictable consequences. IMO Weak references should be reserved for users who accept possible consequential problems, but should never be used in standard libraries. At least I'd suggest to make weak references subject to an compiler switch, so that every user has a chance to disable them in case of trouble. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Sven Barth schrieb: Am 20.09.2014 20:34 schrieb Giuliano Colla A general mechanism to be reliable should take into account all possibilities. If it does, it will block threads even when unnecessary. If it doesn't, it will be unsafe. That would work the same way as it does in interfaces, arrays and strings: using Interlocked*-functions. As I understand Interlocked Inc/Dec funtionality, it only protects the update of the reference counter against interrupts, but not the tests required before/after this update. As a precaution a RefCount should at least be incremented as soon as there exists a *chance*, that the reference is used/copied in some piece of currently active code (thread...). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Giuliano Colla schrieb: Hi Boian, I'm easily convinced that you've developed a lot of things using reference counting. Design is the art of compromise, and possibly in your class of application that's the best compromise. But we should never forget that our class of applications isn't the only possible one in the world. What is a bonus for you might be either useless or extremely harmful for someone else. I might, for example, tell you that my company has been successfully implementing since more than 30 years a class of applications for the control of industrial processes, with hundreds of threads running simultaneously in a multi-CPU environment, [...] IMO realtime applications require an realtime OS, providing all required means of process synchronization and communication. Ordinary systems and developers should be happy with primitive threads, doing their work in the background and exiting when done. It's good to know that FPC allows to implement and manage more complex parallel processing, if I understand you and Boian correctly? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Suggestion: reference counted objects
Boian Mitov schrieb: That is easy. it gets incremented when it gets assigned. The running threads have no way of accessing it if there is no reference (assignment) already in place. The problem arises when an object is destroyed, or even elected for destruction in _Release, while another thread starts using the same instance. Indeed that is how it works in Delphi, and BTW: that is how Strings work in Delphi and FPC the last time I checked ;-) . With strings it's possible to create another (unique) copy, when a string is modified, and it does no harm when that copy is destroyed later - every user will find an valid (empty) string. Not so with objects, which cannot be copied nor reused after destruction. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] RTTI generating
Boian Mitov schrieb: On Fri, 19 Sep 2014, Adriaan van Os wrote: Your remarks seem to imply that you think RTTI can be used to inspect any aspect of an object. It was/is not meant for that. Quite incorrect. All languages with modern RTTI allow for full object inspection, and that includes Delphi 2010 and higher, C#, and even VB has it. It's up to the coder to make all properties etc. published, when he *intends* to ever use RTTI on them. That't the way to tell the compiler what to do. Inside a program there exists no distinct brave object inspector and unauthorized object garbler - both can be implemented by using RTTI. If you don't like safe types and other restrictions, which exist in Pascal for good reasons, then choose any unsafe language to implement whatever mess you like :-] DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Method for write string into TStreamt
Dmitry Boyarintsev schrieb: How about introducing a default parameter? The parameter keeps the method backward compatible, allowing write a string without the prefix. public procedure TStream.WriteAnsiString( const S: string; withLength: Boolean = true; How should a string without a length be read back? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TTimers and TThreads. Attn Michael Schnell
Giuliano Colla schrieb: If you're using relative times and not absolute ones, then you may avoid the search, without need to resort, using a slightly different scheme, i.e. entering in a sorted list the times *relatives to the previous one*. Then your queue can run out of sync with the absolute time. I don't see an advantage with using relative times, or unsorted lists. On insertion a binary search over the list can be made, when the entries are sorted by absolute time. Removal of entries occurs always from the list head. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended($FFFFFFFFFFFFFFFF) = -1?
Ewald schrieb: On 03 Mar 2014, at 00:29, Hans-Peter Diettrich wrote: `-1` would then be $1 , whereas $ would be $0 . It really is quite easy to store it like that and `fix` things [picking a fitting datatype] afterwards. The datatype has to be constructed/determined first, and *if* there exists a type with more than 64 bits, then it will be a signed type, with a 65 bit (unsigned) subrange matching your needs. But if no such type exists, you are lost. Yes, that is true, but there always is a 64 signed/unsigned type (perhaps not native). On machines where, for example, only 32 bit wide datatypes are allowed, the virtual subrange should be 33 instead of 65 bytes. Subranges are expressed in low..hi notation, not in bits, meaning that the hi value must be expressable in a valid signed positive number. Anyway, that is the way how I parse constants. The important rule here is that you don't need the full 65 bit in the final representation. The signedness of the type can fix this loss of the one bit. How (which data type) does *your* parser store untyped numerical constants? IMO your problem arises from the fact that a bitpattern, with the highest of 64 bits set, cannot be stored in a larger (signed) type, as required. All such untyped constants will cause problems when assigned to typed variables. Test yourself what happens when you convert such an QWORD value into Extended. Anyway, then you have got backwards compatibility to take care of, since there will be someone out there who's code actually depends on this behaviour. When we agree that a bitpattern of $ can be interperted differently on different 32 bit machines, as -1 or -MaxInt, Why `on 32 bit machines`? You're right, my guess of the number of bits was wrong. I'm fairly confident that this particular constant on this particular compiler version will generate the same outcome on every possible architecture out there (just change the `extended` to `single` in the original example, because extended tends to vary). That's my expectation, too. then it's obvious that such a textual representation should cause an compilation error not portable We know that such an error message has not yet been implemented, but if you insist on writing unportable code... :-] I insist on using a constant that is: - 64 bit wide - Only contains 1's - Is interpreted as an unsigned number wherever mathematical operations are performed. Then you have to choose a different language. What will C++, C# or Java do in these cases? Those demands are quite portable, no? No. My original problem was easily solved with a typecast QWord(gargantuan constant goes here), so that was no longer an issue. What baffled me though was the fact that this (mis-: in my opinion) mis-parsing of certain constants is by design. What you observed was related to the argument passed to WriteLn. When WriteLn includes code to output a QWord, then the output should reflect the bitpattern (unsigned number). The output of an Extended value reflects the value converted from integral to floating point, and that conversion assumes signed values. IIRC the x87 FPU doesn't have an instruction to load unsigned integral values, so that no compiler has a chance to make it load an unsigned value. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended($FFFFFFFFFFFFFFFF) = -1?
Ewald schrieb: On 03 Mar 2014, at 12:49, Hans-Peter Diettrich wrote: How (which data type) does *your* parser store untyped numerical constants? Roughly like this (syntax may be a bit awry, but you get the point): TIntegerNumber = Record Case SignedNess: TSignedNess of snPositive: UValue: QWord; snNegative: SValue: Int64; End; The parser detects wether there is a `-` in front of the constant and stores the right sign in the SignedNess field. A parser doesn't work like that - too many possible cases with unary minus. If you need an datatype for integers with more bits than provided by the compiler, you must roll your own datatype. Alright, let me rephrase my demands: - I want to store the value 18446744073709551615 in any kind of variable without loss of precision. See above. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended($FFFFFFFFFFFFFFFF) = -1?
Ewald schrieb: Talking about principles: If hexadecimal is actually used to represent bit patterns (as Hans-Peter Diettrich wrote), then the decision to use a signed type here seems to violate this (represent bitpatterns) principle, since the highest bit in a signed number has a different meaning than the other bits, where in a bitpattern all bits have equal meaning. That's correct. It seems like sticking to one principle (signed integer as much as possible) actually breaks another principle (bitpattern). Wirth and his Pascal language are well designed with signed types above all, and unsigned types being subranges. In so far one could consider hex constants with the sign bit set as syntactical errors. You do care about the signedness, because the only way to represent int64(-1) in hexadecimal is as $. Negative numbers never should be expressed in hex. And what about -$1? Or is that too far fetched? That's correct, because -$1 is -1 is a valid integral expression, without signedness problems. Numbers in two's complement do no consist of a single sign bit followed by a magnitude. Those top 63 '1' bits together form the - sign in this number. Yes, but this can all be solved by parsing the string and storing it with one extra MSBit (if there is a `-` in front of the constant it must be negative, otherwise it should be positive). This is why Wirth considers all types being signed, without such problems. This highest bit then reflects the sign. The sign representation is machine specific, as you know. On 1's complement machines there exist two representation of zero, as +0 and -0, and you cannot express both as hexadecimal constants in an portable way. That's why high level languages, like Pascal, forbid hex representations of (possibly) negative numerical values. `-1` would then be $1 , whereas $ would be $0 . It really is quite easy to store it like that and `fix` things [picking a fitting datatype] afterwards. The datatype has to be constructed/determined first, and *if* there exists a type with more than 64 bits, then it will be a signed type, with a 65 bit (unsigned) subrange matching your needs. But if no such type exists, you are lost. Anyway, then you have got backwards compatibility to take care of, since there will be someone out there who's code actually depends on this behaviour. When we agree that a bitpattern of $ can be interperted differently on different 32 bit machines, as -1 or -MaxInt, then it's obvious that such a textual representation should cause an compilation error not portable We know that such an error message has not yet been implemented, but if you insist on writing unportable code... :-] DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Class property and virtual getter
Michael Schnell schrieb: On 02/28/2014 02:18 AM, Hans-Peter Diettrich wrote: So the lack of Self seems to apply to static; methods, not to class methods. I'll ask in an EMBT group for a description of static;, the OH seems to reflect the C++ meaning only, In ANSI C static with functions just means unreachable from outside the current source file (i.e. by the linker) (I always thought this is a silly name for that meaning.) Is this different with C++ ? Yes. Some C keywords have multiple different meanings in C++, depending on where they occur in source code. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Extended($FFFFFFFFFFFFFFFF) = -1?
Ewald schrieb: On 28 Feb 2014, at 20:39, Jonas Maebe wrote: All hexadecimal constants are (conceptually) parsed as int64, so this is by design. int64($) is not -1. By the way, what do you do when you want to port fpc to a one's comlement machine (if they still exist)? Numerical constants, where the sign matters, should only be encoded in decimal. The other formats (hex,oct,bin...) are intended for use with binary values, where the bit pattern is important. Then the code compiles correctly on any kind of machine. Assumptions about type sizes and encodings can make *application* code unportable. E.g. the Extended type doesn't have a guaranteed size and binary representation, IIRC it's equivalent to Double on x64. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Class property and virtual getter
Michael Van Canneyt schrieb: So what's the special use of a *class* property? If it exists for Delphi compatibility only, why then is it handled differently from property? The reason is explained in the upcoming docs. Namely: a static method cannot be overridden. Sure, but virtual methods (including class methods) can be overridden. The class property is part of this particular class, and descendent classes should not be able to override it's behaviour. A static class method can call another virtual class method, so this protection looks very artifical to me. BTW Delphi XE allows to call a virtual class method, but when called from a static class method it calls it like a static method, overrides are simply ignored. Calling the same method directly honors overrides. Also self is no more known inside class methods in XE. In D7 it was the class type instead of the instance pointer. Thus a too restrictive compiler, geared towards compatibilitiy with *new* Delphi versions, may break existing code. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Class property and virtual getter
Michael Van Canneyt schrieb: The reason is explained in the upcoming docs. Namely: a static method cannot be overridden. Sure, but virtual methods (including class methods) can be overridden. The class property is part of this particular class, and descendent classes should not be able to override it's behaviour. A static class method can call another virtual class method, No, it cannot. Try it. It was explained to me using this exact example. Then I missed the static; directive in the posted example, added to the getter/setter methods. Delphi introduced that directive after D7, and I found no useful description for it yet. Now it looks to me as if we have to distinguish ordinary static (non-virtual) methods from explicit static; methods. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Class property and virtual getter
Sven Barth schrieb: Am 27.02.2014 15:35, schrieb Hans-Peter Diettrich: Also self is no more known inside class methods in XE. In D7 it was the class type instead of the instance pointer. Thus a too restrictive compiler, geared towards compatibilitiy with *new* Delphi versions, may break existing code. Source please. This compiles and runs without problems in XE: === source begin === type TTest = class class procedure Test; - end; When you add static;, as required for class property getters/setters, the following won't compile: class procedure TTest.Test; begin Writeln(Self.ClassName); end; So the lack of Self seems to apply to static; methods, not to class methods. I'll ask in an EMBT group for a description of static;, the OH seems to reflect the C++ meaning only, without mentioning the impact on OPL classes. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Class property and virtual getter
Jonas Maebe schrieb: Error: Only class methods, class properties and class variables can be referred with class references You have to declare an instance and then call its property. You don't have to instantiate the instance if the property maps to a class method. Technically there's some obstacle to allow such construct? As long as a method doesn't use Self, directly or implicitly, the absence of an object reference does not cause problems. Class properties should be accessible from within static class methods. Having them accessible depending on the getter/setter they use (static or not) would break orthogonality (the visibility/usability must depend on the interface, not on the implementation of the interface). This would mean that in legacy code the non-virtual methods have to be separated now, into non-virtual, static, class and static class methods, in order to keep the code compiling? Non-static class methods cannot be called from static class methods because you don't know the original class type that was used to call it (and hence this could have unexpected results). Does this mean that the new static class methods don't have an Self parameter? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] About typecasts and the documentation
Martin Frb schrieb: http://www.freepascal.org/docs-html/ref/refse67.html#x124-13400012.4 In general, the type size of the expression and the size of the type cast must be the same. However, for ordinal types (byte, char, word, boolean, enumerates) this is not so, they can be used interchangeably. That is, the following will work, although the sizes do not match. http://www.freepascal.org/docs-html/ref/refse68.html#x125-13500012.5 A variable can be considered a single factor in an expression. It can therefore be typecast as well. A variable can be typecast to any type, provided the type has the same size as the original variable. IMO type*cast* and type*conversion* should be kept separate. A cast then requires that the *size* is the same, while in a conversion the *value* stays the same. The compiler messages should reflect this difference (see your example below). Usually typecast can have both meanings, in detail for value typecasts. Further terms are type coercion, type promotion. Typecasts can be further restricted to *compatible* types. Here numeric types seem to be compatible with other numeric types, but not with structured types (records...). With classes sometimes a distinction between upcast and downcast is made (type *inclusion*), where up and down reflect more basic (ancestors) and more derived classes. Eventually this also applies to conversions between Char and numeric types, for which standard conversions Ord(c) and Chr(i) are defined. It's not clear to me why a TStrings.Objects[i]:TObject can be compatible with e.g. integer: MyStringList.AddObject('1',TObject(1)); This may be due to some underlying implementation detail, where the list (array) contains pointers instead of objects, and these pointers then are compatible with numbers. Often also multiple casts can be accepted, in something (untested) like MyObject := TObject(pointer(1)); or pointer(MyObject) := pointer(1); IMO the detailed rules, as implemented in the compiler, are too complex for a simple description. That's why the docs only explain the syntax, not the full semantics behind the syntax. foo := TFoo(longint(1)); // project1.lpr(9,8) Error: Illegal type conversion: LongInt to TFoo foo := TFoo(1); // project1.lpr(10,8) Error: Illegal type conversion: LongInt to TFoo end. Obviously the types (record and numeric) are considered incompatible by the compiler. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Incomplete docs on operator precedence / Question about actual precedence
Martin Frb schrieb: Further, it appears that ^ has a higher precedence than unary - IMO pointer/address arithmetic (should) follows its own rules. Unary - and @ should not be applicable to addresses. @ also is restricted to arguments which *do* have an address, i.e. not applicable to arithmetic expressions or properties. Applicable binary operators depend on the type of both arguments, e.g. it's valid to subtract addresses (yielding an ordinal value), but adding addresses should be disallowed, while adding an ordinal value to an pointer is okay (yealding another address). // p:= -@i; // if enabled, next line will crash This should not compile, unary - is not applicable to addresses. writeln( -p^ ); // writes -99 Here ^ must take precedence, applied to an pointer/address. p+i^ is questionable, the only valid interpretation is (p+i)^. IOW applicable operators and precedence depend on the type of the argument(s). DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Black List of examples that FPC won't compile.
Maciej Izak schrieb: Following by current logic, this example should not compile: --begin code-- TA = class procedure Foo; end; TB = class(TA) procedure Foo(A: Integer = 0); overload; end; var b: TB; begin b := TB.Create; b.Foo; // should raise Error: Can't determine which overloaded function to call end; --end code-- Delphi (XE) has no problem with this code, even if TA.Foo also is declared overload, or both are declared Foo(). The static type (b:TB) determines which static method to call. Returning to the example from bugtracker (http://bugs.freepascal.org/view.php?id=25607): FPC don't recognize at TB level that the TObject.Create was hidden on the TA level by constructor Create(A: Integer = 0); virtual; overload; Delphi here requires constructor Create(A: Integer = 0); overload; virtual; Delphi seems to search for overloaded methods only in the same class. Otherwise an error Previous declaration ... not marked 'overload' would occur since TObject.Create was not marked 'overload'. Tested with declarations in the same class: constructor Create; constructor Create(A: integer); where *both* must be marked 'overload'. Only if there is no matching method in a class, the ancestors are searched as well. I.e. Delphi does not *hide* inherited methods, it only extends the search into ancestors *when required*, regardless of 'overload' directives. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] DOS GUI
Thaddy schrieb: Well, I have a statement from their legal dating from 2005 amounting to: we use it as you intended (sic) and see no reason to quote that this sourcecode is yours. Furthermore, the two units that contain said sourcecode you refer to are protected under U.S. copyright law and are our intellectual property. (It blahblah's a lot more, this is the essence and not verbatum) In other words: closed source. Well, such companies and lawyers can claim a lot. This is not different from other countries, but it may be much more expensive to defend against such piracy in the U.S. :-( At least you know now that your license has been too generous. And your case also explains why the open source licenses are so complicated, in order to prevent Copyright adicts from hijacking open source code. Now you can be right and probably you are right but to be legally right in de U.S. this will cost a lot of funds that I can better use elsewhere. This type of answers is not unique to my case. I believe Henri Gourvest has a rather unique addition to some of his his open-licenced sourcecode explicitly exluding said company from using it after a similarly bad experience. Did you contact e.g. the FSF, asking for advice or assistance in your case? When that company is known for such illegal practices, they may be interested in defending open source principles. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] DOS GUI
Thaddy schrieb: It happened to me once or twice ;) that a certain company with ever changing names used my sourcecode and licensed it under their own closed terms because i included the term: use as you like. Better: free for private use. If the owner wants that not to happen,, choose any of these licenses mentioned. This is really important. Without huge legal fees I can't get my intellectual property back Sorry, that's nonsense. You still have all rights on your own software, no need to get anything back. Even in outdated Copyright terms a use as you like should not mean take ownership. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] DOS GUI
Mark Morgan Lloyd schrieb: Hans-Peter Diettrich wrote: If the owner wants that not to happen,, choose any of these licenses mentioned. This is really important. Without huge legal fees I can't get my intellectual property back Sorry, that's nonsense. You still have all rights on your own software, no need to get anything back. Even in outdated Copyright terms a use as you like should not mean take ownership. I don't think that's necessarily the case. If you don't make a clear statement of ownership in every accessible file then it's difficult to claim that it's not in the public domain (or res nullius), In contrary, nobody can state then that it *is* in the public domain. that's why classic IBM operating systems and HP calculator firmware are now being distributed freely. Not legally in the EU, at least not with consent of the rights holder. Ownership expires after some time, perhaps the old Copyright protection has expired now? Otherwise ownership expires 70 years after the *death* of the author, what unlikely happened for software yet :-] In current international law (Droit d'Auteur) *only* the author has rights on his work. Everbody else must be allowed by the author to use it. That's why a author note will allow to identify the person from which one can obtain the right to use it. When the author can not be identified, then the work is *not* in the public domain, nobody is allowed to use it. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Explanation about code page-aware AnsiStrings
Jonas Maebe schrieb: http://wiki.freepascal.org/FPC_Unicode_support (only Sections 1 to 3; 4 and later are older and mostly either incomplete or wishful thinking). Just a note on RawByteString concatenation: Delphi concatenates RawByteStrings to the dynamic encoding of the *first* string, the appended strings eventually are converted before concatenation. Special handling of strings with the same encoding is not required. I.e. the result is *not* always a CP_ACP string, as documented in the wiki. Please adjust the implementation accordingly, it makes the RawByteStrings much more useful. The handling of automatic conversions may be unified in general, when concatenated strings are assigned to a target of a known encoding; in this case the target encoding can be used for the result, instead of the encoding of the first string, the remaining concatenation process can be the same. On OEMString: CP_OEM (=1) works differently from CP_ACP (=0). Variables of type AnsiString(CP_OEM) will always have dynamic encoding CP_OEM, no substitution to a specific OEM codepage. CP_ACP strings instead have a dynamic encoding of the current DefaultSystemCodepage. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Explanation about code page-aware AnsiStrings
Sven Barth schrieb: Am 08.01.2014 15:58 schrieb Hans-Peter Diettrich drdiettri...@aol.com mailto:drdiettri...@aol.com: Delphi concatenates RawByteStrings to the dynamic encoding of the *first* string, the appended strings eventually are converted before concatenation. Special handling of strings with the same encoding is not required. I.e. the result is *not* always a CP_ACP string, as documented in the wiki. Would you be so kind to provide a simple test case for this? :) function test(a,b: RawByteString): RawByteString; begin Result := a+b; WriteLn(StringCodePage(Result)); end; var u: UTF8String; a: AnsiString; begin a := 'äöü'; u := 'üöä'; test(a,u); //CP_ACP test(u,a); //UTF-8 end; It looks to me, however, that no conversion occurs at all! The strings are only concatenated as they are. Same for a concatenation of (global) RawByteString variables. This of course were not a desireable implementation :-( DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Explanation about code page-aware AnsiStrings
Sven Barth schrieb: On 08.01.2014 19:57, Hans-Peter Diettrich wrote: It looks to me, however, that no conversion occurs at all! The strings are only concatenated as they are. Same for a concatenation of (global) RawByteString variables. This of course were not a desireable implementation :-( I'm inclined to say of course, because in your test function you are concatenating two RawByteStrings which - by definition - don't do any conversion. I cited what I've been told in the EMBT groups - that a conversion is made when required. Everything else doesn't make sense. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Encoded AnsiString
Michael Van Canneyt schrieb: If you want a TStrings that can hold strings which may differ in their encoding (i.e. strings[0] has a different encoding from strings[1]) then you'll be left in the cold. Just an idea: What if FPC adds another encoding, similar to RawByteString ($), but without the Delphi quirks? Or simply fix the RawByteString flaws in the *Ansi* compiler and RTL? 1) In a discussion in the Embarcadero groups it turned out that, in an assignment of a RawByteString to another AnsiString type, the Delphi compiler should (but does not) check and eventually convert the string to the static encoding of the target. This is (almost) the only way to create strings with a different static and dynamic encoding. 2) The stupid conversion to CP_ACP in an assignment *to* an RawByteString should be dropped. This applies in detail to the assignment to *function results*. 3) The function result type should be honored, in functions accepting RawByteString parameters. The Delphi compiler seems to *assume* that the results of such functions is RawByteString, so that (including beforementioned flaws) the outcome is a CP_ACP string, even if the declared function result is e.g. an UTF8String. Test case: function conc(a,b: RawByteString): UTF8String; begin Result := a+b; end; The same result as for function conc(a,b: RawByteString): RawByteString; begin Result := a+b; end; the returned string has CP_ACP encoding :-( When these flaws are fixed in the FPC compiler, the AnsiString types will always have the same static and dynamic encoding, as it should be. Then TStrings could be based on such RawByteStrings, without excess conversions or losses. Sorting (TStringList) eventually should ignore the dynamic encoding, i.e. work on a strictly binary (byte-by-byte) base. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Encoded AnsiString
Jy V schrieb: A quick note: the new LLVM Delphi compiler forbid the use of AnsiString and AnsiChar, (declared in the unit AnsiString.pas, you cannot use this unit anyway), The compiler supports AnsiStrings, but these are hidden for *mobile* targets. There exists a hack to enable AnsiString support also for such targets, though. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Explanation about code page-aware AnsiStrings
Jonas Maebe schrieb: Large parts of the returning discussions about code page-aware AnsiStrings are related to the fact that many people don't how they work. For this reason I've created an overview that explains the rules that are followed by the RTL/compiler at http://wiki.freepascal.org/FPC_Unicode_support (only Sections 1 to 3; 4 and later are older and mostly either incomplete or wishful thinking). Thanks :-) The chapter numbers are missing from the headings? On my Win98 VM this page is not accessible: Error 403 We're sorry, but we could not fulfill your request for /FPC_Unicode_support on this server. You do not have permission to access this server. Your technical support key is: 02f1-94ac-17f4-e8c8 What's wrong? On my Win8 machine the page and server is accessible, of course. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Encoded AnsiString
Jonas Maebe schrieb: On 07 Jan 2014, at 15:35, Hans-Peter Diettrich wrote: 2) The stupid conversion to CP_ACP in an assignment *to* an RawByteString should be dropped. This applies in detail to the assignment to *function results*. The conversion does not happen for all assignments, it only happens for concatenations that are assigned to RawByteString. And even then it doesn't always happen. Please read the wiki page I wrote (trying to prevent exactly this kind of wrong statements from being further repeated, and obviously failing). I've tested the behaviour, and it appears not only in assignments to RawByteStrings. See test case below. Test case: function conc(a,b: RawByteString): UTF8String; begin Result := a+b; end; This will always return CP_UTF8 on FPC. Does it really return CP_ACP on Delphi? Even if it does, I doubt we will change that. This leads me back to my previous statement: it will be simpler to do things right, than trying to achieve compatibility with *all* Delphi flaws. In detail when the Delphi flaws never have been documented... We even couldn't easily do that, because we don't know the static code pages of the strings that are concatenated inside the RTL routine that handles this. Right! Only the compiler can do that, and therefore the compiler should do it right. Then TStrings could be based on such RawByteStrings, without excess conversions or losses. The problem with changing TStrings from AnsiString to RawByteString is not so much related to the behaviour of RawByteString, but more regarding descendent classes in existing third party (= user) source code that override methods using AnsiString parameters. We don't want to force everyone to rewrite their code so it uses RawByteString (if anything, RawByteString should probably be used as little as possible in user code, because always correctly dealing with all possible code pages is very hard). Right sigh Sorting (TStringList) eventually should ignore the dynamic encoding, i.e. work on a strictly binary (byte-by-byte) base. Looking for just one second at the definition of the Sort methods of TStringList (and TStrings) would have prevented you from writing the above statement, which does not make any sense whatsoever (unless you want the compiler to start changing all code where a programmer passes a comparison function that does take code pages into account to the Sort methods of TStrings/TStringList). Fine that you took the bait ;-) DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Encoded AnsiString
Paul Ishenin schrieb: 30.12.2013 9:07, Hans-Peter Diettrich пишет: Do you think that FPC should really reproduce all this inconsistent behaviour? Who would test or even specify the compatible behaviour, when every new variation will result in more unexpected results? IMO it's much easier to do it right, and fix the Delphi flaws in FPC. The work is already done by FPC team. AnsiString(codepage) works and works compatible with Delphi (whether someone like this or not) and the behavior is covered by tests. Trunk version is very close to 2.8 release. This means that UTF-8 won't work properly when it's not CP_ACP :-( DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Encoded AnsiString
Michael Van Canneyt schrieb: On Sun, 29 Dec 2013, Hans-Peter Diettrich wrote: Inspired by the current Lazarus discussion I'd like to learn more about the current state of the implementation of the new AnsiStrings. In case nothing has be done yet, I'd suggest to extend TAnsiRec by the new codePage and elemSize fields (words). These can be zero for now, so that the remaining codebase is not affected. Then it will be possible to play around with encoded strings, using the codePage field. All this is done already a long time ago in trunk. We're way past that stage. I'm very confused, didn't use FPC for a long time. Have to refresh memory of all related procedures... How do I instruct fpcup to checkout the trunk version? (Windows) I tried to add an parameter fpcURL=trunk to the shortcut, is this correct? How do I proceed (build, use in Lazarus...)? Any links appreciated :-) Current stage is the creation of a unicode RTL, where all base file/string operations accept unicode strings. This is done too. Next step is creation of the unicode RTL, where string = widestring. This will be combined with the dotted unit filenames, to be Delphi 2010+ compatible. sigh.sigh How do I create source files for use with both versions? To allow people to choose, 2 RTLs will be created: one unicode (string=ansistring), one non-unicode (string=widestring). This will result (probably) in 2 paths: units/os-cpu units/os-cpu-unicode This is not decided yet. I planned the work in februari/march. Thanks :-) Where can I jump in? A related question: Why is the string length set to zero in NewAnsiString, when the allocated Length is already known? Because the allocated memory length is not necessarily equal to the string length. If you have a string of length 50, setting the length to 25 will not discard and reallocate the memory block, but merely set the character length to 25. This means that the allocated length is stored somewhere else, in the memory block descriptor? How can a user request an string of a specific allocation size? Another one: I've heard that a mix of encodings converts the (concatenated) output (RawByteString?) to CP_ACP, with possible losses. Is this correct? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Encoded AnsiString
Michael Van Canneyt schrieb: On Sun, 29 Dec 2013, Hans-Peter Diettrich wrote: This will be combined with the dotted unit filenames, to be Delphi 2010+ compatible. sigh.sigh How do I create source files for use with both versions? What do you mean by this statement ? I'm not familiar with dotted unit names, they seem not to be used in XE. So I only can imagine something like conditionals around the different items in un/dotted environment, to keep Classes separate from System.Classes? Are directories involved? If so, does the Delphi structure match the FPC tree structure? Where can I jump in? When I'm done I will release a version for testing to the public. Fine :-) How can a user request an string of a specific allocation size? You should not. Okay. Another one: I've heard that a mix of encodings converts the (concatenated) output (RawByteString?) to CP_ACP, with possible losses. Is this correct? Define output ? s := SomeACPstr+SomeUTF8str+äöü; In XE I can concatenate ACP and UTF-8 strings and assign it to an OEM string without losses. Somebody said this will fail in FPC, on e.g. FindFirst(myPath+allfiles,faAnyFile,sr); due to an (intermediate?) conversion of myPath+allfiles to CP_ACP. Of course the string must be converted to CP_ACP if FindFirst expects exactly an AnsiString(0) argument, otherwise something is broken. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel