Re: [DUG] Upgrading to XE - Unicode strings questions
Iterating over a string is for the purpose of doing something with each individual character..whether it is a ‘A’ or a 'A' with a ^ (caret) on top of it. When I said the number of bytes in a character varies I was not meaning the number of bytes in a Char - I was meaning the total number of bytes in a one resulting character or letter might vary. For instance the word fiancee (with an acute on the last e) has 7 characters, the last of which might be 2 code units When I iterate over a string I ideally want to get one character in the word each time: could I build a string like this? setlength(String1,7); string1[1] := 'f'; string1[2] := 'i'; string1[3] := 'a'; string1[4] := 'n'; string1[5] := 'c'; string1[6] := 'e'; string1[7] := 'e';//I would want the full e acute here hence I want to be able to go for i :=1 to length(string1) do begin thisChar:=string1[i];//get each character one at a time listbox1.items.add('i=' + inttostr(i)+' character at position i = ' +ThisChar; end I would be expecting to see 7 characters, 7 lines in the list box, and length=7, with the last being e acute. Now everything Jolyon are saying and Cary also implies that this is not going to work. This looks to be a real nuisance! Now I think the e acute could be one unicode character (as there is likely to be a representation using one character, one code point and one code unit) or as one character, two code units, 2*2 bytes - a surrogate pair - where eg one supplies the e and one the acute. So it looks like what I see might vary according to how the e acute is encoded in the string? As I read further this gets murkier, as some of the things Cary Jensen says are not the same as what you say even if you say it emphatically! This is why I am thinking we have to understand clearly Unicode, and the Windows implementation of it.and I don't really yet. Here is what Cary Jensen says about a similar example with 7 characters, one of which is a surrogate pair: Although there are 7 characters in the printed string, the UnicodeString contains 8 code units, as returned by the Length function. Inspection of the 6th and 7th elements of the UnicodeString reveal the high and low surrogate values, each of which are code units. And, though the size of the UnicodeString is 16 bytes, ElementToCharLen accurately returns that there were a total of 7 code points in the string. While these answers suffice for surrogate pairs, unfortunately, things are not exactly the same when it comes to composite characters. Specifically, when a UnicodeString contains at least one composite character, that composite character may occupy two or more code units, though only one actual character will appear in the displayed string. Furthermore, ElementToCharLen is designed specifically to handle surrogate pairs, and not composite characters. Actually, composite characters introduce an issue of string normalization, which is not currently handled by Delphi's RTL (runtime library). When I asked Seppy Bloom about this, he replied that Microsoft has recently added normalization APIs (application programming interfaces) to some of the latest versions of Windows, ® including Windows® Vista, Windows® Server 2008, and Windows® 7. Seppy was also kind enough to offer a code sample of how you might count the number of characters in a UnicodeString that includes at least one composite character. I am including this code here for your benefit, but I must offer these cautions. First, this code has not been thoroughly tested, and has not been certified. If you use it, you do so at your own risk. Second, be aware that this code will not work on pre-Windows XP installations, and will only work with Windows XP if you have installed the Microsoft Internationalized Domain Names (IDN) Mitigation APIs 1.1. http://www.embarcadero.com/images/dm/technical-papers/delphi-unicode-migration.pdf Elsewhere he implies that Delphi can handle normalised strings for comparisons if one is careful, as in var s1, s2: String; begin ListBox1.Items.Clear; s1 := 'Hell'#$006F + #$0308' W'#$006F + #$0308'rld';//make using surrogate pairs s2 := 'Hellö Wörld'; ListBox1.Items.Add(s1); ListBox1.Items.Add(s2); ListBox1.Items.Add(BoolToStr(s1 = s2, True)); ListBox1.Items.Add(BoolToStr(AnsiCompareStr(s1, s2) = 0, True)); The contents of ListBox1 are shown in the following figure. Hellö Wörld Hellö Wörld False True Now I am not sure if the above example will show properly in email - because email text is generally limited to the ASCII characters and lists like this usually also restrict to text and not HTML emails. So as a related exercise I am curious whether the above example prints OK on the list..the words hello and world should have umlaut (..) over each o in case it doesn't arrive like that on the list. John As I understand it iterating over a string with Chars
Re: [DUG] Upgrading to XE - Unicode strings questions
John, I think you are confusing Canonical Normalized versions of the same Unicode string (in the example s1 is canonical, s2 is normalized) and the effect of local codepage conversion. Windows-1252 codepage (latin ISO 8859-1) has support for characters like the ö (ascii code #246) and é (ascii code #130). Converting to ansistring/ansichar on your system will take care of canonical Unicode representation and hence return true if you compare those strings. Please note that this only works because your system is set to a latin based codepage ... do the same on a Japanese version of windows and you'll get a very different result as there is no support for ö in ansistring under Japanese codepage! Because your system is Latin your first testcase/example of you building the word finance should actually work without problems - Joylon/Cary are probably wrong if they indeed implied that this wouldn't work. The ö can be written as a compound #$006F + #$0308 in canonical format ... and as #$00f6 in the normalized format. For most normal applications it just doesn't really matter either way because a user that is inputting text under his local codepage will always do it the same way and hence chances of you encountering a mix between canonical/normalized version will be close to zero. You only ever get issues if you cross codepage boundaries (like for example if you have users in different countries storing data in a database - which is why international databases often use UTF-8 to store data instead of their native charactersets). Most of the better databases (like for example Oracle) have built in support for sorting and handling canonical format and do the conversion automatically for you ... for someone writing desktop applications it usually just isn't an issue either way. Kind Regards, Stefan Mueller ___ RD Manager ORCL Toolbox LLP, Japan http://www.orcl-toolbox.com -Original Message- From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of John Bird Sent: Tuesday, November 23, 2010 7:33 PM To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Iterating over a string is for the purpose of doing something with each individual character..whether it is a ‘A’ or a 'A' with a ^ (caret) on top of it. When I said the number of bytes in a character varies I was not meaning the number of bytes in a Char - I was meaning the total number of bytes in a one resulting character or letter might vary. For instance the word fiancee (with an acute on the last e) has 7 characters, the last of which might be 2 code units When I iterate over a string I ideally want to get one character in the word each time: could I build a string like this? setlength(String1,7); string1[1] := 'f'; string1[2] := 'i'; string1[3] := 'a'; string1[4] := 'n'; string1[5] := 'c'; string1[6] := 'e'; string1[7] := 'e';//I would want the full e acute here hence I want to be able to go for i :=1 to length(string1) do begin thisChar:=string1[i];//get each character one at a time listbox1.items.add('i=' + inttostr(i)+' character at position i = ' +ThisChar; end I would be expecting to see 7 characters, 7 lines in the list box, and length=7, with the last being e acute. Now everything Jolyon are saying and Cary also implies that this is not going to work. This looks to be a real nuisance! Now I think the e acute could be one unicode character (as there is likely to be a representation using one character, one code point and one code unit) or as one character, two code units, 2*2 bytes - a surrogate pair - where eg one supplies the e and one the acute. So it looks like what I see might vary according to how the e acute is encoded in the string? As I read further this gets murkier, as some of the things Cary Jensen says are not the same as what you say even if you say it emphatically! This is why I am thinking we have to understand clearly Unicode, and the Windows implementation of it.and I don't really yet. Here is what Cary Jensen says about a similar example with 7 characters, one of which is a surrogate pair: Although there are 7 characters in the printed string, the UnicodeString contains 8 code units, as returned by the Length function. Inspection of the 6th and 7th elements of the UnicodeString reveal the high and low surrogate values, each of which are code units. And, though the size of the UnicodeString is 16 bytes, ElementToCharLen accurately returns that there were a total of 7 code points in the string. While these answers suffice for surrogate pairs, unfortunately, things are not exactly the same when it comes to composite characters. Specifically, when a UnicodeString contains at least one composite character, that composite character may occupy two or more code units, though only one
Re: [DUG] Upgrading to XE - Unicode strings questions
I think you are confusing Canonical Normalized versions of the same Unicode string (in the example s1 is canonical, s2 is normalized) and the effect of local codepage conversion. Yep, and for the record I think this is a big problem with the way Embarcadero implemented Unicode. By pursuing the Unicode is a no-brainer approach (facilitating easy migration for ASCII apps) they have obfuscated the fact that Unicode is far from simple. Or at least doing it right is. Danny Thorpe opined years ago that it made a lot of sense to do 64-bit and Unicode in one go as a big-bang breaking change, leaving the 32-bit, ANSI VCL product behind as a legacy platform. Danny Thorpe always was a clever guy! ;) The ö can be written as a compound #$006F + #$0308 in canonical format ... and as #$00f6 in the normalized format. For most normal applications it just doesn't really matter either way because a user that is inputting text under his local codepage will always do it the same way A user could specifically choose to enter that character in either form - this is unlikely, yes. Or, two users using the same codepage could choose to enter the character differently. Or if your data is coming from two separate external sources. The *only* way to be sure is to normalise before processing. You only ever get issues if you cross codepage boundaries (like for example if you have users in different countries storing data in a database - which is why international databases often use UTF-8 to store data instead of their native charactersets). This makes no sense at all to me. ö encoded as #$006F + #$0308 **OR** #$00f6 even in UTF-8. Whether you encode using UTF-8, UTF-16 or UTF-32, a single accented character codepoint vs a character followed by a diacritic are still two distinct character sequences. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] Upgrading to XE - Unicode strings questions
John, the problem is that in Unicode single character is meaningless unless you have performed some pre-processing to GIVE that term some meaning. There are some standard forms for such processing, called Normalisations. The problem is that a single character to your eyes, e.g. an accented a, could be represented in a Unicode string in at least two ways: 1. A single codepoint represented that accented a 2. TWO codepoints - the first representing a and the second a diacritic codepoint for the accent Iterating over a string is for the purpose of doing something with each individual character That's fine, but in Unicode what you have is a string not of characters but of codepoints. The concept of a character is not synonymous with codepoint in Unicode in the same way that it is with ASCII or even ANSI. So you have compounded complications: a. Depending on encoding, a single codepoint (32-bit value) may be encoded in 1, 2, or more bytes. Each byte may represent a whole codepoint or only part of a codepoint encoding. b. Each codepoint may represent a whole character or only PART of a character encoding. Complication 'a' can be avoided by adopting UTF-32 encoding - 4 bytes for EVERY codepoint. That is hugely wasteful in terms of memory/storage for most applications. UTF-16 - the encoding used by Delphi and indeed by Windows natively itself - is a compromise. It is less efficient than ANSI for ASCII, but more efficient that UTF-32 for ANSI characters sets represented in the BMP. For applications working entirely in the BMP UTF-16 is also relatively easy to process - for NORMALISED strings, each codepoint IS a character (in the BMP). But for non-normalised data that is still not necessarily the case. could I build a string like this? setlength(String1,7); string1[1] := 'f'; string1[2] := 'i'; string1[3] := 'a'; string1[4] := 'n'; string1[5] := 'c'; string1[6] := 'e'; string1[7] := 'e';//I would want the full e acute here Yes, you can. But you might also *receive* from another source, a string that is apparently the same at the visual representation level, but different at the data level, where: string1[1] = 'f'; string1[2] = 'i'; string1[3] = 'a'; string1[4] = 'n'; string1[5] = 'c'; string1[6] = 'e'; string1[7] = 'e';// Normal 'e' character, i.e. identical to string1[6] string1[8] = U+0301; // Combining acute diacritic When displayed on screen this string will appear identical to your string, but it is represented in the data in a different way. hence I want to be able to go for i :=1 to length(string1) do begin .. end Now everything Jolyon are saying and Cary also implies that this is not going to work. This looks to be a real nuisance! I don't know what gave you that impression from what I said. Yes, Unicode is/can be a real nuisance - *properly* supporting it is a lot more work than people think - but what you want to do here can be done. Now I think the e acute could be one unicode character (as there is likely to be a representation using one character, one code point and one code unit) or as one character, two code units, 2*2 bytes - a surrogate pair - where eg one supplies the e and one the acute. NO!!! This is NOT what a surrogate pair is. A surrogate pair is encountered ONLY in UTF-16, and is found when you have a codepoint that is not in the BMP. i.e. a value 65535 that cannot be encoded in a 16-bit value. These are typically CJVK characters (Chinese/Japanese/Vietnamese/Korean) sometimes called Han or Kanji character sets. The first 16-bit value indicates a page in the non-BMP. The following 16-bit value then identifies an entry in that page. To obtain the codepoint that the PAIR of VALUES represents, you have to apply a transform, combining the page selector with the page entry. But what you get is a single codepoint. (you don't have to do this - there are routines to do it for you, but you have to invoke them as appropriate). A Surrogate Pair is a representation of a single codepoint, NOT a relationship between TWO codepoints. When you have a visual character encoded as a codepoint + a following, combining codepoint, that is simply TWO Unicode codepoints that are combined to form one VISUAL character. That is NOT a surrogate pair however. It is merely two codepoints that have to be combined. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] Icon creation
As people know, I always recommend IconWorkshop for icons and think it is the ants pants. Tried Gimp (no thanks), IcoFx is ok for a free product. Anyway, it is on special at the moment, almost half off. It is a lifetime license as well. It is an offer via SWREG so you can't just go to the website (unless the discount is given in the checkout screen). Link from email: http://dr.bluehornet.com/ct/3992300:4569202619:m:1:294663869:22ADE41AE5315E436E0B22FC0E8007EF Becomes: https://usd.swreg.org/cgi-bin/s.cgi?s=47156p=471561v=0d=0q=1rc=45K2D464EWa=swreg_Q4_2010linkid=IWP_2 * No I don't work for them - just like the product. On Wed, Nov 10, 2010 at 7:52 AM, Alister Christie alis...@salespartner.co.nz wrote: GiMP is pretty awesome, however not overly suited to icon creation. I also use it for all my button images. Generally I use a really old version of Corel Draw (7 I think) for creating a vectored image, then copy and paste it into GiMP to sort out the transparancy and shadows and stuff. What do other people do for button images? Draw their own? get pre-built packs? have an in-house graphic artist? Alister Christie Computers for People Ph: 04 471 1849 Fax: 04 471 1266 http://www.salespartner.co.nz PO Box 13085 Johnsonville Wellington On 10/11/2010 9:26 a.m., Nick Fauchelle wrote: After seeing Alister's video a few years ago, I have been using gimp for my icons! (Thanks Alister!) On Tue, 2010-11-09 at 18:11 +1300, Alister Christie wrote: I made this video a few years ago http://codegearguru.com/index.php?option=com_contenttask=viewid=12Itemid=27 making icons with GiMP. However there are some quite good free standalone Icon editors. Alister Christie Computers for People Ph: 04 471 1849 Fax: 04 471 1266 http://www.salespartner.co.nz PO Box 13085 Johnsonville Wellington On 9/11/2010 5:32 p.m., John Bird wrote: I have never read up on best practices/sizes etc for creating program icons and BMP files for buttons/images etc. Anyone got any good references to read up further?? q1 – I have figured how to create a BMP with transparent background for bit buttons etc so the image is not square. How do I create same for program ICO files? All my program icons in the task bar are square as a result. q2 – Where does everyone else store their image files? Really they should be considered part of the project. Only the IDE does not store what the original filename was, so to find the image again (for a bitbutton/program icon etc) I have to know what it was and where. I haven’t made up mind between storing them in the project source folder or a separate images folder... Any Embarcadero programmers out there reading this – how about the IDE stores the names of the files when it loads them, so it can be found again. John ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] Upgrading to XE - Unicode strings questions
Hi John You can find out whether a unicode string is inside the BMP by converting it to UTF-32 and checking that the new string is twice the length of the original (UTF-16) string. A user could specifically choose to enter that character in either form - this is unlikely, yes. Or, two users using the same codepage could choose to enter the character differently. Or if your data is coming from two separate external sources. The *only* way to be sure is to normalise before processing. Agreed. That will eliminate any issues with composite codepoints. You only ever get issues if you cross codepage boundaries (like for example if you have users in different countries storing data in a database - which is why international databases often use UTF-8 to store data instead of their native charactersets). This makes no sense at all to me. ö encoded as #$006F + #$0308 **OR** #$00f6 even in UTF-8. Whether you encode using UTF-8, UTF-16 or UTF-32, a single accented character codepoint vs a character followed by a diacritic are still two distinct character sequences. True. I think the point is that UTF-8 is the most compact format without data loss, regardless of whether the codepoints are composite or not. Todd. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] Upgrading to XE - Unicode strings questions
?I read in one of the references that UTF-32 was a more common standard on Unix systems - which means I guess they have chosen the simplest format at the trade off of using more space? I think linux/Windows/MacOS use UTF-16 more commonly... Anyway for the time being, as long as the data in strings is unicode, but is still Latin 8859 (ie ASCII characters) I can without worrying too much iterate over a string one character at a time...using length. That was the main thing I wanted to know John ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] Upgrading to XE - Unicode strings questions
Anyway for the time being, as long as the data in strings is unicode, but is still Latin 8859 (ie ASCII characters) I can without worrying too much iterate over a string one character at a time...using length. Yep. But you are building an app that now supports Unicode. If your users are able to enter data into your app, your app will now *potentially* find itself handling Unicode data for which it was not designed, unless you take additional steps to now prevent a user from entering non-ASCII data in the first place. Previously you may not have taken these steps so theoretically could have found a user entering non-ASCII, ANSI characters too, except that in the past you would not have been using Unicode support as an advertised (or even unadvertised) feature of your app and could legitimately have told such users not to be so dumb (in not so many words, of course :D) This again is the danger of the no brainer approach with the Unicode migration in Delphi. By selling the idea that switching to Unicode was easy, they have just made it more confusing in many cases, imho. If I can just recompile and patch up a few warnings with some boilerplate, how come there's all this other stuff that I need to do too? I thought Unicode was supposed to make supporting this stuff easier. Answer: It does. It make supporting Unicode easier, but supporting Unicode is not, itself, easy. imho ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] Upgrading to XE - Unicode strings questions
It's a shame UTF-8 wasn't made the standard in Delphi. It's commonly used in audio file tags, for example, which I have to deal with. My software needs to search for songs with specific artists or titles, and it sounds like I'm going to have problems where the information is visually the same but entered differently in different parts of the world, using all sorts of 3rd party software. Ross. -Original Message- From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Todd Sent: Wednesday, 24 November 2010 11:27 AM To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Hi John You can find out whether a unicode string is inside the BMP by converting it to UTF-32 and checking that the new string is twice the length of the original (UTF-16) string. A user could specifically choose to enter that character in either form - this is unlikely, yes. Or, two users using the same codepage could choose to enter the character differently. Or if your data is coming from two separate external sources. The *only* way to be sure is to normalise before processing. Agreed. That will eliminate any issues with composite codepoints. You only ever get issues if you cross codepage boundaries (like for example if you have users in different countries storing data in a database - which is why international databases often use UTF-8 to store data instead of their native charactersets). This makes no sense at all to me. ö encoded as #$006F + #$0308 **OR** #$00f6 even in UTF-8. Whether you encode using UTF-8, UTF-16 or UTF-32, a single accented character codepoint vs a character followed by a diacritic are still two distinct character sequences. True. I think the point is that UTF-8 is the most compact format without data loss, regardless of whether the codepoints are composite or not. Todd. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] Upgrading to XE - Unicode strings questions
You should be fine - you just have to ensure you normalise the strings. You're going to have to convert from UTF-8 to UTF-16 to bring them in to your Delphi app anyway, for processing, so you may as well normalise them in the process. UTF-16 was chosen in Delphi because it is also the native encoding in Windows itself. -Original Message- From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Ross Levis Sent: Wednesday, 24 November 2010 16:00 To: 'NZ Borland Developers Group - Delphi List' Subject: Re: [DUG] Upgrading to XE - Unicode strings questions It's a shame UTF-8 wasn't made the standard in Delphi. It's commonly used in audio file tags, for example, which I have to deal with. My software needs to search for songs with specific artists or titles, and it sounds like I'm going to have problems where the information is visually the same but entered differently in different parts of the world, using all sorts of 3rd party software. Ross. -Original Message- From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Todd Sent: Wednesday, 24 November 2010 11:27 AM To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] Upgrading to XE - Unicode strings questions Hi John You can find out whether a unicode string is inside the BMP by converting it to UTF-32 and checking that the new string is twice the length of the original (UTF-16) string. A user could specifically choose to enter that character in either form - this is unlikely, yes. Or, two users using the same codepage could choose to enter the character differently. Or if your data is coming from two separate external sources. The *only* way to be sure is to normalise before processing. Agreed. That will eliminate any issues with composite codepoints. You only ever get issues if you cross codepage boundaries (like for example if you have users in different countries storing data in a database - which is why international databases often use UTF-8 to store data instead of their native charactersets). This makes no sense at all to me. ö encoded as #$006F + #$0308 **OR** #$00f6 even in UTF-8. Whether you encode using UTF-8, UTF-16 or UTF-32, a single accented character codepoint vs a character followed by a diacritic are still two distinct character sequences. True. I think the point is that UTF-8 is the most compact format without data loss, regardless of whether the codepoints are composite or not. Todd. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] How's this for inconsistent
Hi Jolyon I spotted that they fixed that a while ago -- I remember having to fix the issue myself many years ago so was quite pleased to see that it was now taken care of in TInterfaceObject as a matter of course. For some reason I never noticed the omission of the same facility in the destructor. And yes, it's a [potentially] big problem. I need to think about this tho... setting a fake ref count during execution of the constructor is safe enough as you know precisely when construction is done and to restore the ref count back to zero. Setting a fake ref count during destruction strikes me as more problematic and makes me nervous, but I can't quite put my finger on why. It might be nothing. That doesn't mean it can't be fixed, only that the solution put in place for construction might not work for destruction and it wasn't felt necessary to do any extra work for a more comprehensive solution. I fixed it like this procedure TamObject.BeforeDestruction; begin //add a reference count, in case an interface is acquired and released during destruction InterlockedIncrement(FCount); inherited BeforeDestruction; end; procedure TamObject.FreeInstance; begin //remove the reference count added in BeforeDestruction InterlockedDecrement(FCount); assert(FCount = 0,'Destroying object with non-zero reference count'); inherited FreeInstance; end; Of course, an interface can still not be referenced in descendant BeforeDestruction methods, (since the inherited method is usually called at the end), but can be safely referenced in the destructor. Todd. Certainly in the case of my code where I fixed this I had specific constructing / destructing state markers (it wasn't a general purpose interfacedobject class but a base class in a far richer framework that happened to also implement its own version of IUnknown) -- I know I didn't rely on side effects of a faked ref count. *From:* delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] *On Behalf Of *Todd *Sent:* Wednesday, 24 November 2010 16:55 *To:* NZ Borland Developers Group - Delphi List *Subject:* [DUG] How's this for inconsistent The Delphi developer who implemented TInterfacedObject obviously considered the case when an interface reference is grabbed during construction.. // Set an implicit refcount so that refcounting // during construction won't destroy the object. class function TInterfacedObject.NewInstance: TObject; begin Result := inherited NewInstance; TInterfacedObject(Result).FRefCount := 1; end; procedure TInterfacedObject.AfterConstruction; begin // Release the constructor's implicit refcount InterlockedDecrement(FRefCount); end; but didn't consider applying the same logic during destruction. So grabing an interface reference during destruction causes all hell to break loose, as the _Release method tries to free the object again and again recursively. procedure TInterfacedObject.BeforeDestruction; begin if RefCount 0 then Error(reInvalidPtr); end; function TInterfacedObject._Release: Integer; begin Result := InterlockedDecrement(FRefCount); if Result = 0 then Destroy; end; Todd. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] How's this for inconsistent
Actually, this would be better function TamObject._Release: Integer; begin Result := InterlockedDecrement(FCount); if (FCount = 0) then begin //add a reference count, incase an interface is acquired and released during destruction InterlockedIncrement(FCount); self.Destroy; end; end; procedure TamObject.FreeInstance; begin //remove the reference count added in _Release InterlockedDecrement(FCount); assert(FCount = 0,'Destroying object with non-zero reference count'); inherited FreeInstance; end; I spotted that they fixed that a while ago -- I remember having to fix the issue myself many years ago so was quite pleased to see that it was now taken care of in TInterfaceObject as a matter of course. For some reason I never noticed the omission of the same facility in the destructor. And yes, it's a [potentially] big problem. I need to think about this tho... setting a fake ref count during execution of the constructor is safe enough as you know precisely when construction is done and to restore the ref count back to zero. Setting a fake ref count during destruction strikes me as more problematic and makes me nervous, but I can't quite put my finger on why. It might be nothing. That doesn't mean it can't be fixed, only that the solution put in place for construction might not work for destruction and it wasn't felt necessary to do any extra work for a more comprehensive solution. Certainly in the case of my code where I fixed this I had specific constructing / destructing state markers (it wasn't a general purpose interfacedobject class but a base class in a far richer framework that happened to also implement its own version of IUnknown) -- I know I didn't rely on side effects of a faked ref count. *From:* delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] *On Behalf Of *Todd *Sent:* Wednesday, 24 November 2010 16:55 *To:* NZ Borland Developers Group - Delphi List *Subject:* [DUG] How's this for inconsistent The Delphi developer who implemented TInterfacedObject obviously considered the case when an interface reference is grabbed during construction.. // Set an implicit refcount so that refcounting // during construction won't destroy the object. class function TInterfacedObject.NewInstance: TObject; begin Result := inherited NewInstance; TInterfacedObject(Result).FRefCount := 1; end; procedure TInterfacedObject.AfterConstruction; begin // Release the constructor's implicit refcount InterlockedDecrement(FRefCount); end; but didn't consider applying the same logic during destruction. So grabing an interface reference during destruction causes all hell to break loose, as the _Release method tries to free the object again and again recursively. procedure TInterfacedObject.BeforeDestruction; begin if RefCount 0 then Error(reInvalidPtr); end; function TInterfacedObject._Release: Integer; begin Result := InterlockedDecrement(FRefCount); if Result = 0 then Destroy; end; Todd. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] How's this for inconsistent
Yep - I remember that my fix was to set the destructing state indicator in a BeforeDestruction() override. This was then tested in _Release() to render it a NO-OP during execution of the destructor chain (incomplete, obviously, just to give the idea): Procedure BeforeDestruction; SetState(csDestroying); Function _Release; If csDestroying in State then EXIT; Nothing else needs be done, as long as any further BeforeDestruction overrides call inherited before doing their work, which they should do (in my framework I introduced another virtual to be overridden in my descendants, in case there were occasions when work was done to generate references during the destructor execution - even in your case, the FreeInstance() override is redundant I think, other than as a sanity/safety check and so could be made subject to some conditional compilation flag. From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Todd Sent: Wednesday, 24 November 2010 6:15 p.m. To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] How's this for inconsistent Actually, this would be better function TamObject._Release: Integer; begin Result := InterlockedDecrement(FCount); if (FCount = 0) then begin //add a reference count, incase an interface is acquired and released during destruction InterlockedIncrement(FCount); self.Destroy; end; end; procedure TamObject.FreeInstance; begin //remove the reference count added in _Release InterlockedDecrement(FCount); assert(FCount = 0,'Destroying object with non-zero reference count'); inherited FreeInstance; end; I spotted that they fixed that a while ago - I remember having to fix the issue myself many years ago so was quite pleased to see that it was now taken care of in TInterfaceObject as a matter of course. For some reason I never noticed the omission of the same facility in the destructor. And yes, it's a [potentially] big problem. I need to think about this tho... setting a fake ref count during execution of the constructor is safe enough as you know precisely when construction is done and to restore the ref count back to zero. Setting a fake ref count during destruction strikes me as more problematic and makes me nervous, but I can't quite put my finger on why. It might be nothing. That doesn't mean it can't be fixed, only that the solution put in place for construction might not work for destruction and it wasn't felt necessary to do any extra work for a more comprehensive solution. Certainly in the case of my code where I fixed this I had specific constructing / destructing state markers (it wasn't a general purpose interfacedobject class but a base class in a far richer framework that happened to also implement its own version of IUnknown) - I know I didn't rely on side effects of a faked ref count. From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Todd Sent: Wednesday, 24 November 2010 16:55 To: NZ Borland Developers Group - Delphi List Subject: [DUG] How's this for inconsistent The Delphi developer who implemented TInterfacedObject obviously considered the case when an interface reference is grabbed during construction.. // Set an implicit refcount so that refcounting // during construction won't destroy the object. class function TInterfacedObject.NewInstance: TObject; begin Result := inherited NewInstance; TInterfacedObject(Result).FRefCount := 1; end; procedure TInterfacedObject.AfterConstruction; begin // Release the constructor's implicit refcount InterlockedDecrement(FRefCount); end; but didn't consider applying the same logic during destruction. So grabing an interface reference during destruction causes all hell to break loose, as the _Release method tries to free the object again and again recursively. procedure TInterfacedObject.BeforeDestruction; begin if RefCount 0 then Error(reInvalidPtr); end; function TInterfacedObject._Release: Integer; begin Result := InterlockedDecrement(FRefCount); if Result = 0 then Destroy; end; Todd. ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe
Re: [DUG] How's this for inconsistent
Yep -- I remember that my fix was to set the destructing state indicator in a BeforeDestruction() override. This was then tested in _Release() to render it a NO-OP during execution of the destructor chain (incomplete, obviously, just to give the idea): Procedure BeforeDestruction; SetState(csDestroying); Function _Release; If csDestroying in State then EXIT; Nothing else needs be done, as long as any further BeforeDestruction overrides call inherited before doing their work, which they should do (in my framework I introduced another virtual to be overridden in my descendants, in case there were occasions when work was done to generate references during the destructor execution -- even in your case, the FreeInstance() override is redundant I think, other than as a sanity/safety check and so could be made subject to some conditional compilation flag. True. I used an assert, since it can be eliminated by the compiler, but the decrement still remains. There's always more than one way to skin the cat. I like the do/undo pattern. The whole problem could be eliminated entirely if TInterfacedObject just broadcast a message when its reference count hit zero, rather than destroying itself. Then some other object (garbage collector) listening for that message, could decide what to do with the it. I'm wondering about whether TObject.Dispatch could achieve this. *From:* delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] *On Behalf Of *Todd *Sent:* Wednesday, 24 November 2010 6:15 p.m. *To:* NZ Borland Developers Group - Delphi List *Subject:* Re: [DUG] How's this for inconsistent Actually, this would be better function TamObject._Release: Integer; begin Result := InterlockedDecrement(FCount); if (FCount = 0) then begin //add a reference count, incase an interface is acquired and released during destruction InterlockedIncrement(FCount); self.Destroy; end; end; procedure TamObject.FreeInstance; begin //remove the reference count added in _Release InterlockedDecrement(FCount); assert(FCount = 0,'Destroying object with non-zero reference count'); inherited FreeInstance; end; I spotted that they fixed that a while ago -- I remember having to fix the issue myself many years ago so was quite pleased to see that it was now taken care of in TInterfaceObject as a matter of course. For some reason I never noticed the omission of the same facility in the destructor. And yes, it's a [potentially] big problem. I need to think about this tho... setting a fake ref count during execution of the constructor is safe enough as you know precisely when construction is done and to restore the ref count back to zero. Setting a fake ref count during destruction strikes me as more problematic and makes me nervous, but I can't quite put my finger on why. It might be nothing. That doesn't mean it can't be fixed, only that the solution put in place for construction might not work for destruction and it wasn't felt necessary to do any extra work for a more comprehensive solution. Certainly in the case of my code where I fixed this I had specific constructing / destructing state markers (it wasn't a general purpose interfacedobject class but a base class in a far richer framework that happened to also implement its own version of IUnknown) -- I know I didn't rely on side effects of a faked ref count. *From:* delphi-boun...@delphi.org.nz mailto:delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] *On Behalf Of *Todd *Sent:* Wednesday, 24 November 2010 16:55 *To:* NZ Borland Developers Group - Delphi List *Subject:* [DUG] How's this for inconsistent The Delphi developer who implemented TInterfacedObject obviously considered the case when an interface reference is grabbed during construction.. // Set an implicit refcount so that refcounting // during construction won't destroy the object. class function TInterfacedObject.NewInstance: TObject; begin Result := inherited NewInstance; TInterfacedObject(Result).FRefCount := 1; end; procedure TInterfacedObject.AfterConstruction; begin // Release the constructor's implicit refcount InterlockedDecrement(FRefCount); end; but didn't consider applying the same logic during destruction. So grabing an interface reference during destruction causes all hell to break loose, as the _Release method tries to free the object again and again recursively. procedure TInterfacedObject.BeforeDestruction; begin if RefCount 0 then Error(reInvalidPtr); end; function TInterfacedObject._Release: Integer; begin Result := InterlockedDecrement(FRefCount); if Result = 0 then Destroy; end; Todd. ___ NZ Borland Developers Group - Delphi mailing list Post:delphi@delphi.org.nz mailto:delphi@delphi.org.nz Admin:http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email
Re: [DUG] How's this for inconsistent
Yep -- I remember that my fix was to set the destructing state indicator in a BeforeDestruction() override. This was then tested in _Release() to render it a NO-OP during execution of the destructor chain (incomplete, obviously, just to give the idea): Procedure BeforeDestruction; SetState(csDestroying); Function _Release; If csDestroying in State then EXIT; Upon reflection, that would only work for a TComponent descendant. Nothing else needs be done, as long as any further BeforeDestruction overrides call inherited before doing their work, which they should do (in my framework I introduced another virtual to be overridden in my descendants, in case there were occasions when work was done to generate references during the destructor execution -- even in your case, the FreeInstance() override is redundant I think, other than as a sanity/safety check and so could be made subject to some conditional compilation flag. *From:* delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] *On Behalf Of *Todd *Sent:* Wednesday, 24 November 2010 6:15 p.m. *To:* NZ Borland Developers Group - Delphi List *Subject:* Re: [DUG] How's this for inconsistent Actually, this would be better function TamObject._Release: Integer; begin Result := InterlockedDecrement(FCount); if (FCount = 0) then begin //add a reference count, incase an interface is acquired and released during destruction InterlockedIncrement(FCount); self.Destroy; end; end; procedure TamObject.FreeInstance; begin //remove the reference count added in _Release InterlockedDecrement(FCount); assert(FCount = 0,'Destroying object with non-zero reference count'); inherited FreeInstance; end; I spotted that they fixed that a while ago -- I remember having to fix the issue myself many years ago so was quite pleased to see that it was now taken care of in TInterfaceObject as a matter of course. For some reason I never noticed the omission of the same facility in the destructor. And yes, it's a [potentially] big problem. I need to think about this tho... setting a fake ref count during execution of the constructor is safe enough as you know precisely when construction is done and to restore the ref count back to zero. Setting a fake ref count during destruction strikes me as more problematic and makes me nervous, but I can't quite put my finger on why. It might be nothing. That doesn't mean it can't be fixed, only that the solution put in place for construction might not work for destruction and it wasn't felt necessary to do any extra work for a more comprehensive solution. Certainly in the case of my code where I fixed this I had specific constructing / destructing state markers (it wasn't a general purpose interfacedobject class but a base class in a far richer framework that happened to also implement its own version of IUnknown) -- I know I didn't rely on side effects of a faked ref count. *From:* delphi-boun...@delphi.org.nz mailto:delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] *On Behalf Of *Todd *Sent:* Wednesday, 24 November 2010 16:55 *To:* NZ Borland Developers Group - Delphi List *Subject:* [DUG] How's this for inconsistent The Delphi developer who implemented TInterfacedObject obviously considered the case when an interface reference is grabbed during construction.. // Set an implicit refcount so that refcounting // during construction won't destroy the object. class function TInterfacedObject.NewInstance: TObject; begin Result := inherited NewInstance; TInterfacedObject(Result).FRefCount := 1; end; procedure TInterfacedObject.AfterConstruction; begin // Release the constructor's implicit refcount InterlockedDecrement(FRefCount); end; but didn't consider applying the same logic during destruction. So grabing an interface reference during destruction causes all hell to break loose, as the _Release method tries to free the object again and again recursively. procedure TInterfacedObject.BeforeDestruction; begin if RefCount 0 then Error(reInvalidPtr); end; function TInterfacedObject._Release: Integer; begin Result := InterlockedDecrement(FRefCount); if Result = 0 then Destroy; end; Todd. ___ NZ Borland Developers Group - Delphi mailing list Post:delphi@delphi.org.nz mailto:delphi@delphi.org.nz Admin:http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email todelphi-requ...@delphi.org.nz mailto:delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post: delphi@delphi.org.nz Admin: http://delphi.org.nz/mailman/listinfo/delphi Unsubscribe: send an email to delphi-requ...@delphi.org.nz with Subject: unsubscribe ___ NZ Borland Developers Group - Delphi mailing list Post:
Re: [DUG] How's this for inconsistent
No, the State and csDestroying elements were part of my framework, not the mechanism that is part of TComponent (though of course there were obvious parallels in some cases - note however that TComponent uses ComponentState, not just State and ControlState is introduced by the controls part of the VCL hierarchy also). My changes in the case that I am dragging up from the dim and distant past were not to create a replacement general purpose TInterfacedObject as such, but part of a much wider framework that happened to have at its core a class that implemented IUnknown along with a whole host of other services as part of that framework since the entire thing was interface based. On a more general point, I'd say that whilst referencing self as an interface is an almost unavoidable part of any interface based framework, the same is not true in the destructor. Anything referenced or notified during destruction of an object that requires being passed a reference TO that object, could and most likely should, have been passed a reference to that object during construction or during some later operation. However, if those other objects were holding on to those references, that would prevent the object from being destroyed in the first place. Another thing that is missing is a safe, common implementation of a weak reference (an interface reference that does not hold on to - i.e. does not contribute +1 toward - the reference count of the referenced object). You can use an interface reference cast as a Pointer, but the catch is that the reference must be sure that it will be notified if the referenced object is destroyed, so that it can NIL itself. Using a plain Pointer won't do... you have to wrap it inside some object that can register for those notifications (and be sure that the object referenced will implement the required notification for the wrapper to respond to). Coincidentally, just this weak [sic] I found myself implementing exactly that, leveraging my TMultiCastNotify work and IOn_Destroy multicast destroy notification framework to take care of all that. But, that is not a general purpose solution as it requires that objects exposing interfaces that may be encapsulated in my weak reference implementation also implement IOn_Destroy, which is not built into the VCL. In fact, the VCL doesn't have any such general purpose system (and neither should it have imho). The FreeNotification() mechanism is not supported on TObject, and neither imho should it be. Not every application - or indeed every object - needs these things, nor the overhead that it would incur, adding housekeeping code to the tear-down of every object. Most applications simply don't need these sorts of exotica (as evidenced by the fact that these sort of facilities either don't exist at all or took so long to be introduced into the core VCL, following the introduction of interfaces waay back in Delphi 3 - Delphi 2 did things slightly differently you may recall). The very beauty of Delphi and the VCL is that we can introduce these things into the applications (or indeed the small parts of our applications) that need them, without imposing them on everything else. When we want the convenience of having everything to hand whether you want it or not, that's what managed code is for. J From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Todd Sent: Wednesday, 24 November 2010 8:08 p.m. To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] How's this for inconsistent Yep - I remember that my fix was to set the destructing state indicator in a BeforeDestruction() override. This was then tested in _Release() to render it a NO-OP during execution of the destructor chain (incomplete, obviously, just to give the idea): Procedure BeforeDestruction; SetState(csDestroying); Function _Release; If csDestroying in State then EXIT; Upon reflection, that would only work for a TComponent descendant. Nothing else needs be done, as long as any further BeforeDestruction overrides call inherited before doing their work, which they should do (in my framework I introduced another virtual to be overridden in my descendants, in case there were occasions when work was done to generate references during the destructor execution - even in your case, the FreeInstance() override is redundant I think, other than as a sanity/safety check and so could be made subject to some conditional compilation flag. From: delphi-boun...@delphi.org.nz [mailto:delphi-boun...@delphi.org.nz] On Behalf Of Todd Sent: Wednesday, 24 November 2010 6:15 p.m. To: NZ Borland Developers Group - Delphi List Subject: Re: [DUG] How's this for inconsistent Actually, this would be better function TamObject._Release: Integer; begin Result := InterlockedDecrement(FCount); if (FCount = 0) then begin //add a reference count, incase an