Re: [HACKERS] Garbage pad bytes within datums are bad news
The alternative seems to be to forbid uninitialized pad bytes within Datums. That's not very pleasant to contemplate either, since it'll forever be vulnerable to sins of omission. Hmm, we can add to palloc random filling of allocated memory with --enable-cassert. It'll allow to catch such bugs in most cases. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Garbage pad bytes within datums are bad news
Teodor Sigaev [EMAIL PROTECTED] writes: The alternative seems to be to forbid uninitialized pad bytes within Datums. That's not very pleasant to contemplate either, since it'll forever be vulnerable to sins of omission. Hmm, we can add to palloc random filling of allocated memory with --enable-cassert. It'll allow to catch such bugs in most cases. Based on the fact that this problem went undetected for so long, I'm not real convinced of that. The difficulty is exactly that not very many system behaviors will fail obviously if nodetrees that should be seen as equal are not. Perhaps we could add some additional tests to help exercise it? But how do you get them applied to all datatypes? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Garbage pad bytes within datums are bad news
Tom Lane [EMAIL PROTECTED] writes: The alternative seems to be to forbid uninitialized pad bytes within Datums. That's not very pleasant to contemplate either, since it'll forever be vulnerable to sins of omission. Just brainstorming here, I don't think this is a good solution but perhaps it could lead somewhere interesting... We could have actual equal operators include an assertion that the datums are also datumIsEqual? That isn't guaranteed to catch every case but it would be good for complex data types like arrays. I suppose if all we want to do is assert that constructors don't create this situation we could have an assertion that calls the constructor a second time (with palloc generating garbage data) and compares the results with datumEqual. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Garbage pad bytes within datums are bad news
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: The alternative seems to be to forbid uninitialized pad bytes within Datums. That's not very pleasant to contemplate either, since it'll forever be vulnerable to sins of omission. Just brainstorming here, I don't think this is a good solution but perhaps it could lead somewhere interesting... We could have actual equal operators include an assertion that the datums are also datumIsEqual? That isn't guaranteed to catch every case but it would be good for complex data types like arrays. That still puts the responsibility on the individual datatype author to get it right. The case I'm most worried about is user-written datatypes that are never going to magically acquire such asserts. There is another path we could try to take, which is to fix things so that there aren't any cases where a false not-equal report will break critical behavior. I seem to recall that the original justification for implementing equal() this way was exactly the argument that it didn't matter if you sometimes got a false not-equal report, so long as copyObject() was guaranteed to generate equal() trees (which it is). This also seems prone to somebody breaking the assumption in future, though, even assuming that we can make it work like that today. (I'm not too sure how to deal with the ORDER-BY/DISTINCT semantics without it.) I guess we could have a test mode in which datumIsEqual always returned false... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Garbage pad bytes within datums are bad news
That still puts the responsibility on the individual datatype author to get it right. The case I'm most worried about is user-written datatypes that are never going to magically acquire such asserts. It seems to me that working with two assumption (binary equal and catalog-defined equal function) in the same time is a wrong way. If we decide to use binary equal criteria, then why we need catalog-defined equal at all? If we use catalog-defined one, why we should require binary equality? Using both way in the same time is an error prone. It's possible to say that two value is equal if they are binary the same, if not - we should find catalog-defined equal operation and call it. Binary comparison is only an optimization. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Garbage pad bytes within datums are bad news
Teodor Sigaev [EMAIL PROTECTED] writes: That still puts the responsibility on the individual datatype author to get it right. The case I'm most worried about is user-written datatypes that are never going to magically acquire such asserts. It seems to me that working with two assumption (binary equal and catalog-defined equal function) in the same time is a wrong way. If we decide to use binary equal criteria, then why we need catalog-defined equal at all? If we use catalog-defined one, why we should require binary equality? Using both way in the same time is an error prone. It's possible to say that two value is equal if they are binary the same, if not - we should find catalog-defined equal operation and call it. Binary comparison is only an optimization. That only works if there is a unique function that we can say is THE equal operation for the datatype (with a true result guaranteeing that every operation the datatype has will produce the same results from the two values). There is no such concept in PG at the moment, and if memory serves there are at least three built-in types for which the default btree equality function in fact doesn't guarantee that. So even if we wanted to pursue that path, it wouldn't produce a fix that we could back-patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Garbage pad bytes within datums are bad news
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: The alternative seems to be to forbid uninitialized pad bytes within Datums. That's not very pleasant to contemplate either, since it'll forever be vulnerable to sins of omission. Just brainstorming here, I don't think this is a good solution but perhaps it could lead somewhere interesting... Another thought. Perhaps every data type should define an operator which is a true equals. Ie, it guarantees that *no* internal state that any function could expose is different between two datums. Most data types could implement it just by calling memcmp (or postgres could provide such a definition if it's left undefined). That gives arrays the option of either providing such an operator or guaranteeing no padding bytes and using memcmp. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Garbage pad bytes within datums are bad news
Gregory Stark [EMAIL PROTECTED] writes: I suppose if all we want to do is assert that constructors don't create this situation we could have an assertion that calls the constructor a second time (with palloc generating garbage data) and compares the results with datumEqual. After further reflection I think it might indeed be the case that we only have to worry about the datatype input routines, since those are what are invoked to create Consts that the user has a right to expect are equal(). I cons'd up a quick hack along the above lines and ran the regression tests with it. The failures suggest that aside from array_in, we'll need to fix these types: path tsquery lquery (in contrib) ltree (in contrib) That seems like a short enough list that the right fix for the back branches is just to fix these input routines (palloc0 instead of palloc should do it). I'm not sure yet if we want some less-klugy solution for the future. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers