Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Teodor Sigaev



The alternative seems to be to forbid uninitialized pad bytes within
Datums.  That's not very pleasant to contemplate either, since it'll
forever be vulnerable to sins of omission.


Hmm, we can add to palloc random filling of allocated memory with 
--enable-cassert.  It'll allow to catch such bugs in most cases.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 The alternative seems to be to forbid uninitialized pad bytes within
 Datums.  That's not very pleasant to contemplate either, since it'll
 forever be vulnerable to sins of omission.

 Hmm, we can add to palloc random filling of allocated memory with 
 --enable-cassert.  It'll allow to catch such bugs in most cases.

Based on the fact that this problem went undetected for so long,
I'm not real convinced of that.  The difficulty is exactly that
not very many system behaviors will fail obviously if nodetrees
that should be seen as equal are not.

Perhaps we could add some additional tests to help exercise it?
But how do you get them applied to all datatypes?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 The alternative seems to be to forbid uninitialized pad bytes within
 Datums.  That's not very pleasant to contemplate either, since it'll
 forever be vulnerable to sins of omission.

Just brainstorming here, I don't think this is a good solution but perhaps it
could lead somewhere interesting...

We could have actual equal operators include an assertion that the datums are
also datumIsEqual? That isn't guaranteed to catch every case but it would be
good for complex data types like arrays.

I suppose if all we want to do is assert that constructors don't create this
situation we could have an assertion that calls the constructor a second time
(with palloc generating garbage data) and compares the results with
datumEqual.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 The alternative seems to be to forbid uninitialized pad bytes within
 Datums.  That's not very pleasant to contemplate either, since it'll
 forever be vulnerable to sins of omission.

 Just brainstorming here, I don't think this is a good solution but perhaps it
 could lead somewhere interesting...

 We could have actual equal operators include an assertion that the datums are
 also datumIsEqual? That isn't guaranteed to catch every case but it would be
 good for complex data types like arrays.

That still puts the responsibility on the individual datatype author to
get it right.  The case I'm most worried about is user-written datatypes
that are never going to magically acquire such asserts.

There is another path we could try to take, which is to fix things so
that there aren't any cases where a false not-equal report will break
critical behavior.  I seem to recall that the original justification for
implementing equal() this way was exactly the argument that it didn't
matter if you sometimes got a false not-equal report, so long as
copyObject() was guaranteed to generate equal() trees (which it is).
This also seems prone to somebody breaking the assumption in future,
though, even assuming that we can make it work like that today.
(I'm not too sure how to deal with the ORDER-BY/DISTINCT semantics
without it.)  I guess we could have a test mode in which datumIsEqual
always returned false...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Teodor Sigaev




That still puts the responsibility on the individual datatype author to
get it right.  The case I'm most worried about is user-written datatypes
that are never going to magically acquire such asserts.


It seems to me that working with two assumption (binary equal and 
catalog-defined equal function) in the same time is a wrong way. If we decide to 
use binary equal criteria, then why we need catalog-defined equal at all? If we 
use catalog-defined one, why we should require binary equality? Using both way 
in the same time is an error prone. It's possible to say that two value is equal 
 if they are binary the same, if not - we should find catalog-defined equal 
operation and call it. Binary comparison is only an optimization.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 That still puts the responsibility on the individual datatype author to
 get it right.  The case I'm most worried about is user-written datatypes
 that are never going to magically acquire such asserts.

 It seems to me that working with two assumption (binary equal and 
 catalog-defined equal function) in the same time is a wrong way. If we decide 
 to 
 use binary equal criteria, then why we need catalog-defined equal at all? If 
 we 
 use catalog-defined one, why we should require binary equality? Using both 
 way 
 in the same time is an error prone. It's possible to say that two value is 
 equal 
   if they are binary the same, if not - we should find catalog-defined equal 
 operation and call it. Binary comparison is only an optimization.

That only works if there is a unique function that we can say is THE
equal operation for the datatype (with a true result guaranteeing that
every operation the datatype has will produce the same results from the
two values).  There is no such concept in PG at the moment, and if
memory serves there are at least three built-in types for which the
default btree equality function in fact doesn't guarantee that.
So even if we wanted to pursue that path, it wouldn't produce a fix
that we could back-patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 The alternative seems to be to forbid uninitialized pad bytes within
 Datums.  That's not very pleasant to contemplate either, since it'll
 forever be vulnerable to sins of omission.

 Just brainstorming here, I don't think this is a good solution but perhaps it
 could lead somewhere interesting...

Another thought. Perhaps every data type should define an operator which is a
true equals. Ie, it guarantees that *no* internal state that any function
could expose is different between two datums. Most data types could implement
it just by calling memcmp (or postgres could provide such a definition if it's
left undefined).

That gives arrays the option of either providing such an operator or
guaranteeing no padding bytes and using memcmp.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Garbage pad bytes within datums are bad news

2008-04-04 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I suppose if all we want to do is assert that constructors don't create this
 situation we could have an assertion that calls the constructor a second time
 (with palloc generating garbage data) and compares the results with
 datumEqual.

After further reflection I think it might indeed be the case that we
only have to worry about the datatype input routines, since those are
what are invoked to create Consts that the user has a right to expect
are equal().

I cons'd up a quick hack along the above lines and ran the regression
tests with it.  The failures suggest that aside from array_in, we'll
need to fix these types:
path
tsquery
lquery  (in contrib)
ltree   (in contrib)

That seems like a short enough list that the right fix for the back
branches is just to fix these input routines (palloc0 instead of palloc
should do it).  I'm not sure yet if we want some less-klugy solution
for the future.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers