Re: clarify equalTupleDescs()

2024-03-17 Thread Peter Eisentraut

On 13.03.24 19:43, Tomas Vondra wrote:

I looked at this patch today. I went through all the calls switched to
equalRowTypes, and AFAIK all of them are correct - all the places
switched to equalRowTypes() only need the weaker checks.

There's only two places still calling equalTupleDescs() - relcache
certainly needs that, and so does the assert in execReplication().

As for attndims, I agree equalRowTypes() should not check that. We're
not really checking that anywhere, it'd be quite weird to start with it
here. Especially if the plan is to remove it entirely.


Thanks for checking this again.  I have committed the patch as it was 
presented then.






Re: clarify equalTupleDescs()

2024-03-13 Thread Tomas Vondra



On 2/27/24 12:13, jian he wrote:
> On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut  wrote:
>>
>>
>> In principle, hashRowType() could process all the fields that
>> equalRowTypes() does.  But since it's only a hash function, it doesn't
>> have to be perfect.  (This is also the case for the current
>> hashTupleDesc().)  I'm not sure where the best tradeoff is.
> 
> That's where my confusion comes from.
> hashRowType is used in record_type_typmod_hash.
> record_type_typmod_hash is within assign_record_type_typmod.
>> in assign_record_type_typmod:
> 
> if (RecordCacheHash == NULL)
> {
> /* First time through: initialize the hash table */
> HASHCTL ctl;
> ctl.keysize = sizeof(TupleDesc); /* just the pointer */
> ctl.entrysize = sizeof(RecordCacheEntry);
> ctl.hash = record_type_typmod_hash;
> ctl.match = record_type_typmod_compare;
> RecordCacheHash = hash_create("Record information cache", 64,
>   ,
>   HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
> /* Also make sure CacheMemoryContext exists */
> if (!CacheMemoryContext)
> CreateCacheMemoryContext();
> }
> /*
> * Find a hashtable entry for this tuple descriptor. We don't use
> * HASH_ENTER yet, because if it's missing, we need to make sure that all
> * the allocations succeed before we create the new entry.
> */
> recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> ,
> HASH_FIND, );
> 
> based on the comments in hash_create. The above hash_search function
> would first use
> record_type_typmod_hash to find out candidate entries in a hash table
> then use record_type_typmod_compare to compare the given tupDesc with
> candidate entries.
> 
> Is this how the hash_search in assign_record_type_typmod works?
> 

Yes.

> equalRowTypes processed more fields than hashRowType,
> hashRowType comments mentioned equalRowTypes,
> maybe we should have some comments in hashRowType explaining why only
> hashing natts, tdtypeid, atttypid will be fine.
> 

Not sure I understand what the confusion is - omitting fields with
little entropy is not uncommon, and collisions are inherent to hash
tables, and need to be handled anyway.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: clarify equalTupleDescs()

2024-03-13 Thread Tomas Vondra
Hi,

I looked at this patch today. I went through all the calls switched to
equalRowTypes, and AFAIK all of them are correct - all the places
switched to equalRowTypes() only need the weaker checks.

There's only two places still calling equalTupleDescs() - relcache
certainly needs that, and so does the assert in execReplication().


As for attndims, I agree equalRowTypes() should not check that. We're
not really checking that anywhere, it'd be quite weird to start with it
here. Especially if the plan is to remove it entirely.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: clarify equalTupleDescs()

2024-02-27 Thread jian he
On Mon, Feb 12, 2024 at 7:47 PM Peter Eisentraut  wrote:
>
>
> In principle, hashRowType() could process all the fields that
> equalRowTypes() does.  But since it's only a hash function, it doesn't
> have to be perfect.  (This is also the case for the current
> hashTupleDesc().)  I'm not sure where the best tradeoff is.

That's where my confusion comes from.
hashRowType is used in record_type_typmod_hash.
record_type_typmod_hash is within assign_record_type_typmod.

in assign_record_type_typmod:

if (RecordCacheHash == NULL)
{
/* First time through: initialize the hash table */
HASHCTL ctl;
ctl.keysize = sizeof(TupleDesc); /* just the pointer */
ctl.entrysize = sizeof(RecordCacheEntry);
ctl.hash = record_type_typmod_hash;
ctl.match = record_type_typmod_compare;
RecordCacheHash = hash_create("Record information cache", 64,
  ,
  HASH_ELEM | HASH_FUNCTION | HASH_COMPARE);
/* Also make sure CacheMemoryContext exists */
if (!CacheMemoryContext)
CreateCacheMemoryContext();
}
/*
* Find a hashtable entry for this tuple descriptor. We don't use
* HASH_ENTER yet, because if it's missing, we need to make sure that all
* the allocations succeed before we create the new entry.
*/
recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
,
HASH_FIND, );

based on the comments in hash_create. The above hash_search function
would first use
record_type_typmod_hash to find out candidate entries in a hash table
then use record_type_typmod_compare to compare the given tupDesc with
candidate entries.

Is this how the hash_search in assign_record_type_typmod works?

equalRowTypes processed more fields than hashRowType,
hashRowType comments mentioned equalRowTypes,
maybe we should have some comments in hashRowType explaining why only
hashing natts, tdtypeid, atttypid will be fine.




Re: clarify equalTupleDescs()

2024-02-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 06.02.24 16:14, Tom Lane wrote:
>> +1 for the general idea, but it seems like "row type equality"
>> might still be a slightly fuzzy concept.

> I did another pass across the callers to check what pg_attribute fields 
> might be relevant.

> Collation definitely needs to be added, certainly for plancache.c, maybe 
> for typcache.c, the other callers don't care.

+1

> Record types can have attisdropped fields, so it's probably good to 
> check those.

Yeah, good idea.  (In most cases the attname comparison would catch
that, but we shouldn't rely on it.)  In a perfect world maybe a
dropped column should be invisible to this comparison, but we're
a very long way from being able to treat it that way.

> I'm suspicious about attndims.  Maybe one could create a test case where 
> record types differ only in that.  Support for attndims throughout the 
> system is weak, but maybe there is something to check there.

There was a discussion last year[1] about removing attndims
altogether, which still seems to me like possibly a good idea.
So I doubt we want to consider it as a core semantic field.

> On a conceptual level, I figured pg_attribute rows can be divided up 
> into three categories:

> 1. "row type" stuff: attname, atttypid, atttypmod, attndims, 
> attisdropped, attcollation

> 2. physical layout stuff: attlen, attcacheoff, attbyval, attalign

I recall some discussion about taking attcacheoff out of this data
structure too ...

> 3. table metadata stuff (everything else)

> It's not perfect, and sometimes it's not clear whether these categories 
> inform the implementation or the other way around, but I think it helps 
> conceptualize it.

Sure.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/ZD%2B14YZ4IUue8Rhi%40gendo.asyd.net




Re: clarify equalTupleDescs()

2024-02-12 Thread Peter Eisentraut

On 06.02.24 16:14, Tom Lane wrote:

Peter Eisentraut  writes:

The first want to compare what I call row-type equality, which means
they want to check specifically for equal number of attributes, and the
same attribute names, types, and typmods for each attribute.  Most
callers actually want that behavior.


Should compare attcollation too, no?

+1 for the general idea, but it seems like "row type equality"
might still be a slightly fuzzy concept.


I did another pass across the callers to check what pg_attribute fields 
might be relevant.


Collation definitely needs to be added, certainly for plancache.c, maybe 
for typcache.c, the other callers don't care.


Record types can have attisdropped fields, so it's probably good to 
check those.


I'm suspicious about attndims.  Maybe one could create a test case where 
record types differ only in that.  Support for attndims throughout the 
system is weak, but maybe there is something to check there.


On a conceptual level, I figured pg_attribute rows can be divided up 
into three categories:


1. "row type" stuff: attname, atttypid, atttypmod, attndims, 
attisdropped, attcollation


2. physical layout stuff: attlen, attcacheoff, attbyval, attalign

3. table metadata stuff (everything else)

It's not perfect, and sometimes it's not clear whether these categories 
inform the implementation or the other way around, but I think it helps 
conceptualize it.
From 7798fb0c9b57e94f8edc9c8c808395097e541954 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 12 Feb 2024 12:42:11 +0100
Subject: [PATCH v1] Separate equalRowTypes() from equalTupleDescs()

This introduces a new function equalRowTypes() that is effectively a
subset of equalTupleDescs() but only compares the number of attributes
and attribute name, type, and typmod.  This is enough for most
existing uses of equalTupleDescs(), which are changed to use the new
function.  The only remaining callers of equalTupleDescs() are those
that really want to check the full tuple descriptor as such, without
concern about record or row or record type semantics.

The existing function hashTupleDesc() is renamed to hashRowType(),
because it now corresponds more to equalRowTypes().

The purpose of this change is to be clearer about the semantics of
equality asked for by each caller.  (At least one caller had a comment
that questioned whether equalTupleDescs() was too restrictive.)  For
example, 4f622503d6d removed attstattarget from the tuple descriptor
structure.  It was not fully clear at the time how this should affect
equalTupleDescs().  Now the answer is clear: By their own definitions,
equalRowTypes() does not care, and equalTupleDescs() just compares
whatever is in the tuple descriptor but does not care why it is in
there.

Discussion: 
https://www.postgresql.org/message-id/flat/f656d6d9-6660-4518-a006-2f65cafbebd1%40eisentraut.org
---
 src/backend/access/common/tupdesc.c | 71 -
 src/backend/catalog/pg_proc.c   |  2 +-
 src/backend/commands/analyze.c  |  4 +-
 src/backend/commands/view.c | 13 +++---
 src/backend/utils/cache/plancache.c |  5 +-
 src/backend/utils/cache/typcache.c  | 10 ++--
 src/include/access/tupdesc.h|  4 +-
 7 files changed, 77 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/common/tupdesc.c 
b/src/backend/access/common/tupdesc.c
index bc80caee117..64761f050f5 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -414,11 +414,6 @@ DecrTupleDescRefCount(TupleDesc tupdesc)
 
 /*
  * Compare two TupleDesc structures for logical equality
- *
- * Note: we deliberately do not check the attrelid and tdtypmod fields.
- * This allows typcache.c to use this routine to see if a cached record type
- * matches a requested type, and is harmless for relcache.c's uses.
- * We don't compare tdrefcount, either.
  */
 bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
@@ -431,6 +426,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;
 
+   /* tdtypmod and tdrefcount are not checked */
+
for (i = 0; i < tupdesc1->natts; i++)
{
Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
@@ -561,17 +558,67 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 }
 
 /*
- * hashTupleDesc
- * Compute a hash value for a tuple descriptor.
+ * equalRowTypes
  *
- * If two tuple descriptors would be considered equal by equalTupleDescs()
- * then their hash value will be equal according to this function.
+ * This determines whether two tuple descriptors have equal row types.  This
+ * only checks those fields in pg_attribute that are applicable for row types,
+ * while ignoring those fields that define the physical row storage or those
+ * that define table column metadata.
+ *
+ * Specifically, this checks:
+ *
+ * - same number of attributes
+ * - same composite 

Re: clarify equalTupleDescs()

2024-02-12 Thread Peter Eisentraut

On 07.02.24 04:06, jian he wrote:

/*
  * hashRowType
  *
  * If two tuple descriptors would be considered equal by equalRowTypes()
  * then their hash value will be equal according to this function.
  */
uint32
hashRowType(TupleDesc desc)
{
uint32 s;
int i;

s = hash_combine(0, hash_uint32(desc->natts));
s = hash_combine(s, hash_uint32(desc->tdtypeid));
for (i = 0; i < desc->natts; ++i)
s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));

return s;
}

from the hashRowType comment, should we also hash attname and atttypmod?


In principle, hashRowType() could process all the fields that 
equalRowTypes() does.  But since it's only a hash function, it doesn't 
have to be perfect.  (This is also the case for the current 
hashTupleDesc().)  I'm not sure where the best tradeoff is.





Re: clarify equalTupleDescs()

2024-02-06 Thread jian he
On Tue, Feb 6, 2024 at 8:59 PM Peter Eisentraut  wrote:
>
> In a recent patch thread it was discussed[0] which fields should be
> compared by equalTupleDescs() and whether it is ok to remove a field
> from tuple descriptors and how that should affect their equality
> (attstattarget in that case).
>
> After analyzing all the callers, I have noticed that there are two
> classes of callers of equalTupleDescs():
>
> The first want to compare what I call row-type equality, which means
> they want to check specifically for equal number of attributes, and the
> same attribute names, types, and typmods for each attribute.  Most
> callers actually want that behavior.  The remaining callers just want to
> compare the tuple descriptors as they are, they don't care why the
> fields are in there, they just want to compare all of them.
>
> In the attached patch, I add a new function equalRowTypes() that is
> effectively a subset of equalTupleDescs() and switch most callers to that.
>
> The purpose of this patch is to make the semantics less uncertain.
> Questions like the one in [0] about attstattarget now have a clear
> answer for both functions.  I think this would be useful to have, as we
> are thinking about more changes in pg_attribute and tuple descriptors.
>
>
> [0]:
> https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsql

function name record_type_typmod_hash imply that
hashRowType should also hash atttypmod field?

also:

bool
equalRowTypes(TupleDesc tupdesc1, TupleDesc tupdesc2)
{
if (tupdesc1->natts != tupdesc2->natts)
return false;
if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;

for (int i = 0; i < tupdesc1->natts; i++)
{
Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
Form_pg_attribute attr2 = TupleDescAttr(tupdesc2, i);

if (strcmp(NameStr(attr1->attname), NameStr(attr2->attname)) != 0)
return false;
if (attr1->atttypid != attr2->atttypid)
return false;
if (attr1->atttypmod != attr2->atttypmod)
return false;
}

return true;
}

/*
 * hashRowType
 *
 * If two tuple descriptors would be considered equal by equalRowTypes()
 * then their hash value will be equal according to this function.
 */
uint32
hashRowType(TupleDesc desc)
{
uint32 s;
int i;

s = hash_combine(0, hash_uint32(desc->natts));
s = hash_combine(s, hash_uint32(desc->tdtypeid));
for (i = 0; i < desc->natts; ++i)
s = hash_combine(s, hash_uint32(TupleDescAttr(desc, i)->atttypid));

return s;
}

from the hashRowType comment, should we also hash attname and atttypmod?




Re: clarify equalTupleDescs()

2024-02-06 Thread Tom Lane
Peter Eisentraut  writes:
> The first want to compare what I call row-type equality, which means 
> they want to check specifically for equal number of attributes, and the 
> same attribute names, types, and typmods for each attribute.  Most 
> callers actually want that behavior.

Should compare attcollation too, no?

+1 for the general idea, but it seems like "row type equality"
might still be a slightly fuzzy concept.

regards, tom lane




clarify equalTupleDescs()

2024-02-06 Thread Peter Eisentraut
In a recent patch thread it was discussed[0] which fields should be 
compared by equalTupleDescs() and whether it is ok to remove a field 
from tuple descriptors and how that should affect their equality 
(attstattarget in that case).


After analyzing all the callers, I have noticed that there are two 
classes of callers of equalTupleDescs():


The first want to compare what I call row-type equality, which means 
they want to check specifically for equal number of attributes, and the 
same attribute names, types, and typmods for each attribute.  Most 
callers actually want that behavior.  The remaining callers just want to 
compare the tuple descriptors as they are, they don't care why the 
fields are in there, they just want to compare all of them.


In the attached patch, I add a new function equalRowTypes() that is 
effectively a subset of equalTupleDescs() and switch most callers to that.


The purpose of this patch is to make the semantics less uncertain. 
Questions like the one in [0] about attstattarget now have a clear 
answer for both functions.  I think this would be useful to have, as we 
are thinking about more changes in pg_attribute and tuple descriptors.



[0]: 
https://www.postgresql.org/message-id/202401101316.k4s3fomwjx52@alvherre.pgsqlFrom 7212a249b56eb5ac339dcd029526e7fb380932fe Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 6 Feb 2024 13:02:41 +0100
Subject: [PATCH v0] Separate equalRowTypes() from equalTupleDescs()

This introduces a new function equalRowTypes() that is effectively a
subset of equalTupleDescs() but only compares the number of attributes
and attribute name, type, and typmod.  This is enough for most
existing uses of equalTupleDescs(), which are changed to use the new
function.  The only remaining callers of equalTupleDescs() are those
that really want to check the tuple descriptor as such, without
concern about record to row or record type semantics.

The existing function hashTupleDesc() is renamed to hashRowType(),
because it now corresponds more to equalRowTypes().

The purpose of this change is to be clearer about the semantics of
equality asked for by each caller.  (At least one caller had a comment
that questioned whether equalTupleDescs() was too restrictive.)  For
example, 4f622503d6d removed attstattarget from the tuple descriptor
structure.  It was not fully clear at the time how this should affect
equalTupleDescs().  Now the answer is clear: By their own definitions,
equalRowTypes() does not care, and equalTupleDescs() just compares
whatever is in the tuple descriptor but does not care why it is in
there.
---
 src/backend/access/common/tupdesc.c | 58 +++--
 src/backend/catalog/pg_proc.c   |  2 +-
 src/backend/commands/analyze.c  |  4 +-
 src/backend/commands/view.c | 13 +++
 src/backend/utils/cache/plancache.c |  5 +--
 src/backend/utils/cache/typcache.c  | 10 ++---
 src/include/access/tupdesc.h|  4 +-
 7 files changed, 64 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/common/tupdesc.c 
b/src/backend/access/common/tupdesc.c
index bc80caee117..22d6708708e 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -414,11 +414,6 @@ DecrTupleDescRefCount(TupleDesc tupdesc)
 
 /*
  * Compare two TupleDesc structures for logical equality
- *
- * Note: we deliberately do not check the attrelid and tdtypmod fields.
- * This allows typcache.c to use this routine to see if a cached record type
- * matches a requested type, and is harmless for relcache.c's uses.
- * We don't compare tdrefcount, either.
  */
 bool
 equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
@@ -431,6 +426,8 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
if (tupdesc1->tdtypeid != tupdesc2->tdtypeid)
return false;
 
+   /* tdtypmod and tdrefcount are not checked */
+
for (i = 0; i < tupdesc1->natts; i++)
{
Form_pg_attribute attr1 = TupleDescAttr(tupdesc1, i);
@@ -561,17 +558,54 @@ equalTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2)
 }
 
 /*
- * hashTupleDesc
- * Compute a hash value for a tuple descriptor.
+ * equalRowTypes
  *
- * If two tuple descriptors would be considered equal by equalTupleDescs()
- * then their hash value will be equal according to this function.
+ * This determines whether two tuple descriptors have equal row types.  This
+ * means specifically:
+ *
+ * - same number of attributes
+ * - same composite type ID (but could both be zero)
+ * - corresponding attributes (in order) have the same name, type, and typmod
  *
- * Note that currently contents of constraint are not hashed - it'd be a bit
- * painful to do so, and conflicts just due to constraints are unlikely.
+ * This is used to check whether two record types are compatible, whether
+ * function return row types are the same, and other similar situations.
+ *
+ * Note: We deliberately do not check the