Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-18 Thread Robert Haas
On Thu, May 17, 2018 at 10:13 AM, Tom Lane  wrote:
> Maybe what you need is a redesign.  This convention seems impossibly
> confusing and hence error-prone.  What about using a separate bool to
> indicate which list the index refers to?

I really don't think it's a big deal.  The comments may need some
improvement (which is what we're doing here), but it's not as if there
are no other places in the PostgreSQL code where positive and negative
values indicate different kinds of things, user and system columns
being the most obvious such distinction. What we're doing here can't
possibly be more error-prone than adding and subtracting
FirstLowInvalidHeapAttributeNumber all over the place.

I'm biased, of course.  I invented this particular convention.  If
somebody else feels like redesigning it for a future release, and the
result is better than what we have now, cool.  But I do not think that
it would be a clear improvement to have a boolean indicating whether
or not the index is an index into subpartitions or leaf nodes and have
the index value always be positive.  In the current convention, if you
fail to handle negative values, you will probably crash.  With that
convention, if you forget to check the boolean and just assume you
have a leaf partition, it's quite likely that it will look like it
works, but do the wrong thing.  And as David points out, it also uses
less memory (which also makes it faster).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-18 Thread Robert Haas
On Thu, May 17, 2018 at 9:56 PM, Amit Langote
 wrote:
> On 2018/05/18 5:56, David Rowley wrote:
>> On 18 May 2018 at 06:21, Robert Haas  wrote:
>>> All right, so let's just say that explicitly.  Maybe something like
>>> the attached.
>>
>> That looks fine to me.
>
> Me too, except:
>
> +* *pds list is the root partition, so 0 always means the first leaf. 
> When
>
> I prefer "root partitioned table" over "root partition", and git grep
> suggests that that's actually what we use elsewhere in the source code.

I don't know that it's necessary to be that rigid about this, so I
chose to commit without changing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Amit Langote
On 2018/05/18 5:56, David Rowley wrote:
> On 18 May 2018 at 06:21, Robert Haas  wrote:
>> All right, so let's just say that explicitly.  Maybe something like
>> the attached.
> 
> That looks fine to me.

Me too, except:

+* *pds list is the root partition, so 0 always means the first leaf. 
When

I prefer "root partitioned table" over "root partition", and git grep
suggests that that's actually what we use elsewhere in the source code.

Thanks,
Amit




Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Amit Langote
On 2018/05/18 6:14, David Rowley wrote:
> On 18 May 2018 at 02:13, Tom Lane  wrote:
>> Maybe what you need is a redesign.  This convention seems impossibly
>> confusing and hence error-prone.  What about using a separate bool to
>> indicate which list the index refers to?
> 
> While I agree that the coding is a bit unusual, I think it's also good
> that we can get away without allocating yet another array nparts in
> size. ExecSetupPartitionTupleRouting is already a huge bottleneck with
> single-row INSERT into a partitioned table with a large number of
> partitions. Allocating yet another array nparts in size will just slow
> it down further.

I recall having considered the idea of adding an array of bools, but went
with the negative-indexes-for-partitioned-tables idea anyway, which I
remember was suggested by Robert back then [1].  I admit it's a bit
confusing, but it's nice not have one more array allocation in that path
as you say.

> I have patches locally that I'll be submitting during the v12 cycle to
> improve on this. Among other things, the patches go to lengths to not
> allocate these arrays when we don't have to.

That would be nice.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmobF2r%3Df-crrE-k7WM8iFpBKLz3dtBtEc%3DKmkudYViYcyQ%40mail.gmail.com




Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread David Rowley
On 18 May 2018 at 02:13, Tom Lane  wrote:
> Maybe what you need is a redesign.  This convention seems impossibly
> confusing and hence error-prone.  What about using a separate bool to
> indicate which list the index refers to?

While I agree that the coding is a bit unusual, I think it's also good
that we can get away without allocating yet another array nparts in
size. ExecSetupPartitionTupleRouting is already a huge bottleneck with
single-row INSERT into a partitioned table with a large number of
partitions. Allocating yet another array nparts in size will just slow
it down further.

I have patches locally that I'll be submitting during the v12 cycle to
improve on this. Among other things, the patches go to lengths to not
allocate these arrays when we don't have to.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread David Rowley
On 18 May 2018 at 06:21, Robert Haas  wrote:
> All right, so let's just say that explicitly.  Maybe something like
> the attached.

That looks fine to me.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Alvaro Herrera
On 2018-May-17, Tom Lane wrote:

> Robert Haas  writes:
> > Hang on, I can't be wrong (famous last words).  If the negative
> > indexes were 0-based, that would mean that the first element of the
> > list was referenced by -0, which obviously can't be true, because 0 =
> > -0.  In other words, we can't be using 0-based indexing for both the
> > positive and the negative values, because then 0 itself would be
> > ambiguous.  It's got to be that -1 is the first element of the *pds
> > list, which means -- AFAICS, anyway -- that the way I phrased it is
> > correct.

> Maybe what you need is a redesign.  This convention seems impossibly
> confusing and hence error-prone.  What about using a separate bool to
> indicate which list the index refers to?

That was my impression I first came across this, FWIW, and I confess I
didn't try hard enough to understand it fully.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Robert Haas
On Thu, May 17, 2018 at 10:36 AM, Amit Langote  wrote:
> On Thu, May 17, 2018 at 10:29 PM, Robert Haas  wrote:
>> Unless the indexing system actually can't reference the first element
>> of *pds, and -1 means the second element.  But then I think we need a
>> more verbose explanation here.
>
> First element in *pds list (and the array subsequently created from
> it) contains the root table's entry.  So, a -1 does mean the 2nd entry
> in that list/array.  A 0 in the indexes array always refers to a leaf
> partition and hence an index into the array for leaf partitions.

All right, so let's just say that explicitly.  Maybe something like
the attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


another-try.patch
Description: Binary data


Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Amit Langote
On Thu, May 17, 2018 at 10:29 PM, Robert Haas  wrote:
> Unless the indexing system actually can't reference the first element
> of *pds, and -1 means the second element.  But then I think we need a
> more verbose explanation here.

First element in *pds list (and the array subsequently created from
it) contains the root table's entry.  So, a -1 does mean the 2nd entry
in that list/array.  A 0 in the indexes array always refers to a leaf
partition and hence an index into the array for leaf partitions.

Thanks,
Amit



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Tom Lane
Robert Haas  writes:
> Hang on, I can't be wrong (famous last words).  If the negative
> indexes were 0-based, that would mean that the first element of the
> list was referenced by -0, which obviously can't be true, because 0 =
> -0.  In other words, we can't be using 0-based indexing for both the
> positive and the negative values, because then 0 itself would be
> ambiguous.  It's got to be that -1 is the first element of the *pds
> list, which means -- AFAICS, anyway -- that the way I phrased it is
> correct.

> Unless the indexing system actually can't reference the first element
> of *pds, and -1 means the second element.  But then I think we need a
> more verbose explanation here.

Maybe what you need is a redesign.  This convention seems impossibly
confusing and hence error-prone.  What about using a separate bool to
indicate which list the index refers to?

regards, tom lane



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-17 Thread Robert Haas
On Wed, May 16, 2018 at 3:20 PM, Robert Haas  wrote:
> On Wed, May 16, 2018 at 2:28 PM, David Rowley
>  wrote:
>> Thanks for committing. Although, I disagree with your tweak:
>>
>> +* 1-based index into the *pds list.
>>
>> I think that's making the same mistake as the last comment did. You
>> think it's 1-based because the index is being set with list_length
>> rather than list_length - 1, but it can do that simply because the
>> item has not been added to the list yet.
>
> Uh, maybe I've got that wrong.  We can say 0-based instead if that's
> right.  I just didn't want to say that in one case it was 0-based and
> in the other case make no mention.

Hang on, I can't be wrong (famous last words).  If the negative
indexes were 0-based, that would mean that the first element of the
list was referenced by -0, which obviously can't be true, because 0 =
-0.  In other words, we can't be using 0-based indexing for both the
positive and the negative values, because then 0 itself would be
ambiguous.  It's got to be that -1 is the first element of the *pds
list, which means -- AFAICS, anyway -- that the way I phrased it is
correct.

Unless the indexing system actually can't reference the first element
of *pds, and -1 means the second element.  But then I think we need a
more verbose explanation here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread David Rowley
On 17 May 2018 at 13:17, Amit Langote  wrote:
> Or maybe, change the comment to say that even the negative indexes are
> 0-based like you pointed out, *but* instead of updating the comment like
> you suggest above, change the other index value assignment statement to
> not subtract 1 from the list_length by switching order with the
> accompanying lappend; like this:
>
>  if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
>  {
> +pd->indexes[i] = list_length(*leaf_part_oids);
>  *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
> -pd->indexes[i] = list_length(*leaf_part_oids) - 1;
>  }
>  else
>  {

That makes sense.  It's probably less confusing that way.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread Amit Langote
On 2018/05/17 3:28, David Rowley wrote:
> On 17 May 2018 at 02:51, Robert Haas  wrote:
>> I think that's clearer.  Committed with a few tweaks that are
>> hopefully improvements.
> 
> Thanks for committing. Although, I disagree with your tweak:
> 
> +* 1-based index into the *pds list.
> 
> I think that's making the same mistake as the last comment did. You
> think it's 1-based because the index is being set with list_length
> rather than list_length - 1, but it can do that simply because the
> item has not been added to the list yet.
> 
> Nothing converts this index back to 0-based;
>
> RelationGetPartitionDispatchInfo builds the array from the list with:
> 
> i = 0;
> foreach(lc, pdlist)
> {
> pd[i++] = lfirst(lc);
> }
> 
> ExecFindPartition uses the pd array with:
> 
> parent = pd[-parent->indexes[cur_index]];
> 
> So if it was 1-based then we'd be off by one here.

That's right.  Even those negative values in the pd->indexes are still
0-based, with the 0th entry being for the root table.

> Maybe we can clear up that confusion with
> 
> + /*
> +  * No need to subtract 1 to get the 0-based index as the item for this
> +  * partitioned table has not been added to the list yet.
> +  */
> pd->indexes[i] = -list_length(*pds);
> 
> and just switch 1-based to 0-based in the new comment.

Or maybe, change the comment to say that even the negative indexes are
0-based like you pointed out, *but* instead of updating the comment like
you suggest above, change the other index value assignment statement to
not subtract 1 from the list_length by switching order with the
accompanying lappend; like this:

 if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
 {
+pd->indexes[i] = list_length(*leaf_part_oids);
 *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
-pd->indexes[i] = list_length(*leaf_part_oids) - 1;
 }
 else
 {

Attached a patch.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 75329b3624..da962c4375 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -975,7 +975,7 @@ get_partition_dispatch_recurse(Relation rel, Relation 
parent,
 * array element belongs to a leaf partition or a subpartitioned table.
 * For leaf partitions we store the 0-based index into *leaf_part_oids,
 * and for sub-partitioned tables we store a negative version of the
-* 1-based index into the *pds list.  When searching, if we see a 
negative
+* 0-based index into the *pds list.  When searching, if we see a 
negative
 * value, the search must continue in the corresponding sub-partition;
 * otherwise, we've identified the correct partition.
 */
@@ -986,8 +986,8 @@ get_partition_dispatch_recurse(Relation rel, Relation 
parent,
 
if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
{
+   pd->indexes[i] = list_length(*leaf_part_oids);
*leaf_part_oids = lappend_oid(*leaf_part_oids, 
partrelid);
-   pd->indexes[i] = list_length(*leaf_part_oids) - 1;
}
else
{


Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread Robert Haas
On Wed, May 16, 2018 at 2:28 PM, David Rowley
 wrote:
> Thanks for committing. Although, I disagree with your tweak:
>
> +* 1-based index into the *pds list.
>
> I think that's making the same mistake as the last comment did. You
> think it's 1-based because the index is being set with list_length
> rather than list_length - 1, but it can do that simply because the
> item has not been added to the list yet.

Uh, maybe I've got that wrong.  We can say 0-based instead if that's
right.  I just didn't want to say that in one case it was 0-based and
in the other case make no mention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread David Rowley
On 17 May 2018 at 02:51, Robert Haas  wrote:
> On Mon, May 14, 2018 at 12:57 AM, David Rowley
>  wrote:
>> The "minus 1" part is incorrect. It simply just stores the 0-based
>> index of the item in the list. I was going to fix it by removing just
>> that part, but instead, I ended up rewriting the whole thing.
>
> I think that's clearer.  Committed with a few tweaks that are
> hopefully improvements.

Thanks for committing. Although, I disagree with your tweak:

+* 1-based index into the *pds list.

I think that's making the same mistake as the last comment did. You
think it's 1-based because the index is being set with list_length
rather than list_length - 1, but it can do that simply because the
item has not been added to the list yet.

Nothing converts this index back to 0-based;

RelationGetPartitionDispatchInfo builds the array from the list with:

i = 0;
foreach(lc, pdlist)
{
pd[i++] = lfirst(lc);
}

ExecFindPartition uses the pd array with:

parent = pd[-parent->indexes[cur_index]];

So if it was 1-based then we'd be off by one here.

Maybe we can clear up that confusion with

+ /*
+  * No need to subtract 1 to get the 0-based index as the item for this
+  * partitioned table has not been added to the list yet.
+  */
pd->indexes[i] = -list_length(*pds);

and just switch 1-based to 0-based in the new comment.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-16 Thread Robert Haas
On Mon, May 14, 2018 at 12:57 AM, David Rowley
 wrote:
> I noticed that a comment in get_partition_dispatch_recurse claims that:
>
> "it contains the
> * leaf partition's position in the global list *leaf_part_oids minus 1"
>
> The "minus 1" part is incorrect. It simply just stores the 0-based
> index of the item in the list. I was going to fix it by removing just
> that part, but instead, I ended up rewriting the whole thing.

I think that's clearer.  Committed with a few tweaks that are
hopefully improvements.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-14 Thread Amit Langote
On 2018/05/14 20:29, David Rowley wrote:
> On 14 May 2018 at 17:29, Amit Langote  wrote:
>> Hmm, while I agree that simply calling it "0-based index" might be better
>> for readers, what's there now doesn't sound incorrect to me; in the
>> adjacent code:
>>
>> if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
>> {
>> *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
>> pd->indexes[i] = list_length(*leaf_part_oids) - 1;
>> }
>>
>> If I call the value of list_length after adding an OID to the list the
>> OID's position in the list, we're storing into the indexes array exactly
>> what the existing comment says it is.  Now, literally describing the code
>> in the adjacent comment is not a great documentation style, so I'm open to
>> revising it like your patch does. :)
> 
> Thanks for looking.
> 
> I wouldn't have complained if list_nth() accepted a 1-based index, but
> it does not. So, indexes[] does not store the "position in the global
> list *leaf_part_oids minus 1", it just stores the position in the
> list.
> 
> I imagine it's only written this way due to the way you're obtaining
> the index using list_length(*leaf_part_oids) - 1, but the fact you had
> to subtract 1 there does not make it "position minus 1". That's just
> how you get the position of the list item in a List.

Hmm, yes.  I guess I had not bothered back then to confirm that the
pg_list.h Lists are in fact 0-based.  That said, I wasn't thinking of the
physical implementation of lists when writing the comment, but probably of
a logical sequence, IOW, I had it all mixed up.  Anyway, it's good that
we're getting rid of that misguided terminology and thank you for that.

Regards,
Amit




Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-14 Thread David Rowley
On 14 May 2018 at 17:29, Amit Langote  wrote:
> On 2018/05/14 13:57, David Rowley wrote:
>> I noticed that a comment in get_partition_dispatch_recurse claims that:
>>
>> "it contains the
>> * leaf partition's position in the global list *leaf_part_oids minus 1"
>>
>> The "minus 1" part is incorrect. It simply just stores the 0-based
>> index of the item in the list.
>
> Hmm, while I agree that simply calling it "0-based index" might be better
> for readers, what's there now doesn't sound incorrect to me; in the
> adjacent code:
>
> if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
> {
> *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
> pd->indexes[i] = list_length(*leaf_part_oids) - 1;
> }
>
> If I call the value of list_length after adding an OID to the list the
> OID's position in the list, we're storing into the indexes array exactly
> what the existing comment says it is.  Now, literally describing the code
> in the adjacent comment is not a great documentation style, so I'm open to
> revising it like your patch does. :)

Thanks for looking.

I wouldn't have complained if list_nth() accepted a 1-based index, but
it does not. So, indexes[] does not store the "position in the global
list *leaf_part_oids minus 1", it just stores the position in the
list.

I imagine it's only written this way due to the way you're obtaining
the index using list_length(*leaf_part_oids) - 1, but the fact you had
to subtract 1 there does not make it "position minus 1". That's just
how you get the position of the list item in a List.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Incorrect comment in get_partition_dispatch_recurse

2018-05-13 Thread Amit Langote
Hi David.

On 2018/05/14 13:57, David Rowley wrote:
> I noticed that a comment in get_partition_dispatch_recurse claims that:
> 
> "it contains the
> * leaf partition's position in the global list *leaf_part_oids minus 1"
> 
> The "minus 1" part is incorrect. It simply just stores the 0-based
> index of the item in the list.

Hmm, while I agree that simply calling it "0-based index" might be better
for readers, what's there now doesn't sound incorrect to me; in the
adjacent code:

if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE)
{
*leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid);
pd->indexes[i] = list_length(*leaf_part_oids) - 1;
}

If I call the value of list_length after adding an OID to the list the
OID's position in the list, we're storing into the indexes array exactly
what the existing comment says it is.  Now, literally describing the code
in the adjacent comment is not a great documentation style, so I'm open to
revising it like your patch does. :)

> I was going to fix it by removing just
> that part, but instead, I ended up rewriting the whole thing.
> 
> Patch attached.

Looks good to me, thanks.

Regards,
Amit