Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Peter Geoghegan
On Thu, Jul 30, 2020 at 6:40 PM Justin Pryzby  wrote:
> Feel free to close it out.  I'm satisfied that we've had a discussion about 
> it.

Closed it out.

-- 
Peter Geoghegan




Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Peter Geoghegan
On Thu, Jul 30, 2020 at 6:39 PM James Coleman  wrote:
> I very much do not like this approach, and I think it's actually 
> fundamentally wrong, at least for the memory check. Quicksort is not the only 
> option that uses memory. For now, there's only one option that spills to disk 
> (external merge sort), but there's no reason it has to remain that way.

I wouldn't be surprised if it was possible to get
SORT_TYPE_EXTERNAL_SORT even today (though I'm not sure if that's
truly possible). That will happen for a regular sort node if we
require randomAccess to the sort, and it happens to spill -- we can
randomly access the final tape, but cannot do a final on-the-fly
merge. Say for a merge join.

-- 
Peter Geoghegan




Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Justin Pryzby
On Thu, Jul 30, 2020 at 06:33:32PM -0700, Peter Geoghegan wrote:
> On Thu, Jul 30, 2020 at 5:22 PM Justin Pryzby  wrote:
> > Because filtering out zero values is exactly what's intended to be avoided 
> > for
> > nontext output.
> >
> > I think checking whether the method was used should result in the same 
> > output,
> > without the literal check for zero value (which itself sets a bad example).
> 
> It seems fine to me as-is. What about SORT_TYPE_TOP_N_HEAPSORT? Or any
> other sort methods we add in the future?

Feel free to close it out.  I'm satisfied that we've had a discussion about it.

-- 
Justin




Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread James Coleman
On Thu, Jul 30, 2020 at 8:22 PM Justin Pryzby  wrote:

> On Wed, Jul 29, 2020 at 09:18:44PM -0700, Peter Geoghegan wrote:
> > On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby 
> wrote:
> > > So my 2ndary suggestion is to conditionalize based on the method
> rather than
> > > value of space used.
> >
> > What's the advantage of doing it that way?
>
> Because filtering out zero values is exactly what's intended to be avoided
> for
> nontext output.
>
> I think checking whether the method was used should result in the same
> output,
> without the literal check for zero value (which itself sets a bad example).
>
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2824,13 +2824,13 @@
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
> appendStringInfo(, "%s Groups", groupLabel);
> ExplainOpenGroup("Incremental Sort Groups",
> groupName.data, true, es);
> ExplainPropertyInteger("Group Count", NULL,
> groupInfo->groupCount, es);
>
> ExplainPropertyList("Sort Methods Used", methodNames, es);
>
> -   if (groupInfo->maxMemorySpaceUsed > 0)
> +   if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT)
> {
> longavgSpace =
> groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
> const char *spaceTypeName;
> StringInfoData memoryName;
>
> spaceTypeName =
> tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
> @@ -2841,13 +2841,13 @@
> show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
> ExplainPropertyInteger("Average Sort Space Used",
> "kB", avgSpace, es);
> ExplainPropertyInteger("Peak Sort Space Used",
> "kB",
>
>  groupInfo->maxMemorySpaceUsed, es);
>
> ExplainCloseGroup("Sort Spaces", memoryName.data,
> true, es);
> }
> -   if (groupInfo->maxDiskSpaceUsed > 0)
> +   if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT)
> {
> longavgSpace =
> groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
> const char *spaceTypeName;
> StringInfoData diskName;
>
> spaceTypeName =
> tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
>

I very much do not like this approach, and I think it's actually
fundamentally wrong, at least for the memory check. Quicksort is not the
only option that uses memory. For now, there's only one option that spills
to disk (external merge sort), but there's no reason it has to remain that
way. And in the future we might accurately report memory consumed even when
we've eventually spilled to disk also, so memory used would be relevant
potentially even if no in-memory sort was ever performed.

So I'm pretty confident checking the space used is the correct way to do
this.

James


Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Peter Geoghegan
On Thu, Jul 30, 2020 at 5:22 PM Justin Pryzby  wrote:
> Because filtering out zero values is exactly what's intended to be avoided for
> nontext output.
>
> I think checking whether the method was used should result in the same output,
> without the literal check for zero value (which itself sets a bad example).

It seems fine to me as-is. What about SORT_TYPE_TOP_N_HEAPSORT? Or any
other sort methods we add in the future?

The way that we flatten maxDiskSpaceUsed and maxMemorySpaceUsed into
"space used" on output might be kind of questionable, but it's
something that we have to live with for the foreseeable future. I
don't think that this is a bad example -- we don't output
maxDiskSpaceUsed or maxMemorySpaceUsed at the conceptual level.

-- 
Peter Geoghegan




Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)

2020-07-30 Thread Justin Pryzby
On Wed, Jul 29, 2020 at 09:18:44PM -0700, Peter Geoghegan wrote:
> On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby  wrote:
> > So my 2ndary suggestion is to conditionalize based on the method rather than
> > value of space used.
> 
> What's the advantage of doing it that way?

Because filtering out zero values is exactly what's intended to be avoided for
nontext output.

I think checking whether the method was used should result in the same output,
without the literal check for zero value (which itself sets a bad example).

--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2824,13 +2824,13 @@ 
show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
appendStringInfo(, "%s Groups", groupLabel);
ExplainOpenGroup("Incremental Sort Groups", groupName.data, 
true, es);
ExplainPropertyInteger("Group Count", NULL, 
groupInfo->groupCount, es);
 
ExplainPropertyList("Sort Methods Used", methodNames, es);
 
-   if (groupInfo->maxMemorySpaceUsed > 0)
+   if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT)
{
longavgSpace = 
groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
const char *spaceTypeName;
StringInfoData memoryName;
 
spaceTypeName = 
tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
@@ -2841,13 +2841,13 @@ 
show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
ExplainPropertyInteger("Average Sort Space Used", "kB", 
avgSpace, es);
ExplainPropertyInteger("Peak Sort Space Used", "kB",
   
groupInfo->maxMemorySpaceUsed, es);
 
ExplainCloseGroup("Sort Spaces", memoryName.data, true, 
es);
}
-   if (groupInfo->maxDiskSpaceUsed > 0)
+   if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT)
{
longavgSpace = 
groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
const char *spaceTypeName;
StringInfoData diskName;
 
spaceTypeName = 
tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);