On 2017-12-04 12:28:56 +1300, Thomas Munro wrote:
> From 43eeea0bc35204440d262224b56efca958b33428 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <[email protected]>
> Date: Mon, 4 Dec 2017 11:52:11 +1300
> Subject: [PATCH] Fix EXPLAIN ANALYZE of hash join when the leader doesn't
>  execute.
> 
> If a hash join appears in a parallel query, there may be no hash table
> available for explain.c to inspect even though a hash table may have been
> built in other processes.  This could happen either because
> parallel_leader_participation was set to off or because the leader happened to
> hit the end of the outer relation immediately (even though the complete
> relation is not empty) and decided not to build the hash table.
> 
> Commit bf11e7ee introduced a way for workers to exchange instrumentation via
> the DSM segment for Sort nodes even though they are not parallel-aware.  This
> commit does the same for Hash nodes, so that explain.c has a way to find
> instrumentation data from an arbitrary participant that actually built the
> hash table.

Makes sense. Seems ok to only fix this inv11, even if present earlier.

> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
> index 447f69d044e..1ffe635d66c 100644
> --- a/src/backend/commands/explain.c
> +++ b/src/backend/commands/explain.c
> @@ -2379,34 +2379,62 @@ show_sort_info(SortState *sortstate, ExplainState *es)
>  static void
>  show_hash_info(HashState *hashstate, ExplainState *es)
>  {
> -     HashJoinTable hashtable;
> +     HashInstrumentation *hinstrument = NULL;
>  
> -     hashtable = hashstate->hashtable;
> +     /*
> +      * In a parallel query, the leader process may or may not have run the
> +      * hash join, and even if it did it may not have built a hash table due 
> to
> +      * timing (if it started late it might have seen no tuples in the outer
> +      * relation and skipped building the hash table).  Therefore we have to 
> be
> +      * prepared to get instrumentation data from a worker if there is no 
> hash
> +      * table.
> +      */
> +     if (hashstate->hashtable)
> +     {
> +             hinstrument = (HashInstrumentation *)
> +                     palloc(sizeof(HashInstrumentation));
> +             ExecHashGetInstrumentation(hinstrument, hashstate->hashtable);
> +     }
> +     else if (hashstate->shared_info)
> +     {
> +             SharedHashInfo *shared_info = hashstate->shared_info;
> +             int             i;
> +
> +             /* Find the first worker that built a hash table. */
> +             for (i = 0; i < shared_info->num_workers; ++i)
> +             {
> +                     if (shared_info->hinstrument[i].nbatch > 0)
> +                     {
> +                             hinstrument = &shared_info->hinstrument[i];
> +                             break;
> +                     }
> +             }
> +     }

I can't think of a way that, before the parallel HJ patch, results in a
difference between the size of the hashtable between workers. Short of
some participant not having a hashtable at all, that is. Wwe also don't
currently display lookup information etc. So just more or less randomly
choosing one worker's seems ok.


> +             case T_HashState:
> +                     /* even when not parallel-aware */

"..., for stats" or such maybe?

> +                     ExecHashEstimate((HashState *) planstate, e->pcxt);
> +                     break;
>               case T_SortState:
>                       /* even when not parallel-aware */

I'd just update that comment as well.


>  #endif                                                       /* NODEHASH_H */
> diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
> index e05bc04f525..6c322e57c00 100644
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -1980,6 +1980,29 @@ typedef struct GatherMergeState
>       struct binaryheap *gm_heap; /* binary heap of slot indices */
>  } GatherMergeState;
>  
> +/* ----------------
> + *    Values displayed by EXPLAIN ANALYZE
> + * ----------------
> + */
> +typedef struct HashInstrumentation
> +{
> +     int                     nbuckets;
> +     int                     nbuckets_original;
> +     int                     nbatch;
> +     int                     nbatch_original;
> +     size_t          space_peak;
> +} HashInstrumentation;

Maybe I'm being pedantic here, but I think it'd not hurt to explain what
these mean. It's obviously pretty clear what nbuckets/batch mean, but
the difference between original and not is far less clea.r


Looks reasonable these nitpicks aside.

Greetings,

Andres Freund

Reply via email to