Re: [HACKERS] Expression Pruning in postgress

HarmeekSingh Bedi Wed, 13 Jul 2011 08:45:45 -0700

Hi tom .

Thanks for your input . Appreciate your taking time and responding . Just
some comments.

   1. May be I am mistaken Kindly  help me understand a bit more. I do agree
   that passing datums up the node chain helps - but consider the case when
   either Sort or Hash joins spills on disk - large columns that get written on
   to the disk will still cause a lot of performance issues {as sorts spills
   will detoast} - lot of unnecessary columns will cause lot of I/O. 1024
   varchars and lot of rows and you can see that serial case detoriates due to
   this.
   2. The parallel case works - the parallel nodes inherit the target list
   of the underlying nodes  - but in my case the issue of non pruned column
   becomes worse as it also adds to network payload which is worse.
   3. Now coming to your detoast . I have to do that at parallel node
   boundaries as the data flow operators {delimited by parallel operators} run
   on different machines and hence has to pass by value.

I did make a fix at least to alleviate this case in the optimizer . But I am
going to work on a more general approach of expression pruning based on the
lifetime of an expression. Basically each node will either references or
generate an expression. Any expression that is generated and is not
referenced by any top on top will be eliminated.

 Regards
 Harmeek

On Sun, Jul 10, 2011 at 10:28 AM, Tom Lane <[email protected]> wrote:

> HarmeekSingh Bedi <[email protected]> writes:
> > Thanks Tom. Here is a example. Just a background of things . I have made
> > changes in postgress execution and storage engine to make it a MPP style
> > engine - keeping all optimizer intact. Basically take pgress serial plan
> and
> > construct a parallel plan. The query I am running is below.
>
> The output lists for the parallel nodes look pretty broken, but I guess
> you weren't asking about those.  As near as I can tell, what you're
> unhappy about is that it's passing up both raw column values and
> pre-evaluated placeholder expressions using those values, when only the
> placeholders are really going to be needed.  Yeah, that's probably true,
> because the placeholder mechanism isn't (yet) taken into account by the
> code that determines how far up a column value will be needed.
>
> In standard Postgres this isn't much of an issue because passing up
> by-reference Datums is really quite cheap ... it's only a pointer copy
> in many cases, and even where it's not, it's probably just a
> toast-pointer copy.  I suspect it's costing you more because your
> "parallel" nodes have to instantiate the tuples instead of just passing
> virtual slots around ... but it's still not clear to me why you're
> passing more than a toast pointer for big values.  Maybe you're being
> too enthusiastic about detoasting pointers early?
>
>                        regards, tom lane
>

Re: [HACKERS] Expression Pruning in postgress

Reply via email to