Hi tom . Thanks for your input . Appreciate your taking time and responding . Just some comments.
1. May be I am mistaken Kindly help me understand a bit more. I do agree that passing datums up the node chain helps - but consider the case when either Sort or Hash joins spills on disk - large columns that get written on to the disk will still cause a lot of performance issues {as sorts spills will detoast} - lot of unnecessary columns will cause lot of I/O. 1024 varchars and lot of rows and you can see that serial case detoriates due to this. 2. The parallel case works - the parallel nodes inherit the target list of the underlying nodes - but in my case the issue of non pruned column becomes worse as it also adds to network payload which is worse. 3. Now coming to your detoast . I have to do that at parallel node boundaries as the data flow operators {delimited by parallel operators} run on different machines and hence has to pass by value. I did make a fix at least to alleviate this case in the optimizer . But I am going to work on a more general approach of expression pruning based on the lifetime of an expression. Basically each node will either references or generate an expression. Any expression that is generated and is not referenced by any top on top will be eliminated. Regards Harmeek On Sun, Jul 10, 2011 at 10:28 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > HarmeekSingh Bedi <harmeeksi...@gmail.com> writes: > > Thanks Tom. Here is a example. Just a background of things . I have made > > changes in postgress execution and storage engine to make it a MPP style > > engine - keeping all optimizer intact. Basically take pgress serial plan > and > > construct a parallel plan. The query I am running is below. > > The output lists for the parallel nodes look pretty broken, but I guess > you weren't asking about those. As near as I can tell, what you're > unhappy about is that it's passing up both raw column values and > pre-evaluated placeholder expressions using those values, when only the > placeholders are really going to be needed. Yeah, that's probably true, > because the placeholder mechanism isn't (yet) taken into account by the > code that determines how far up a column value will be needed. > > In standard Postgres this isn't much of an issue because passing up > by-reference Datums is really quite cheap ... it's only a pointer copy > in many cases, and even where it's not, it's probably just a > toast-pointer copy. I suspect it's costing you more because your > "parallel" nodes have to instantiate the tuples instead of just passing > virtual slots around ... but it's still not clear to me why you're > passing more than a toast pointer for big values. Maybe you're being > too enthusiastic about detoasting pointers early? > > regards, tom lane >