"Lawrence, Ramon" <[EMAIL PROTECTED]> writes: > To keep the changes simple, the update simply calls > ExecChooseHashTableSize() in create_hashjoin_plan() to re-calculate the > expected number of batches. This is more efficient and results in less > code changes than modifying the HashPath struct to store the number of > batches and updating that variable when costing (as cost_hashjoin() will > be called many times during costing).
I was intending to do it the other way, actually. An extra field in HashPath hardly costs anything. The other reason for it is that there are other possible uses for knowing whether a hash will be multi-batch. (For example, if we were prepared to tell the executor that it *must* keep the hash to one batch, we could assume that the sort order of the left input is preserved. I haven't looked into the risks/benefits of that too much, but it's been in the back of the mind for a long time.) > An ideal solution would detect at execution time if the inner relation > remained in memory (one batch) and decide to disable/enable the > physical-tlist optimization on the outer relation accordingly. At this > time, we are uncertain if this would be desirable or possible. That seems pretty infeasible really. Aside from changing plan node output tuple types on-the-fly, it would mean renumbering Vars in the join node to reference the outer relation's new output columns. The overhead of supporting that would be paid across-the-board in the executor whether or not anyone got any real benefit from it. I'd be more inclined to deal with the issue by trying to establish a "safety margin" in the estimate of whether the hash will go multi-batch. IOW we should disuse_physical_tlist if the hash is estimated to be close to but still within one batch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers