Re: [PERFORM] Hash Join performance

Vamsidhar Thummala Fri, 13 Mar 2009 19:11:07 -0700

On Fri, Mar 13, 2009 at 7:08 PM, Tom Lane wrote:

> Vamsidhar Thummala writes:
> > I am wondering why are we subtracting the entire Seq Scan time of
> Lineitem
> > from the total time to calculate the HashJoin time.
>
> Well, if you're trying to identify the speed of the join itself and not
> how long it takes to provide the input for it, that seems like a
> sensible calculation to make.



I am still not clear on this. I am thinking the output is produced in a
pipelined fashion i.e., as soon as the record of outer child is read
(sequentially here) and if HashJoin finds a match by probing the inner hash
table (in memory), we have an output record. Please correct if I am wrong
here.


>
>
> > Here is another plan I have for the same TPC-H 18 query with different
> > configuration parameters (shared_buffers set to 400MB, just for
> experimental
> > purposes) and HashJoin seems to take longer time (at least 155.58s based
> on
> > above calculation):
>
> Yeah, that seems to work out to about 25us per row instead of 3us, which
> is a lot slower.  Maybe the hash got split up into multiple batches ...
> what have you got work_mem set to?  Try turning on log_temp_files and
> see if it records any temp files as getting created.


Unfortunately, I am working with Postgres 8.2 which doesn't have
log_temp_files. The work_mem is still at 1MB (all other parameters were kept
constant apart from shared_buffers w.r.t previous configuration). The hash
is build on 57 records (~20kb, customer row length is 179 bytes and orders
row length is 104 bytes) produced by inner subplan and so I will be
surprised if multiple batches are created.

Thank you.

Regards,
-Vamsi

Re: [PERFORM] Hash Join performance

Reply via email to