On Fri, Mar 13, 2009 at 7:08 PM, Tom Lane wrote: > Vamsidhar Thummala writes: > > I am wondering why are we subtracting the entire Seq Scan time of > Lineitem > > from the total time to calculate the HashJoin time. > > Well, if you're trying to identify the speed of the join itself and not > how long it takes to provide the input for it, that seems like a > sensible calculation to make.
I am still not clear on this. I am thinking the output is produced in a pipelined fashion i.e., as soon as the record of outer child is read (sequentially here) and if HashJoin finds a match by probing the inner hash table (in memory), we have an output record. Please correct if I am wrong here. > > > > Here is another plan I have for the same TPC-H 18 query with different > > configuration parameters (shared_buffers set to 400MB, just for > experimental > > purposes) and HashJoin seems to take longer time (at least 155.58s based > on > > above calculation): > > Yeah, that seems to work out to about 25us per row instead of 3us, which > is a lot slower. Maybe the hash got split up into multiple batches ... > what have you got work_mem set to? Try turning on log_temp_files and > see if it records any temp files as getting created. Unfortunately, I am working with Postgres 8.2 which doesn't have log_temp_files. The work_mem is still at 1MB (all other parameters were kept constant apart from shared_buffers w.r.t previous configuration). The hash is build on 57 records (~20kb, customer row length is 179 bytes and orders row length is 104 bytes) produced by inner subplan and so I will be surprised if multiple batches are created. Thank you. Regards, -Vamsi