From: Chris Travers 
Sent: Tuesday, September 12, 2017 3:36 PM
To: Frank Millman 
Cc: Postgres General 
Subject: Re: [GENERAL] Joining 16 tables seems slow

Chris Travers wrote:

On Tue, Sep 12, 2017 at 3:15 PM, Frank Millman <fr...@chagford.com> wrote:


  2017-09-12 12:39 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:


    > please use https://explain.depesz.com/ for both plans (slow, fast)

      Here are the results - 

      sql_slow - https://explain.depesz.com/s/9vn3

      sql_fast - https://explain.depesz.com/s/oW0F

> 
> I am not convinced that the nested loop is a problem here.  I cannot think of 
> a faster join plan than a nested loop when you only have one iteration of the 
> loop (and looking through I did not see any loop counts above 1).
> 
> If you read and count ms carefully you will find that ar_tran_inv is scanned 
> 6 times and each of these times is taking about 25ms.  25x6 is half of your 
> query time right there and then you have the overhead in the joins on top of 
> that.  Quick eyeball estimates is that this is where approx 200ms of your 
> query time comes from.  Looking at this in more detail it doesn't look 
> 
> This is not a problem with too many tables in the join but the fact that you 
> are joining the same tables in multiple times in ways you end up needing to 
> repeatedly sequentially scan them.
> 
> I also don't think an index is going to help unless you have accounting data 
> going way back (since you are looking for about a year's worth of data) or 
> unless 90% of your transactions get marked as deleted.  So I think you are 
> stuck with the sequential scans on this table and optimizing will probably 
> mean reducing the number of times you scan that table.

Ok, I have a bit more information.

A couple of general comments first.

1. This is now purely an academic exercise. The SQL query that triggered this 
thread is unnecessarily complex, and I have a better solution. However, I think 
it is still worth the effort to understand what is going on.

2. explain.depesz.com is a brilliant tool – thanks for suggesting it.

As Pavel and Chris have pointed out, the problem seems to be that ar_tran_inv 
is scanned six times. The question is why? I have an idea, but I will need some 
assistance.

I have split my transaction table into three separate tables – ar_tran_inv, 
ar_tran_crn, ar_tran_rec. I then have a VIEW called ar_trans to view the 
transactions in total.

Each physical table has a primary key called ‘row_id’, and an index on 
‘tran_date’.

The view is created like this -

CREATE VIEW ccc.ar_trans AS
  SELECT ‘ar_inv’ AS tran_type, row_id as tran_row_id, tran_number, tran_date 
... FROM ccc.ar_tran_inv
  UNION ALL
  SELECT ‘ar_crn’ AS tran_type, row_id as tran_row_id, tran_number, tran_date 
... FROM ccc.ar_tran_crn
  UNION ALL
  SELECT ‘ar_rec’ AS tran_type, row_id as tran_row_id, tran_number, tran_date 
... FROM ccc.ar_tran_rec

In my sql_slow query, I have this 5 times, using different dates -

LEFT JOIN ccc.ar_trans trans_alloc_curr ON
    trans_alloc_curr.tran_type = alloc_curr.tran_type AND
    trans_alloc_curr.tran_row_id = alloc_curr.tran_row_id AND
    trans_alloc_curr.tran_date <= '2015-09-30'

Is it possible that it has to perform a full scan of each of the underlying 
tables to make the join?

If so, is it possible to speed this up?

Frank

Reply via email to