From: Chris Travers Sent: Tuesday, September 12, 2017 3:36 PM To: Frank Millman Cc: Postgres General Subject: Re: [GENERAL] Joining 16 tables seems slow
Chris Travers wrote: On Tue, Sep 12, 2017 at 3:15 PM, Frank Millman <fr...@chagford.com> wrote: 2017-09-12 12:39 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>: > please use https://explain.depesz.com/ for both plans (slow, fast) Here are the results - sql_slow - https://explain.depesz.com/s/9vn3 sql_fast - https://explain.depesz.com/s/oW0F > > I am not convinced that the nested loop is a problem here. I cannot think of > a faster join plan than a nested loop when you only have one iteration of the > loop (and looking through I did not see any loop counts above 1). > > If you read and count ms carefully you will find that ar_tran_inv is scanned > 6 times and each of these times is taking about 25ms. 25x6 is half of your > query time right there and then you have the overhead in the joins on top of > that. Quick eyeball estimates is that this is where approx 200ms of your > query time comes from. Looking at this in more detail it doesn't look > > This is not a problem with too many tables in the join but the fact that you > are joining the same tables in multiple times in ways you end up needing to > repeatedly sequentially scan them. > > I also don't think an index is going to help unless you have accounting data > going way back (since you are looking for about a year's worth of data) or > unless 90% of your transactions get marked as deleted. So I think you are > stuck with the sequential scans on this table and optimizing will probably > mean reducing the number of times you scan that table. Ok, I have a bit more information. A couple of general comments first. 1. This is now purely an academic exercise. The SQL query that triggered this thread is unnecessarily complex, and I have a better solution. However, I think it is still worth the effort to understand what is going on. 2. explain.depesz.com is a brilliant tool – thanks for suggesting it. As Pavel and Chris have pointed out, the problem seems to be that ar_tran_inv is scanned six times. The question is why? I have an idea, but I will need some assistance. I have split my transaction table into three separate tables – ar_tran_inv, ar_tran_crn, ar_tran_rec. I then have a VIEW called ar_trans to view the transactions in total. Each physical table has a primary key called ‘row_id’, and an index on ‘tran_date’. The view is created like this - CREATE VIEW ccc.ar_trans AS SELECT ‘ar_inv’ AS tran_type, row_id as tran_row_id, tran_number, tran_date ... FROM ccc.ar_tran_inv UNION ALL SELECT ‘ar_crn’ AS tran_type, row_id as tran_row_id, tran_number, tran_date ... FROM ccc.ar_tran_crn UNION ALL SELECT ‘ar_rec’ AS tran_type, row_id as tran_row_id, tran_number, tran_date ... FROM ccc.ar_tran_rec In my sql_slow query, I have this 5 times, using different dates - LEFT JOIN ccc.ar_trans trans_alloc_curr ON trans_alloc_curr.tran_type = alloc_curr.tran_type AND trans_alloc_curr.tran_row_id = alloc_curr.tran_row_id AND trans_alloc_curr.tran_date <= '2015-09-30' Is it possible that it has to perform a full scan of each of the underlying tables to make the join? If so, is it possible to speed this up? Frank