On Sun, Aug 25, 2019 at 5:51 PM Jeff Janes <jeff.ja...@gmail.com> wrote: > > Yes, it certainly looks like it is due to cold caches. But you say it is > slow at first, and then say it varies greatly during a run. Is being slow at > first the only way it varies greatly, or is there large variation even beyond > that?
There is a great variation in run times (hundreds of ms to several seconds) even beyond the start of the server. The query runs several times with a different device_id, object_id and another list of attribute_ids and it varies from one another. > You can use pg_rewarm to overcome the cold cache issue when you first start > up the server. I cannot find anything related to pg_rewarm other than some dead ends from 2013 from which I gather it only works on Linux. Anyway, I have problems even beyond the start of the database, it's just easier to reproduce the problem at the start, otherwise I have to leave the application running for a while (to invalidate part of the cache I think). > If you query only on "results" with only the conditions that apply to > "results", what is the expected number of rows, and what is the actual number > of rows? Explain for the query on results only: https://explain.depesz.com/s/Csau EXPLAIN (ANALYZE,BUFFERS) SELECT DISTINCT ON (results.attribute_id) results.timestamp, results.data FROM results WHERE results.data <> '<NullData/>' AND results.data IS NOT NULL AND results.object_id = 1955 AND results.attribute_id IN (4, 5) -- possibly a longer list here AND results.data_access_result = 'SUCCESS' ORDER BY results.attribute_id, results.timestamp DESC LIMIT 2 -- limit by the length of the attributes list Limit (cost=166793.28..167335.52 rows=2 width=54) (actual time=134783.510..134816.941 rows=2 loops=1) Buffers: shared hit=19086 read=46836, temp read=1522 written=3311 -> Unique (cost=166793.28..168420.01 rows=6 width=54) (actual time=134783.507..134816.850 rows=2 loops=1) Buffers: shared hit=19086 read=46836, temp read=1522 written=3311 -> Sort (cost=166793.28..167606.64 rows=325346 width=54) (actual time=134783.505..134802.602 rows=205380 loops=1) Sort Key: attribute_id, "timestamp" DESC Sort Method: external merge Disk: 26456kB Buffers: shared hit=19086 read=46836, temp read=1522 written=3311 -> Gather (cost=1000.00..125882.23 rows=325346 width=54) (actual time=32.325..133815.793 rows=410749 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=19086 read=46836 -> Parallel Seq Scan on results (cost=0.00..92347.63 rows=135561 width=54) (actual time=18.496..133871.888 rows=136916 loops=3) Filter: ((data IS NOT NULL) AND (data <> '<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[])) AND (object_id = 1955) AND (data_access_result = 'SUCCESS'::text)) Rows Removed by Filter: 920123 Buffers: shared hit=19086 read=46836 Planning Time: 5.071 ms Execution Time: 134874.687 ms As far as I can see the estimates are close to the real returned rows in the "Parallel Seq Scan on results". The numbers are similar (of course) if I turn off query parallelism. Or should I VACUUM ANALYZE again? I'm sure I ran it enough. >> How can I improve it to be consistently fast (is it possible to get to >> several milliseconds?)? > > > Maybe. Depends on the answer to my previous question. > >> >> What I don't really understand is why the nested loop has 3 loops >> (three joined tables)? > > > Each parallel execution counts as a loop. There are 2 parallel workers, plus > the leader also participates, making three. > >> >> And why does the first index scan indicate ~60k loops? And does it >> really work? It doesn't seem to filter out any rows. > > > The parallel hash join returns about 20,000 rows, but I think that that is > just for one of the three parallel executions, making about 60,000 in total. > I don't know why one of the nodes report combined execution and the other > just a single worker. Parallel queries are hard to understand. When I want > to optimize a query that does parallel execution, I just turn off parallelism > ("set max_parallel_workers_per_gather TO 0;") at first to make is simpler to > understand. > > Apparently everything with device_id = 97 just happens to pass all the rest > of your filters. If you need those filters to make sure you get the right > answer in all cases, then you need them. A lifeboat isn't useless just > because your ship didn't happen to sink today. The part with the workers makes sense, thanks. For the condition, I thought there is something more contrived going on in the planner, but I failed to see it was that simple: there is no need to remove anything since everything matches the condition. >> Should I add an index only on (attribute_id, object_id)? And maybe >> data_access_result? >> Does it make sens to add it on a text column (results.data)? > > > Which parts of query you give are going to change from execution to execution? > > Assuming the parts for object_id and attribute_id are variable and the rest > are static, I think the optimal index would be "create index on results > (object_id, attribute_id) where data IS NOT NULL and data <> '<NullData/>' > and data_access_result = 'SUCCESS'" That's right, object_id and attribute_id are variable, the device_id is variable as well. If I create that index it is not chosen by the planner when executing the full "joined" query, it is though if I run it only on "results" related conditions, without joining, which makes sense corroborated with what you said below. > Why does results.data have two different "spellings" for null data? One where the device couldn't be contacted (NULL), one where the device was contacted and for my (object_id, attribute_ids) combinations actually returned null data ('<NullData/>'). > However, if the number of rows from "results" that meet all your criteria are > high, the index won't make much of a difference. The planner has a > fundamental choice to make, should it seek things with device_id = 97, and > then check each of those to see if they satisfy your conditions on "results" > fields conditions. Or, should it seek things that satisfy the "results" > fields conditions, and then check each of those to see if they satisfy > device_id = 97. It is currently doing the first of those. Whether it should > be doing the second, and whether creating the index will cause it to switch > to using the second, are two (separate) questions which can't be answered > with the data given. So maybe I should de-normalize and place the device_id column into the "results" table and add it to the index in your suggestion above? > Cheers, > > Jeff Thank you for the detailed response. -- Barbu Paul - Gheorghe