On Sun, Aug 25, 2019 at 5:51 PM Jeff Janes <jeff.ja...@gmail.com> wrote:
>
> Yes, it certainly looks like it is due to cold caches.  But you say it is 
> slow at first, and then say it varies greatly during a run.  Is being slow at 
> first the only way it varies greatly, or is there large variation even beyond 
> that?

There is a great variation in run times (hundreds of ms to several
seconds) even beyond the start of the server.
The query runs several times with a different device_id, object_id and
another list of attribute_ids and it varies from one another.

> You can use pg_rewarm to overcome the cold cache issue when you first start 
> up the server.

I cannot find anything related to pg_rewarm other than some dead ends
from 2013 from which I gather it only works on Linux.
Anyway, I have problems even beyond the start of the database, it's
just easier to reproduce the problem at the start, otherwise I have to
leave the application running for a while (to invalidate part of the
cache I think).

> If you query only on "results" with only the conditions that apply to 
> "results", what is the expected number of rows, and what is the actual number 
> of rows?

Explain for the query on results only: https://explain.depesz.com/s/Csau

EXPLAIN (ANALYZE,BUFFERS)
 SELECT DISTINCT ON (results.attribute_id) results.timestamp,
results.data FROM results
 WHERE
     results.data <> '<NullData/>'
     AND results.data IS NOT NULL
     AND results.object_id = 1955
     AND results.attribute_id IN (4, 5) -- possibly a longer list here
     AND results.data_access_result = 'SUCCESS'
 ORDER BY results.attribute_id, results.timestamp DESC
 LIMIT 2 -- limit by the length of the attributes list

Limit  (cost=166793.28..167335.52 rows=2 width=54) (actual
time=134783.510..134816.941 rows=2 loops=1)
  Buffers: shared hit=19086 read=46836, temp read=1522 written=3311
  ->  Unique  (cost=166793.28..168420.01 rows=6 width=54) (actual
time=134783.507..134816.850 rows=2 loops=1)
        Buffers: shared hit=19086 read=46836, temp read=1522 written=3311
        ->  Sort  (cost=166793.28..167606.64 rows=325346 width=54)
(actual time=134783.505..134802.602 rows=205380 loops=1)
              Sort Key: attribute_id, "timestamp" DESC
              Sort Method: external merge  Disk: 26456kB
              Buffers: shared hit=19086 read=46836, temp read=1522 written=3311
              ->  Gather  (cost=1000.00..125882.23 rows=325346
width=54) (actual time=32.325..133815.793 rows=410749 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    Buffers: shared hit=19086 read=46836
                    ->  Parallel Seq Scan on results
(cost=0.00..92347.63 rows=135561 width=54) (actual
time=18.496..133871.888 rows=136916 loops=3)
                          Filter: ((data IS NOT NULL) AND (data <>
'<NullData/>'::text) AND (attribute_id = ANY ('{4,5}'::integer[])) AND
(object_id = 1955) AND (data_access_result = 'SUCCESS'::text))
                          Rows Removed by Filter: 920123
                          Buffers: shared hit=19086 read=46836
Planning Time: 5.071 ms
Execution Time: 134874.687 ms

As far as I can see the estimates are close to the real returned rows
in the "Parallel Seq Scan on results".
The numbers are similar (of course) if I turn off query parallelism.
Or should I VACUUM ANALYZE again?
I'm sure I ran it enough.

>> How can I improve it to be consistently fast (is it possible to get to
>> several milliseconds?)?
>
>
> Maybe.  Depends on the answer to my previous question.
>
>>
>> What I don't really understand is why the nested loop has 3 loops
>> (three joined tables)?
>
>
> Each parallel execution counts as a loop.  There are 2 parallel workers, plus 
> the leader also participates, making three.
>
>>
>> And why does the first index scan indicate ~60k loops? And does it
>> really work? It doesn't seem to filter out any rows.
>
>
> The parallel hash join returns about 20,000 rows, but I think that that is 
> just for one of the three parallel executions, making about 60,000 in total.  
> I don't know why one of the nodes report combined execution and the other 
> just a single worker.  Parallel queries are hard to understand.  When I want 
> to optimize a query that does parallel execution, I just turn off parallelism 
> ("set max_parallel_workers_per_gather TO 0;") at first to make is simpler to 
> understand.
>
> Apparently everything with device_id = 97 just happens to pass all the rest 
> of your filters.  If you need those filters to make sure you get the right 
> answer in all cases, then you need them.  A lifeboat isn't useless just 
> because your ship didn't happen to sink today.

The part with the workers makes sense, thanks.
For the condition, I thought there is something more contrived going
on in the planner, but I failed to see it was that simple: there is no
need to remove anything since everything matches the condition.

>> Should I add an index only on (attribute_id, object_id)? And maybe
>> data_access_result?
>> Does it make sens to add it on a text column (results.data)?
>
>
> Which parts of query you give are going to change from execution to execution?
>
> Assuming the parts for object_id and attribute_id are variable and the rest 
> are static, I think the optimal index would be "create index on results 
> (object_id, attribute_id) where data IS NOT NULL and data <> '<NullData/>' 
> and data_access_result = 'SUCCESS'"

That's right, object_id and attribute_id are variable, the device_id
is variable as well.
If I create that index it is not chosen by the planner when executing
the full "joined" query, it is though if I run it only on "results"
related conditions, without joining, which makes sense corroborated
with what you said below.

> Why does results.data have two different "spellings" for null data?

One where the device couldn't be contacted (NULL), one where the
device was contacted and for my (object_id, attribute_ids)
combinations actually returned null data ('<NullData/>').

> However, if the number of rows from "results" that meet all your criteria are 
> high, the index won't make much of a difference.  The planner has a 
> fundamental choice to make, should it seek things with device_id = 97, and 
> then check each of those to see if they satisfy your conditions on "results" 
> fields conditions.  Or, should it seek things that satisfy the "results" 
> fields conditions, and then check each of those to see if they satisfy 
> device_id = 97.  It is currently doing the first of those.  Whether it should 
> be doing the second, and whether creating the index will cause it to switch 
> to using the second, are two (separate) questions which can't be answered 
> with the data given.

So maybe I should de-normalize and place the device_id column into the
"results" table and add it to the index in your suggestion above?

> Cheers,
>
> Jeff

Thank you for the detailed response.

-- 

Barbu Paul - Gheorghe


Reply via email to