Re: [PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

2017-06-26 Thread Jeff Janes
On Fri, Jun 23, 2017 at 1:09 PM, Chris Wilson 
wrote:

>
> The records can already be read in order from idx_metric_value If this
> was selected as the primary table, and metric_pos was joined to it, then
> the output would also be in order, and no sort would be needed.
>
> We should be able to use a merge join to metric_pos, because it can be
> read in order of id_metric (its primary key, and the first column in
> idx_metric_value...). If not, a hash join should be faster than a nested
> loop, if we only have to hash ~100 records.
>

Hash joins do not preserve order.  They could preserve the order of their
"first" input, but only if the hash join is all run in one batch and
doesn't spill to disk.  But the hash join code is never prepared to make a
guarantee that it won't spill to disk, and so never considers it to
preserve order.  It thinks it only needs to hash 100 rows, but it is never
absolutely certain of that, until it actually executes.

If I set enable_sort to false, then I do get the merge join you want (but
with asset_pos joined by nested loop index scan, not a hash join, for the
reason just stated above) but that is slower than the plan with the sort in
it, just like PostgreSQL thinks it will be.

If I vacuum your fact table, then it can switch to use index only scans.  I
then get a different plan, still using a sort, which runs in 1.6 seconds.
Sorting is not the slow step you think it is.

Be warned that "explain (analyze)" can substantially slow down and distort
this type of query, especially when sorting.  You should run "explain
(analyze, timing off)" first, and then only trust "explain (analyze)" if
the overall execution times between them are similar.



> If I remove one of the joins (asset_pos) then I get a merge join between
> two indexes, as expected, but it has a materialize just before it which
> makes no sense to me. Why do we need to materialize here? And why
> materialise 100 rows into 1.5 million rows? (explain.depesz.com
> )
>


   ->  Materialize  (cost=0.14..4.89 rows=100 width=8) (actual
> time=0.018..228.265 rows=1504801 loops=1)
>  Buffers: shared hit=2
>  ->  Index Only Scan using idx_metric_pos_id_pos on metric_pos
>  (cost=0.14..4.64 rows=100 width=8) (actual time=0.013..0.133 rows=100
> loops=1)
>Heap Fetches: 100
>Buffers: shared hit=2
>
>
It doesn't need to materialize, it does it simply because it thinks it will
be faster (which it is, slightly).  You can prevent it from doing so by set
enable_materialize to off.  The reason it is faster is that with the
materialize, it can check all the visibility filters at once, rather than
having to do it repeatedly.  It is only materializing 100 rows, the 1504801
comes from the number of rows the projected out of the materialized table
(one for each row in the other side of the join, in this case), rather than
the number of rows contained within it.

And again, vacuum your tables.  Heap fetches aren't cheap.


> The size of the result set is approximately 91 MB (measured with psql -c |
> wc -c). Why does it take 4 seconds to transfer this much data over a UNIX
> socket on the same box?
>

It has to convert the data to a format used for the wire protocol (hardware
independent, and able to support user defined and composite types), and
then back again.

> work_mem = 100MB

Can you give it more than that?  How many simultaneous connections do you
expect?

Cheers,

Jeff


[PERFORM] Re: Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

2017-06-26 Thread Karl Czajkowski
On Jun 26, Chris Wilson modulated:

> I created the index starting with date and it did make a big
> difference: down to 10.3 seconds using a bitmap index scan and bitmap
> heap scan (and then two hash joins as before).
> 

By the way, what kind of machine are you using?  CPU, RAM, backing
storage?

I tried running your original test code and the query completed in
about 8 seconds, and adding the index changes and analyze statement
brought it down to around 2.3 seconds on my workstation with Postgres
9.5.7.  On an unrelated development VM with Postgres 9.6.3, the final
form took around 4 seconds.


Karl


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

2017-06-26 Thread Karl Czajkowski
On Jun 26, Chris Wilson modulated:
> ...
> In your case, the equivalent hack would be to compile the small
> dimension tables into big CASE statements I suppose...
> 
> 
> Nice idea! I tried this but unfortunately it made the query 16 seconds
> slower (up to 22 seconds) instead of faster.

Other possible rewrites to try instead of joins:

  -- replace the case statement with a scalar subquery

  -- replace the case statement with a stored procedure wrapping that scalar 
subquery
 and declare the procedure as STABLE or even IMMUTABLE

These are shots in the dark, but seem easy enough to experiment with and might
behave differently if the query planner realizes it can cache results for 
repeated use of the same ~100 input values.


Karl



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each

2017-06-26 Thread Chris Wilson
Hi Karl,

Thanks for the quick reply! Answers inline.

My starting point, having executed exactly the preparation query in my
email, was that the sample EXPLAIN (ANALYZE, BUFFERS) SELECT query ran in
15.3 seconds (best of 5), and did two nested loops
.

On 24 June 2017 at 03:01, Karl Czajkowski  wrote:

> Also, did you include an ANALYZE step between your table creation
> statements and your query benchmarks?  Since you are dropping and
> recreating test data, you have no stats on anything.


I tried this suggestion first, as it's the hardest to undo, and could also
be done automatically by a background ANALYZE while I wasn't looking. It
did result in a switch to using hash joins
 (instead of nested loops), and to
starting with the metric_value table (the fact table), which are both
changes that I thought would help, and the EXPLAIN ... SELECT speeded up to
13.2 seconds (2 seconds faster; best of 5 again).

Did you only omit a CREATE INDEX statement on asset_pos (id, pos) from
> your problem statement or also from your actual tests?  Without any
> index, you are forcing the query planner to do that join the hard way.
>

I omitted it from my previous tests and the preparation script because I
didn't expect it to make much difference. There was already a primary key
on ID, so this would only enable an index scan to be changed into an
index-only scan, but the query plan wasn't doing an index scan.

It didn't appear to change the query plan or performance
.

Have you tried adding a foreign key constraint on the id_asset and
> id_metric columns?  I wonder if you'd get a better query plan if the
> DB knew that the inner join would not change the number of result
> rows.  I think it's doing the join inside the filter step because
> it assumes that the inner join may drop rows.
>

This didn't appear to change the query plan or performance
 either.


> > This is an example of the kind of query we would like to speed up:
> >
> >
> > SELECT metric_pos.pos AS pos_metric, asset_pos.pos AS pos_asset,
> > date, value
> > FROM metric_value
> > INNER JOIN asset_pos ON asset_pos.id = metric_value.id_asset
> > INNER JOIN metric_pos ON metric_pos.id = metric_value.id_metric
> > WHERE
> > date >= '2016-01-01' and date < '2016-06-01'
> > AND timerange_transaction @> current_timestamp
> > ORDER BY metric_value.id_metric, metric_value.id_asset, date
> >
>
> How sparse is the typical result set selected by these date and
> timerange predicates?  If it is sparse, I'd think you want your
> compound index to start with those two columns.
>

I'm not sure what "sparse" means? The date is a significant fraction (25%)
of the total table contents in this test example, although we're flexible
about date ranges (if it improves performance per day) since we'll end up
processing a big chunk of the entire table anyway, batched by date. Almost
no rows will be removed by the timerange_transaction filter (none in our
test example). We expect to have rows in this table for most metric and
asset combinations (in the test example we populate metric_value using the
cartesian product of these tables to simulate this).

I created the index starting with date and it did make a big difference:
down to 10.3 seconds using a bitmap index scan and bitmap heap scan
 (and then two hash joins as before).

I was also able to shave another 1.1 seconds off
 (down to 9.2 seconds) by materialising
the cartesian product of id_asset and id_metric, and joining to
metric_value, but I don't really understand why this helps. It's
unfortunate that this requires materialisation (using a subquery isn't
enough) and takes more time than it saves from the query (6 seconds)
although it might be partially reusable in our case.

CREATE TABLE cartesian AS
SELECT DISTINCT id_metric, id_asset FROM metric_value;

SELECT metric_pos.pos AS pos_metric, asset_pos.pos AS pos_asset, date,
value
FROM cartesian
INNER JOIN metric_value ON metric_value.id_metric = cartesian.id_metric AND
metric_value.id_asset = cartesian.id_asset
INNER JOIN asset_pos ON asset_pos.id = metric_value.id_asset
INNER JOIN metric_pos ON metric_pos.id = metric_value.id_metric
WHERE
date >= '2016-01-01' and date < '2016-06-01'
AND timerange_transaction @> current_timestamp
ORDER BY metric_value.id_metric, metric_value.id_asset, date;


And I was able to shave another 3.7 seconds off
 (down to 5.6 seconds) by making the
only two columns of the cartesian table into its primary key, although
again I don't understand why:

alter table cartesian add primary key (id_metric, id_asset);


[image: Inline images 1]

This uses merge joins instead, which supports the hypothesis that merge
joins could be faster than hash joins if only we can 

Re: [PERFORM] Inappropriate inner table for nested loop join

2017-06-26 Thread Albe Laurenz
Akihiko Odaki wrote:
> On 2017-06-23 20:20, Albe Laurenz wrote:
>> You could either try to do something like
>>
>> SELECT *
>> FROM (SELECT "posts".*
>>FROM "posts"
>>   JOIN "follows" ON "follows"."target_account" = "posts"."account"
>>WHERE "follows"."owner_account" = $1
>>OFFSET 0) q
>> ORDER BY "posts"."timestamp"
>> LIMIT 100;
> 
> Now I wonder whether it actually sorted or not. As you said, I want to
> "find rows with the greatest 'timestamp', match with rows from 'posts'
> in a nested loop and stop as soon as it has found 100 matches".
> 
> However, it seems to query 100 records without any consideration for
> "timestamp", and then sorts them. That is not expected. Here is a
> abstract query plan:
> 
>   Limit
> ->  Sort
>   Sort Key: posts.id DESC
>   ->  Nested Loop
> ->  Seq Scan on follows
>   Filter: (owner_account = $1)
> ->  Index Scan using index_posts_on_account on posts
>   Index Cond: (account_id = follows.target_account)
> 
> index_posts_on_account is an obsolete index on "posts" and only for
> "account". So it does nothing for sorting "timestamp".

That should be fine.

It fetches all rows from "follows" that match the condition,
Then joins them will all matching rows from "posts", sorts the
result descending by "id" and returns the 100 rows with the largest
value for "id".

So you will get those 100 rows from "posts" with the largest "id"
that have a match in "follows" where the condition is fulfilled.

It is just a different plan to do the same thing that is more efficient
in your case.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance