Re: Improving Performance of Query ~ Filter by A, Sort by B

2018-07-17 Thread Jeff Janes
On Mon, Jul 16, 2018 at 5:29 PM, Lincoln Swaine-Moore <
lswainemo...@gmail.com> wrote:

> Tom and Jeff,
>
> Thanks very much for the suggestions!
>
> Here's what I've found so far after playing around for a few more days:
>
> What is your default_statistics_target?  What can you tell us about the
>> distribution of parent_id?  (exponential, power law, etc?).  Can you show
>> the results for select * from pg_stats where tablename='a' and
>> attname='parent_id' \x\g\x  ?
>
>
> The default_statistics_target is 500, which I agree seems quite
> insufficient for these purposes. I bumped this up to 2000, and saw some
> improvement in the row count estimation, but pretty much the same query
> plans. Unfortunately the distribution of counts is not intended to be
> correlated to parent_id, which is one reason I imagine the histograms might
> not be particularly effective unless theres one bucket for every value.
> Here is the output you requested:
>
> select * from pg_stats where tablename='a' and attname='parent_id';
>
> schemaname | public
> tablename  | a
> attname| parent_id
> inherited  | t
> null_frac  | 0
> avg_width  | 4
> n_distinct | 18871
> most_common_vals   | {15503,49787,49786,24595,49784,17549, ...} (2000
> values)
> most_common_freqs  | {0.0252983,0.02435,0.0241317,
> 0.02329,0.019095,0.0103967,0.00758833,0.004245, ...} (2000 values)
>

You showed the 8 most common frequencies.  But could you also show the last
couple of them?  When your queried parent_id value is not on the MCV list,
it is the frequency of the least frequent one on the list which puts an
upper limit on how frequent the one you queried for can be.



> A few questions re: statistics:
>  1) would it be helpful to bump column statistics to, say, 20k (the number
> of distinct values of parent_id)?
>

Only one way to find out...
However you can only go up to 10k, not 20k.



>  2) is the discrepancy between the statistics on the parent and child
> table be expected? certainly I would think that the statistics would be
> different, but I would've imagined they would have histograms of the same
> size given the settings being the same.
>

Is the n_distinct estimate accurate for the partition?  There is an
algorithm (which will change in v11) to stop the MCV from filling the
entire statistics target size if it thinks adding more won't be useful.
But I don't know why the histogram boundary list would be short.  But, I
doubt that that is very important here.  The histogram is only used for
inequality/range, not for equality/set membership.



>  3) is there a way to manually specify the the distribution of rows to be
> even? that is, set the frequency of each value to be ~ n_rows/n_distinct.
> This isn't quite accurate, but is a reasonable assumption about the
> distribution, and might generate better query plans.
>


This would be going in the wrong direction.  Your queries seem to
preferentially use rare parent_ids, not typical parent_ids.  In fact, it
seems like many of your hard-coded parent_ids don't exist in the table at
all.  That certainly isn't going to help the planner any.  Could you
somehow remove those before constructing the query?

You might also take a step back, where is that list of parent_ids coming
from in the first place, and why couldn't you convert the list of literals
into a query that returns that list naturally?


> You could try reversing the order and adding a column to be (tmstmp,
>> parent_id, id) and keeping the table well vacuumed.  This would allow the
>> slow plan to still walk the indexes in tmstmp order but do it as an
>> index-only scan, so it could omit the extra trip to the table. That trip to
>> the table must be awfully slow to explain the numbers you show later in the
>> thread.
>
>
> Just to clarify, do you mean building indexes like:
> CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on
> "a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
> That seems promising! Is the intuition here that we want the first key of
> the index to be the one we are ultimately ordering by? Sounds like I make
> have had that flipped initially. My understanding of this whole situation
> (and please do correct me if this doesn't make sense) is the big bottleneck
> here is reading pages from disk (when looking at stopped up queries, the
> wait_event is DataFileRead), and so anything that can be done to minimize
> the pages read will be valuable. Which is why I would love to get the query
> plan to use the tmstmp index without having to filter thereafter by
> parent_id.
>

Yes, that is the index.

You really want it to filter by parent_id in the index, rather than going
to the table to do the filter on parent_id.  The index pages with tmstmp as
the leading column are going to be more tightly packed with potentially
relevant rows, while the table pages are less likely to be densely packed.
So filtering in the 

Re: Improving Performance of Query ~ Filter by A, Sort by B

2018-07-16 Thread Lincoln Swaine-Moore
Tom and Jeff,

Thanks very much for the suggestions!

Here's what I've found so far after playing around for a few more days:

What is your default_statistics_target?  What can you tell us about the
> distribution of parent_id?  (exponential, power law, etc?).  Can you show
> the results for select * from pg_stats where tablename='a' and
> attname='parent_id' \x\g\x  ?


The default_statistics_target is 500, which I agree seems quite
insufficient for these purposes. I bumped this up to 2000, and saw some
improvement in the row count estimation, but pretty much the same query
plans. Unfortunately the distribution of counts is not intended to be
correlated to parent_id, which is one reason I imagine the histograms might
not be particularly effective unless theres one bucket for every value.
Here is the output you requested:

select * from pg_stats where tablename='a' and attname='parent_id';

schemaname | public
tablename  | a
attname| parent_id
inherited  | t
null_frac  | 0
avg_width  | 4
n_distinct | 18871
most_common_vals   | {15503,49787,49786,24595,49784,17549, ...} (2000
values)
most_common_freqs  |
{0.0252983,0.02435,0.0241317,0.02329,0.019095,0.0103967,0.00758833,0.004245,
...} (2000 values)
histogram_bounds   |
{2,12,17,24,28,36,47,59,74,80,86,98,108,121,135,141,147,160,169,177,190,204,
...} (2001 values)
correlation| -0.161576
most_common_elems  |
most_common_elem_freqs |
elem_count_histogram   |


Interestingly, the number of elements in these most_common_vals is as
expected (2000) for the parent table, but it's lower for the partition
tables, despite the statistics level being the same.

SELECT attstattarget
FROM   pg_attribute
WHERE  attrelid in ('a_partition1'::regclass, 'a'::regclass)
ANDattname = 'parent_id';
-[ RECORD 1 ]-+-
attstattarget | 2000
-[ RECORD 2 ]-+-
attstattarget | 2000


select * from pg_stats where tablename='a_partition1' and
attname='parent_id';

schemaname | public
tablename  | a_partition1
attname| parent_id
inherited  | f
null_frac  | 0
avg_width  | 4
n_distinct | 3969
most_common_vals   |
{15503,49787,49786,24595,49784,10451,20136,17604,9683, ...} (400-ish values)
most_common_freqs  |
{0.0807067,0.0769483,0.0749433,0.073565,0.0606433,0.0127917,0.011265,0.0112367,
...} (400-ish values)
histogram_bounds   | {5,24,27,27,33,38,41,69,74,74, ...} (1500-ish
values)
correlation| 0.402414
most_common_elems  |
most_common_elem_freqs |
elem_count_histogram   |

A few questions re: statistics:
 1) would it be helpful to bump column statistics to, say, 20k (the number
of distinct values of parent_id)?
 2) is the discrepancy between the statistics on the parent and child table
be expected? certainly I would think that the statistics would be
different, but I would've imagined they would have histograms of the same
size given the settings being the same.
 3) is there a way to manually specify the the distribution of rows to be
even? that is, set the frequency of each value to be ~ n_rows/n_distinct.
This isn't quite accurate, but is a reasonable assumption about the
distribution, and might generate better query plans.

The same thing could be obtained by doing a dummy operation, such as ORDER
> BY tmstmp + '0 seconds' DESC.  I prefer that method, as it is more
> obviously a tuning trick.  Adding in "id" looks more like a legitimate
> desire to break any ties that might occasionally occur in tmstmp.


I 100% agree that that is more clear. Thanks for the suggestion!

Finally, could you rewrite it as a join to a VALUES list, rather than as an
> in-list?


I should've mentioned this in my initial post, but I tried doing this
without much success.

You could try reversing the order and adding a column to be (tmstmp,
> parent_id, id) and keeping the table well vacuumed.  This would allow the
> slow plan to still walk the indexes in tmstmp order but do it as an
> index-only scan, so it could omit the extra trip to the table. That trip to
> the table must be awfully slow to explain the numbers you show later in the
> thread.


Just to clarify, do you mean building indexes like:
CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on
"a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
That seems promising! Is the intuition here that we want the first key of
the index to be the one we are ultimately ordering by? Sounds like I make
have had that flipped initially. My understanding of this whole situation
(and please do correct me if this doesn't make sense) is the big bottleneck
here is reading pages from disk (when looking at stopped up queries, the
wait_event is DataFileRead), and so anything that can be done to minimize
the pages read will be valuable. Which is why I would love to get the query
plan to use the tmstmp index without having to 

Re: Improving Performance of Query ~ Filter by A, Sort by B

2018-07-14 Thread Jeff Janes
On Tue, Jul 10, 2018 at 11:07 AM, Lincoln Swaine-Moore <
lswainemo...@gmail.com> wrote:

>
>
>
> Something about the estimated row counts (this problem persisted after I
> tried ANALYZEing)
>

What is your default_statistics_target?  What can you tell us about the
distribution of parent_id?  (exponential, power law, etc?).  Can you show
the results for select * from pg_stats where tablename='a' and
attname='parent_id' \x\g\x  ?


> forces usage of the tmstmp index and Merge Append (which seems wise) but
> also a filter condition on parent_id over an index condition, which is
> apparently prohibitively slow.
>
> I tried creating a multicolumn index like:
>
> CREATE INDEX "a_partition1_parent_and_tmstmp_idx" on "a_partition2" USING
> btree ("parent_id", "tmstmp" DESC);
>
> But this didn't help (it wasn't used).
>

You could try reversing the order and adding a column to be (tmstmp,
parent_id, id) and keeping the table well vacuumed.  This would allow the
slow plan to still walk the indexes in tmstmp order but do it as an
index-only scan, so it could omit the extra trip to the table. That trip to
the table must be awfully slow to explain the numbers you show later in the
thread.

...


> This query plan (which is the same as when LIMIT is removed) has been a
> good short term solution when the number of "parent_id"s I'm using is still
> relatively small, but unfortunately queries grow untenably slow as the
> number of "parent_id"s involved increases:
>

What happens when you remove that extra order by phrase that you added?
The original slow plan should become much faster when the number of
parent_ids is large (it has to dig through fewer index entries before
accumulating 20 good ones), so you should try going back to that.

...


> I'd be very grateful for help with one or both of these questions:
> 1) Why is adding an unnecessary (from the perspective of result
> correctness) ORDER BY valuable for forcing the parent_id index usage, and
> does that indicate that there is something wrong with my
> table/indexes/statistics/etc.?
>

It finds the indexes on tmstmp to be falsely attractive, as it can walk in
tmstmp order and so avoid the sort. (Really not the sort itself, but the
fact that sort has to first read every row to be sorted, while walking an
index can abort once the LIMIT is satisfied).  Adding an extra phrase to
the ORDER BY means the index is no longer capable of delivering rows in the
needed order, so it no longer looks falsely attractive.  The same thing
could be obtained by doing a dummy operation, such as ORDER BY tmstmp + '0
seconds' DESC.  I prefer that method, as it is more obviously a tuning
trick.  Adding in "id" looks more like a legitimate desire to break any
ties that might occasionally occur in tmstmp.

As Tom pointed out, there clearly is something wrong with your statistics,
although we don't know what is causing it to go wrong.  Fixing the
statistics isn't guaranteed to fix the problem, but it would be a good
start.




> 2) Is there any way I can improve my query time when there are many
> "parent_id"s involved? I seem to only be able to get the query plan to use
> at most one of the parent_id index and the tmstmp index at a time. Perhaps
> the correct multicolumn index would help?
>

A few things mentioned above might help.

But if they don't, is there any chance you could redesign your partitioning
so that all parent_id queries together will always be in the same
partition?  And if not, could you just get rid of the partitioning
altogether?  1e7 row is not all that many and doesn't generally need
partitioning.  Unless it is serving a specific purpose, it is probably
costing you more than you are getting.

Finally, could you rewrite it as a join to a VALUES list, rather than as an
in-list?

Cheers,

Jeff


Re: Improving Performance of Query ~ Filter by A, Sort by B

2018-07-11 Thread Tom Lane
Lincoln Swaine-Moore  writes:
> Here's the result (I turned off the timeout and got it to finish):
> ...

I think the core of the problem here is bad rowcount estimation.  We can't
tell from your output how many rows really match

> WHERE "a"."parent_id" IN (
> 49188,14816,14758,8402
> )

but the planner is guessing there are 5823 of them.  In the case with
only three IN items, we have

>  ->  Index Scan using a_parent_id_idx1 on a_partition1 a 
> (cost=0.43..4888.37 rows=4367 width=12) (actual time=5.581..36.270 rows=50 
> loops=1)
>Index Cond: (parent_id = ANY 
> ('{19948,21436,41220}'::integer[]))

so the planner thinks there are 4367 matching rows but there are only 50.
Anytime you've got a factor-of-100 estimation error, you're going to be
really lucky if you get a decent plan.

I suggest increasing the statistics target for the parent_id column
in hopes of getting better estimates for the number of matches.

regards, tom lane



Re: Improving Performance of Query ~ Filter by A, Sort by B

2018-07-11 Thread legrand legrand
Hello,

I have tested it with release 11 and limit 20 is pushed to each partition
when using index on tmstmp.

Could you tell us what is the result of your query applyed to one partition 

EXPLAIN ANALYZE
SELECT "a"."id"
FROM  a_partition1 "a"
WHERE "a"."parent_id" IN (
34226,24506,40987,27162
)
ORDER BY "a"."tmstmp" DESC
LIMIT 20;

May be that limit 20 is not pushed to partitions in your version ?
Regards
PAscal





--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html