Hi all, Thomas is absolutely right, the distribution I synthetically made, had 6M records but very old, 9M old, as you can see it had to skip 9M records before finding a suitable record using time index.
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM updates WHERE driver_id = 100 ORDER BY
"time" DESC LIMIT 1;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.44..0.65 rows=1 width=36) (actual time=3827.807..3827.807
rows=1 loops=1)
Buffers: shared hit=24592 read=99594 written=659
-> Index Scan Backward using updates_time_idx on updates
(cost=0.44..1284780.53 rows=6064800 width=36) (actual time=3827.805..3827.805
rows=1 loops=1)
Filter: (driver_id = 100)
Rows Removed by Filter: 9000000
Buffers: shared hit=24592 read=99594 written=659
Planning time: 0.159 ms
Execution time: 3827.846 ms
(8 rows)
Here you have my tests where I was able to reproduce the problem using default
settings on 9.6, 9.5 and 9.3. 9.6 and 9.5 choose the wrong index, while 9.3
didn’t. (update: 9.5 didn’t fail last time)
test_bad_index_choice.sql
Description: Binary data
bad_idx_choice.9.6.out
Description: Binary data
bad_idx_choice.9.5.out
Description: Binary data
bad_idx_choice.9.3.out
Description: Binary data
However when I tried to add more than one value with this strange distribution ~ 30% of distribution to one value the index bad choice problem didn’t happen again in none of the different versions. I Hope this helps. Regards, Daniel Blanch. > El 10 dic 2016, a las 21:34, Tomas Vondra <[email protected]> > escribió: > > Hi, > > On 12/10/2016 12:51 AM, Tom Lane wrote: >> Eric Jiang <[email protected]> writes: >>> I have a query that I *think* should use a multicolumn index, but >>> sometimes isn't, resulting in slow queries. >> >> I tried to duplicate this behavior, without success. Are you running >> with nondefault planner parameters? >> > > My guess is this is a case of LIMIT the matching rows are uniformly > distributed in the input data. The planner likely concludes that for a driver > with a lot of data we'll find the first row using ix_updates_time very > quickly, and that it will be cheaper than inspecting the larger multi-column > index. But imagine a driver with a lots of data long time ago. That breaks > the LIMIT fairly quickly. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-performance mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
-- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
