Re: [PERFORM] Querying with multicolumn index

Daniel Blanch Bataller Sat, 10 Dec 2016 22:05:35 -0800

Hi all,

Thomas is absolutely right, the distribution I synthetically made, had 6M 
records but very old, 9M old, as you can see it had to skip 9M records before 
finding a suitable record using time index.


EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM updates WHERE driver_id = 100 ORDER BY 
"time" DESC LIMIT 1;
                                                                         QUERY 
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..0.65 rows=1 width=36) (actual time=3827.807..3827.807 
rows=1 loops=1)
   Buffers: shared hit=24592 read=99594 written=659
   ->  Index Scan Backward using updates_time_idx on updates  
(cost=0.44..1284780.53 rows=6064800 width=36) (actual time=3827.805..3827.805 
rows=1 loops=1)
         Filter: (driver_id = 100)
         Rows Removed by Filter: 9000000
         Buffers: shared hit=24592 read=99594 written=659
 Planning time: 0.159 ms
 Execution time: 3827.846 ms
(8 rows)


Here you have my tests where I was able to reproduce the problem using default 
settings on 9.6, 9.5 and 9.3. 9.6 and 9.5 choose the wrong index, while 9.3 
didn’t. (update: 9.5 didn’t fail last time)

test_bad_index_choice.sql
Description: Binary data

bad_idx_choice.9.6.out
Description: Binary data

bad_idx_choice.9.5.out
Description: Binary data

bad_idx_choice.9.3.out
Description: Binary data


However when I tried to add more than one value with this strange distribution 
~ 30% of distribution to one value the index bad choice problem didn’t happen 
again in none of the different versions.

I Hope this helps. Regards,

Daniel Blanch.


> El 10 dic 2016, a las 21:34, Tomas Vondra <[email protected]> 
> escribió:
> 
> Hi,
> 
> On 12/10/2016 12:51 AM, Tom Lane wrote:
>> Eric Jiang <[email protected]> writes:
>>> I have a query that I *think* should use a multicolumn index, but
>>> sometimes isn't, resulting in slow queries.
>> 
>> I tried to duplicate this behavior, without success.  Are you running
>> with nondefault planner parameters?
>> 
> 
> My guess is this is a case of LIMIT the matching rows are uniformly 
> distributed in the input data. The planner likely concludes that for a driver 
> with a lot of data we'll find the first row using ix_updates_time very 
> quickly, and that it will be cheaper than inspecting the larger multi-column 
> index. But imagine a driver with a lots of data long time ago. That breaks 
> the LIMIT fairly quickly.
> 
> regards
> 
> -- 
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 
> -- 
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Querying with multicolumn index

Reply via email to