Re: [PERFORM] Querying with multicolumn index

Andreas Joseph Krogh Fri, 09 Dec 2016 09:57:07 -0800

På fredag 09. desember 2016 kl. 18:00:16, skrev Eric Jiang <e...@doublemap.com 
<mailto:e...@doublemap.com>>:
Hi all,
 I have a query that I *think* should use a multicolumn index, but
 sometimes isn't, resulting in slow queries.


 We have a DB that records GPS coordinates for vehicles:

                                Table "public.updates"
    Column   |           Type           |                      Modifiers
 
------------+--------------------------+------------------------------------------------------
  id         | integer                  | not null default
 nextval('updates_id_seq'::regclass)
  driver_id  | integer                  | not null
  latitude   | double precision         | not null
  longitude  | double precision         | not null
  time       | timestamp with time zone | not null default now()
  vehicle_id | integer                  |
 Indexes:
     "updates_pkey" PRIMARY KEY, btree (id)
     "ix_updates_time" btree ("time")
     "updates_driver_id_time_idx" btree (driver_id, "time")
     "updates_vehicle_id_time_idx" btree (vehicle_id, "time")

 Table has about 15M records across 100 distinct driver_id.

 I want to get the last record for a specific driver:

 SELECT * FROM updates WHERE driver_id=123 ORDER BY "time" DESC LIMIT 1;

 For some values of driver_id, it does what I expect and uses
 updates_driver_id_time_idx to fetch the records in 2 ms or less. For
 other values of driver_id, it does an index scan backwards on
 ix_updates_time, taking upwards of 2 minutes.

 Good plan:

  Limit  (cost=0.11..1.38 rows=1 width=56) (actual time=2.710..2.710
 rows=1 loops=1)
    ->  Index Scan Backward using updates_driver_id_time_idx on updates
  (cost=0.11..139278.28 rows=110051 width=56) (actual time=2.709..2.709
 rows=1 loops=1)
          Index Cond: (driver_id = 17127)
  Total runtime: 2.732 ms
 (4 rows)

 Bad plan:

  Limit  (cost=0.09..0.69 rows=1 width=56) (actual
 time=216769.111..216769.112 rows=1 loops=1)
    ->  Index Scan Backward using ix_updates_time on updates
 (cost=0.09..272339.04 rows=448679 width=56) (actual
 time=216769.110..216769.110 rows=1 loops=1)
          Filter: (driver_id = 30132)
          Rows Removed by Filter: 5132087
  Total runtime: 216769.174 ms


 From cursory testing, the difference seems to be based on how many
 total rows there are for a particular driver. The above query uses
 updates_driver_id_time_idx for drivers with less than about 300K rows,
 but uses ix_updates_time for drivers with more than about 300K rows.

 Anything we can do to make it do the "right" thing? We are also
 considering denormalizing the data and keeping a "cache" of the same
 data in another table.

 pgsql version: 9.3.14 and 9.5.3, already tried vacuum analyze.

 Thanks,
 Eric
 
You should be having this index:
 
create index updates_driver_time_idx ON updates(driver_id, "time" DESC);
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>

Re: [PERFORM] Querying with multicolumn index

Reply via email to