Re: [PERFORM] Unexpected expensive index scan

Jake Nielsen Tue, 27 Sep 2016 17:25:13 -0700

Herp, forgot to include the query:

SELECT * FROM SyncerEvent WHERE ID > 12468 AND propogatorId NOT IN
('"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"') AND conflicted != 1 AND
userId = '57dc984f1c87461c0967e228' ORDER BY ID LIMIT 4000;^


On Tue, Sep 27, 2016 at 5:02 PM, Jake Nielsen <[email protected]>
wrote:

> I've got a query that takes a surprisingly long time to run, and I'm
> having a really rough time trying to figure it out.
>
> Before I get started, here are the specifics of the situation:
>
> Here is the table that I'm working with (apologies for spammy indices,
> I've been throwing shit at the wall)
>
>                             Table "public.syncerevent"
>
>     Column    |  Type   |                        Modifiers
>
>
> --------------+---------+-----------------------------------
> -----------------------
>
>  id           | bigint  | not null default nextval('syncerevent_id_seq'::
> regclass)
>
>  userid       | text    |
>
>  event        | text    |
>
>  eventid      | text    |
>
>  originatorid | text    |
>
>  propogatorid | text    |
>
>  kwargs       | text    |
>
>  conflicted   | integer |
>
> Indexes:
>
>     "syncerevent_pkey" PRIMARY KEY, btree (id)
>
>     "syncereventidindex" UNIQUE, btree (eventid)
>
>     "anothersyncereventidindex" btree (userid)
>
>     "anothersyncereventidindexwithascending" btree (userid, id)
>
>     "asdfasdgasdf" btree (userid, id DESC)
>
>     "syncereventuseridhashindex" hash (userid)
>
> To provide some context, as per the wiki,
> there are 3,290,600 rows in this table.
> It gets added to frequently, but never deleted from.
> The "kwargs" column often contains mid-size JSON strings (roughly 30K
> characters on average)
> As of right now, the table has 53 users in it. About 20% of those have a
> negligible number of events, but the rest of the users have a fairly even
> smattering.
>
> EXPLAIN (ANALYZE, BUFFERS) says:
>
>
>   QUERY PLAN
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------------------
>
>  Limit  (cost=0.43..1218.57 rows=4000 width=615) (actual
> time=3352.390..3403.572 rows=4000 loops=1)
>
>    Buffers: shared hit=120244 read=160198
>
>    ->  Index Scan using syncerevent_pkey on syncerevent
> (cost=0.43..388147.29 rows=1274560 width=615) (actual
> time=3352.386..3383.100 rows=4000 loops=1)
>
>          Index Cond: (id > 12468)
>
>          Filter: ((propogatorid <> 
> '"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"'::text)
> AND (conflicted <> 1) AND (userid = '57dc984f1c87461c0967e228'::text))
>
>          Rows Removed by Filter: 1685801
>
>          Buffers: shared hit=120244 read=160198
>
>  Planning time: 0.833 ms
>
>  Execution time: 3407.633 ms
>
> (9 rows)
>
>
> The postgres verison is: PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
>
>
> This query has gotten slower over time.
>
> The postgres server is running on a db.m3.medium RDS instance on Amazon.
>
> (3.75GB of ram)
>
> (~3 GHz processor, single core)
>
> I ran VACUUM, and ANALYZEd this table just prior to running the EXPLAIN
> command.
>
> Here are the server settings:
>
>  name                                   | current_setting
>               | source
>
>
>
>
>  application_name                       | psql
>              | client
>
>  archive_command                        | 
> /etc/rds/dbbin/pgscripts/rds_wal_archive
> %p   | configuration file
>
>  archive_mode                           | on
>              | configuration file
>
>  archive_timeout                        | 5min
>              | configuration file
>
>  autovacuum_analyze_scale_factor        | 0.05
>                | configuration file
>
>  autovacuum_naptime                     | 30s
>               | configuration file
>
>  autovacuum_vacuum_scale_factor         | 0.1
>               | configuration file
>
>  checkpoint_completion_target           | 0.9
>               | configuration file
>
>  client_encoding                        | UTF8
>              | client
>
>  effective_cache_size                   | 1818912kB
>               | configuration file
>
>  fsync                                  | on
>              | configuration file
>
>  full_page_writes                       | on
>              | configuration file
>
>  hot_standby                            | off
>               | configuration file
>
>  listen_addresses                       | *
>               | command line
>
>  lo_compat_privileges                   | off
>               | configuration file
>
>  log_checkpoints                        | on
>              | configuration file
>
>  log_directory                          | /rdsdbdata/log/error
>
> Sorry for the formatting, I'm not sure of the best way to format this data
> on a mailing list.
>
>
> If it matters/interests you, here is my underlying confusion:
>
> From some internet sleuthing, I've decided that having a table per user
> (which would totally make this problem a non-issue) isn't a great idea.
> Because there is a file per table, having a table per user would not scale.
> My next thought was partial indexes (which would also totally help), but
> since there is also a table per index, this really doesn't side-step the
> problem. My rough mental model says: If there exists a way that a
> table-per-user scheme would make this more efficient, then there should
> also exist an index that could achieve the same effect (or close enough to
> not matter). I would think that "userid = '57dc984f1c87461c0967e228'" could
> utilize at least one of the two indexes on the userId column, but clearly
> I'm not understanding something.
>
> Any help in making this query more efficient would be greatly appreciated,
> and any conceptual insights would be extra awesome.
>
> Thanks for reading.
>
> -Jake
>

Re: [PERFORM] Unexpected expensive index scan

Reply via email to