Herp, forgot to include the query: SELECT * FROM SyncerEvent WHERE ID > 12468 AND propogatorId NOT IN ('"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"') AND conflicted != 1 AND userId = '57dc984f1c87461c0967e228' ORDER BY ID LIMIT 4000;^
On Tue, Sep 27, 2016 at 5:02 PM, Jake Nielsen <jake.k.niel...@gmail.com> wrote: > I've got a query that takes a surprisingly long time to run, and I'm > having a really rough time trying to figure it out. > > Before I get started, here are the specifics of the situation: > > Here is the table that I'm working with (apologies for spammy indices, > I've been throwing shit at the wall) > > Table "public.syncerevent" > > Column | Type | Modifiers > > > --------------+---------+----------------------------------- > ----------------------- > > id | bigint | not null default nextval('syncerevent_id_seq':: > regclass) > > userid | text | > > event | text | > > eventid | text | > > originatorid | text | > > propogatorid | text | > > kwargs | text | > > conflicted | integer | > > Indexes: > > "syncerevent_pkey" PRIMARY KEY, btree (id) > > "syncereventidindex" UNIQUE, btree (eventid) > > "anothersyncereventidindex" btree (userid) > > "anothersyncereventidindexwithascending" btree (userid, id) > > "asdfasdgasdf" btree (userid, id DESC) > > "syncereventuseridhashindex" hash (userid) > > To provide some context, as per the wiki, > there are 3,290,600 rows in this table. > It gets added to frequently, but never deleted from. > The "kwargs" column often contains mid-size JSON strings (roughly 30K > characters on average) > As of right now, the table has 53 users in it. About 20% of those have a > negligible number of events, but the rest of the users have a fairly even > smattering. > > EXPLAIN (ANALYZE, BUFFERS) says: > > > QUERY PLAN > > > ------------------------------------------------------------ > ------------------------------------------------------------ > -------------------------------------- > > Limit (cost=0.43..1218.57 rows=4000 width=615) (actual > time=3352.390..3403.572 rows=4000 loops=1) > > Buffers: shared hit=120244 read=160198 > > -> Index Scan using syncerevent_pkey on syncerevent > (cost=0.43..388147.29 rows=1274560 width=615) (actual > time=3352.386..3383.100 rows=4000 loops=1) > > Index Cond: (id > 12468) > > Filter: ((propogatorid <> > '"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"'::text) > AND (conflicted <> 1) AND (userid = '57dc984f1c87461c0967e228'::text)) > > Rows Removed by Filter: 1685801 > > Buffers: shared hit=120244 read=160198 > > Planning time: 0.833 ms > > Execution time: 3407.633 ms > > (9 rows) > > > The postgres verison is: PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled > by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit > > > This query has gotten slower over time. > > The postgres server is running on a db.m3.medium RDS instance on Amazon. > > (3.75GB of ram) > > (~3 GHz processor, single core) > > I ran VACUUM, and ANALYZEd this table just prior to running the EXPLAIN > command. > > Here are the server settings: > > name | current_setting > | source > > > > > application_name | psql > | client > > archive_command | > /etc/rds/dbbin/pgscripts/rds_wal_archive > %p | configuration file > > archive_mode | on > | configuration file > > archive_timeout | 5min > | configuration file > > autovacuum_analyze_scale_factor | 0.05 > | configuration file > > autovacuum_naptime | 30s > | configuration file > > autovacuum_vacuum_scale_factor | 0.1 > | configuration file > > checkpoint_completion_target | 0.9 > | configuration file > > client_encoding | UTF8 > | client > > effective_cache_size | 1818912kB > | configuration file > > fsync | on > | configuration file > > full_page_writes | on > | configuration file > > hot_standby | off > | configuration file > > listen_addresses | * > | command line > > lo_compat_privileges | off > | configuration file > > log_checkpoints | on > | configuration file > > log_directory | /rdsdbdata/log/error > > Sorry for the formatting, I'm not sure of the best way to format this data > on a mailing list. > > > If it matters/interests you, here is my underlying confusion: > > From some internet sleuthing, I've decided that having a table per user > (which would totally make this problem a non-issue) isn't a great idea. > Because there is a file per table, having a table per user would not scale. > My next thought was partial indexes (which would also totally help), but > since there is also a table per index, this really doesn't side-step the > problem. My rough mental model says: If there exists a way that a > table-per-user scheme would make this more efficient, then there should > also exist an index that could achieve the same effect (or close enough to > not matter). I would think that "userid = '57dc984f1c87461c0967e228'" could > utilize at least one of the two indexes on the userId column, but clearly > I'm not understanding something. > > Any help in making this query more efficient would be greatly appreciated, > and any conceptual insights would be extra awesome. > > Thanks for reading. > > -Jake >