[PERFORM] Merge joins on index scans
Dear psql-performance, I'm having issues with a certain query, and I was hoping you could help me out. The schema: (start with new database cluster, with either SQL_ASCII or en.us-UTF8 encoding, using the default server configuration available in the pgdg Jessie packages). CREATE TABLE a (id bigint primary key, nonce bigint); CREATE TABLE b (id bigint primary key, a_id bigint not null); CREATE INDEX a_idx ON b (a_id); The query: SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = ? ORDER BY b.id ASC; (skip down to [1] and [2] to see the query performance) What I know: If you force the query planner to use a merge join on the above query, it takes 10+ minutes to complete using the data as per below. If you force the query planner to use a hash join on the same data, it takes ~200 milliseconds. This behavior is the same both on Postgresql 9.2.15 and Postgresql 9.4.6 (at least as provided by the Debian Jessie repo hosted by postgresql.org), and happens both on i386 and x86_64 platforms. Note: the data for these queries is the same as described below. Let me know if you want me to provide the raw csv's or similar. Creating a new Postgresql 9.4 cluster (64-bit), creating the tables (a) and (b), importing the table data into the tables (a) and (b), and then running the above query using EXPLAIN results in a merge join query plan, as in [1]. Creating a new Postgresql 9.2 cluster (32-bit or 64-bit), creating the tables (a) and (b), importing the table data into (a) and (b), and then running the above query results in a hash join query plan, as in [2]. When running query [1], the postgres process on the machine consumes 100% CPU for a long time (it seems CPU-bound). What I expected: I expected both of the hash join and merge join implementations of this query to have comparable query times; perhaps within an order of magnitude. This was expected on my part mostly because the cost metrics for each query were very similar. Instead, the "optimal" query plan for the query takes more than 1000x longer. I also expected that the "Rows Removed by Filter: " for the index scan on (a) would not have such a high count, as the number of rows in table (a) (~500,000) is significantly less than the count (2,201,063,696). What I want to know: - Is this expected behavior? Can you describe how the merge join algorithm achieves these results? - Can I avoid this issue by disabling merge joins in the server configuration? Configuration: The configuration of the database is the sample configuration as per the Debian Jessie packages of Postgresql available at http://apt.postgresql.org/pub/repos/apt/ with the exception that the data directory was explicitly specified. Information about the data: Here are some queries that help describe the data I'm working with: postgres=# select distinct a_id, count(*) from b group by a_id; a_id | count +--- 49872 | 320 47994 | 5 19084 | 82977 53251 | 100 109804 |10 51738 | 5 49077 |10 postgres=# select count(*) from b; count --- 83427 postgres=# select count(distinct nonce) from a; count --- 198 postgres=# select count(*) from a; count 490166 postgres=# select count(*) from a where nonce = 64; count --- 395 Hardware: 2015-era Intel Xeon processors > 300 GB of ram (greater than the size of the database with a large margin) database on hardware raid 1 array on 2 SSDs Commentary: This is my first bug report to a major open source project, so I apologize in advance if I messed up this report. Let me know if I have left out key details -- I'm happy to provide them. Given that there are roughly 500k rows in table a, and given that the EXPLAIN output claims that the filter (nonce = 64) caused 2 billion rows to be skipped (refer to [1]) suggests that each row in table b is being compared to a non-negligible number of rows in (a). I can probably make this data available as a pg_dump file. Let me know if you think that's necessary, and where I should upload it. Regards, James [1] postgres=# explain (analyze,buffers) select b.* from b join a on b.a_id = a.id where a.nonce = 64 order by b.id asc; QUERY PLAN - Sort (cost=7855.25..7855.61 rows=143 width=16) (actual time=752058.415..752080.731 rows=83427 loops=1) Sort Key: b.id Sort Method: external merge Disk: 2096kB Buffers: shared hit=2151025721 read=479, temp read=267 written=267 I/O Timings: read=2.384 -> Merge Join (cost=869.07..7850.13 rows=143 width=16) (actual time=5.718..751760.637 rows=83427 loops=1) Merge Cond: (b.a_id = a.id) Buffers: shared hit=2151025721 read=479 I/O Timings: read=2.384 -> Index Scan using a_idx on b (cost=0.00..2953.35 rows=83427 width=16) (actual time=0.007..68.165
Re: [PERFORM] Odd behavior with indices
On Fri, Feb 26, 2016 at 1:38 PM, joe meiringwrote: > Here's the distribution of parameter_id's > > select count(parameter_id), parameter_id from datavalue group by parameter_id > 88169 142889171 815805 178570124257262 213947049 151225902 24091090 > 3103877 10633764 11994442 1849232 2014935 4563638 132955919 7 > > Ok...again its beyond my present experience but its what the planner thinks about the distribution, and not what actually is present, that matters. David J.
Re: [PERFORM] Odd behavior with indices
Here's the distribution of parameter_id's select count(parameter_id), parameter_id from datavalue group by parameter_id 88169 142889171 815805 178570124257262 213947049 151225902 24091090 3103877 10633764 11994442 1849232 2014935 4563638 132955919 7 On Fri, Feb 26, 2016 at 2:02 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Fri, Feb 26, 2016 at 12:43 PM, joe meiring> wrote: > >> Also available on S.O.: >> >> >> http://stackoverflow.com/questions/35658238/postgres-odd-behavior-with-indices >> >> I've got a datavalue table with ~200M rows or so, with indices on both >> site_id and parameter_id. I need to execute queries like "return all >> sites with data" and "return all parameters with data". The site table >> has only 200 rows or so, and the parameter table has only 100 or so rows. >> >> The site query is fast and uses the index: >> >> EXPLAIN ANALYZEselect *from sitewhere exists ( >> select 1 from datavalue >> where datavalue.site_id = site.id limit 1); >> >> Seq Scan on site (cost=0.00..64.47 rows=64 width=113) (actual >> time=0.046..1.106 rows=89 loops=1) >> Filter: (SubPlan 1) >> Rows Removed by Filter: 39 >> SubPlan 1 >> -> Limit (cost=0.44..0.47 rows=1 width=0) (actual time=0.008..0.008 >> rows=1 loops=128) >> -> Index Only Scan using ix_datavalue_site_id on datavalue >> (cost=0.44..8142.71 rows=248930 width=0) (actual time=0.008..0.008 rows=1 >> loops=128) >> Index Cond: (site_id = site.id) >> Heap Fetches: 0 >> Planning time: 0.361 ms >> Execution time: 1.149 ms >> >> The same query for parameters is rather slow and does NOT use the index: >> >> EXPLAIN ANALYZEselect *from parameterwhere exists ( >> select 1 from datavalue >> where datavalue.parameter_id = parameter.id limit 1); >> >> Seq Scan on parameter (cost=0.00..20.50 rows=15 width=2648) (actual >> time=2895.972..21331.701 rows=15 loops=1) >> Filter: (SubPlan 1) >> Rows Removed by Filter: 6 >> SubPlan 1 >> -> Limit (cost=0.00..0.34 rows=1 width=0) (actual >> time=1015.790..1015.790 rows=1 loops=21) >> -> Seq Scan on datavalue (cost=0.00..502127.10 rows=1476987 >> width=0) (actual time=1015.786..1015.786 rows=1 loops=21) >> Filter: (parameter_id = parameter.id) >> Rows Removed by Filter: 7739355 >> Planning time: 0.123 ms >> Execution time: 21331.736 ms >> >> What the deuce is going on here? Alternatively, whats a good way to do >> this? >> >> Any help/guidance appreciated! >> >> >> >> Some of the table description: >> >> \d datavalue >> >> id BIGINT DEFAULT nextval('datavalue_id_seq'::regclass) NOT NULL, >> value DOUBLE PRECISION NOT NULL, >> site_id INTEGER NOT NULL, >> parameter_id INTEGER NOT NULL, >> deployment_id INTEGER, >> instrument_id INTEGER, >> invalid BOOLEAN, >> Indexes: >> "datavalue_pkey" PRIMARY KEY, btree (id) >> "datavalue_datetime_utc_site_id_parameter_id_instrument_id_key" UNIQUE >> CONSTRAINT, btree (datetime_utc, site_id, parameter_id, instrument_id) >> "ix_datavalue_instrument_id" btree (instrument_id) >> "ix_datavalue_parameter_id" btree (parameter_id) >> "ix_datavalue_site_id" btree (site_id) >> "tmp_idx" btree (site_id, datetime_utc) >> Foreign-key constraints: >> "datavalue_instrument_id_fkey" FOREIGN KEY (instrument_id) REFERENCES >> instrument(id) ON UPDATE CASCADE ON DELETE CASCADE >> "datavalue_parameter_id_fkey" FOREIGN KEY (parameter_id) REFERENCES >> parameter(id) ON UPDATE CASCADE ON DELETE CASCADE >> "datavalue_site_id_fkey" FOREIGN KEY (site_id) REFERENCES >> coastal.site(id) ON UPDATE CASCADE ON DELETE CASCADE >> "datavalue_statistic_type_id_fkey" >> >> >> I'm not great with the details but the short answer - aside from the > fact that you should consider increasing the statistics on these columns - > is that at a certain point querying the index and then subsequently > checking the table for visibility is more expensive than simply scanning > and then discarding the extra rows. > > The fact that you could perform an INDEX ONLY scan in the first query > makes that cost go away since no subsequent heap check is required. In the > parameters query the planner thinks it needs 1.5 million of the rows and > will have to check each of them for visibility. It decided that scanning > the entire table was more efficient. > > The LIMIT 1 in both queries should not be necessary. The planner is smart > enough to stop once it finds what it is looking for. In fact the LIMIT's > presence may be a contributing factor...but I cannot say for sure. > > A better query seems like it would be: > > WITH active_sites AS ( > SELECT DISTINCT site_id FROM datavalues; > ) > SELECT * > FROM sites > JOIN active_sites USING (site_id); > > David J. >
Re: [PERFORM] Odd behavior with indices
On Fri, Feb 26, 2016 at 12:43 PM, joe meiringwrote: > Also available on S.O.: > > > http://stackoverflow.com/questions/35658238/postgres-odd-behavior-with-indices > > I've got a datavalue table with ~200M rows or so, with indices on both > site_id and parameter_id. I need to execute queries like "return all > sites with data" and "return all parameters with data". The site table > has only 200 rows or so, and the parameter table has only 100 or so rows. > > The site query is fast and uses the index: > > EXPLAIN ANALYZEselect *from sitewhere exists ( > select 1 from datavalue > where datavalue.site_id = site.id limit 1); > > Seq Scan on site (cost=0.00..64.47 rows=64 width=113) (actual > time=0.046..1.106 rows=89 loops=1) > Filter: (SubPlan 1) > Rows Removed by Filter: 39 > SubPlan 1 > -> Limit (cost=0.44..0.47 rows=1 width=0) (actual time=0.008..0.008 > rows=1 loops=128) > -> Index Only Scan using ix_datavalue_site_id on datavalue > (cost=0.44..8142.71 rows=248930 width=0) (actual time=0.008..0.008 rows=1 > loops=128) > Index Cond: (site_id = site.id) > Heap Fetches: 0 > Planning time: 0.361 ms > Execution time: 1.149 ms > > The same query for parameters is rather slow and does NOT use the index: > > EXPLAIN ANALYZEselect *from parameterwhere exists ( > select 1 from datavalue > where datavalue.parameter_id = parameter.id limit 1); > > Seq Scan on parameter (cost=0.00..20.50 rows=15 width=2648) (actual > time=2895.972..21331.701 rows=15 loops=1) > Filter: (SubPlan 1) > Rows Removed by Filter: 6 > SubPlan 1 > -> Limit (cost=0.00..0.34 rows=1 width=0) (actual > time=1015.790..1015.790 rows=1 loops=21) > -> Seq Scan on datavalue (cost=0.00..502127.10 rows=1476987 > width=0) (actual time=1015.786..1015.786 rows=1 loops=21) > Filter: (parameter_id = parameter.id) > Rows Removed by Filter: 7739355 > Planning time: 0.123 ms > Execution time: 21331.736 ms > > What the deuce is going on here? Alternatively, whats a good way to do > this? > > Any help/guidance appreciated! > > > > Some of the table description: > > \d datavalue > > id BIGINT DEFAULT nextval('datavalue_id_seq'::regclass) NOT NULL, > value DOUBLE PRECISION NOT NULL, > site_id INTEGER NOT NULL, > parameter_id INTEGER NOT NULL, > deployment_id INTEGER, > instrument_id INTEGER, > invalid BOOLEAN, > Indexes: > "datavalue_pkey" PRIMARY KEY, btree (id) > "datavalue_datetime_utc_site_id_parameter_id_instrument_id_key" UNIQUE > CONSTRAINT, btree (datetime_utc, site_id, parameter_id, instrument_id) > "ix_datavalue_instrument_id" btree (instrument_id) > "ix_datavalue_parameter_id" btree (parameter_id) > "ix_datavalue_site_id" btree (site_id) > "tmp_idx" btree (site_id, datetime_utc) > Foreign-key constraints: > "datavalue_instrument_id_fkey" FOREIGN KEY (instrument_id) REFERENCES > instrument(id) ON UPDATE CASCADE ON DELETE CASCADE > "datavalue_parameter_id_fkey" FOREIGN KEY (parameter_id) REFERENCES > parameter(id) ON UPDATE CASCADE ON DELETE CASCADE > "datavalue_site_id_fkey" FOREIGN KEY (site_id) REFERENCES > coastal.site(id) ON UPDATE CASCADE ON DELETE CASCADE > "datavalue_statistic_type_id_fkey" > > > I'm not great with the details but the short answer - aside from the fact that you should consider increasing the statistics on these columns - is that at a certain point querying the index and then subsequently checking the table for visibility is more expensive than simply scanning and then discarding the extra rows. The fact that you could perform an INDEX ONLY scan in the first query makes that cost go away since no subsequent heap check is required. In the parameters query the planner thinks it needs 1.5 million of the rows and will have to check each of them for visibility. It decided that scanning the entire table was more efficient. The LIMIT 1 in both queries should not be necessary. The planner is smart enough to stop once it finds what it is looking for. In fact the LIMIT's presence may be a contributing factor...but I cannot say for sure. A better query seems like it would be: WITH active_sites AS ( SELECT DISTINCT site_id FROM datavalues; ) SELECT * FROM sites JOIN active_sites USING (site_id); David J.
[PERFORM] Odd behavior with indices
Also available on S.O.: http://stackoverflow.com/questions/35658238/postgres-odd-behavior-with-indices I've got a datavalue table with ~200M rows or so, with indices on both site_id and parameter_id. I need to execute queries like "return all sites with data" and "return all parameters with data". The site table has only 200 rows or so, and the parameter table has only 100 or so rows. The site query is fast and uses the index: EXPLAIN ANALYZEselect *from sitewhere exists ( select 1 from datavalue where datavalue.site_id = site.id limit 1); Seq Scan on site (cost=0.00..64.47 rows=64 width=113) (actual time=0.046..1.106 rows=89 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 39 SubPlan 1 -> Limit (cost=0.44..0.47 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=128) -> Index Only Scan using ix_datavalue_site_id on datavalue (cost=0.44..8142.71 rows=248930 width=0) (actual time=0.008..0.008 rows=1 loops=128) Index Cond: (site_id = site.id) Heap Fetches: 0 Planning time: 0.361 ms Execution time: 1.149 ms The same query for parameters is rather slow and does NOT use the index: EXPLAIN ANALYZEselect *from parameterwhere exists ( select 1 from datavalue where datavalue.parameter_id = parameter.id limit 1); Seq Scan on parameter (cost=0.00..20.50 rows=15 width=2648) (actual time=2895.972..21331.701 rows=15 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 6 SubPlan 1 -> Limit (cost=0.00..0.34 rows=1 width=0) (actual time=1015.790..1015.790 rows=1 loops=21) -> Seq Scan on datavalue (cost=0.00..502127.10 rows=1476987 width=0) (actual time=1015.786..1015.786 rows=1 loops=21) Filter: (parameter_id = parameter.id) Rows Removed by Filter: 7739355 Planning time: 0.123 ms Execution time: 21331.736 ms What the deuce is going on here? Alternatively, whats a good way to do this? Any help/guidance appreciated! Some of the table description: \d datavalue id BIGINT DEFAULT nextval('datavalue_id_seq'::regclass) NOT NULL, value DOUBLE PRECISION NOT NULL, site_id INTEGER NOT NULL, parameter_id INTEGER NOT NULL, deployment_id INTEGER, instrument_id INTEGER, invalid BOOLEAN, Indexes: "datavalue_pkey" PRIMARY KEY, btree (id) "datavalue_datetime_utc_site_id_parameter_id_instrument_id_key" UNIQUE CONSTRAINT, btree (datetime_utc, site_id, parameter_id, instrument_id) "ix_datavalue_instrument_id" btree (instrument_id) "ix_datavalue_parameter_id" btree (parameter_id) "ix_datavalue_site_id" btree (site_id) "tmp_idx" btree (site_id, datetime_utc) Foreign-key constraints: "datavalue_instrument_id_fkey" FOREIGN KEY (instrument_id) REFERENCES instrument(id) ON UPDATE CASCADE ON DELETE CASCADE "datavalue_parameter_id_fkey" FOREIGN KEY (parameter_id) REFERENCES parameter(id) ON UPDATE CASCADE ON DELETE CASCADE "datavalue_site_id_fkey" FOREIGN KEY (site_id) REFERENCES coastal.site(id) ON UPDATE CASCADE ON DELETE CASCADE "datavalue_statistic_type_id_fkey"