Re: [PERFORM] oracle to psql migration - slow query in postgres
On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala mladen.gog...@vmsinfo.comwrote: If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used. Having said that, I was able to solve the problems with my speed and partitioning. Could you elaborate on this, please? What do you mean by 'everythin must be prefixed with partitioning column?' --sam
Re: [PERFORM] oracle to psql migration - slow query in postgres
Samuel Gendler wrote: On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala mladen.gog...@vmsinfo.com mailto:mladen.gog...@vmsinfo.com wrote: If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used. Having said that, I was able to solve the problems with my speed and partitioning. Could you elaborate on this, please? What do you mean by 'everythin must be prefixed with partitioning column?' --sam If you have partitioned table part_tab, partitioned on the column item_date and if there is a global primary key in Oracle, let's call it item_id, then queries like select * from part_tab where item_id=12345 will perform worse than queries with item_date select * from part_tab where item_id=12345 and item_date='2010-10-15' This also applies to inserts and updates. Strictly speaking, the item_date column in the query above is not necessary, after all, the item_id column is the primary key. However, with range scans you will get much better results if you include the item_date column than if you use combination of columns without. The term prefixed indexes is borrowed from Oracle RDBMS and means that the beginning column in the index is the column on which the table is partitioned. Oracle, as opposed to Postgres, has global indexes, the indexes that span all partitions. PostgreSQL only maintains indexes on each of the partitions separately. Oracle calls such indexes local indexes and defines them on the partitioned table level. Here is a brief and rather succinct explanation of the terminology: http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php Of, course, there are other differences between Oracle partitioning and PostgreSQL partitioning. The main difference is $1/CPU. I am talking from experience: news= \d moreover_documents Table moreover.moreover_documents Column|Type | Modifiers --+-+--- document_id | bigint | not null dre_reference| bigint | not null headline | character varying(4000) | author | character varying(200) | url | character varying(1000) | rank | bigint | content | text| stories_like_this| character varying(1000) | internet_web_site_id | bigint | not null harvest_time | timestamp without time zone | valid_time | timestamp without time zone | keyword | character varying(200) | article_id | bigint | not null media_type | character varying(20) | source_type | character varying(20) | created_at | timestamp without time zone | autonomy_fed_at | timestamp without time zone | language | character varying(150) | Indexes: moreover_documents_pkey PRIMARY KEY, btree (document_id) Triggers: insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH ROW EXE CUTE PROCEDURE moreover_insert_trgfn() Number of child tables: 8 (Use \d+ to list them.) The child tables are, of course, partitions. Here is the original: Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options SQL desc moreover_documents Name Null?Type - DOCUMENT# NOT NULL NUMBER DRE_REFERENCE NOT NULL NUMBER HEADLINEVARCHAR2(4000) AUTHOR VARCHAR2(200) URLVARCHAR2(1000) RANKNUMBER CONTENTCLOB STORIES_LIKE_THISVARCHAR2(1000) INTERNET_WEB_SITE# NOT NULL NUMBER HARVEST_TIMEDATE VALID_TIMEDATE KEYWORDVARCHAR2(200) ARTICLE_ID NOT NULL NUMBER MEDIA_TYPEVARCHAR2(20) CREATED_ATDATE SOURCE_TYPEVARCHAR2(50) PUBLISH_DATEDATE AUTONOMY_FED_ATDATE LANGUAGEVARCHAR2(150) SQL I must say that it took me some time to get things right. -- Mladen Gogala Sr. Oracle DBA 1500
Re: [PERFORM] oracle to psql migration - slow query in postgres
On Thu, Oct 14, 2010 at 3:43 PM, Tony Capobianco tcapobia...@prospectiv.com wrote: explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; *) as others have noted, none of your indexes will back this expression. For an index to match properly the index must have all the fields matched in the 'where' clause in left to right order. you could rearrange indexes you already have and probably get things to work properly. *) If you want things to go really fast, and the combination of emailok, emailbounced is a small percentage (say, less than 5) in the table, and you are not interested in the schema level changes your table is screaming, and the (1,0) combination is what you want to frequently match and you should consider: create function email_interesting(ok numeric, bounced numeric) returns bool as $$ select $1 = 1 and $2 = 0; $$ language sql immutable; create function members_email_interesting_idx on members(email_interesting(emailok, emailbounced)) where email_interesting(); This will build a partial index which you can query via: select emailaddress, websiteid from members where email_interesting(emailok, emailbounced); merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
The recommendations on the numeric columns are fantastic. Thank you very much. We will revisit our methods of assigning datatypes when we migrate our data over from Oracle. Regarding the full table scans; it appears inevitable that full table scans are necessary for the volume of data involved and the present design of our indexes. Over time, indexes were added/removed to satisfy particular functionality. Considering this is our most important table, I will research exactly how this table is queried to better optimize/reorganize our indexes. Thanks for your help. Tony On Thu, 2010-10-14 at 23:59 -0400, Mladen Gogala wrote: On 10/14/2010 4:10 PM, Jon Nelson wrote: The first thing I'd do is think real hard about whether you really really want 'numeric' instead of boolean, smallint, or integer. The second thing is that none of your indices (which specify a whole bunch of fields, by the way) have only just emailok, emailbounced, or only the pair of them. Without knowing the needs of your app, I would reconsider your index choices and go with fewer columns per index. Also, make sure that the statistics is good, that histograms are large enough and that Geico (the genetic query optimizer) will really work hard to save you 15% or more on the query execution time. You can also make sure that any index existing index is used, by disabling the sequential scan and then activating and de-activating indexes with the dummy expressions, just as it was done with Oracle's rule based optimizer. I agree that a good data model is even more crucial for Postgres than is the case with Oracle. Oracle, because of its rich assortment of tweaking hacking tools and parameters, can be made to perform, even if the model is designed by someone who didn't apply the rules of good design. Postgres is much more susceptible to bad models and it is much harder to work around a badly designed model in Postgres than in Oracle. What people do not understand is that every application in the world will become badly designed after years of maintenance, adding columns, creating additional indexes, views, tables and triggers and than deploying various tools to design applications. As noted by Murphy, things develop from bad to worse. Keep Postgres models simple and separated, because it's much easier to keep clearly defined models simple and effective than to keep models with 700 tables and 350 views, frequently with conflicting names, different columns named the same and same columns named differently. And monitor, monitor, monitor. Use strace, ltrace, pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, iostat and all tools you can get hold of. Without the event interface, it's frequently a guessing game. It is, however, possible to manage things. If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used. Having said that, I was able to solve the problems with my speed and partitioning. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
This table has approximately 300million rows. and your query grab rows=236 660 930 of them. An index might be useless in this situation. I want to point out that this is probably the most important comment here. A couple of people have noted out that the index won't work for this query, but more importantly, an index is (probably) not desirable for this query. As an analogy (since everyone loves half-baked programming analogies), if you want to find a couple of bakeries to sponsor your MySQL Data Integrity Issues Awareness Walk by donating scones, you use the yellow pages. If you want to hit up every business in the area to donate whatever they can, you're better off canvasing the neighborhood. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
-Original Message- From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] Sent: Thursday, October 14, 2010 3:43 PM To: pgsql-performance@postgresql.org Subject: oracle to psql migration - slow query in postgres We are in the process of testing migration of our oracle data warehouse over to postgres. A potential showstopper are full table scans on our members table. We can't function on postgres effectively unless index scans are employed. I'm thinking I don't have something set correctly in my postgresql.conf file, but I'm not sure what. This table has approximately 300million rows. Version: SELECT version(); version -- PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; QUERY PLAN -- Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) Total runtime: 850306.220 ms (3 rows) show shared_buffers ; shared_buffers 7680MB (1 row) show effective_cache_size ; effective_cache_size -- 22GB (1 row) show work_mem ; work_mem -- 768MB (1 row) show enable_seqscan ; enable_seqscan on (1 row) Below are the data definitions for the table/indexes in question: \d members Table members Column|Type | Modifiers -+-+--- memberid| numeric | not null firstname | character varying(50) | lastname| character varying(50) | emailaddress| character varying(50) | password| character varying(50) | address1| character varying(50) | address2| character varying(50) | city| character varying(50) | statecode | character varying(50) | zipcode | character varying(50) | birthdate | date| emailok | numeric(2,0)| gender | character varying(1)| addeddate | timestamp without time zone | emailbounced| numeric(2,0)| changedate | timestamp without time zone | optoutsource| character varying(100) | websiteid | numeric | promotionid | numeric | sourceid| numeric | siteid | character varying(64) | srcwebsiteid| numeric | homephone | character varying(20) | homeareacode| character varying(10) | campaignid | numeric | srcmemberid | numeric | optoutdate | date| regcomplete | numeric(1,0)| regcompletesourceid | numeric | ipaddress | character varying(25) | pageid | numeric | streetaddressstatus | numeric(1,0)| middlename | character varying(50) | optinprechecked | numeric(1,0)| optinposition | numeric | homephonestatus | numeric | addeddate_id| numeric | changedate_id | numeric | rpmindex| numeric | optmode | numeric(1,0)| countryid | numeric | confirmoptin| numeric(2,0)| bouncedate | date| memberageid | numeric | sourceid2 | numeric | remoteuserid| character varying(50) | goal| numeric(1,0)| flowdepth | numeric | pagetype
Re: [PERFORM] oracle to psql migration - slow query in postgres
Thanks for all your responses. What's interesting is that an index is used when this query is executed in Oracle. It appears to do some parallel processing: SQL set line 200 delete from plan_table; explain plan for select websiteid, emailaddress from members where emailok = 1 and emailbounced = 0; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); SQL 3 rows deleted. SQL 2345 Explained. SQL SQL PLAN_TABLE_OUTPUT Plan hash value: 4247959398 --- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | --- | 0 | SELECT STATEMENT|| 237M| 7248M| 469K (2)| 01:49:33 || || | 1 | PX COORDINATOR || | || || || | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | P-S | QC (RAND) | | 3 |PX BLOCK ITERATOR|| 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || --- PLAN_TABLE_OUTPUT Predicate Information (identified by operation id): --- 4 - filter(EMAILBOUNCED=0 AND EMAILOK=1) 16 rows selected. On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote: -Original Message- From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] Sent: Thursday, October 14, 2010 3:43 PM To: pgsql-performance@postgresql.org Subject: oracle to psql migration - slow query in postgres We are in the process of testing migration of our oracle data warehouse over to postgres. A potential showstopper are full table scans on our members table. We can't function on postgres effectively unless index scans are employed. I'm thinking I don't have something set correctly in my postgresql.conf file, but I'm not sure what. This table has approximately 300million rows. Version: SELECT version(); version -- PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; QUERY PLAN -- Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) Total runtime: 850306.220 ms (3 rows) show shared_buffers ; shared_buffers 7680MB (1 row) show effective_cache_size ; effective_cache_size -- 22GB (1 row) show work_mem ; work_mem -- 768MB (1 row) show enable_seqscan ; enable_seqscan on (1 row) Below are the data definitions for the table/indexes in question: \d members Table members Column|Type | Modifiers -+-+--- memberid| numeric | not null firstname | character varying(50) | lastname| character varying(50) | emailaddress| character varying(50) | password| character varying(50) | address1|
Re: [PERFORM] oracle to psql migration - slow query in postgres
-Original Message- From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] Sent: Friday, October 15, 2010 2:14 PM To: pgsql-performance@postgresql.org Subject: Re: oracle to psql migration - slow query in postgres Thanks for all your responses. What's interesting is that an index is used when this query is executed in Oracle. It appears to do some parallel processing: SQL set line 200 delete from plan_table; explain plan for select websiteid, emailaddress from members where emailok = 1 and emailbounced = 0; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); SQL 3 rows deleted. SQL 2345 Explained. SQL SQL PLAN_TABLE_OUTPUT -- -- -- -- Plan hash value: 4247959398 -- - | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | -- - | 0 | SELECT STATEMENT|| 237M| 7248M| 469K (2)| 01:49:33 || || | 1 | PX COORDINATOR || | || || || | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | P-S | QC (RAND) | | 3 |PX BLOCK ITERATOR|| 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || -- - PLAN_TABLE_OUTPUT -- -- -- -- Predicate Information (identified by operation id): --- 4 - filter(EMAILBOUNCED=0 AND EMAILOK=1) 16 rows selected. 1. Postgres doesn't have FAST FULL SCAN because even if all the info is in the index, it need to visit the row in the table (visibility issue). 2. Postgres doesn't have parallel executions. BUT, it's free anf has greate community support, as you already saw. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
Very true Igor! Free is my favorite price. I'll figure a way around this issue. Thanks for your help. Tony On Fri, 2010-10-15 at 14:54 -0400, Igor Neyman wrote: -Original Message- From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] Sent: Friday, October 15, 2010 2:14 PM To: pgsql-performance@postgresql.org Subject: Re: oracle to psql migration - slow query in postgres Thanks for all your responses. What's interesting is that an index is used when this query is executed in Oracle. It appears to do some parallel processing: SQL set line 200 delete from plan_table; explain plan for select websiteid, emailaddress from members where emailok = 1 and emailbounced = 0; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); SQL 3 rows deleted. SQL 2345 Explained. SQL SQL PLAN_TABLE_OUTPUT -- -- -- -- Plan hash value: 4247959398 -- - | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |TQ |IN-OUT| PQ Distrib | -- - | 0 | SELECT STATEMENT|| 237M| 7248M| 469K (2)| 01:49:33 || || | 1 | PX COORDINATOR || | || || || | 2 | PX SEND QC (RANDOM) | :TQ1 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | P-S | QC (RAND) | | 3 |PX BLOCK ITERATOR|| 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWC || |* 4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 | 237M| 7248M| 469K (2)| 01:49:33 | Q1,00 | PCWP || -- - PLAN_TABLE_OUTPUT -- -- -- -- Predicate Information (identified by operation id): --- 4 - filter(EMAILBOUNCED=0 AND EMAILOK=1) 16 rows selected. 1. Postgres doesn't have FAST FULL SCAN because even if all the info is in the index, it need to visit the row in the table (visibility issue). 2. Postgres doesn't have parallel executions. BUT, it's free anf has greate community support, as you already saw. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] oracle to psql migration - slow query in postgres
We are in the process of testing migration of our oracle data warehouse over to postgres. A potential showstopper are full table scans on our members table. We can't function on postgres effectively unless index scans are employed. I'm thinking I don't have something set correctly in my postgresql.conf file, but I'm not sure what. This table has approximately 300million rows. Version: SELECT version(); version -- PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; QUERY PLAN -- Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) Total runtime: 850306.220 ms (3 rows) show shared_buffers ; shared_buffers 7680MB (1 row) show effective_cache_size ; effective_cache_size -- 22GB (1 row) show work_mem ; work_mem -- 768MB (1 row) show enable_seqscan ; enable_seqscan on (1 row) Below are the data definitions for the table/indexes in question: \d members Table members Column|Type | Modifiers -+-+--- memberid| numeric | not null firstname | character varying(50) | lastname| character varying(50) | emailaddress| character varying(50) | password| character varying(50) | address1| character varying(50) | address2| character varying(50) | city| character varying(50) | statecode | character varying(50) | zipcode | character varying(50) | birthdate | date| emailok | numeric(2,0)| gender | character varying(1)| addeddate | timestamp without time zone | emailbounced| numeric(2,0)| changedate | timestamp without time zone | optoutsource| character varying(100) | websiteid | numeric | promotionid | numeric | sourceid| numeric | siteid | character varying(64) | srcwebsiteid| numeric | homephone | character varying(20) | homeareacode| character varying(10) | campaignid | numeric | srcmemberid | numeric | optoutdate | date| regcomplete | numeric(1,0)| regcompletesourceid | numeric | ipaddress | character varying(25) | pageid | numeric | streetaddressstatus | numeric(1,0)| middlename | character varying(50) | optinprechecked | numeric(1,0)| optinposition | numeric | homephonestatus | numeric | addeddate_id| numeric | changedate_id | numeric | rpmindex| numeric | optmode | numeric(1,0)| countryid | numeric | confirmoptin| numeric(2,0)| bouncedate | date| memberageid | numeric | sourceid2 | numeric | remoteuserid| character varying(50) | goal| numeric(1,0)| flowdepth | numeric | pagetype| numeric | savepassword| character varying(50) | customerprofileid | numeric | Indexes: email_website_unq UNIQUE, btree (emailaddress, websiteid), tablespace members_idx member_addeddateid_idx btree (addeddate_id), tablespace members_idx member_changedateid_idx btree
Re: [PERFORM] oracle to psql migration - slow query in postgres
On Thu, Oct 14, 2010 at 12:43 PM, Tony Capobianco tcapobia...@prospectiv.com wrote: We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; Maybe a couple indexes to try: create index members_emailok_emailbounced_idx on members (emailok,emailbounced); or a functional index (will likely be smaller, depending on the contents of your table): create index members_emailok_emailbounced_idx on members (emailok,emailbounced) where emailok = 1 and emailbounced = 0; -- if you use that combination of 1 and 0 regularly -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
On 10/14/10 21:43, Tony Capobianco wrote: We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; QUERY PLAN -- Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) Total runtime: 850306.220 ms (3 rows) Indexes: email_website_unq UNIQUE, btree (emailaddress, websiteid), tablespace members_idx member_addeddateid_idx btree (addeddate_id), tablespace members_idx member_changedateid_idx btree (changedate_id), tablespace members_idx members_fdate_idx btree (to_char_year_month(addeddate)), tablespace esave_idx members_memberid_idx btree (memberid), tablespace members_idx members_mid_emailok_idx btree (memberid, emailaddress, zipcode, firstname, emailok), tablespace members_idx members_sorted_idx_001 btree (websiteid, emailok, emailbounced, addeddate, memberid, zipcode, statecode, emailaddress), tablespace members_idx members_src_idx btree (websiteid, emailbounced, sourceid), tablespace members_idx members_wid_idx btree (websiteid), tablespace members_idx PostgreSQL doesn't fetch data directly from indexes, so there is no way for it to reasonably use an index declared like: members_sorted_idx_001 btree (websiteid, emailok, emailbounced, addeddate, memberid, zipcode, statecode, emailaddress) You need a direct index on the fields you are using in your query, i.e. an index on (emailok, emailbounced). OTOH, those columns look boolean-like. It depends on what your data set is, but if the majority of records contain (emailok=1 and emailbounced=0) an index may not help you much. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
2010/10/14 Tony Capobianco tcapobia...@prospectiv.com: We are in the process of testing migration of our oracle data warehouse over to postgres. A potential showstopper are full table scans on our members table. We can't function on postgres effectively unless index scans are employed. I'm thinking I don't have something set correctly in my postgresql.conf file, but I'm not sure what. This table has approximately 300million rows. and your query grab rows=236 660 930 of them. An index might be useless in this situation. Version: SELECT version(); version -- PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit We have 4 quad-core processors and 32GB of RAM. The below query uses the members_sorted_idx_001 index in oracle, but in postgres, the optimizer chooses a sequential scan. explain analyze create table tmp_srcmem_emws1 as select emailaddress, websiteid from members where emailok = 1 and emailbounced = 0; QUERY PLAN -- Seq Scan on members (cost=0.00..14137154.64 rows=238177981 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1) Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric)) Total runtime: 850306.220 ms (3 rows) show shared_buffers ; shared_buffers 7680MB (1 row) show effective_cache_size ; effective_cache_size -- 22GB (1 row) show work_mem ; work_mem -- 768MB (1 row) show enable_seqscan ; enable_seqscan on (1 row) Below are the data definitions for the table/indexes in question: \d members Table members Column | Type | Modifiers -+-+--- memberid | numeric | not null firstname | character varying(50) | lastname | character varying(50) | emailaddress | character varying(50) | password | character varying(50) | address1 | character varying(50) | address2 | character varying(50) | city | character varying(50) | statecode | character varying(50) | zipcode | character varying(50) | birthdate | date | emailok | numeric(2,0) | gender | character varying(1) | addeddate | timestamp without time zone | emailbounced | numeric(2,0) | changedate | timestamp without time zone | optoutsource | character varying(100) | websiteid | numeric | promotionid | numeric | sourceid | numeric | siteid | character varying(64) | srcwebsiteid | numeric | homephone | character varying(20) | homeareacode | character varying(10) | campaignid | numeric | srcmemberid | numeric | optoutdate | date | regcomplete | numeric(1,0) | regcompletesourceid | numeric | ipaddress | character varying(25) | pageid | numeric | streetaddressstatus | numeric(1,0) | middlename | character varying(50) | optinprechecked | numeric(1,0) | optinposition | numeric | homephonestatus | numeric | addeddate_id | numeric | changedate_id | numeric | rpmindex | numeric | optmode | numeric(1,0) | countryid | numeric | confirmoptin | numeric(2,0) | bouncedate | date | memberageid | numeric | sourceid2 | numeric | remoteuserid | character varying(50) | goal | numeric(1,0) | flowdepth | numeric | pagetype | numeric | savepassword | character varying(50) | customerprofileid | numeric | Indexes: email_website_unq UNIQUE, btree (emailaddress, websiteid), tablespace members_idx member_addeddateid_idx
Re: [PERFORM] oracle to psql migration - slow query in postgres
emailok | numeric(2,0)| Note that NUMERIC is meant for - really large numbers with lots of digits - or controlled precision and rounding (ie, order total isn't 99. $) Accordingly, NUMERIC is a lot slower in all operations, and uses a lot more space, than all the other numeric types. I see many columns in your table that are declared as NUMERIC but should be BOOLs, or SMALLINTs, or INTs, or BIGINTs. Perhaps Oracle handles these differently, I dunno. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
Just my take on this. The first thing I'd do is think real hard about whether you really really want 'numeric' instead of boolean, smallint, or integer. The second thing is that none of your indices (which specify a whole bunch of fields, by the way) have only just emailok, emailbounced, or only the pair of them. Without knowing the needs of your app, I would reconsider your index choices and go with fewer columns per index. For this particular query I would think either two indexes (depending on the cardinality of the data, one for each of emailok, emailbounced) or one index (containing both emailok, emailbounced) would make quite a bit of difference. Consider creating the indexes using a WITH clause, for example: CREATE INDEX members_just_an_example_idx ON members (emailok, emailbounced) WHERE emailok = 1 AND emailbounced = 0; Obviously that index is only useful in situations where both fields are specified with those values. Furthermore, if the result is such that a very high percentage of the table has those conditions a sequential scan is going to be cheaper, anyway. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] oracle to psql migration - slow query in postgres
On 10/14/2010 4:10 PM, Jon Nelson wrote: The first thing I'd do is think real hard about whether you really really want 'numeric' instead of boolean, smallint, or integer. The second thing is that none of your indices (which specify a whole bunch of fields, by the way) have only just emailok, emailbounced, or only the pair of them. Without knowing the needs of your app, I would reconsider your index choices and go with fewer columns per index. Also, make sure that the statistics is good, that histograms are large enough and that Geico (the genetic query optimizer) will really work hard to save you 15% or more on the query execution time. You can also make sure that any index existing index is used, by disabling the sequential scan and then activating and de-activating indexes with the dummy expressions, just as it was done with Oracle's rule based optimizer. I agree that a good data model is even more crucial for Postgres than is the case with Oracle. Oracle, because of its rich assortment of tweaking hacking tools and parameters, can be made to perform, even if the model is designed by someone who didn't apply the rules of good design. Postgres is much more susceptible to bad models and it is much harder to work around a badly designed model in Postgres than in Oracle. What people do not understand is that every application in the world will become badly designed after years of maintenance, adding columns, creating additional indexes, views, tables and triggers and than deploying various tools to design applications. As noted by Murphy, things develop from bad to worse. Keep Postgres models simple and separated, because it's much easier to keep clearly defined models simple and effective than to keep models with 700 tables and 350 views, frequently with conflicting names, different columns named the same and same columns named differently. And monitor, monitor, monitor. Use strace, ltrace, pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, iostat and all tools you can get hold of. Without the event interface, it's frequently a guessing game. It is, however, possible to manage things. If working with partitioning, be very aware that PostgreSQL optimizer has certain problems with partitions, especially with group functions. If you want speed, everything must be prefixed with partitioning column: indexes, expressions, joins. There is no explicit star schema and creating hash indexes will not buy you much, as a matter of fact, Postgres community is extremely suspicious of the hash indexes and I don't see them widely used. Having said that, I was able to solve the problems with my speed and partitioning. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance