Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Samuel Gendler
On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala mladen.gog...@vmsinfo.comwrote:

  If working with partitioning, be very aware that PostgreSQL optimizer has
 certain problems with partitions, especially with group functions. If you
 want speed, everything must be prefixed with partitioning column: indexes,
 expressions, joins. There is no explicit star schema and creating hash
 indexes will not buy you much, as a matter of fact, Postgres community is
 extremely suspicious of the hash indexes and I don't see them widely used.
 Having said that, I was able to solve the problems with my speed and
 partitioning.


Could you elaborate on this, please? What do you mean by 'everythin must be
prefixed with partitioning column?'

--sam


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Mladen Gogala

Samuel Gendler wrote:



On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala 
mladen.gog...@vmsinfo.com mailto:mladen.gog...@vmsinfo.com wrote:


 If working with partitioning, be very aware that PostgreSQL
optimizer has certain problems with partitions, especially with
group functions. If you want speed, everything must be prefixed
with partitioning column: indexes, expressions, joins. There is no
explicit star schema and creating hash indexes will not buy you
much, as a matter of fact, Postgres community is extremely
suspicious of the hash indexes and I don't see them widely used.
Having said that, I was able to solve the problems with my speed
and partitioning.


Could you elaborate on this, please? What do you mean by 'everythin 
must be prefixed with partitioning column?'


--sam
If you have partitioned table part_tab, partitioned on the column 
item_date and if there is a global primary key in Oracle, let's call it 
item_id, then queries like select * from part_tab where item_id=12345 
will perform worse than queries with item_date


select * from part_tab where item_id=12345 and item_date='2010-10-15'

This also applies to inserts and updates. Strictly speaking, the 
item_date column in the query above is not necessary, after all, the 
item_id column is the primary key. However, with range scans you will 
get much better results if you include the item_date column than if you 
use combination of columns without. The term prefixed indexes is 
borrowed from Oracle RDBMS and means that the beginning column in the 
index is the column on which the table is partitioned. Oracle, as 
opposed to Postgres, has global indexes, the indexes that span all 
partitions. PostgreSQL only maintains indexes on each of the partitions 
separately.  Oracle calls such indexes local indexes and defines them 
on the partitioned table level. Here is a brief and rather succinct  
explanation of the terminology:


http://www.oracle-base.com/articles/8i/PartitionedTablesAndIndexes.php


Of, course, there are other differences between Oracle partitioning and 
PostgreSQL partitioning. The main difference is $1/CPU.

I am talking from experience:

news= \d moreover_documents
 Table moreover.moreover_documents
   Column|Type | Modifiers
--+-+---
document_id  | bigint  | not null
dre_reference| bigint  | not null
headline | character varying(4000) |
author   | character varying(200)  |
url  | character varying(1000) |
rank | bigint  |
content  | text|
stories_like_this| character varying(1000) |
internet_web_site_id | bigint  | not null
harvest_time | timestamp without time zone |
valid_time   | timestamp without time zone |
keyword  | character varying(200)  |
article_id   | bigint  | not null
media_type   | character varying(20)   |
source_type  | character varying(20)   |
created_at   | timestamp without time zone |
autonomy_fed_at  | timestamp without time zone |
language | character varying(150)  |
Indexes:
   moreover_documents_pkey PRIMARY KEY, btree (document_id)
Triggers:
   insert_moreover_trigger BEFORE INSERT ON moreover_documents FOR EACH 
ROW EXE

CUTE PROCEDURE moreover_insert_trgfn()
Number of child tables: 8 (Use \d+ to list them.)

The child tables are, of course, partitions.

Here is the original:


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL desc moreover_documents
Name   Null?Type
-  


DOCUMENT#   NOT NULL NUMBER
DRE_REFERENCE   NOT NULL NUMBER
HEADLINEVARCHAR2(4000)
AUTHOR VARCHAR2(200)
URLVARCHAR2(1000)
RANKNUMBER
CONTENTCLOB
STORIES_LIKE_THISVARCHAR2(1000)
INTERNET_WEB_SITE#   NOT NULL NUMBER
HARVEST_TIMEDATE
VALID_TIMEDATE
KEYWORDVARCHAR2(200)
ARTICLE_ID   NOT NULL NUMBER
MEDIA_TYPEVARCHAR2(20)
CREATED_ATDATE
SOURCE_TYPEVARCHAR2(50)
PUBLISH_DATEDATE
AUTONOMY_FED_ATDATE
LANGUAGEVARCHAR2(150)

SQL



I must say that it took me some time to get things right.

--

Mladen Gogala 
Sr. Oracle DBA

1500 

Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Merlin Moncure
On Thu, Oct 14, 2010 at 3:43 PM, Tony Capobianco
tcapobia...@prospectiv.com wrote:
 explain analyze create table tmp_srcmem_emws1
 as
 select emailaddress, websiteid
  from members
  where emailok = 1
   and emailbounced = 0;

*) as others have noted, none of your indexes will back this
expression.  For an index to match properly the index must have all
the fields matched in the 'where' clause in left to right order.  you
could rearrange indexes you already have and probably get things to
work properly.

*) If you want things to go really fast, and the combination of
emailok, emailbounced is a small percentage (say, less than 5) in the
table, and you are not interested in the schema level changes your
table is screaming, and the (1,0) combination is what you want to
frequently match and you should consider:

create function email_interesting(ok numeric, bounced numeric) returns bool as
$$
  select $1 = 1 and $2 = 0;
$$ language sql immutable;

create function members_email_interesting_idx on
  members(email_interesting(emailok, emailbounced)) where email_interesting();

This will build a partial index which you can query via:
select emailaddress, websiteid
 from members
 where email_interesting(emailok, emailbounced);

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Tony Capobianco
The recommendations on the numeric columns are fantastic.  Thank you
very much.  We will revisit our methods of assigning datatypes when we
migrate our data over from Oracle.
Regarding the full table scans; it appears inevitable that full table
scans are necessary for the volume of data involved and the present
design of our indexes.  Over time, indexes were added/removed to satisfy
particular functionality.  Considering this is our most important table,
I will research exactly how this table is queried to better
optimize/reorganize our indexes.

Thanks for your help.
Tony


On Thu, 2010-10-14 at 23:59 -0400, Mladen Gogala wrote:
 On 10/14/2010 4:10 PM, Jon Nelson wrote:
  The first thing I'd do is think real hard about whether you really
  really want 'numeric' instead of boolean, smallint, or integer.  The
  second thing is that none of your indices (which specify a whole bunch
  of fields, by the way) have only just emailok, emailbounced, or only
  the pair of them. Without knowing the needs of your app, I would
  reconsider your index choices and go with fewer columns per index.
 
 Also, make sure that the statistics is good, that histograms are large 
 enough and that Geico (the genetic query optimizer) will really work 
 hard to save you 15% or more on the query execution time. You can also 
 make sure that any index existing index is used,  by disabling the 
 sequential scan and then activating and de-activating indexes with the 
 dummy expressions,  just as it was done with Oracle's rule based optimizer.
 I agree that a good data model is even more crucial for Postgres than is 
 the case with Oracle. Oracle, because of its rich assortment of tweaking 
  hacking tools and parameters, can be made to perform, even if the 
 model is designed by someone who didn't apply the rules of good design. 
 Postgres is much more susceptible to bad models and it is much harder to 
 work around a badly designed model in Postgres than in Oracle. What 
 people do not understand is that every application in the world will 
 become badly designed after years of maintenance, adding columns, 
 creating additional indexes, views, tables and triggers and than 
 deploying various tools to design applications.  As noted by Murphy, 
 things develop from bad to worse. Keep Postgres models simple and 
 separated, because it's much easier to keep clearly defined models 
 simple and effective than to keep models with 700 tables and 350 views, 
 frequently with conflicting names, different columns named the same and 
 same columns named differently. And monitor, monitor, monitor. Use 
 strace, ltrace,  pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, 
 iostat and all tools you can get hold of. Without the event interface, 
 it's frequently a guessing game.  It is, however, possible to manage 
 things.  If working with partitioning, be very aware that PostgreSQL 
 optimizer has certain problems with partitions, especially with group 
 functions. If you want speed, everything must be prefixed with 
 partitioning column: indexes, expressions, joins. There is no explicit 
 star schema and creating hash indexes will not buy you much, as a matter 
 of fact, Postgres community is extremely suspicious of the hash indexes 
 and I don't see them widely used.
 Having said that, I was able to solve the problems with my speed and 
 partitioning.
 
 -- 
 Mladen Gogala
 Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 www.vmsinfo.com
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Maciek Sakrejda
 This table has approximately 300million rows.

 and your query grab rows=236 660 930 of them. An index might be
 useless in this situation.

I want to point out that this is probably the most important comment
here. A couple of people have noted out that the index won't work for
this query, but more importantly, an index is (probably) not desirable
for this query. As an analogy (since everyone loves half-baked
programming analogies), if you want to find a couple of bakeries to
sponsor your MySQL Data Integrity Issues Awareness Walk by donating
scones, you use the yellow pages. If you want to hit up every business
in the area to donate whatever they can, you're better off canvasing
the neighborhood.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Igor Neyman
 

 -Original Message-
 From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
 Sent: Thursday, October 14, 2010 3:43 PM
 To: pgsql-performance@postgresql.org
 Subject: oracle to psql migration - slow query in postgres
 
 We are in the process of testing migration of our oracle data 
 warehouse over to postgres.  A potential showstopper are full 
 table scans on our members table.  We can't function on 
 postgres effectively unless index scans are employed.  I'm 
 thinking I don't have something set correctly in my 
 postgresql.conf file, but I'm not sure what.
 
 This table has approximately 300million rows.
 
 Version:
 SELECT version();
 
 version  
 --
 
  PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC 
 gcc (GCC)
 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
 
 We have 4 quad-core processors and 32GB of RAM.  The below 
 query uses the members_sorted_idx_001 index in oracle, but in 
 postgres, the optimizer chooses a sequential scan.
 
 explain analyze create table tmp_srcmem_emws1 as select 
 emailaddress, websiteid
   from members
  where emailok = 1
and emailbounced = 0;
   QUERY
 PLAN  
 --
 
  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 
 width=29) (actual time=0.052..685834.785 rows=236660930 loops=1)
Filter: ((emailok = 1::numeric) AND (emailbounced = 
 0::numeric))  Total runtime: 850306.220 ms
 (3 rows)
 
 show shared_buffers ;
  shared_buffers
 
  7680MB
 (1 row)
 
 show effective_cache_size ;
  effective_cache_size
 --
  22GB
 (1 row)
 
 show work_mem ;
  work_mem
 --
  768MB
 (1 row)
 
 show enable_seqscan ;
  enable_seqscan
 
  on
 (1 row)
 
 Below are the data definitions for the table/indexes in question:
 
 \d members
  Table members
Column|Type | Modifiers 
 -+-+---
  memberid| numeric | not null
  firstname   | character varying(50)   | 
  lastname| character varying(50)   | 
  emailaddress| character varying(50)   | 
  password| character varying(50)   | 
  address1| character varying(50)   | 
  address2| character varying(50)   | 
  city| character varying(50)   | 
  statecode   | character varying(50)   | 
  zipcode | character varying(50)   | 
  birthdate   | date| 
  emailok | numeric(2,0)| 
  gender  | character varying(1)| 
  addeddate   | timestamp without time zone | 
  emailbounced| numeric(2,0)| 
  changedate  | timestamp without time zone | 
  optoutsource| character varying(100)  | 
  websiteid   | numeric | 
  promotionid | numeric | 
  sourceid| numeric | 
  siteid  | character varying(64)   | 
  srcwebsiteid| numeric | 
  homephone   | character varying(20)   | 
  homeareacode| character varying(10)   | 
  campaignid  | numeric | 
  srcmemberid | numeric | 
  optoutdate  | date| 
  regcomplete | numeric(1,0)| 
  regcompletesourceid | numeric | 
  ipaddress   | character varying(25)   | 
  pageid  | numeric | 
  streetaddressstatus | numeric(1,0)| 
  middlename  | character varying(50)   | 
  optinprechecked | numeric(1,0)| 
  optinposition   | numeric | 
  homephonestatus | numeric | 
  addeddate_id| numeric | 
  changedate_id   | numeric | 
  rpmindex| numeric | 
  optmode | numeric(1,0)| 
  countryid   | numeric | 
  confirmoptin| numeric(2,0)| 
  bouncedate  | date| 
  memberageid | numeric | 
  sourceid2   | numeric | 
  remoteuserid| character varying(50)   | 
  goal| numeric(1,0)| 
  flowdepth   | numeric | 
  pagetype 

Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Tony Capobianco
Thanks for all your responses. What's interesting is that an index is
used when this query is executed in Oracle.  It appears to do some
parallel processing:

SQL set line 200
delete from plan_table;
explain plan for
select websiteid, emailaddress
  from members
 where emailok = 1
   and emailbounced = 0;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
SQL 
3 rows deleted.

SQL   2345  
Explained.

SQL SQL 
PLAN_TABLE_OUTPUT

Plan hash value: 4247959398

---
| Id  | Operation   | Name   | Rows  | Bytes
| Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
---
|   0 | SELECT STATEMENT||   237M|
7248M|   469K  (2)| 01:49:33 ||  ||
|   1 |  PX COORDINATOR ||   |
||  ||  ||
|   2 |   PX SEND QC (RANDOM)   | :TQ1   |   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | P-S | QC (RAND)  |
|   3 |PX BLOCK ITERATOR||   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC ||
|*  4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP ||
---

PLAN_TABLE_OUTPUT


Predicate Information (identified by operation id):
---

   4 - filter(EMAILBOUNCED=0 AND EMAILOK=1)

16 rows selected.


On Fri, 2010-10-15 at 13:43 -0400, Igor Neyman wrote:
 
  -Original Message-
  From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
  Sent: Thursday, October 14, 2010 3:43 PM
  To: pgsql-performance@postgresql.org
  Subject: oracle to psql migration - slow query in postgres
  
  We are in the process of testing migration of our oracle data 
  warehouse over to postgres.  A potential showstopper are full 
  table scans on our members table.  We can't function on 
  postgres effectively unless index scans are employed.  I'm 
  thinking I don't have something set correctly in my 
  postgresql.conf file, but I'm not sure what.
  
  This table has approximately 300million rows.
  
  Version:
  SELECT version();
  
  version  
  --
  
   PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC 
  gcc (GCC)
  4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit
  
  We have 4 quad-core processors and 32GB of RAM.  The below 
  query uses the members_sorted_idx_001 index in oracle, but in 
  postgres, the optimizer chooses a sequential scan.
  
  explain analyze create table tmp_srcmem_emws1 as select 
  emailaddress, websiteid
from members
   where emailok = 1
 and emailbounced = 0;
QUERY
  PLAN  
  --
  
   Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 
  width=29) (actual time=0.052..685834.785 rows=236660930 loops=1)
 Filter: ((emailok = 1::numeric) AND (emailbounced = 
  0::numeric))  Total runtime: 850306.220 ms
  (3 rows)
  
  show shared_buffers ;
   shared_buffers
  
   7680MB
  (1 row)
  
  show effective_cache_size ;
   effective_cache_size
  --
   22GB
  (1 row)
  
  show work_mem ;
   work_mem
  --
   768MB
  (1 row)
  
  show enable_seqscan ;
   enable_seqscan
  
   on
  (1 row)
  
  Below are the data definitions for the table/indexes in question:
  
  \d members
   Table members
 Column|Type | Modifiers 
  -+-+---
   memberid| numeric | not null
   firstname   | character varying(50)   | 
   lastname| character varying(50)   | 
   emailaddress| character varying(50)   | 
   password| character varying(50)   | 
   address1| 

Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Igor Neyman

 -Original Message-
 From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
 Sent: Friday, October 15, 2010 2:14 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: oracle to psql migration - slow query in postgres
 
 Thanks for all your responses. What's interesting is that an 
 index is used when this query is executed in Oracle.  It 
 appears to do some parallel processing:
 
 SQL set line 200
 delete from plan_table;
 explain plan for
 select websiteid, emailaddress
   from members
  where emailok = 1
and emailbounced = 0;
 
 SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
 SQL 
 3 rows deleted.
 
 SQL   2345  
 Explained.
 
 SQL SQL 
 PLAN_TABLE_OUTPUT
 --
 --
 --
 --
 Plan hash value: 4247959398
 
 --
 -
 | Id  | Operation   | Name   | 
 Rows  | Bytes
 | Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
 --
 -
 |   0 | SELECT STATEMENT||   237M|
 7248M|   469K  (2)| 01:49:33 ||  ||
 |   1 |  PX COORDINATOR ||   |
 ||  ||  ||
 |   2 |   PX SEND QC (RANDOM)   | :TQ1   |   237M|
 7248M|   469K  (2)| 01:49:33 |  Q1,00 | P-S | QC (RAND)  |
 |   3 |PX BLOCK ITERATOR||   237M|
 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC ||
 |*  4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
 7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP ||
 --
 -
 
 PLAN_TABLE_OUTPUT
 --
 --
 --
 --
 
 Predicate Information (identified by operation id):
 ---
 
4 - filter(EMAILBOUNCED=0 AND EMAILOK=1)
 
 16 rows selected.
 
 

1. Postgres doesn't have FAST FULL SCAN because even if all the info
is in the index, it need to visit the row in the table (visibility
issue).

2. Postgres doesn't have parallel executions.

BUT, it's free anf has greate community support, as you already saw.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Tony Capobianco
Very true Igor!  Free is my favorite price.  
I'll figure a way around this issue.

Thanks for your help.
Tony

On Fri, 2010-10-15 at 14:54 -0400, Igor Neyman wrote:
  -Original Message-
  From: Tony Capobianco [mailto:tcapobia...@prospectiv.com] 
  Sent: Friday, October 15, 2010 2:14 PM
  To: pgsql-performance@postgresql.org
  Subject: Re: oracle to psql migration - slow query in postgres
  
  Thanks for all your responses. What's interesting is that an 
  index is used when this query is executed in Oracle.  It 
  appears to do some parallel processing:
  
  SQL set line 200
  delete from plan_table;
  explain plan for
  select websiteid, emailaddress
from members
   where emailok = 1
 and emailbounced = 0;
  
  SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());
  SQL 
  3 rows deleted.
  
  SQL   2345  
  Explained.
  
  SQL SQL 
  PLAN_TABLE_OUTPUT
  --
  --
  --
  --
  Plan hash value: 4247959398
  
  --
  -
  | Id  | Operation   | Name   | 
  Rows  | Bytes
  | Cost (%CPU)| Time |TQ  |IN-OUT| PQ Distrib |
  --
  -
  |   0 | SELECT STATEMENT||   237M|
  7248M|   469K  (2)| 01:49:33 ||  ||
  |   1 |  PX COORDINATOR ||   |
  ||  ||  ||
  |   2 |   PX SEND QC (RANDOM)   | :TQ1   |   237M|
  7248M|   469K  (2)| 01:49:33 |  Q1,00 | P-S | QC (RAND)  |
  |   3 |PX BLOCK ITERATOR||   237M|
  7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWC ||
  |*  4 | INDEX FAST FULL SCAN| MEMBERS_SORTED_IDX_001 |   237M|
  7248M|   469K  (2)| 01:49:33 |  Q1,00 | PCWP ||
  --
  -
  
  PLAN_TABLE_OUTPUT
  --
  --
  --
  --
  
  Predicate Information (identified by operation id):
  ---
  
 4 - filter(EMAILBOUNCED=0 AND EMAILOK=1)
  
  16 rows selected.
  
  
 
 1. Postgres doesn't have FAST FULL SCAN because even if all the info
 is in the index, it need to visit the row in the table (visibility
 issue).
 
 2. Postgres doesn't have parallel executions.
 
 BUT, it's free anf has greate community support, as you already saw.
 
 Regards,
 Igor Neyman
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Tony Capobianco
We are in the process of testing migration of our oracle data warehouse
over to postgres.  A potential showstopper are full table scans on our
members table.  We can't function on postgres effectively unless index
scans are employed.  I'm thinking I don't have something set correctly
in my postgresql.conf file, but I'm not sure what.

This table has approximately 300million rows.

Version:
SELECT version();

version  
--
 PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit

We have 4 quad-core processors and 32GB of RAM.  The below query uses
the members_sorted_idx_001 index in oracle, but in postgres, the
optimizer chooses a sequential scan.

explain analyze create table tmp_srcmem_emws1
as
select emailaddress, websiteid
  from members
 where emailok = 1
   and emailbounced = 0;
  QUERY
PLAN  
--
 Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
(actual time=0.052..685834.785 rows=236660930 loops=1)
   Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
 Total runtime: 850306.220 ms
(3 rows)

show shared_buffers ;
 shared_buffers 

 7680MB
(1 row)

show effective_cache_size ;
 effective_cache_size 
--
 22GB
(1 row)

show work_mem ;
 work_mem 
--
 768MB
(1 row)

show enable_seqscan ;
 enable_seqscan 

 on
(1 row)

Below are the data definitions for the table/indexes in question:

\d members
 Table members
   Column|Type | Modifiers 
-+-+---
 memberid| numeric | not null
 firstname   | character varying(50)   | 
 lastname| character varying(50)   | 
 emailaddress| character varying(50)   | 
 password| character varying(50)   | 
 address1| character varying(50)   | 
 address2| character varying(50)   | 
 city| character varying(50)   | 
 statecode   | character varying(50)   | 
 zipcode | character varying(50)   | 
 birthdate   | date| 
 emailok | numeric(2,0)| 
 gender  | character varying(1)| 
 addeddate   | timestamp without time zone | 
 emailbounced| numeric(2,0)| 
 changedate  | timestamp without time zone | 
 optoutsource| character varying(100)  | 
 websiteid   | numeric | 
 promotionid | numeric | 
 sourceid| numeric | 
 siteid  | character varying(64)   | 
 srcwebsiteid| numeric | 
 homephone   | character varying(20)   | 
 homeareacode| character varying(10)   | 
 campaignid  | numeric | 
 srcmemberid | numeric | 
 optoutdate  | date| 
 regcomplete | numeric(1,0)| 
 regcompletesourceid | numeric | 
 ipaddress   | character varying(25)   | 
 pageid  | numeric | 
 streetaddressstatus | numeric(1,0)| 
 middlename  | character varying(50)   | 
 optinprechecked | numeric(1,0)| 
 optinposition   | numeric | 
 homephonestatus | numeric | 
 addeddate_id| numeric | 
 changedate_id   | numeric | 
 rpmindex| numeric | 
 optmode | numeric(1,0)| 
 countryid   | numeric | 
 confirmoptin| numeric(2,0)| 
 bouncedate  | date| 
 memberageid | numeric | 
 sourceid2   | numeric | 
 remoteuserid| character varying(50)   | 
 goal| numeric(1,0)| 
 flowdepth   | numeric | 
 pagetype| numeric | 
 savepassword| character varying(50)   | 
 customerprofileid   | numeric | 
Indexes:
email_website_unq UNIQUE, btree (emailaddress, websiteid),
tablespace members_idx
member_addeddateid_idx btree (addeddate_id), tablespace
members_idx
member_changedateid_idx btree 

Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread bricklen
On Thu, Oct 14, 2010 at 12:43 PM, Tony Capobianco
tcapobia...@prospectiv.com wrote:
 We have 4 quad-core processors and 32GB of RAM.  The below query uses
 the members_sorted_idx_001 index in oracle, but in postgres, the
 optimizer chooses a sequential scan.

 explain analyze create table tmp_srcmem_emws1
 as
 select emailaddress, websiteid
  from members
  where emailok = 1
   and emailbounced = 0;


Maybe a couple indexes to try:

create index members_emailok_emailbounced_idx on members (emailok,emailbounced);

or a functional index (will likely be smaller, depending on the
contents of your table):
create index members_emailok_emailbounced_idx on members
(emailok,emailbounced) where emailok = 1 and emailbounced = 0; -- if
you use that combination of 1 and 0 regularly

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Ivan Voras

On 10/14/10 21:43, Tony Capobianco wrote:



We have 4 quad-core processors and 32GB of RAM.  The below query uses
the members_sorted_idx_001 index in oracle, but in postgres, the
optimizer chooses a sequential scan.

explain analyze create table tmp_srcmem_emws1
as
select emailaddress, websiteid
   from members
  where emailok = 1
and emailbounced = 0;
   QUERY
PLAN
--
  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
(actual time=0.052..685834.785 rows=236660930 loops=1)
Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
  Total runtime: 850306.220 ms
(3 rows)



Indexes:
 email_website_unq UNIQUE, btree (emailaddress, websiteid),
tablespace members_idx
 member_addeddateid_idx btree (addeddate_id), tablespace
members_idx
 member_changedateid_idx btree (changedate_id), tablespace
members_idx
 members_fdate_idx btree (to_char_year_month(addeddate)),
tablespace esave_idx
 members_memberid_idx btree (memberid), tablespace members_idx
 members_mid_emailok_idx btree (memberid, emailaddress, zipcode,
firstname, emailok), tablespace members_idx
 members_sorted_idx_001 btree (websiteid, emailok, emailbounced,
addeddate, memberid, zipcode, statecode, emailaddress), tablespace
members_idx
 members_src_idx btree (websiteid, emailbounced, sourceid),
tablespace members_idx
 members_wid_idx btree (websiteid), tablespace members_idx


PostgreSQL doesn't fetch data directly from indexes, so there is no way 
for it to reasonably use an index declared like:


members_sorted_idx_001 btree (websiteid, emailok, emailbounced, 
addeddate, memberid, zipcode, statecode, emailaddress)


You need a direct index on the fields you are using in your query, i.e. 
an index on (emailok, emailbounced).


OTOH, those columns look boolean-like. It depends on what your data set 
is, but if the majority of records contain (emailok=1 and 
emailbounced=0) an index may not help you much.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Cédric Villemain
2010/10/14 Tony Capobianco tcapobia...@prospectiv.com:
 We are in the process of testing migration of our oracle data warehouse
 over to postgres.  A potential showstopper are full table scans on our
 members table.  We can't function on postgres effectively unless index
 scans are employed.  I'm thinking I don't have something set correctly
 in my postgresql.conf file, but I'm not sure what.

 This table has approximately 300million rows.

and your query grab rows=236 660 930 of them. An index might be
useless in this situation.


 Version:
 SELECT version();

 version
 --
  PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit

 We have 4 quad-core processors and 32GB of RAM.  The below query uses
 the members_sorted_idx_001 index in oracle, but in postgres, the
 optimizer chooses a sequential scan.

 explain analyze create table tmp_srcmem_emws1
 as
 select emailaddress, websiteid
  from members
  where emailok = 1
   and emailbounced = 0;
                                                          QUERY
 PLAN
 --
  Seq Scan on members  (cost=0.00..14137154.64 rows=238177981 width=29)
 (actual time=0.052..685834.785 rows=236660930 loops=1)
   Filter: ((emailok = 1::numeric) AND (emailbounced = 0::numeric))
  Total runtime: 850306.220 ms
 (3 rows)

 show shared_buffers ;
  shared_buffers
 
  7680MB
 (1 row)

 show effective_cache_size ;
  effective_cache_size
 --
  22GB
 (1 row)

 show work_mem ;
  work_mem
 --
  768MB
 (1 row)

 show enable_seqscan ;
  enable_seqscan
 
  on
 (1 row)

 Below are the data definitions for the table/indexes in question:

 \d members
                     Table members
       Column        |            Type             | Modifiers
 -+-+---
  memberid            | numeric                     | not null
  firstname           | character varying(50)       |
  lastname            | character varying(50)       |
  emailaddress        | character varying(50)       |
  password            | character varying(50)       |
  address1            | character varying(50)       |
  address2            | character varying(50)       |
  city                | character varying(50)       |
  statecode           | character varying(50)       |
  zipcode             | character varying(50)       |
  birthdate           | date                        |
  emailok             | numeric(2,0)                |
  gender              | character varying(1)        |
  addeddate           | timestamp without time zone |
  emailbounced        | numeric(2,0)                |
  changedate          | timestamp without time zone |
  optoutsource        | character varying(100)      |
  websiteid           | numeric                     |
  promotionid         | numeric                     |
  sourceid            | numeric                     |
  siteid              | character varying(64)       |
  srcwebsiteid        | numeric                     |
  homephone           | character varying(20)       |
  homeareacode        | character varying(10)       |
  campaignid          | numeric                     |
  srcmemberid         | numeric                     |
  optoutdate          | date                        |
  regcomplete         | numeric(1,0)                |
  regcompletesourceid | numeric                     |
  ipaddress           | character varying(25)       |
  pageid              | numeric                     |
  streetaddressstatus | numeric(1,0)                |
  middlename          | character varying(50)       |
  optinprechecked     | numeric(1,0)                |
  optinposition       | numeric                     |
  homephonestatus     | numeric                     |
  addeddate_id        | numeric                     |
  changedate_id       | numeric                     |
  rpmindex            | numeric                     |
  optmode             | numeric(1,0)                |
  countryid           | numeric                     |
  confirmoptin        | numeric(2,0)                |
  bouncedate          | date                        |
  memberageid         | numeric                     |
  sourceid2           | numeric                     |
  remoteuserid        | character varying(50)       |
  goal                | numeric(1,0)                |
  flowdepth           | numeric                     |
  pagetype            | numeric                     |
  savepassword        | character varying(50)       |
  customerprofileid   | numeric                     |
 Indexes:
    email_website_unq UNIQUE, btree (emailaddress, websiteid),
 tablespace members_idx
    member_addeddateid_idx 

Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Pierre C



 emailok | numeric(2,0)|


Note that NUMERIC is meant for
- really large numbers with lots of digits
- or controlled precision and rounding (ie, order total isn't  
99. $)


Accordingly, NUMERIC is a lot slower in all operations, and uses a lot  
more space, than all the other numeric types.


I see many columns in your table that are declared as NUMERIC but should  
be BOOLs, or SMALLINTs, or INTs, or BIGINTs.


Perhaps Oracle handles these differently, I dunno.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Jon Nelson
Just my take on this.

The first thing I'd do is think real hard about whether you really
really want 'numeric' instead of boolean, smallint, or integer.  The
second thing is that none of your indices (which specify a whole bunch
of fields, by the way) have only just emailok, emailbounced, or only
the pair of them. Without knowing the needs of your app, I would
reconsider your index choices and go with fewer columns per index.

For this particular query I would think either two indexes (depending
on the cardinality of the data, one for each of emailok, emailbounced)
or one index (containing both emailok, emailbounced) would make quite
a bit of difference. Consider creating the indexes using a WITH
clause, for example:

CREATE INDEX members_just_an_example_idx ON members (emailok,
emailbounced) WHERE emailok = 1 AND emailbounced = 0;

Obviously that index is only useful in situations where both fields
are specified with those values. Furthermore, if the result is such
that a very high percentage of the table has those conditions a
sequential scan is going to be cheaper, anyway.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Mladen Gogala

 On 10/14/2010 4:10 PM, Jon Nelson wrote:

The first thing I'd do is think real hard about whether you really
really want 'numeric' instead of boolean, smallint, or integer.  The
second thing is that none of your indices (which specify a whole bunch
of fields, by the way) have only just emailok, emailbounced, or only
the pair of them. Without knowing the needs of your app, I would
reconsider your index choices and go with fewer columns per index.

Also, make sure that the statistics is good, that histograms are large 
enough and that Geico (the genetic query optimizer) will really work 
hard to save you 15% or more on the query execution time. You can also 
make sure that any index existing index is used,  by disabling the 
sequential scan and then activating and de-activating indexes with the 
dummy expressions,  just as it was done with Oracle's rule based optimizer.
I agree that a good data model is even more crucial for Postgres than is 
the case with Oracle. Oracle, because of its rich assortment of tweaking 
 hacking tools and parameters, can be made to perform, even if the 
model is designed by someone who didn't apply the rules of good design. 
Postgres is much more susceptible to bad models and it is much harder to 
work around a badly designed model in Postgres than in Oracle. What 
people do not understand is that every application in the world will 
become badly designed after years of maintenance, adding columns, 
creating additional indexes, views, tables and triggers and than 
deploying various tools to design applications.  As noted by Murphy, 
things develop from bad to worse. Keep Postgres models simple and 
separated, because it's much easier to keep clearly defined models 
simple and effective than to keep models with 700 tables and 350 views, 
frequently with conflicting names, different columns named the same and 
same columns named differently. And monitor, monitor, monitor. Use 
strace, ltrace,  pgstatspack, auto_explain, pgfouine, pgadmin, top, sar, 
iostat and all tools you can get hold of. Without the event interface, 
it's frequently a guessing game.  It is, however, possible to manage 
things.  If working with partitioning, be very aware that PostgreSQL 
optimizer has certain problems with partitions, especially with group 
functions. If you want speed, everything must be prefixed with 
partitioning column: indexes, expressions, joins. There is no explicit 
star schema and creating hash indexes will not buy you much, as a matter 
of fact, Postgres community is extremely suspicious of the hash indexes 
and I don't see them widely used.
Having said that, I was able to solve the problems with my speed and 
partitioning.


--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance