Re: [PERFORM] Query plan excluding index on view

2008-04-04 Thread Tom Lane
"Matt Klinker" <[EMAIL PROTECTED]> writes:
> --Joined View:
> CREATE OR REPLACE VIEW directory_listing AS
>  SELECT school.id, school.name, school.description, 119075291 AS
> listing_type_fid
>FROM school
> UNION ALL
>  SELECT company.id, company.name, company.description, 119074833 AS
> listing_type_fid
>FROM company;

Ah, there's the problem :-(.  Can you get rid of the constants here?
The planner's currently not smart about UNION ALL subqueries unless
their SELECT lists contain just simple column references.

(Yes, fixing that is on the todo list, but don't hold your breath...
it'll be 8.4 material at the earliest.)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioned tables - planner wont use indexes

2008-04-04 Thread Gregory Stark

"kevin kempter" <[EMAIL PROTECTED]> writes:

> that the planner wants to do a sequential scan on each  partition. We do have
> "constraint_elimination = on" set in the  postgresql.conf file.

"constraint_exclusion" btw.


> myDB=# explain SELECT min(logdate) FROM part_master;

Er, yeah. Unfortunately this is just not a kind of query our planner knows how
to optimize when dealing with a partitioned table... yet. There are several
different pieces missing to make this work. There's some hope some of them
might show up for 8.4 but no guarantees.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioned tables - planner wont use indexes

2008-04-04 Thread paul rivers

kevin kempter wrote:

Hi List;

Sorry if this is a dupe, my first post never showed up...

I'm having some performance issues with a partitioned table. We have a 
VERY large table that we've partitioned by day.




Unfortunately, that is the defined behavior in this case. From 5.9.6 of 
the manual:


"Constraint exclusion only works when the query's WHERE clause contains 
constants."


[Where the constants are of course your partitioning column(s)]


The best way around this depends mostly on what you're up to. You can 
get the min tablename from the catalogs, or you can keep a table of 
active partitions that your script which drops off old partitions and 
generates new ones can keep updated on the oldest/newest partition 
dates. Or some number of other solutions, whatever you find cleanest for 
your purposes.


Paul



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Partitioned tables - planner wont use indexes

2008-04-04 Thread kevin kempter

Hi List;

Sorry if this is a dupe, my first post never showed up...

I'm having some performance issues with a partitioned table. We have a  
VERY large table that we've partitioned by day.


Currently we have 17 partitions - each partition table contains >  
700million rows.
One of the things we need to query is the min date from the master  
table - we may explore alternatives for this particular query, however  
even if we fix this query I think we have a fundamental issue with the  
use of indexes (actuallt the non-use) by the planner.


Below is a sample of the DDL used to create our tables and an explain  
showing that the planner wants to do a sequential scan on each  
partition. We do have "constraint_elimination = on" set in the  
postgresql.conf file.


I tried removing the index from the part_master table and got the same  
result


Likewise the costs associated with the seq scans seem to be way off  
(yes I've run analyze on the master and all partition tables) - I ran  
the actual SQL statement below and killed it after about 15min.


Thanks in advance for any help, advice, etc...




Tables:

--
-- Master Table
--
CREATE TABLE part_master (
   filename character varying(100),
   logdate date,
   ... -- about 50 more columns go here
loghour date,
   url character varying(500),
customer character varying(500)
);
CREATE INDEX master_logdate ON part_master USING btree (logdate);

--
-- Partitions:
--

--
-- part_20080319
--
CREATE TABLE part_20080319 (CONSTRAINT part_20080319_logdate_check
CHECK ((logdate = '2008-03-19'::date))
)
INHERITS (part_master);


CREATE INDEX idx_part_20080319_customer ON part_20080319 USING btree  
(customer);
CREATE INDEX idx_part_20080319_logdate ON part_20080319 USING btree  
(logdate);
CREATE INDEX idx_part_20080319_loghour ON part_20080319 USING btree  
(loghour);



--
-- part_20080320
--
CREATE TABLE part_20080320 (CONSTRAINT part_20080320_logdate_check
CHECK ((logdate = '2008-03-20'::date))
)
INHERITS (part_master);


CREATE INDEX idx_part_20080320_customer ON part_20080320 USING btree  
(customer);
CREATE INDEX idx_part_20080320_logdate ON part_20080320 USING btree  
(logdate);
CREATE INDEX idx_part_20080320_loghour ON part_20080320 USING btree  
(loghour);



-- And so on, thru part_20080404



--
-- explain plan
--

myDB=# explain SELECT min(logdate) FROM part_master;
  QUERY PLAN
-
Aggregate  (cost=117070810.10..117070810.11 rows=1 width=4)
  ->  Append  (cost=0.00..114866502.48 rows=881723048 width=4)
->  Seq Scan on part_master  (cost=0.00..85596244.18  
rows=679385718 width=4)
->  Seq Scan on part_20080319 part  (cost=0.00..212860.86  
rows=1674986 width=4)
->  Seq Scan on part_20080320 part  (cost=0.00..1753802.51  
rows=13782951 width=4)
->  Seq Scan on part_20080321 part  (cost=0.00..2061636.83  
rows=15881283 width=4)
->  Seq Scan on part_20080322 part  (cost=0.00..1965144.71  
rows=14936971 width=4)
->  Seq Scan on part_20080323 part  (cost=0.00..1614413.18  
rows=12345618 width=4)
->  Seq Scan on part_20080324 part  (cost=0.00..1926520.22  
rows=14741022 width=4)
->  Seq Scan on part_20080325 part  (cost=0.00..2356704.22  
rows=18477622 width=4)
->  Seq Scan on part_20080326 part  (cost=0.00..1889267.71  
rows=14512171 width=4)
->  Seq Scan on part_20080327 part  (cost=0.00..1622100.34  
rows=12445034 width=4)
->  Seq Scan on part_20080328 part  (cost=0.00..1711779.49  
rows=12885749 width=4)
->  Seq Scan on part_20080329 part  (cost=0.00..1568192.94  
rows=11958394 width=4)
->  Seq Scan on part_20080330 part  (cost=0.00..1521204.64  
rows=11676564 width=4)
->  Seq Scan on part_20080331 part  (cost=0.00..1587138.77  
rows=12180377 width=4)
->  Seq Scan on part_20080401 part  (cost=0.00..2324352.82  
rows=18211382 width=4)
->  Seq Scan on part_20080402 part  (cost=0.00..2891295.04  
rows=6693804 width=4)
->  Seq Scan on part_20080403 part  (cost=0.00..1707327.48  
rows=5748348 width=4)
->  Seq Scan on part_20080404 part  (cost=0.00..556516.54  
rows=4185054 width=4)

(20 rows)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan excluding index on view

2008-04-04 Thread Matt Klinker
I'm sorry for the "fan-dance", it was not my intention to make it difficult
but actually simpler in leaving out the finer details - lesson learned.
Below you'll find create scripts for all tables and views invlolved.  Also
I've included the explain text for both queries when ran on the 8.3 database
where what was included before was from 8.1  (I was incorrect in stating I
had tried version 8.2, as I thought the 8.1 install was 8.2 - my apologies).

--Table 1 - (Item A)  ~18M records
CREATE TABLE company
(
  id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
  "name" character varying(65) NOT NULL,
  description character varying(100),
  recordid character varying(10),
  full_address character varying(45),
  street_number character varying(10),
  street_directional character(2),
  street_name character varying(20),
  unit_designator character varying(4),
  unit_number character varying(8),
  city_name character varying(20),
  state_code character(2),
  zip character(5),
  zip_extension character(4),
  phone character varying(10),
  phone_code character(1),
  publish_date character varying(6),
  solicitation_restrictions character(1),
  business_flag character(1),
  latitude character varying(11),
  longitude character varying(11),
  precision_code character(1),
  fips character varying(16),
  is_telco_unique boolean,
  vanity_city_name character varying(20),
  book_number character varying(6),
  web_address character varying(50),
  primary_bdc_flag character(1),
  msa character varying(4),
  is_amex_accepted boolean,
  is_mastercard_accepted boolean,
  is_visa_accepted boolean,
  is_discover_accepted boolean,
  is_diners_accepted boolean,
  is_other_cc_accepted boolean,
  fax character varying(10),
  free_eac character(1),
  hours_of_operation character(1),
  is_spanish_spoken boolean,
  is_french_spoken boolean,
  is_german_spoken boolean,
  is_japanese_spoken boolean,
  is_italian_spoken boolean,
  is_korean_spoken boolean,
  is_chinese_spoken boolean,
  senior_discount_key character(1),
  listing_type_fid bigint,
  CONSTRAINT pk_company_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

--Table 2 - (Item B)  ~100k records
CREATE TABLE school
(
  id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
  "name" character varying(65) NOT NULL,
  description character varying(100),
  address1 character varying(100),
  address2 character varying(100),
  city character varying(50),
  state character(2),
  CONSTRAINT pk_school_id PRIMARY KEY (id)
)
WITH (OIDS=FALSE);

--Joined View:
CREATE OR REPLACE VIEW directory_listing AS
 SELECT school.id, school.name, school.description, 119075291 AS
listing_type_fid
   FROM school
UNION ALL
 SELECT company.id, company.name, company.description, 119074833 AS
listing_type_fid
   FROM company;

--Listing-Classification  Xref:  ~26M records
CREATE TABLE listing_node_xref
(
  id bigint NOT NULL DEFAULT nextval('global_sequence'::regclass),
  listing_fid bigint NOT NULL,
  node_fid bigint NOT NULL,
  CONSTRAINT pk_listing_node_xref PRIMARY KEY (id),
  CONSTRAINT fk_listing_node_xref_node_fid FOREIGN KEY (node_fid)
  REFERENCES node (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT uqe_listing_node_xref_listing_fid_node_fid UNIQUE (listing_fid,
node_fid)
)
WITH (OIDS=FALSE);
ALTER TABLE listing_node_xref OWNER TO vml;

CREATE INDEX idx_listing_node_xref_listing_fid
  ON listing_node_xref
  USING btree
  (listing_fid);

CREATE INDEX idx_listing_node_xref_node_fid
  ON listing_node_xref
  USING btree
  (node_fid);

Here is the version of Postgres:  PostgreSQL 8.3.1

Query:
SELECT l.id, l.name, l.description, l.listing_type_fid
FROM  directory_listing l
INNER JOIN listing_node_xref xref  ON  l.id = xref.listing_fid
WHERE xref.node_fid = 173204537

Explain:
Hash Join  (cost=48449.22..1223695.46 rows=11472 width=378)
  Hash Cond: (school.id = xref.listing_fid)
  ->  Append  (cost=0.00..945319.40 rows=18384970 width=378)
->  Seq Scan on school  (cost=0.00..10.80 rows=80 width=374)
->  Seq Scan on company  (cost=0.00..761458.90 rows=18384890
width=247)
  ->  Hash  (cost=48246.22..48246.22 rows=16240 width=8)
->  Bitmap Heap Scan on listing_node_xref xref
(cost=308.96..48246.22 rows=16240 width=8)
  Recheck Cond: (node_fid = 173204537)
  ->  Bitmap Index Scan on idx_listing_node_xref_node_fid
(cost=0.00..304.90 rows=16240 width=0)
Index Cond: (node_fid = 173204537)

Query:
select c.*
from company c
inner join listing_node_xref xref on c.id = xref.listing_fid
where xref.node_fid = 173204537

Explain:
Nested Loop  (cost=308.96..205552.40 rows=11471 width=424)
  ->  Bitmap Heap Scan on listing_node_xref xref  (cost=308.96..48246.22
rows=16240 width=8)
Recheck Cond: (node_fid = 173204537)
->  Bitmap Index Scan on idx_listing_node_xref_node_fid
(cost=0.00..304.90 rows=16240 width=0)
  Index Cond: (node_fid = 173204537)
  ->  Index Scan using p

Re: [PERFORM] Forcing more agressive index scans for BITMAP AND

2008-04-04 Thread PFC



On Fri, 4 Apr 2008, Ow Mun Heng wrote:

select * from table
where A=X
and B = Y
and C = Z
and D = AA
and E = BB


	With that kind of WHERE condition, Postgres will use a Bitmap Index Scan  
to combine your indices. If, however, postgres notices while looking at  
the statistics gathered during ANALYZE, that for one of your columns, you  
request a value that happens in a large percentage of the rows (like 20%),  
and this value has a rather random distribution, Postgres will not bother  
scanning the index, because it is very likely that all the pages would  
contain a row satisfying your condition anyway, so the time taken to scan  
this huge index and mark the bitmap would be lost because it would not  
allow a better selectivity, since all the pages would get selected for  
scan anyway.
	I would guess that Postgres uses Bitmap Index Scan only on your columns  
that have good selectivity (ie. lots of different values).


So :

	If you use conditions on (a,b) or (a,b,c) or (a,b,c,d) etc, you will  
benefit GREATLY from a multicolumn index on (a,b,c,d...).
	However, even if postgres can use some clever tricks, a multicolumn index  
on (a,b,c,d) will not be optimal for a condition on (b,c,d) for instance.


	So, if you mostly use conditions on a left-anchored subset of  
(a,b,c,d,e), the multicolumn index will be a great tool.
	A multicolumn index on (a,b,c,d,e) is always slightly slower than an  
index on (a) if you only use a condition on (a), but it is immensely  
faster when you use a multicolumn condition.


	Can you tell us more about what those columns mean and what you store in  
them, how many distinct values, etc ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing more agressive index scans for BITMAP AND

2008-04-04 Thread Matthew

On Fri, 4 Apr 2008, Ow Mun Heng wrote:

select * from table
where A=X
and B = Y
and C = Z
and D = AA
and E = BB


This may not be the answer you're looking for, but if you create a 
multi-coloumn index, it should be able to make your query run fast:


CREATE INDEX foo ON table (A, B, C, D, E);

It'll certainly be faster than building a bitmap for the contents of five 
separate indexes.


Matthew

--
-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-.
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance