Re: [PERFORM] same plan, add 1 condition, 1900x slower

2005-11-11 Thread Mitchell Skinner
On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote:
 After re-reading your explanation of what you're doing with the data,
 I thought of a possible explanation.  Is the source value exactly
 correlated with the external_id_map primary key?

Sort of.  In this case, at the beginning of external_id_map, yes, though
further down the table they're not.  For example, if we got new subjects
from 'SCH' at this point, they'd get assigned external_id_map.target_id
(the primary key) values that are totally unrelated to what the current
set are (the values in the external_id_map primary key just come off of
a sequence that we use for everything).

Right now though, since the 'SCH' data came in a contiguous chunk right
at the beginning and hasn't changed or grown since then, the correlation
is pretty exact, I think.  It's true that there are no 'SCH' rows in the
table after the first contiguous set (when I get back to work I'll check
exactly what row that is).  It's interesting that there are these
correlations in the the data that didn't exist at all in my mental
model.

 what you need to do is incorporate the source value into the
 external_id_map index key somehow.  Then the index scan would be able to
 realize that there is no possibility of finding another row with source
 = 'SCH'.  The simplest way is just to make a 2-column index

I thought that's what I had done with the
external_id_map_source_target_id index:

statgen= \d util.external_id_map
 Table util.external_id_map
  Column   | Type  | Modifiers
---+---+---
 source_id | character varying(32) | not null
 source| character(3)  | not null
 target_id | bigint| not null
Indexes:
external_id_map_primary_key PRIMARY KEY, btree (target_id)
external_id_map_source_source_id_unique UNIQUE, btree (source,
source_id)
external_id_map_source btree (source)
external_id_map_source_target_id btree (source, target_id)
Foreign-key constraints:
external_id_map_source FOREIGN KEY (source) REFERENCES
util.source(id)

So if I understand your suggestion correctly, we're back to the why
isn't this query using index foo FAQ.  For the external_id_map table,
the statistics target for source is 200; the other two columns are at
the default level because I didn't think of them as being very
interesting statistics-wise.  I suppose I should probably go ahead and
raise the targets for every column of that table; I expect the planning
time is negligible, and our queries tend to be large data-wise.  Beyond
that, I'm not sure how else to encourage the use of that index.  If I
changed that index to be (target_id, source) would it make a difference?

Thanks for your help,
Mitch 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Faster db architecture for a twisted table.

2005-12-06 Thread Mitchell Skinner
On Sat, 2005-12-03 at 23:00 +, Rodrigo Madera wrote:
 CREATE TABLE person(
id bigint PRIMARY KEY,
first_name TEXT,
age INT,
mother bigint REFERENCES person,
father biging REFERENCES person,
siblings array of bigints  (don't remember the syntax, but you get the 
 point)
 );
 
 Well, this is ok, but imagine a search for  brothers of person id
 34. We would have to search inside the record's 'siblings' array. Is
 this a bad design? is this going to be slow?

Well, I don't know how close this example is to your actual problem, but
the siblings array is redundant, AFAICS.  If you got rid of it, you
could query for full sibling brothers with something like (not tested):

select bro.* from
person p inner join person bro
on (p.mother = bro.mother)
   AND (p.father = bro.father)
where
bro.sex='M' and p.id=34

...assuming you added a sex field, which you would need in any case to
query for brothers.

You could query for half-siblings by changing the AND into an OR, I
think.

Mitch


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Simple Join

2005-12-20 Thread Mitchell Skinner
On Wed, 2005-12-14 at 17:47 -0500, Tom Lane wrote:
 That plan looks perfectly fine to me.  You could try forcing some other
 choices by fooling with the planner enable switches (eg set
 enable_seqscan = off) but I doubt you'll find much improvement.  There
 are too many rows being pulled from ordered_products to make an index
 nestloop a good idea.

Well, I'm no expert either, but if there was an index on
ordered_products (paid, suspended_sub, id) it should be mergejoinable
with the index on to_ship.ordered_product_id, right?  Given the
conditions on paid and suspended_sub.

If you (Kevin) try adding such an index, ideally it would get used given
that you're only pulling out a small fraction of the rows in to_ship.
If it doesn't get used, then I had a similar issue with 8.0.3 where an
index that was mergejoinable (only because of the restrictions in the
where clause) wasn't getting picked up.

Mitch

Kevin Brown wrote:
 CREATE TABLE to_ship
 (
   id int8 NOT NULL DEFAULT nextval(('to_ship_seq'::text)::regclass),
   ordered_product_id int8 NOT NULL,
   bounced int4 NOT NULL DEFAULT 0,
   operator_id varchar(20) NOT NULL,
   timestamp timestamptz NOT NULL DEFAULT ('now'::text)::timestamp(6)
 with 
 time zone,
   CONSTRAINT to_ship_pkey PRIMARY KEY (id),
   CONSTRAINT to_ship_ordered_product_id_fkey FOREIGN KEY
 (ordered_product_id) 
 REFERENCES ordered_products (id) ON UPDATE RESTRICT ON DELETE RESTRICT
 ) 
 WITHOUT OIDS;
 
 CREATE TABLE ordered_products
 (
   id int8 NOT NULL DEFAULT
 nextval(('ordered_products_seq'::text)::regclass),
   order_id int8 NOT NULL,
   product_id int8 NOT NULL,
   recipient_address_id int8 NOT NULL,
   hide bool NOT NULL DEFAULT false,
   renewal bool NOT NULL DEFAULT false,
   timestamp timestamptz NOT NULL DEFAULT ('now'::text)::timestamp(6)
 with 
 time zone,
   operator_id varchar(20) NOT NULL,
   suspended_sub bool NOT NULL DEFAULT false,
   quantity int4 NOT NULL DEFAULT 1,
   price_paid numeric NOT NULL,
   tax_paid numeric NOT NULL DEFAULT 0,
   shipping_paid numeric NOT NULL DEFAULT 0,
   remaining_issue_obligation int4 NOT NULL DEFAULT 0,
   parent_product_id int8,
   delivery_method_id int8 NOT NULL,
   paid bool NOT NULL DEFAULT false,
   CONSTRAINT ordered_products_pkey PRIMARY KEY (id),
   CONSTRAINT ordered_products_order_id_fkey FOREIGN KEY (order_id)
 REFERENCES 
 orders (id) ON UPDATE RESTRICT ON DELETE RESTRICT,
   CONSTRAINT ordered_products_parent_product_id_fkey FOREIGN KEY 
 (parent_product_id) REFERENCES ordered_products (id) ON UPDATE
 RESTRICT ON 
 DELETE RESTRICT,
   CONSTRAINT ordered_products_recipient_address_id_fkey FOREIGN KEY 
 (recipient_address_id) REFERENCES addresses (id) ON UPDATE RESTRICT ON
 DELETE 
 RESTRICT
 ) 
 WITHOUT OIDS;
 
 === The two indexes that should matter ===
 CREATE INDEX ordered_product_id_index
   ON to_ship
   USING btree
   (ordered_product_id);
 
 CREATE INDEX paid_index
   ON ordered_products
   USING btree
   (paid);
 
 ordered_products.id is a primary key, so it should have an implicit
 index.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster