Re: [PERFORM] Deleting millions of rows

2009-02-03 Thread Andrew Lazarus
Hello All,


TL If you're deleting very many but not all rows, people tend
TL to drop the FK constraints and re-establish them afterwards.

I find

BEGIN;
CREATE TEMP TABLE remnant AS
  SELECT * FROM bigtable WHERE (very_restrictive_condition);
TRUNCATE TABLE bigtable;
INSERT INTO bigtable SELECT * FROM remnant;
COMMIT;
ANALYSE bigtable;

works well because there is no possibility of my forgetting FKs.


-- 
Sincerely,
 Andrew Lazarusmailto:and...@pillette.comBEGIN:VCARD
VERSION:2.1
N:Lazarus;Andrew;;;Ph.D.
FN:Andrew Lazarus, Ph.D.
EMAIL;PREF;INTERNET:and...@pillette.com
TITLE:Director of RD
ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S=
 treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA
X-GENDER:Male
REV:18991230T08Z
END:VCARD

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] when is a DELETE FK trigger planned?

2008-02-27 Thread Andrew Lazarus
I figured out what appears to happen with cascading delete using a
seqscan. In this case, the foreign keys in the child table are not
equally distributed. A few parent values occur often. Most parent
values do not occur at all. So the planner, faced with an unknown
generic key, takes the safe route.

What I've done is remove the FK (maybe it would be better to leave it
albeit disabled for documentation) and written my own AFTER DELETE
trigger that uses EXECUTE to delay planning until the actual value is
known. This appears to work correctly.

-- 
Sincerely,
 Andrew Lazarusmailto:[EMAIL PROTECTED]BEGIN:VCARD
VERSION:2.1
N:Lazarus;Andrew;;;Ph.D.
FN:Andrew Lazarus, Ph.D.
EMAIL;PREF;INTERNET:[EMAIL PROTECTED]
TITLE:Director of RD
ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S=
 treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA
X-GENDER:Male
REV:18991230T08Z
END:VCARD
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] when is a DELETE FK trigger planned?

2008-02-25 Thread Andrew Lazarus
I have a cascading delete trigger that is obviously using a seqscan.
(Explain analyze shows that trigger as taking over 1000s while all
other triggers are 1s. The value in test delete didn't even appear in
this child table, so an index scan would have been almost instant.)

If I do
DELETE FROM child_table WHERE fkey = value;
I get an index scan. Why doesn't the trigger do that, and how can I
force it to re-plan?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] index structure for 114-dimension vector

2007-05-01 Thread Andrew Lazarus
Let me just thank the list, especially for the references. (I found
similar papers myself with Google: and to think I have a university
library alumni card and barely need it any more!)

I'll write again on the sorts of results I get.
BEGIN:VCARD
VERSION:2.1
N:Lazarus;Andrew;;;Ph.D.
FN:Andrew Lazarus, Ph.D.
EMAIL;PREF;INTERNET:[EMAIL PROTECTED]
TITLE:Director of RD
ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S=
 treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA
X-GENDER:Male
REV:18991230T08Z
END:VCARD
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] index structure for 114-dimension vector

2007-04-20 Thread Andrew Lazarus
I have a table with 2.5 million real[] arrays. (They are points in a
time series.) Given a new array X, I'd like to find, say, the 25
closest to X in some sense--for simplification, let's just say in the
usual vector norm. Speed is critical here, and everything I have tried
has been too slow.

I imported the cube contrib package, and I tried creating an index on
a cube of the last 6 elements, which are the most important. Then I
tested the 2.5MM rows for being contained within a tolerance of the
last 6 elements of X, +/- 0.1 in each coordinate, figuring that would
be an indexed search (which I CLUSTERED on). I then ran the sort on
this smaller set. The index was used, but it was still too slow. I
also tried creating new columns with rounded int2 values of the last 6
coordinates and made a multicolumn index.

For each X the search is taking about 4-15 seconds which is above my
target at least one order of magnitude. Absolute numbers are dependent
on my hardware and settings, and some of this can be addressed with
configuration tweaks, etc., but first I think I need to know the
optimum data structure/indexing strategy.

Is anyone on the list experienced with this sort of issue?

Thanks.
Andrew Lazarus  [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] index structure for 114-dimension vector

2007-04-20 Thread Andrew Lazarus
Because I know the 25 closest are going to be fairly close in each
coordinate, I did try a multicolumn index on the last 6 columns and
used a +/- 0.1 or 0.2 tolerance on each. (The 25 best are very probably inside
that hypercube on the distribution of data in question.)

This hypercube tended to have 10-20K records, and took at least 4
seconds to retrieve. I was a little surprised by how long that took.
So I'm wondering if my data representation is off the wall.

I should mention I also tried a cube index using gist on all 114
elements, but CREATE INDEX hadn't finished in 36 hours, when I killed
it, and I wasn't in retrospect sure an index that took something like
6GB by itself would be helpful on a 2GB of RAM box.

MK I don't think that will work for the vector norm i.e:

MK |x - y| = sqrt(sum over j ((x[j] - y[j])^2))


MK Cheers

MK Mark


-- 
Sincerely,
 Andrew Lazarusmailto:[EMAIL PROTECTED]BEGIN:VCARD
VERSION:2.1
N:Lazarus;Andrew;;;Ph.D.
FN:Andrew Lazarus, Ph.D.
EMAIL;PREF;INTERNET:[EMAIL PROTECTED]
TITLE:Director of RD
ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S=
 treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA
X-GENDER:Male
REV:18991230T08Z
END:VCARD
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Andrew Lazarus

 How about this option:

 SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING
 (ip) RIGHT JOIN  host_events USING (ip) WHERE
 (network_events.name='blah' OR host_events.name = 'blah')  AND
 ip_info.ip IS NOT NULL;

MA Nah, that seems to be much much worse.  The other queries usually
MA return in 1-2 minutes, this one has been running for 30 minutes and
MA has still not returned

I find that an OR involving two different fields (in this case even
different tables) is faster when replaced by the equivalent UNION. In this
case---

SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING
(ip) WHERE
network_events.name='blah' AND ip_info.ip IS NOT NULL
UNION
SELECT distinct ip_info.* FROM ip_info RIGHT JOIN host_events USING (ip) WHERE
host_events.name = 'blah'  AND ip_info.ip IS NOT NULL;

Moreover, at least through 8.1, GROUP BY is faster than DISTINCT.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] group by will not use an index?

2007-01-09 Thread Andrew Lazarus
For the reasons indicated (that is, MVCC), PG can not do a DISTINCT or the 
equivalent
GROUP BY from index values alone.

If this table is large, perhaps you could denormalize and maintain a
summary table with date (using truncation) and count, updated with
triggers on the original table. This table will presumably have a
small number of rows at the cost of doubling the times for updates,
inserts, and deletes.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] execution plan : Oracle vs PostgreSQL

2006-02-06 Thread Andrew Lazarus
Indeed there is: you can use an ARRAY constructor with SELECT. Here's 
some PGPLSQL code I have (simplified and with the variable names shrouded).


 SELECT INTO m
   ARRAY(SELECT d FROM hp
   WHERE hp.ss=$1
ORDER BY 1);


FERREIRA, William (VALTECH) wrote:


maybe there is an other way to query children directly into an array and having 
query plan caching ?


begin:vcard
fn:Andrew Lazarus
n:Lazarus;Andrew
org:Pillette Investment Management;Research and Development
adr;dom:;;3028 Fillmore;San Francisco;CA;94123
email;internet:[EMAIL PROTECTED]
title:Director
tel;work:800-366-0688
tel;fax:415-440-4093
url:http://www.pillette.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Please Help: PostgreSQL performance Optimization

2006-01-12 Thread Andrew Lazarus

Jamal Ghaffour wrote:


CREATE TABLE cookies (
   domain varchar(50) NOT NULL,
   path varchar(50) NOT NULL,
   name varchar(50) NOT NULL,
   principalid varchar(50) NOT NULL,
   host text NOT NULL,
   value text NOT NULL,
   secure bool NOT NULL,
   timestamp timestamp with time zone NOT NULL DEFAULT 
CURRENT_TIMESTAMP+TIME '04:00:00',

   PRIMARY KEY  (domain,path,name,principalid)
)

[snip]
SELECT path, upper(name) AS name, value FROM cookies  WHERE timestampCURRENT_TIMESTAMP AND principalid='192.168.8.219' AND 
secure=FALSE AND (domain='ping.icap-elios.com' OR domain='.icap-elios.com')


I think the problem here is that the column order in the index doesn't 
match the columns used in the WHERE clause criteria. Try adding an index 
on (domain,principalid) or (domain,principalid,timestamp). If these are 
your only queries, you can get the same effect by re-ordering the 
columns in the table so that this is the column order used by the 
primary key and its implicit index.


You should check up on EXPLAIN and EXPLAIN ANALYZE to help you debug 
slow queries.
begin:vcard
fn:Andrew Lazarus
n:Lazarus;Andrew
org:Pillette Investment Management;Research and Development
adr;dom:;;3028 Fillmore;San Francisco;CA;94123
email;internet:[EMAIL PROTECTED]
title:Director
tel;work:800-366-0688
tel;fax:415-440-4093
url:http://www.pillette.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] What *_mem to increase when running CLUSTER

2005-08-25 Thread Andrew Lazarus
Should I temporarily increase sort_mem, vacuum_mem, neither, or both 
when doing a CLUSTER on a large (100 million row) table where as many as 
half of the tuples are deadwood from UPDATEs or DELETEs? I have large 
batch (10 million row) inserts, updates, and deletes so I'm not sure 
frequent vacuuming would help.


begin:vcard
fn:Andrew Lazarus
n:Lazarus;Andrew
org:Pillette Investment Management;Research and Development
adr;dom:;;3028 Fillmore;San Francisco;CA;94123
email;internet:[EMAIL PROTECTED]
title:Director
tel;work:800-366-0688
tel;fax:415-440-4093
url:http://www.pillette.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Fwd: Index on table when using DESC clause

2005-05-23 Thread Andrew Lazarus

As far as I know, to use a straight index Postgres requires either

ORDER BY pages, description -- or --
ORDER BY pages DESC, description DESC.

If you want the results by pages DESC, description ASC, then you have to 
make an index on an expression or define your own operator or something 
esoteric like that. I would think the ability to have an index where the 
columns don't all collate in the same direction would be an easy feature 
to add.
begin:vcard
fn:Andrew Lazarus
n:Lazarus;Andrew
org:Pillette Investment Management;Research and Development
adr;dom:;;3028 Fillmore;San Francisco;CA;94123
email;internet:[EMAIL PROTECTED]
title:Director
tel;work:800-366-0688
tel;fax:415-440-4093
url:http://www.pillette.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query Optimization

2004-12-15 Thread Andrew Lazarus
sarlav kumar wrote:
Hi all,
 
Can someone please help me optimize this query? Is there a better way to 
write this query? I am generating a report of transactions ordered by 
time and with details of the sender and receiver etc.
 
SELECT distinct a.time::date ||'br'||substring(a.time::time::text,1,8) 
as Time,
CASE WHEN a.what = 0 THEN 'Money Transfer' WHEN a.what = 15 THEN 
'Purchase' WHEN a.what = 26 THEN 'Merchant Streamline' WHEN a.what = 13 
THEN 'Reversal' END  as Transaction_Type ,
c1.account_no as SenderAccount, c2.account_no as RecieverAccount,
b.country as SenderCountry, d.country as RecieverCountry,
b.firstname as SenderFirstName, b.lastname as SenderLastName,
d.firstname as ReceiverFirstName, d.lastname as ReceiverLastName,
a.status as status,
(select sum(td.amount * 0.01) from transaction_data td where td.data_id 
= a2.id and td.dir = 1 and td.uid = a.target_uid) as ReversedAmount,
(select sum(td.amount * 0.01) from transaction_data td where td.data_id 
= a2.id and td.dir = 0 and td.uid = a.uid ) as DepositedAmount, a.flags, 
(a.amount * 0.01) as Amount,
(a.fee * 0.01) as Fee
FROM data a, customerdata b, customerdata d, customer c1, customer c2 , 
participant p, data a2
WHERE p.id = a.partner_id AND (a.uid = c1.id) AND (a.target_uid = c2.id) 
and c1.id=b.uid and c2.id=d.uid
and a.confirmation is not null AND (a2.ref_id = a.id) and
((a2.what = 13) or (a2.what = 17) ) ORDER BY time desc ;
(query plan followed)
The expensive operation is the UNIQUE. Are you sure, in terms of 
business logic, that this is necessary? Is it actually possible to have 
duplicate transactions at the exact same time, and if so, would you 
really want to eliminate them?

As an aside, I prefer to have numeric constants like the 'what' field in 
a small lookup table of two columns (what_code, what_description); it's 
easier to extend and to document.
begin:vcard
fn:Andrew Lazarus
n:Lazarus;Andrew
org:Pillette Investment Management;Research and Development
adr;dom:;;3028 Fillmore;San Francisco;CA;94123
email;internet:[EMAIL PROTECTED]
title:Director
tel;work:800-366-0688
tel;fax:415-440-4093
url:http://www.pillette.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])