Re: [PERFORM] Deleting millions of rows
Hello All, TL If you're deleting very many but not all rows, people tend TL to drop the FK constraints and re-establish them afterwards. I find BEGIN; CREATE TEMP TABLE remnant AS SELECT * FROM bigtable WHERE (very_restrictive_condition); TRUNCATE TABLE bigtable; INSERT INTO bigtable SELECT * FROM remnant; COMMIT; ANALYSE bigtable; works well because there is no possibility of my forgetting FKs. -- Sincerely, Andrew Lazarusmailto:and...@pillette.comBEGIN:VCARD VERSION:2.1 N:Lazarus;Andrew;;;Ph.D. FN:Andrew Lazarus, Ph.D. EMAIL;PREF;INTERNET:and...@pillette.com TITLE:Director of RD ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S= treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA X-GENDER:Male REV:18991230T08Z END:VCARD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] when is a DELETE FK trigger planned?
I figured out what appears to happen with cascading delete using a seqscan. In this case, the foreign keys in the child table are not equally distributed. A few parent values occur often. Most parent values do not occur at all. So the planner, faced with an unknown generic key, takes the safe route. What I've done is remove the FK (maybe it would be better to leave it albeit disabled for documentation) and written my own AFTER DELETE trigger that uses EXECUTE to delay planning until the actual value is known. This appears to work correctly. -- Sincerely, Andrew Lazarusmailto:[EMAIL PROTECTED]BEGIN:VCARD VERSION:2.1 N:Lazarus;Andrew;;;Ph.D. FN:Andrew Lazarus, Ph.D. EMAIL;PREF;INTERNET:[EMAIL PROTECTED] TITLE:Director of RD ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S= treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA X-GENDER:Male REV:18991230T08Z END:VCARD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] when is a DELETE FK trigger planned?
I have a cascading delete trigger that is obviously using a seqscan. (Explain analyze shows that trigger as taking over 1000s while all other triggers are 1s. The value in test delete didn't even appear in this child table, so an index scan would have been almost instant.) If I do DELETE FROM child_table WHERE fkey = value; I get an index scan. Why doesn't the trigger do that, and how can I force it to re-plan? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] index structure for 114-dimension vector
Let me just thank the list, especially for the references. (I found similar papers myself with Google: and to think I have a university library alumni card and barely need it any more!) I'll write again on the sorts of results I get. BEGIN:VCARD VERSION:2.1 N:Lazarus;Andrew;;;Ph.D. FN:Andrew Lazarus, Ph.D. EMAIL;PREF;INTERNET:[EMAIL PROTECTED] TITLE:Director of RD ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S= treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA X-GENDER:Male REV:18991230T08Z END:VCARD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] index structure for 114-dimension vector
I have a table with 2.5 million real[] arrays. (They are points in a time series.) Given a new array X, I'd like to find, say, the 25 closest to X in some sense--for simplification, let's just say in the usual vector norm. Speed is critical here, and everything I have tried has been too slow. I imported the cube contrib package, and I tried creating an index on a cube of the last 6 elements, which are the most important. Then I tested the 2.5MM rows for being contained within a tolerance of the last 6 elements of X, +/- 0.1 in each coordinate, figuring that would be an indexed search (which I CLUSTERED on). I then ran the sort on this smaller set. The index was used, but it was still too slow. I also tried creating new columns with rounded int2 values of the last 6 coordinates and made a multicolumn index. For each X the search is taking about 4-15 seconds which is above my target at least one order of magnitude. Absolute numbers are dependent on my hardware and settings, and some of this can be addressed with configuration tweaks, etc., but first I think I need to know the optimum data structure/indexing strategy. Is anyone on the list experienced with this sort of issue? Thanks. Andrew Lazarus [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] index structure for 114-dimension vector
Because I know the 25 closest are going to be fairly close in each coordinate, I did try a multicolumn index on the last 6 columns and used a +/- 0.1 or 0.2 tolerance on each. (The 25 best are very probably inside that hypercube on the distribution of data in question.) This hypercube tended to have 10-20K records, and took at least 4 seconds to retrieve. I was a little surprised by how long that took. So I'm wondering if my data representation is off the wall. I should mention I also tried a cube index using gist on all 114 elements, but CREATE INDEX hadn't finished in 36 hours, when I killed it, and I wasn't in retrospect sure an index that took something like 6GB by itself would be helpful on a 2GB of RAM box. MK I don't think that will work for the vector norm i.e: MK |x - y| = sqrt(sum over j ((x[j] - y[j])^2)) MK Cheers MK Mark -- Sincerely, Andrew Lazarusmailto:[EMAIL PROTECTED]BEGIN:VCARD VERSION:2.1 N:Lazarus;Andrew;;;Ph.D. FN:Andrew Lazarus, Ph.D. EMAIL;PREF;INTERNET:[EMAIL PROTECTED] TITLE:Director of RD ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S= treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA X-GENDER:Male REV:18991230T08Z END:VCARD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Subselect query enhancement
How about this option: SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING (ip) RIGHT JOIN host_events USING (ip) WHERE (network_events.name='blah' OR host_events.name = 'blah') AND ip_info.ip IS NOT NULL; MA Nah, that seems to be much much worse. The other queries usually MA return in 1-2 minutes, this one has been running for 30 minutes and MA has still not returned I find that an OR involving two different fields (in this case even different tables) is faster when replaced by the equivalent UNION. In this case--- SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING (ip) WHERE network_events.name='blah' AND ip_info.ip IS NOT NULL UNION SELECT distinct ip_info.* FROM ip_info RIGHT JOIN host_events USING (ip) WHERE host_events.name = 'blah' AND ip_info.ip IS NOT NULL; Moreover, at least through 8.1, GROUP BY is faster than DISTINCT. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] group by will not use an index?
For the reasons indicated (that is, MVCC), PG can not do a DISTINCT or the equivalent GROUP BY from index values alone. If this table is large, perhaps you could denormalize and maintain a summary table with date (using truncation) and count, updated with triggers on the original table. This table will presumably have a small number of rows at the cost of doubling the times for updates, inserts, and deletes. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] execution plan : Oracle vs PostgreSQL
Indeed there is: you can use an ARRAY constructor with SELECT. Here's some PGPLSQL code I have (simplified and with the variable names shrouded). SELECT INTO m ARRAY(SELECT d FROM hp WHERE hp.ss=$1 ORDER BY 1); FERREIRA, William (VALTECH) wrote: maybe there is an other way to query children directly into an array and having query plan caching ? begin:vcard fn:Andrew Lazarus n:Lazarus;Andrew org:Pillette Investment Management;Research and Development adr;dom:;;3028 Fillmore;San Francisco;CA;94123 email;internet:[EMAIL PROTECTED] title:Director tel;work:800-366-0688 tel;fax:415-440-4093 url:http://www.pillette.com version:2.1 end:vcard ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Please Help: PostgreSQL performance Optimization
Jamal Ghaffour wrote: CREATE TABLE cookies ( domain varchar(50) NOT NULL, path varchar(50) NOT NULL, name varchar(50) NOT NULL, principalid varchar(50) NOT NULL, host text NOT NULL, value text NOT NULL, secure bool NOT NULL, timestamp timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP+TIME '04:00:00', PRIMARY KEY (domain,path,name,principalid) ) [snip] SELECT path, upper(name) AS name, value FROM cookies WHERE timestampCURRENT_TIMESTAMP AND principalid='192.168.8.219' AND secure=FALSE AND (domain='ping.icap-elios.com' OR domain='.icap-elios.com') I think the problem here is that the column order in the index doesn't match the columns used in the WHERE clause criteria. Try adding an index on (domain,principalid) or (domain,principalid,timestamp). If these are your only queries, you can get the same effect by re-ordering the columns in the table so that this is the column order used by the primary key and its implicit index. You should check up on EXPLAIN and EXPLAIN ANALYZE to help you debug slow queries. begin:vcard fn:Andrew Lazarus n:Lazarus;Andrew org:Pillette Investment Management;Research and Development adr;dom:;;3028 Fillmore;San Francisco;CA;94123 email;internet:[EMAIL PROTECTED] title:Director tel;work:800-366-0688 tel;fax:415-440-4093 url:http://www.pillette.com version:2.1 end:vcard ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] What *_mem to increase when running CLUSTER
Should I temporarily increase sort_mem, vacuum_mem, neither, or both when doing a CLUSTER on a large (100 million row) table where as many as half of the tuples are deadwood from UPDATEs or DELETEs? I have large batch (10 million row) inserts, updates, and deletes so I'm not sure frequent vacuuming would help. begin:vcard fn:Andrew Lazarus n:Lazarus;Andrew org:Pillette Investment Management;Research and Development adr;dom:;;3028 Fillmore;San Francisco;CA;94123 email;internet:[EMAIL PROTECTED] title:Director tel;work:800-366-0688 tel;fax:415-440-4093 url:http://www.pillette.com version:2.1 end:vcard ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Fwd: Index on table when using DESC clause
As far as I know, to use a straight index Postgres requires either ORDER BY pages, description -- or -- ORDER BY pages DESC, description DESC. If you want the results by pages DESC, description ASC, then you have to make an index on an expression or define your own operator or something esoteric like that. I would think the ability to have an index where the columns don't all collate in the same direction would be an easy feature to add. begin:vcard fn:Andrew Lazarus n:Lazarus;Andrew org:Pillette Investment Management;Research and Development adr;dom:;;3028 Fillmore;San Francisco;CA;94123 email;internet:[EMAIL PROTECTED] title:Director tel;work:800-366-0688 tel;fax:415-440-4093 url:http://www.pillette.com version:2.1 end:vcard ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Query Optimization
sarlav kumar wrote: Hi all, Can someone please help me optimize this query? Is there a better way to write this query? I am generating a report of transactions ordered by time and with details of the sender and receiver etc. SELECT distinct a.time::date ||'br'||substring(a.time::time::text,1,8) as Time, CASE WHEN a.what = 0 THEN 'Money Transfer' WHEN a.what = 15 THEN 'Purchase' WHEN a.what = 26 THEN 'Merchant Streamline' WHEN a.what = 13 THEN 'Reversal' END as Transaction_Type , c1.account_no as SenderAccount, c2.account_no as RecieverAccount, b.country as SenderCountry, d.country as RecieverCountry, b.firstname as SenderFirstName, b.lastname as SenderLastName, d.firstname as ReceiverFirstName, d.lastname as ReceiverLastName, a.status as status, (select sum(td.amount * 0.01) from transaction_data td where td.data_id = a2.id and td.dir = 1 and td.uid = a.target_uid) as ReversedAmount, (select sum(td.amount * 0.01) from transaction_data td where td.data_id = a2.id and td.dir = 0 and td.uid = a.uid ) as DepositedAmount, a.flags, (a.amount * 0.01) as Amount, (a.fee * 0.01) as Fee FROM data a, customerdata b, customerdata d, customer c1, customer c2 , participant p, data a2 WHERE p.id = a.partner_id AND (a.uid = c1.id) AND (a.target_uid = c2.id) and c1.id=b.uid and c2.id=d.uid and a.confirmation is not null AND (a2.ref_id = a.id) and ((a2.what = 13) or (a2.what = 17) ) ORDER BY time desc ; (query plan followed) The expensive operation is the UNIQUE. Are you sure, in terms of business logic, that this is necessary? Is it actually possible to have duplicate transactions at the exact same time, and if so, would you really want to eliminate them? As an aside, I prefer to have numeric constants like the 'what' field in a small lookup table of two columns (what_code, what_description); it's easier to extend and to document. begin:vcard fn:Andrew Lazarus n:Lazarus;Andrew org:Pillette Investment Management;Research and Development adr;dom:;;3028 Fillmore;San Francisco;CA;94123 email;internet:[EMAIL PROTECTED] title:Director tel;work:800-366-0688 tel;fax:415-440-4093 url:http://www.pillette.com version:2.1 end:vcard ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])