Re: [SQL] DELETE FROM takes forever
Even DB2 and Oracle will take hellishly long times to perform large scale deletes What I do for a database just under 300 gb in size is do deletes in groups of 10,000 So your where clause might look some like WHERE id NOT IN (SELECT id FROM unique_records fetch first 1 rows only) DB2 has a clause of "with ur" to specify its ok to use dirty reads. I am not sure if postgres has this, been awhile. The goal is to make sure postgres allows "dirty reads". It prevents row locking... In DB2 the query would like like: DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records fetch first 1 rows only) with ur Other tips that might enhance the performance is make sure the unique_records table is indexed... even if it has a primary key. In some cases the optimizer may choose an index to satisfy the select clause or it may do a table scan. Table scans are more costly than index scans. What's going to save you the real time is to break up your delete into chunks. All the rdbms log the transactions and each delete is a transaction. See where this is going? Some rdbms allow you to turn off "transactional logging" some don't. DB2 doesn't (( at least not without more effort than reasonably necessary )) so I write my delete queries to use chunks at a time. The most I would recommend is 100,000 records deleted at once. Play with timing and see what works for you Hope this helps -Rob -Original Message- From: [email protected] [mailto:[email protected]] On Behalf Of Josh Sent: Thursday, February 10, 2011 11:57 AM To: [email protected] Subject: [SQL] DELETE FROM takes forever Hi I'm trying to do a DELETE FROM on my large table (about 800 million rows) based on the contents of another, moderately large table (about 110 million rows). The command I'm using is: DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records); This process ran for about two weeks before I decided to stop it -- it was dragging down the DB server. I can understand long-running processes, but two weeks seems a bit much even for a big table. Is this the best way to approach the problem? Is there a better way? Some background: The server is version 8.3, running nothing but Pg. The 'records' table has 'id' as its primary key, and one other index on another column. The table is referenced by just about every other table in my DB (about 15 other tables) via foreign key constraints, which I don't want to break (which is why I'm not just recreating the table rather than deleting rows). Most of the dependent tables have ON DELETE CASCADE. The 'unique_records' table is a temp table I got via something like: SELECT DISTINCT (other_column) id INTO unique_records FROM records Thanks very much! Josh Leder -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql PRIVILEGED AND CONFIDENTIAL This email transmission contains privileged and confidential information intended only for the use of the individual or entity named above. If the reader of the email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination or copying of this email transmission is strictly prohibited by the sender. If you have received this transmission in error, please delete the email and immediately notify the sender via the email return address or mailto:[email protected]. Thank you. -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] Drawing a blank on some SQL
Hi, I'm just drawing a blank entirely today and would appreciate some help on this. The long and short; there are 12 distinct activities that need to be queried on a weekly basis: SELECT count(activity_id), activity_id FROM foo_activity WHERE created >= '01/01/2011' and created < '01/08/2011' GROUP BY 2 ORDER BY 2; It gives me this answer, which is correct: count | activity_id ---+- 1502 | 1 11 | 2 2 | 3 815 | 4 4331 | 7 30 | 9 1950 | 10 7 | 11 67 | 12 But what I need to see is if there are no activities for the particular activity_id that week, that it lists the count as 0 and lists the activity_id associated like this: count | activity_id ---+- 1502 | 1 11 | 2 2 | 3 815 | 4 0 | 5 0 | 6 4331 | 7 0 | 8 30 | 9 1950 | 10 7 | 11 67 | 12 Thanking you in advance for any help on this. The caffiene seems to be not working well today. Aaron -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Drawing a blank on some SQL
Not tested. 1. select count(t2.activity_id), t1.activity_id from (select distinct activity_id from foo_activity) as t1, -- assumes all activities exist somewhere in table left join foo_activity t2 on (t1.activity_id = t2.activity_id) WHERE created >= '01/01/2011' and created < '01/08/2011' group by 2 order by 2; 2. -- use generate_series for simple numbering scheme select count(t2.activity_id) t1.x as 'activity_id" from generate_series(1,12) as t1(x), left join foo_activity t2 on (t1.x = t2.activity_id) WHERE created >= '01/01/2011' and created < '01/08/2011' group by 2 order by 2; On Fri, Feb 11, 2011 at 1:46 PM, Aaron Burnett wrote: > > Hi, > > I'm just drawing a blank entirely today and would appreciate some help on > this. > > The long and short; there are 12 distinct activities that need to be > queried > on a weekly basis: > > SELECT count(activity_id), activity_id > FROM foo_activity > WHERE created >= '01/01/2011' and created < '01/08/2011' > GROUP BY 2 > ORDER BY 2; > > It gives me this answer, which is correct: > > count | activity_id > ---+- > 1502 | 1 >11 | 2 > 2 | 3 > 815 | 4 > 4331 | 7 >30 | 9 > 1950 | 10 > 7 | 11 >67 | 12 > > But what I need to see is if there are no activities for the particular > activity_id that week, that it lists the count as 0 and lists the > activity_id associated like this: > > count | activity_id > ---+- > 1502 | 1 >11 | 2 > 2 | 3 > 815 | 4 > 0 | 5 > 0 | 6 > 4331 | 7 > 0 | 8 >30 | 9 > 1950 | 10 > 7 | 11 >67 | 12 > > Thanking you in advance for any help on this. The caffiene seems to be not > working well today. > > Aaron > > > -- > Sent via pgsql-sql mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql > -- Peter Steinheuser [email protected]
Re: [SQL] Drawing a blank on some SQL
On 02/11/2011 11:46 AM, Aaron Burnett wrote: > > Hi, > > I'm just drawing a blank entirely today and would appreciate some help on > this. > > The long and short; there are 12 distinct activities that need to be queried > on a weekly basis: > > SELECT count(activity_id), activity_id > FROM foo_activity > WHERE created >= '01/01/2011' and created < '01/08/2011' > GROUP BY 2 > ORDER BY 2; > > It gives me this answer, which is correct: > > count | activity_id > ---+- > 1502 | 1 > 11 | 2 > 2 | 3 >815 | 4 > 4331 | 7 > 30 | 9 > 1950 | 10 > 7 | 11 > 67 | 12 > > But what I need to see is if there are no activities for the particular > activity_id that week, that it lists the count as 0 and lists the > activity_id associated like this: > > count | activity_id > ---+- > 1502 | 1 > 11 | 2 > 2 | 3 >815 | 4 > 0 | 5 > 0 | 6 > 4331 | 7 > 0 | 8 > 30 | 9 > 1950 | 10 > 7 | 11 > 67 | 12 > > Thanking you in advance for any help on this. The caffiene seems to be not > working well today. > > Aaron > > Do you have a table which lists exhaustively the know activity_id values in the system. You may need to 'select distinct activity_id from foo_activity' to get the complete list, then left join against that (or the existing list) in your count. You need something to supply the 5,6 and 8. -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Drawing a blank on some SQL
Assuming you have a table which lists all possible activities, with one activity per row and no duplicates, you need to do a left outer join between activities and your query result. That will generate a resultset that has at least one row for every row in activities, with nulls in all the columns coming from the query for rows that don't have a match. Then use coalesce to turn null into 0. Something like this: select a.activity_id, coalesce(q.total, 0) as total from activities a left outer join (select fa.activity_id, count(fa.activity_id) as total from foo_activity fa where fa.created between '01/01/2011' and '01/08/2011' group by 1) q on a.activity_id = q.activity_id order by a.activity_id If you don't have an activities table with one row per activity, just replace the activities table in that query with another query - select distinct activity_id from foo_activity On Fri, Feb 11, 2011 at 10:46 AM, Aaron Burnett wrote: > > Hi, > > I'm just drawing a blank entirely today and would appreciate some help on > this. > > The long and short; there are 12 distinct activities that need to be > queried > on a weekly basis: > > SELECT count(activity_id), activity_id > FROM foo_activity > WHERE created >= '01/01/2011' and created < '01/08/2011' > GROUP BY 2 > ORDER BY 2; > > It gives me this answer, which is correct: > > count | activity_id > ---+- > 1502 | 1 >11 | 2 > 2 | 3 > 815 | 4 > 4331 | 7 >30 | 9 > 1950 | 10 > 7 | 11 >67 | 12 > > But what I need to see is if there are no activities for the particular > activity_id that week, that it lists the count as 0 and lists the > activity_id associated like this: > > count | activity_id > ---+- > 1502 | 1 >11 | 2 > 2 | 3 > 815 | 4 > 0 | 5 > 0 | 6 > 4331 | 7 > 0 | 8 >30 | 9 > 1950 | 10 > 7 | 11 >67 | 12 > > Thanking you in advance for any help on this. The caffiene seems to be not > working well today. > > Aaron > > > -- > Sent via pgsql-sql mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
Re: [SQL] Drawing a blank on some SQL
2011/2/11, Aaron Burnett : > > Hi, > > I'm just drawing a blank entirely today and would appreciate some help on > this. > > The long and short; there are 12 distinct activities that need to be queried > on a weekly basis: > > SELECT count(activity_id), activity_id > FROM foo_activity > WHERE created >= '01/01/2011' and created < '01/08/2011' > GROUP BY 2 > ORDER BY 2; > > It gives me this answer, which is correct: > > count | activity_id > ---+- > 1502 | 1 > 11 | 2 > 2 | 3 >815 | 4 > 4331 | 7 > 30 | 9 > 1950 | 10 > 7 | 11 > 67 | 12 > > But what I need to see is if there are no activities for the particular > activity_id that week, that it lists the count as 0 and lists the > activity_id associated like this: > > count | activity_id > ---+- > 1502 | 1 > 11 | 2 > 2 | 3 >815 | 4 > 0 | 5 > 0 | 6 > 4331 | 7 > 0 | 8 > 30 | 9 > 1950 | 10 > 7 | 11 > 67 | 12 > > Thanking you in advance for any help on this. The caffiene seems to be not > working well today. > Try: SELECT sum(case when created >= '2011-01-01' and created < '2011-01-08' then 1 else 0 end), activity_id FROM foo_activity GROUP BY 2 ORDER BY 2; Osvaldo -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Drawing a blank on some SQL
Thank you all very much for your help. The suggestion from Osvaldo below was the best for my situation (not having any soret of xref table to join)... Best Regards, Aaron On 2/11/11 1:09 PM, "Osvaldo Kussama" wrote: 2011/2/11, Aaron Burnett : > > Hi, > > I'm just drawing a blank entirely today and would appreciate some help on > this. > > The long and short; there are 12 distinct activities that need to be queried > on a weekly basis: > > SELECT count(activity_id), activity_id > FROM foo_activity > WHERE created >= '01/01/2011' and created < '01/08/2011' > GROUP BY 2 > ORDER BY 2; > > It gives me this answer, which is correct: > > count | activity_id > ---+- > 1502 | 1 > 11 | 2 > 2 | 3 >815 | 4 > 4331 | 7 > 30 | 9 > 1950 | 10 > 7 | 11 > 67 | 12 > > But what I need to see is if there are no activities for the particular > activity_id that week, that it lists the count as 0 and lists the > activity_id associated like this: > > count | activity_id > ---+- > 1502 | 1 > 11 | 2 > 2 | 3 >815 | 4 > 0 | 5 > 0 | 6 > 4331 | 7 > 0 | 8 > 30 | 9 > 1950 | 10 > 7 | 11 > 67 | 12 > > Thanking you in advance for any help on this. The caffiene seems to be not > working well today. > Try: SELECT sum(case when created >= '2011-01-01' and created < '2011-01-08' then 1 else 0 end), activity_id FROM foo_activity GROUP BY 2 ORDER BY 2; Osvaldo
Re: [SQL] Drawing a blank on some SQL
On Fri, Feb 11, 2011 at 12:47 PM, Aaron Burnett wrote: > > > Thank you all very much for your help. > > The suggestion from Osvaldo below was the best for my situation (not having > any soret of xref table to join)... > > It may work well for now, but if that foo_activity table has the potential to get large with only a relatively small percentage of rows fitting in the date range, it will get very slow compared to the left join, as it requires loading every row in the table, instead of being able to use an index to pull just the rows from the date range and then join to the list of valid activities. --sam
