Re: [SQL] DELETE FROM takes forever

2011-02-11 Thread Hiltibidal, Rob
Even DB2 and Oracle will take hellishly long times to perform large
scale deletes

What I do for a database just under 300 gb in size is do deletes in
groups of 10,000

So your where clause might look some like 

WHERE id NOT IN  (SELECT id FROM unique_records fetch first 1 rows
only)

DB2 has a clause of "with ur" to specify its ok to use dirty reads. I am
not sure if postgres has this, been awhile. The goal is to make sure
postgres allows "dirty reads". It prevents row locking... 

In DB2 the query would like like:
DELETE FROM records WHERE id NOT IN  (SELECT id FROM
unique_records fetch first 1 rows only) with ur

Other tips that might enhance the performance is make sure the
unique_records table is indexed... even if it has a primary key. In some
cases the optimizer may choose an index to satisfy the select clause or
it may do a table scan. Table scans are more costly than index scans. 

What's going to save you the real time is to break up your delete into
chunks. All the rdbms log the transactions and each delete is a
transaction. See where this is going?

Some rdbms allow you to turn off "transactional logging" some don't. DB2
doesn't (( at least not without more effort than reasonably necessary ))
so I write my delete queries to use chunks at a time. The most I would
recommend is 100,000 records deleted at once. Play with timing and see
what works for you

Hope this helps

-Rob



-Original Message-
From: [email protected]
[mailto:[email protected]] On Behalf Of Josh
Sent: Thursday, February 10, 2011 11:57 AM
To: [email protected]
Subject: [SQL] DELETE FROM takes forever

Hi

I'm trying to do a DELETE FROM on my large table (about 800 million
rows) based on the contents of another, moderately large table (about
110 million rows). The command I'm using is:

DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records);

This process ran for about two weeks before I decided to stop it -- it
was dragging down the DB server. I can understand long-running
processes, but two weeks seems a bit much even for a big table.

Is this the best way to approach the problem? Is there a better way?

Some background: The server is version 8.3, running nothing but Pg.
The 'records' table has 'id' as its primary key, and one other index
on another column. The table is referenced by just about every other
table in my DB (about 15 other tables) via foreign key constraints,
which I don't want to break (which is why I'm not just recreating the
table rather than deleting rows). Most of the dependent tables have ON
DELETE CASCADE. The 'unique_records' table is a temp table I got via
something like: SELECT DISTINCT (other_column) id INTO unique_records
FROM records


Thanks very much!

Josh Leder

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


PRIVILEGED AND CONFIDENTIAL
This email transmission contains privileged and confidential information 
intended only for the use of the individual or entity named above.  If the 
reader of the email is not the intended recipient or the employee or agent 
responsible for delivering it to the intended recipient, you are hereby 
notified that any use, dissemination or copying of this email transmission is 
strictly prohibited by the sender.  If you have received this transmission in 
error, please delete the email and immediately notify the sender via the email 
return address or mailto:[email protected].  Thank you.





-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] Drawing a blank on some SQL

2011-02-11 Thread Aaron Burnett

Hi,

I'm just drawing a blank entirely today and would appreciate some help on
this.

The long and short; there are 12 distinct activities that need to be queried
on a weekly basis:

SELECT count(activity_id), activity_id
FROM foo_activity 
WHERE created >= '01/01/2011' and created < '01/08/2011'
GROUP BY 2 
ORDER BY 2;

It gives me this answer, which is correct:

 count | activity_id
---+-
  1502 |   1
11 |   2
 2 |   3
   815 |   4
  4331 |   7
30 |   9
  1950 |  10
 7 |  11
67 |  12

But what I need to see is if there are no activities for the particular
activity_id that week, that it lists the count as 0 and lists the
activity_id associated like this:

 count | activity_id
---+-
  1502 |   1
11 |   2
 2 |   3
   815 |   4
 0 |   5
 0 |   6
  4331 |   7
 0 |   8
30 |   9
  1950 |  10
 7 |  11
67 |  12

Thanking you in advance for any help on this. The caffiene seems to be not
working well today.

Aaron


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Peter Steinheuser
Not tested.

1. select count(t2.activity_id),
   t1.activity_id
from (select distinct activity_id from foo_activity) as t1, -- assumes
all activities exist somewhere in table
left join foo_activity t2 on (t1.activity_id = t2.activity_id)
WHERE created >= '01/01/2011' and created < '01/08/2011'
group by 2
order by 2;

2. -- use generate_series for simple numbering scheme
select count(t2.activity_id)
   t1.x as 'activity_id"
from generate_series(1,12) as t1(x),
left join foo_activity t2 on (t1.x = t2.activity_id)
WHERE created >= '01/01/2011' and created < '01/08/2011'
group by 2
order by 2;

On Fri, Feb 11, 2011 at 1:46 PM, Aaron Burnett wrote:

>
> Hi,
>
> I'm just drawing a blank entirely today and would appreciate some help on
> this.
>
> The long and short; there are 12 distinct activities that need to be
> queried
> on a weekly basis:
>
> SELECT count(activity_id), activity_id
> FROM foo_activity
> WHERE created >= '01/01/2011' and created < '01/08/2011'
> GROUP BY 2
> ORDER BY 2;
>
> It gives me this answer, which is correct:
>
>  count | activity_id
> ---+-
>  1502 |   1
>11 |   2
> 2 |   3
>   815 |   4
>  4331 |   7
>30 |   9
>  1950 |  10
> 7 |  11
>67 |  12
>
> But what I need to see is if there are no activities for the particular
> activity_id that week, that it lists the count as 0 and lists the
> activity_id associated like this:
>
>  count | activity_id
> ---+-
>  1502 |   1
>11 |   2
> 2 |   3
>   815 |   4
> 0 |   5
> 0 |   6
>  4331 |   7
> 0 |   8
>30 |   9
>  1950 |  10
> 7 |  11
>67 |  12
>
> Thanking you in advance for any help on this. The caffiene seems to be not
> working well today.
>
> Aaron
>
>
> --
> Sent via pgsql-sql mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>



-- 
Peter Steinheuser
[email protected]


Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Rob Sargent


On 02/11/2011 11:46 AM, Aaron Burnett wrote:
> 
> Hi,
> 
> I'm just drawing a blank entirely today and would appreciate some help on
> this.
> 
> The long and short; there are 12 distinct activities that need to be queried
> on a weekly basis:
> 
> SELECT count(activity_id), activity_id
> FROM foo_activity 
> WHERE created >= '01/01/2011' and created < '01/08/2011'
> GROUP BY 2 
> ORDER BY 2;
> 
> It gives me this answer, which is correct:
> 
>  count | activity_id
> ---+-
>   1502 |   1
> 11 |   2
>  2 |   3
>815 |   4
>   4331 |   7
> 30 |   9
>   1950 |  10
>  7 |  11
> 67 |  12
> 
> But what I need to see is if there are no activities for the particular
> activity_id that week, that it lists the count as 0 and lists the
> activity_id associated like this:
> 
>  count | activity_id
> ---+-
>   1502 |   1
> 11 |   2
>  2 |   3
>815 |   4
>  0 |   5
>  0 |   6
>   4331 |   7
>  0 |   8
> 30 |   9
>   1950 |  10
>  7 |  11
> 67 |  12
> 
> Thanking you in advance for any help on this. The caffiene seems to be not
> working well today.
> 
> Aaron
> 
> 
Do you have a table which lists exhaustively the know activity_id values
in the system.  You may need to 'select distinct activity_id from
foo_activity' to get the complete list, then left join against that (or
the existing list) in your count.  You need something to supply the 5,6
and 8.

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Samuel Gendler
Assuming you have a table which lists all possible activities, with one
activity per row and no duplicates, you need to do a left outer join between
activities and your query result.  That will generate a resultset that has
at least one row for every row in activities, with nulls in all the columns
coming from the query for rows that don't have a match.  Then use coalesce
to turn null into 0.  Something like this:

select a.activity_id, coalesce(q.total, 0) as total
from activities a left outer join
(select fa.activity_id, count(fa.activity_id) as total from foo_activity fa
where fa.created between '01/01/2011' and '01/08/2011'
group by 1) q on a.activity_id = q.activity_id
order by a.activity_id

If you don't have an activities table with one row per activity, just
replace the activities table in that query with another query - select
distinct activity_id from foo_activity


On Fri, Feb 11, 2011 at 10:46 AM, Aaron Burnett wrote:

>
> Hi,
>
> I'm just drawing a blank entirely today and would appreciate some help on
> this.
>
> The long and short; there are 12 distinct activities that need to be
> queried
> on a weekly basis:
>
> SELECT count(activity_id), activity_id
> FROM foo_activity
> WHERE created >= '01/01/2011' and created < '01/08/2011'
> GROUP BY 2
> ORDER BY 2;
>
> It gives me this answer, which is correct:
>
>  count | activity_id
> ---+-
>  1502 |   1
>11 |   2
> 2 |   3
>   815 |   4
>  4331 |   7
>30 |   9
>  1950 |  10
> 7 |  11
>67 |  12
>
> But what I need to see is if there are no activities for the particular
> activity_id that week, that it lists the count as 0 and lists the
> activity_id associated like this:
>
>  count | activity_id
> ---+-
>  1502 |   1
>11 |   2
> 2 |   3
>   815 |   4
> 0 |   5
> 0 |   6
>  4331 |   7
> 0 |   8
>30 |   9
>  1950 |  10
> 7 |  11
>67 |  12
>
> Thanking you in advance for any help on this. The caffiene seems to be not
> working well today.
>
> Aaron
>
>
> --
> Sent via pgsql-sql mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Osvaldo Kussama
2011/2/11, Aaron Burnett :
>
> Hi,
>
> I'm just drawing a blank entirely today and would appreciate some help on
> this.
>
> The long and short; there are 12 distinct activities that need to be queried
> on a weekly basis:
>
> SELECT count(activity_id), activity_id
> FROM foo_activity
> WHERE created >= '01/01/2011' and created < '01/08/2011'
> GROUP BY 2
> ORDER BY 2;
>
> It gives me this answer, which is correct:
>
>  count | activity_id
> ---+-
>   1502 |   1
> 11 |   2
>  2 |   3
>815 |   4
>   4331 |   7
> 30 |   9
>   1950 |  10
>  7 |  11
> 67 |  12
>
> But what I need to see is if there are no activities for the particular
> activity_id that week, that it lists the count as 0 and lists the
> activity_id associated like this:
>
>  count | activity_id
> ---+-
>   1502 |   1
> 11 |   2
>  2 |   3
>815 |   4
>  0 |   5
>  0 |   6
>   4331 |   7
>  0 |   8
> 30 |   9
>   1950 |  10
>  7 |  11
> 67 |  12
>
> Thanking you in advance for any help on this. The caffiene seems to be not
> working well today.
>


Try:
SELECT sum(case when created >= '2011-01-01' and created <
'2011-01-08' then 1 else 0 end), activity_id
FROM foo_activity
GROUP BY 2
ORDER BY 2;

Osvaldo

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Aaron Burnett


Thank you all very much for your help.

The suggestion from Osvaldo below was the best for my situation (not having any 
soret of xref table to join)...

Best Regards,

Aaron


On 2/11/11 1:09 PM, "Osvaldo Kussama"  wrote:

2011/2/11, Aaron Burnett :
>
> Hi,
>
> I'm just drawing a blank entirely today and would appreciate some help on
> this.
>
> The long and short; there are 12 distinct activities that need to be queried
> on a weekly basis:
>
> SELECT count(activity_id), activity_id
> FROM foo_activity
> WHERE created >= '01/01/2011' and created < '01/08/2011'
> GROUP BY 2
> ORDER BY 2;
>
> It gives me this answer, which is correct:
>
>  count | activity_id
> ---+-
>   1502 |   1
> 11 |   2
>  2 |   3
>815 |   4
>   4331 |   7
> 30 |   9
>   1950 |  10
>  7 |  11
> 67 |  12
>
> But what I need to see is if there are no activities for the particular
> activity_id that week, that it lists the count as 0 and lists the
> activity_id associated like this:
>
>  count | activity_id
> ---+-
>   1502 |   1
> 11 |   2
>  2 |   3
>815 |   4
>  0 |   5
>  0 |   6
>   4331 |   7
>  0 |   8
> 30 |   9
>   1950 |  10
>  7 |  11
> 67 |  12
>
> Thanking you in advance for any help on this. The caffiene seems to be not
> working well today.
>


Try:
SELECT sum(case when created >= '2011-01-01' and created <
'2011-01-08' then 1 else 0 end), activity_id
FROM foo_activity
GROUP BY 2
ORDER BY 2;

Osvaldo



Re: [SQL] Drawing a blank on some SQL

2011-02-11 Thread Samuel Gendler
On Fri, Feb 11, 2011 at 12:47 PM, Aaron Burnett wrote:

>
>
> Thank you all very much for your help.
>
> The suggestion from Osvaldo below was the best for my situation (not having
> any soret of xref table to join)...
>
>

It may work well for now, but if that foo_activity table has the potential
to get large with only a relatively small percentage of rows fitting in the
date range, it will get very slow compared to the left join, as it requires
loading every row in the table, instead of being able to use an index to
pull just the rows from the date range and then join to the list of valid
activities.

--sam