Re: [GENERAL] automated row deletion

2009-10-10 Thread Jasen Betts
On 2009-10-07, Dave Huber dhu...@letourneautechnologies.com wrote:
 --_000_7CDADB576E07AC4FA71E1B12566C9126540E0A0C34ltimb1LTIcom_
 Content-Type: text/plain; charset=us-ascii
 Content-Transfer-Encoding: quoted-printable

 A colleague gave me the following query to run:

 DELETE FROM data_log_20msec_table WHERE (log_id IN (SELECT log_id FROM data=
 _log_20msec_table ORDER BY log_id DESC OFFSET 1000))

looks slower than neccessary.

DELETE FROM data_log_20msec_table WHERE log_id  (SELECT log_id FROM
data= _log_20msec_table ORDER BY log_id DESC OFFSET 1000 LIMIT 1 )



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] automated row deletion

2009-10-07 Thread Dave Huber
A colleague gave me the following query to run:

DELETE FROM data_log_20msec_table WHERE (log_id IN (SELECT log_id FROM 
data_log_20msec_table ORDER BY log_id DESC OFFSET 1000))

log_id is the primary key (big serial)
data_log is the table described below

This query keeps the most recent 10 million rows and deletes the remaining 
ones. If I call this once a minute, it would be deleting 3000 rows each time. 
Is there a way to optimize this statement? Postgres was setup with default 
configuration. Is there anything we can change in the configuration to make 
this run more efficiently? The table is defined as below:

CREATE TABLE data_log_20msec_table
(
  log_id bigserial NOT NULL,
  timestamp_dbl double precision,
  data bytea,
  CONSTRAINT data_log_20msec_table_pkey PRIMARY KEY (log_id)
)
WITH (OIDS=FALSE);
ALTER TABLE data_log_20msec_table OWNER TO postgres;

-- Index: data_log_20msec_table_timestamp_index

-- DROP INDEX data_log_20msec_table_timestamp_index;

CREATE INDEX data_log_20msec_table_timestamp_index
  ON data_log_20msec_table
  USING btree
  (timestamp_dbl);

Is there anything we can do here that can optimize the deletion of rows?

Much thanks to anyone who can help us out.

Regards,
Dave


Original Post:
I am inserting 250 rows of data (~2kbytes/row) every 5 seconds into a table 
(the primary key is a big serial). I need to be able to limit the size of the 
table to prevent filling up the disk. Is there a way to setup the table to do 
this automatically or do I have to periodically figure out how many rows are in 
the table and delete the oldest rows manually?





This electronic mail message is intended exclusively for the individual(s) or 
entity to which it is addressed. This message, together with any attachment, is 
confidential and may contain privileged information. Any unauthorized review, 
use, printing, retaining, copying, disclosure or distribution is strictly 
prohibited. If you have received this message in error, please immediately 
advise the sender by reply email message to the sender and delete all copies of 
this message.
THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform 
Electronic Transactions Act or any other law of similar import, absent an 
express statement to the contrary contained in this e-mail, neither this e-mail 
nor any attachments are an offer or acceptance to enter into a contract, and 
are not intended to bind the sender, LeTourneau Technologies, Inc., or any of 
its subsidiaries, affiliates, or any other person or entity.
WARNING: Although the company has taken reasonable precautions to ensure no 
viruses are present in this email, the company cannot accept responsibility for 
any loss or damage arising from the use of this email or attachments.



Re: [GENERAL] automated row deletion

2009-10-07 Thread John R Pierce

Dave Huber wrote:


A colleague gave me the following query to run:

 

DELETE FROM data_log_20msec_table WHERE (log_id IN (SELECT log_id FROM 
data_log_20msec_table ORDER BY log_id DESC OFFSET 1000))


...

This query keeps the most recent 10 million rows and deletes the 
remaining ones. If I call this once a minute, it would be deleting 
3000 rows each time. Is there a way to optimize this statement? 
Postgres was setup with default configuration. Is there anything we 
can change in the configuration to make this run more efficiently? The 
table is defined as below:


...

Is there anything we can do here that can optimize the deletion of rows?

 



as I previously wrote...


I think you'll find row deletes would kill your performance.   For time 
aged data like that, we use partitioned tables, we typically do it by 
the week (keeping 6 months of history), but you might end up doing it by 
N*1000 PK values or some such, so you can use your PK to determine the 
partition.   With a partitioning scheme, its much faster to add a new 
one and drop the oldest at whatever interval you need.   See 
http://www.postgresql.org/docs/current/static/ddl-partitioning.html




based on the numbers you give above, I think I'd do it by 10 log_id 
values, so you'd end up with 101 partition tables, and every half hour 
or so you'd truncate the oldest partition and start a new one (reusing 
the previously oldest in a round robin fashion).   truncate is 1000s of 
times faster than delete.








--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] automated row deletion

2009-10-07 Thread Dave Huber
John, I got your previous post, but I think I misunderstood something. You 
didn't mean a disk partition. I think I get what you're describing now. I had 
previously missed the link in your earlier post, too. Please accept my 
apologies for not being more diligent in my reading. I'll look into this 
partitioned table bit.

Thanks,
Dave

-Original Message-
From: John R Pierce [mailto:pie...@hogranch.com]
Sent: Wednesday, October 07, 2009 12:01 PM
To: Dave Huber
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] automated row deletion

Dave Huber wrote:

 A colleague gave me the following query to run:



 DELETE FROM data_log_20msec_table WHERE (log_id IN (SELECT log_id FROM
 data_log_20msec_table ORDER BY log_id DESC OFFSET 1000))

 ...

 This query keeps the most recent 10 million rows and deletes the
 remaining ones. If I call this once a minute, it would be deleting
 3000 rows each time. Is there a way to optimize this statement?
 Postgres was setup with default configuration. Is there anything we
 can change in the configuration to make this run more efficiently? The
 table is defined as below:

 ...

 Is there anything we can do here that can optimize the deletion of rows?




as I previously wrote...


I think you'll find row deletes would kill your performance.   For time
aged data like that, we use partitioned tables, we typically do it by
the week (keeping 6 months of history), but you might end up doing it by
N*1000 PK values or some such, so you can use your PK to determine the
partition.   With a partitioning scheme, its much faster to add a new
one and drop the oldest at whatever interval you need.   See
http://www.postgresql.org/docs/current/static/ddl-partitioning.html



based on the numbers you give above, I think I'd do it by 10 log_id
values, so you'd end up with 101 partition tables, and every half hour
or so you'd truncate the oldest partition and start a new one (reusing
the previously oldest in a round robin fashion).   truncate is 1000s of
times faster than delete.








This electronic mail message is intended exclusively for the individual(s) or 
entity to which it is addressed. This message, together with any attachment, is 
confidential and may contain privileged information. Any unauthorized review, 
use, printing, retaining, copying, disclosure or distribution is strictly 
prohibited. If you have received this message in error, please immediately 
advise the sender by reply email message to the sender and delete all copies of 
this message.
THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform 
Electronic Transactions Act or any other law of similar import, absent an 
express statement to the contrary contained in this e-mail, neither this e-mail 
nor any attachments are an offer or acceptance to enter into a contract, and 
are not intended to bind the sender, LeTourneau Technologies, Inc., or any of 
its subsidiaries, affiliates, or any other person or entity.
WARNING: Although the company has taken reasonable precautions to ensure no 
viruses are present in this email, the company cannot accept responsibility for 
any loss or damage arising from the use of this email or attachments.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] automated row deletion

2009-09-30 Thread Dave Huber
I am inserting 250 rows of data (~2kbytes/row) every 5 seconds into a table 
(the primary key is a big serial). I need to be able to limit the size of the 
table to prevent filling up the disk. Is there a way to setup the table to do 
this automatically or do I have to periodically figure out how many rows are in 
the table and delete the oldest rows manually?

Thanks,
Dave


This electronic mail message is intended exclusively for the individual(s) or 
entity to which it is addressed. This message, together with any attachment, is 
confidential and may contain privileged information. Any unauthorized review, 
use, printing, retaining, copying, disclosure or distribution is strictly 
prohibited. If you have received this message in error, please immediately 
advise the sender by reply email message to the sender and delete all copies of 
this message.
THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform 
Electronic Transactions Act or any other law of similar import, absent an 
express statement to the contrary contained in this e-mail, neither this e-mail 
nor any attachments are an offer or acceptance to enter into a contract, and 
are not intended to bind the sender, LeTourneau Technologies, Inc., or any of 
its subsidiaries, affiliates, or any other person or entity.
WARNING: Although the company has taken reasonable precautions to ensure no 
viruses are present in this email, the company cannot accept responsibility for 
any loss or damage arising from the use of this email or attachments.



Re: [GENERAL] automated row deletion

2009-09-30 Thread John R Pierce

Dave Huber wrote:


I am inserting 250 rows of data (~2kbytes/row) every 5 seconds into a 
table (the primary key is a big serial). I need to be able to limit 
the size of the table to prevent filling up the disk. Is there a way 
to setup the table to do this automatically or do I have to 
periodically figure out how many rows are in the table and delete the 
oldest rows manually?


 




I think you'll find row deletes would kill your performance.   For time 
aged data like that, we use partitioned tables, we typically do it by 
the week (keeping 6 months of history), but you might end up doing it by 
N*1000 PK values or some such, so you can use your PK to determine the 
partition.   With a partitioning scheme, its much faster to add a new 
one and drop the oldest at whatever interval you need.   See 
http://www.postgresql.org/docs/current/static/ddl-partitioning.html






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] automated row deletion

2009-09-30 Thread 纪晓曦
I think there no better way you can get around this problem. You need to
check the disk periodically and it is not to hard.

2009/10/1 Dave Huber dhu...@letourneautechnologies.com

  I am inserting 250 rows of data (~2kbytes/row) every 5 seconds into a
 table (the primary key is a big serial). I need to be able to limit the size
 of the table to prevent filling up the disk. Is there a way to setup the
 table to do this automatically or do I have to periodically figure out how
 many rows are in the table and delete the oldest rows manually?



 Thanks,

 Dave

 --
 This electronic mail message is intended exclusively for the individual(s)
 or entity to which it is addressed. This message, together with any
 attachment, is confidential and may contain privileged information. Any
 unauthorized review, use, printing, retaining, copying, disclosure or
 distribution is strictly prohibited. If you have received this message in
 error, please immediately advise the sender by reply email message to the
 sender and delete all copies of this message.
 THIS E-MAIL IS NOT AN OFFER OR ACCEPTANCE: Notwithstanding the Uniform
 Electronic Transactions Act or any other law of similar import, absent an
 express statement to the contrary contained in this e-mail, neither this
 e-mail nor any attachments are an offer or acceptance to enter into a
 contract, and are not intended to bind the sender, LeTourneau Technologies,
 Inc., or any of its subsidiaries, affiliates, or any other person or entity.
 WARNING: Although the company has taken reasonable precautions to ensure no
 viruses are present in this email, the company cannot accept responsibility
 for any loss or damage arising from the use of this email or attachments.