Re: [PERFORM] PG optimization question

2010-01-11 Thread Pierre Frédéric Caillau d
On Sun, 10 Jan 2010 19:45:32 +0100, Robert Haas   
wrote:



2010/1/10 Pierre Frédéric Caillaud :



If you transfer (delete from staging, insert into archive) in one
transaction , then it will be always visible in exactly one of them,
and exatly once in a view over both staging and archive(s).


       Does the latest version implement this :

INSERT INTO archive (...) DELETE FROM staging WHERE ... RETURNING ...


No.  There are no plans to support that, though there are proposals to  
support:


WITH x AS (DELETE FROM staging WHERE ... RETURNING ...) INSERT INTO
archive (...) SELECT ... FROM x

I'm not sure how much that will help though since, in the designs so
far discused, the tuples won't be pipelined.

...Robert



	Yeah, but it's a lot more user-friendly than SELECT FOR UPDATE, INSERT  
SELECT, DELETE...




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-10 Thread Robert Haas
2010/1/10 Pierre Frédéric Caillaud :
>
>> If you transfer (delete from staging, insert into archive) in one
>> transaction , then it will be always visible in exactly one of them,
>> and exatly once in a view over both staging and archive(s).
>
>        Does the latest version implement this :
>
> INSERT INTO archive (...) DELETE FROM staging WHERE ... RETURNING ...

No.  There are no plans to support that, though there are proposals to support:

WITH x AS (DELETE FROM staging WHERE ... RETURNING ...) INSERT INTO
archive (...) SELECT ... FROM x

I'm not sure how much that will help though since, in the designs so
far discused, the tuples won't be pipelined.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-10 Thread Pierre Frédéric Caillau d



If you transfer (delete from staging, insert into archive) in one
transaction , then it will be always visible in exactly one of them,
and exatly once in a view over both staging and archive(s).


Does the latest version implement this :

INSERT INTO archive (...) DELETE FROM staging WHERE ... RETURNING ...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Craig Ringer

On 9/01/2010 6:32 PM, Nickolay wrote:

Hi 2 all,

Here is my typical configuration: 1(2) GB of RAM, HP ML 350(150) series
server, SATA raid, Linux.

I have 1 big table (called "archive") which contains short text messages
with a plenty of additional service info.
Currently this table contains more than 4M rows for a period of 4,5
months, i.e. each row has average size of 1K.

I'm going to make our application work with partitions of this table
instead of one large table. The primary reason is that eventually we'd
need to remove old rows and it would be pretty hard with one table
because of blocking (and rows are being added constantly).


DELETEs shouldn't block concurrent INSERTs.

That said, dropping a partition is a lot more convenient than DELETEing 
from a big table.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Ludwik Dylag
2010/1/9 Nickolay 

> Okay, I see your point with staging table. That's a good idea!
> The only problem I see here is the transfer-to-archive-table process. As
> you've correctly noticed, the system is kind of a real-time and there can be
> dozens of processes writing to the staging table, i cannot see how to make
> the transfer/flush process right and clear...
>
> The simplest way to do this is to create view and add/remove first/last day
by recreating the view on daily interval.

-- 
Ludwik Dyląg


Re: [PERFORM] PG optimization question

2010-01-09 Thread Nickolay

Okay, I see your point with staging table. That's a good idea!
The only problem I see here is the transfer-to-archive-table process. As 
you've correctly noticed, the system is kind of a real-time and there 
can be dozens of processes writing to the staging table, i cannot see 
how to make the transfer/flush process right and clear...


Pierre Frédéric Caillaud wrote:
Oh, btw, 95% of queries are searching rows for current date (last 
24 hours).




You may want to use a daily staging table and then flush to the 
monthly archive tables at the end of the day.


If the rows in the archive tables are never updated, this strategy 
means you never need to vacuum the big archive tables (and indexes), 
which is good. Also you can insert the rows into the archive table in 
the order of your choice, the timestamp for example, which makes it 
nicely clustered, without needing to ever run CLUSTER.


And with partitioning you can have lots of indexes on the staging 
table (and current months partition) (to speed up your most common 
queries which are likely to be more OLTP), while using less indexes on 
the older partitions (saves disk space) if queries on old partitions 
are likely to be reporting queries which are going to grind through a 
large part of the table anyway.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Pierre Frédéric Caillau d


That may help with the queries speed (not a problem now), but we'll then  
have to add UNION statement for daily staging table for other 5% of  
requests, right? And there would be a moment when daily message is in  
archive table AND in daily table (while transferring from daily table to  
archive).
Our main problem is in blocking when doing DELETE (app sometimes freezes  
for a long time), and also we have to do VACUUM on live table, which is  
not acceptable in our app.


Thanks for your reply, I was kinda worried about number of partitions  
and how this would affect PG query execution speed.


Kenneth Marshall wrote:
Oh, btw, 95% of queries are searching rows for current date (last 24  
hours).




You may want to use a daily staging table and then flush to the monthly  
archive tables at the end of the day.


	If the rows in the archive tables are never updated, this strategy means  
you never need to vacuum the big archive tables (and indexes), which is  
good. Also you can insert the rows into the archive table in the order of  
your choice, the timestamp for example, which makes it nicely clustered,  
without needing to ever run CLUSTER.


	And with partitioning you can have lots of indexes on the staging table  
(and current months partition) (to speed up your most common queries which  
are likely to be more OLTP), while using less indexes on the older  
partitions (saves disk space) if queries on old partitions are likely to  
be reporting queries which are going to grind through a large part of the  
table anyway.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Nickolay
That may help with the queries speed (not a problem now), but we'll then 
have to add UNION statement for daily staging table for other 5% of 
requests, right? And there would be a moment when daily message is in 
archive table AND in daily table (while transferring from daily table to 
archive).
Our main problem is in blocking when doing DELETE (app sometimes freezes 
for a long time), and also we have to do VACUUM on live table, which is 
not acceptable in our app.


Thanks for your reply, I was kinda worried about number of partitions 
and how this would affect PG query execution speed.


Kenneth Marshall wrote:
Oh, btw, 95% of queries are searching rows for current date (last 24 
hours).



You may want to use a daily staging table and then flush to the 
monthly archive tables at the end of the day.


  


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Kenneth Marshall
On Sat, Jan 09, 2010 at 03:42:08PM +0300, Nickolay wrote:
> I do not see any way to normalize this table anymore. it's size is 4Gig for 
> ~4M rows, i.e. 1Kb per row, i think it's ok.
> Also there are 2 indexes: by date_time and by a couple of service fields 
> (total index size is 250Mb now).
> I think i'll be going to partition by months (approx. 1M rows or 1Gig per 
> month), so it would be like 60 partitions for 5 years. Is that OK for 
> postgres?

Not a problem. We have a log server that has 64 daily partitions.

> Oh, btw, 95% of queries are searching rows for current date (last 24 
> hours).

You may want to use a daily staging table and then flush to the 
monthly archive tables at the end of the day.

Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Nickolay
I do not see any way to normalize this table anymore. it's size is 4Gig 
for ~4M rows, i.e. 1Kb per row, i think it's ok.
Also there are 2 indexes: by date_time and by a couple of service fields 
(total index size is 250Mb now).
I think i'll be going to partition by months (approx. 1M rows or 1Gig 
per month), so it would be like 60 partitions for 5 years. Is that OK 
for postgres?

Oh, btw, 95% of queries are searching rows for current date (last 24 hours).
Also we use SELECT...FOR UPDATE row-level locking for updating the rows 
in archive (i.e. we INSERT new row when starting outgoing message 
transmittion and then doing SELECT...FOR UPDATE and UPDATE for source 
(incoming) message when outgoing message was sent), so I guess we would 
have to explicitly write the name of partition table (i.e. 
"archive_2009_12" instead of "archive") for SELECT...FOR UPDATE and 
UPDATE requests, as they may need to access row in previous partition 
instead of the current one.


Grzegorz Jaśkiewicz wrote:

maybe that 'one big table' needs something called 'normalisation'
first. See how much that will shed off. You might be surprised.
The partitioning needs to be done by some constant intervals, of time
- in your case. Whatever suits you, I would suggest to use the rate
that will give you both ease of archiving/removal of old data (so not
too wide), and also, one that would make sure that most of the data
you'll be searching for in your queries will be in one , two
partitions per query.


  


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Thomas Kellerer

Nickolay wrote on 09.01.2010 11:24:

it would be pretty hard with one table because of blocking


What do you man with "because of blocking"?

Thomas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG optimization question

2010-01-09 Thread Grzegorz Jaśkiewicz
maybe that 'one big table' needs something called 'normalisation'
first. See how much that will shed off. You might be surprised.
The partitioning needs to be done by some constant intervals, of time
- in your case. Whatever suits you, I would suggest to use the rate
that will give you both ease of archiving/removal of old data (so not
too wide), and also, one that would make sure that most of the data
you'll be searching for in your queries will be in one , two
partitions per query.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PG optimization question

2010-01-09 Thread Nickolay

Hi 2 all,

Here is my typical configuration: 1(2) GB of RAM, HP ML 350(150) series 
server, SATA raid, Linux.


I have 1 big table (called "archive") which contains short text messages 
with a plenty of additional service info.
Currently this table contains more than 4M rows for a period of 4,5 
months, i.e. each row has average size of 1K.


I'm going to make our application work with partitions of this table 
instead of one large table. The primary reason is that eventually we'd 
need to remove old rows and it would be pretty hard with one table 
because of blocking (and rows are being added constantly).


1. What would be your recommendations on how to partition this table (by 
months, years or quarters)?
2. What is recommended PG settings for such configuration? Would it be 
ok to set shared_buffers to let's say 512M (if RAM is 1Gig may be 
shared_buffers is to be 400M?)? What other settings would you recommend?


Thanks in advance,
Nick.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PG optimization question

2010-01-09 Thread Nickolay

Hi 2 all,

Here is my typical configuration: 1(2) GB of RAM, HP ML 350(150) series 
server, SATA raid, Linux.


I have 1 big table (called "archive") which contains short text messages 
with a plenty of additional service info.
Currently this table contains more than 4M rows for a period of 4,5 
months, i.e. each row has average size of 1K.


I'm going to make our application work with partitions of this table 
instead of one large table. The primary reason is that eventually we'd 
need to remove old rows and it would be pretty hard with one table 
because of blocking (and rows are being added constantly).


1. What would be your recommendations on how to partition this table (by 
months, years or quarters)?
2. What is recommended PG settings for such configuration? Would it be 
ok to set shared_buffers to let's say 512M (if RAM is 1Gig may be 
shared_buffers is to be 400M?)? What other settings would you recommend?


Thanks in advance,
Nick.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance