Re: [PERFORM] Recommendations for partitioning?

2013-12-30 Thread Sergey Konoplev
On Fri, Dec 20, 2013 at 7:59 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Dave Johansen escribió:
 On Thu, Dec 19, 2013 at 10:27 AM, Scott Marlowe 
 scott.marl...@gmail.comwrote:

  That's pretty much it. What I did was to create the new month table
  and day tables, alter my triggers to reflect this, then move the data
  with insert into / select from query for each old day partition. Then
  once their data is moved you can just drop them. Since you changed the
  triggers first those tables are no long taking input so it's usually
  safe to drop them now.

 It would be nice if there was just a move command, but that seems like
 the type of model that we want and we'll probably move to that.

 Eh.  Why can't you just do something like

 WITH moved AS (
 DELETE FROM src WHERE ..
 RETURNING *
 ) INSERT INTO dst SELECT * FROM moved;

Avero, I think it could be cheaper to do this like it is shown below, correct?

psql dbname -c 'copy src to stdout' | \
psql dbname -c 'copy dst from stdin; truncate src;'

Dave, in case if you need to archive old partitions to compressed
files out of your database you can use this tool [1]. Consult with the
configuration example [2], look at the ARCHIVE_* parameters.

[1] https://github.com/grayhemp/pgcookbook/blob/master/bin/archive_tables.sh
[2] https://github.com/grayhemp/pgcookbook/blob/master/bin/config.sh.example

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-30 Thread Alvaro Herrera
Sergey Konoplev escribió:
 On Fri, Dec 20, 2013 at 7:59 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  Eh.  Why can't you just do something like
 
  WITH moved AS (
  DELETE FROM src WHERE ..
  RETURNING *
  ) INSERT INTO dst SELECT * FROM moved;
 
 Avero, I think it could be cheaper to do this like it is shown below, correct?
 
 psql dbname -c 'copy src to stdout' | \
 psql dbname -c 'copy dst from stdin; truncate src;'

Yes, if you can get rid of the old records by removing or emptying a
partition (or de-inheriting it, as suggested elsewhere in the thread),
that's better than DELETE because that way you don't create dead rows to
vacuum later.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread Dave Johansen
On Thu, Dec 19, 2013 at 10:27 AM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Thu, Dec 19, 2013 at 9:53 AM, Dave Johansen davejohan...@gmail.com
 wrote:
 
  On Sat, Dec 7, 2013 at 1:37 PM, Scott Marlowe scott.marl...@gmail.com
  wrote:
  I'll add that you can use assymetric partitioning if you tend to do a
  lot of more fine grained queries on recent data and more big roll up
  on older ones. I.e. partition by month except for the last 30 days, do
  it by day etc. Then at the end of the month roll all the days into a
  month partition and delete them.
 
  This sounds like a great solution for us. Is there some trick to roll the
  records from one partition to another? Or is the only way just a SELECT
 INTO
  followed by a DELETE?

 That's pretty much it. What I did was to create the new month table
 and day tables, alter my triggers to reflect this, then move the data
 with insert into / select from query for each old day partition. Then
 once their data is moved you can just drop them. Since you changed the
 triggers first those tables are no long taking input so it's usually
 safe to drop them now.


It would be nice if there was just a move command, but that seems like
the type of model that we want and we'll probably move to that.

On a semi-related note, I was trying to move from the single large table to
the partitions and doing INSERT INTO SELECT * FROM WHERE ... was running
very slow (I believe because of the same index issue that we've been
running into), so then I tried creating a BEFORE INSERT trigger that was
working and using pg_restore on an -Fc dump. The documentation says that
triggers are executed as part of a COPY FROM (
http://www.postgresql.org/docs/8.4/static/sql-copy.html ), but it doesn't
appear that the trigger was honored because all of the data was put into
the base table and all of the partitions are empty.

Is there a way that I can run pg_restore that will properly honor the
trigger? Or do I just have to create a new INSERTs dump?

Thanks,
Dave


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread bricklen
On Fri, Dec 20, 2013 at 7:52 AM, Dave Johansen davejohan...@gmail.comwrote:

 It would be nice if there was just a move command, but that seems like
 the type of model that we want and we'll probably move to that.


I haven't been following this thread, but this comment caught my eye. Are
you after the NO INHERIT command?
http://www.postgresql.org/docs/current/static/sql-altertable.html Search
for the NO INHERIT clause -- it will allow you to detach a child table
from an inherited parent which can then archive or copy into another table.
Inserting into the rolled-up partition was already mentioned upthread I see.


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread Alvaro Herrera
Dave Johansen escribió:
 On Thu, Dec 19, 2013 at 10:27 AM, Scott Marlowe 
 scott.marl...@gmail.comwrote:

  That's pretty much it. What I did was to create the new month table
  and day tables, alter my triggers to reflect this, then move the data
  with insert into / select from query for each old day partition. Then
  once their data is moved you can just drop them. Since you changed the
  triggers first those tables are no long taking input so it's usually
  safe to drop them now.
 
 It would be nice if there was just a move command, but that seems like
 the type of model that we want and we'll probably move to that.

Eh.  Why can't you just do something like

WITH moved AS (
DELETE FROM src WHERE ..
RETURNING *
) INSERT INTO dst SELECT * FROM moved;

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread Shaun Thomas

On 12/20/2013 09:59 AM, Alvaro Herrera wrote:


WITH moved AS (
DELETE FROM src WHERE ..
RETURNING *
) INSERT INTO dst SELECT * FROM moved;


I know that's effectively an atomic action, but I'd feel a lot more 
comfortable reversing that logic so the delete is based on the results 
of the insert.


WITH saved AS (
INSERT INTO dst
SELECT * FROM src WHERE ...
RETURNING *
)
DELETE FROM src
 WHERE ...;

I'll admit yours is cleaner, though. :)

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread Dave Johansen
On Fri, Dec 20, 2013 at 9:18 AM, Shaun Thomas stho...@optionshouse.comwrote:

 On 12/20/2013 09:59 AM, Alvaro Herrera wrote:

  WITH moved AS (
 DELETE FROM src WHERE ..
 RETURNING *
 ) INSERT INTO dst SELECT * FROM moved;


 I know that's effectively an atomic action, but I'd feel a lot more
 comfortable reversing that logic so the delete is based on the results of
 the insert.

 WITH saved AS (
 INSERT INTO dst
 SELECT * FROM src WHERE ...
 RETURNING *
 )
 DELETE FROM src
  WHERE ...;

 I'll admit yours is cleaner, though. :)


That is a good idea. I didn't even realize that there was such a command,
so I'll definitely use those.


Re: [PERFORM] Recommendations for partitioning?

2013-12-20 Thread Dave Johansen
On Fri, Dec 20, 2013 at 8:52 AM, Dave Johansen davejohan...@gmail.comwrote:

 On Thu, Dec 19, 2013 at 10:27 AM, Scott Marlowe 
 scott.marl...@gmail.comwrote:

 On Thu, Dec 19, 2013 at 9:53 AM, Dave Johansen davejohan...@gmail.com
 wrote:
 
  On Sat, Dec 7, 2013 at 1:37 PM, Scott Marlowe scott.marl...@gmail.com
  wrote:
  I'll add that you can use assymetric partitioning if you tend to do a
  lot of more fine grained queries on recent data and more big roll up
  on older ones. I.e. partition by month except for the last 30 days, do
  it by day etc. Then at the end of the month roll all the days into a
  month partition and delete them.
 
  This sounds like a great solution for us. Is there some trick to roll
 the
  records from one partition to another? Or is the only way just a SELECT
 INTO
  followed by a DELETE?

 That's pretty much it. What I did was to create the new month table
 and day tables, alter my triggers to reflect this, then move the data
 with insert into / select from query for each old day partition. Then
 once their data is moved you can just drop them. Since you changed the
 triggers first those tables are no long taking input so it's usually
 safe to drop them now.


 It would be nice if there was just a move command, but that seems like
 the type of model that we want and we'll probably move to that.

 On a semi-related note, I was trying to move from the single large table
 to the partitions and doing INSERT INTO SELECT * FROM WHERE ... was running
 very slow (I believe because of the same index issue that we've been
 running into), so then I tried creating a BEFORE INSERT trigger that was
 working and using pg_restore on an -Fc dump. The documentation says that
 triggers are executed as part of a COPY FROM (
 http://www.postgresql.org/docs/8.4/static/sql-copy.html ), but it doesn't
 appear that the trigger was honored because all of the data was put into
 the base table and all of the partitions are empty.

 Is there a way that I can run pg_restore that will properly honor the
 trigger? Or do I just have to create a new INSERTs dump?


It turns out that this was an error on my part. I was using an old script
to do the restore and it had --disable-triggers to prevent the foreign keys
from being checked and that was the actual source of my problem.


Re: [PERFORM] Recommendations for partitioning?

2013-12-19 Thread Dave Johansen
Sorry for the delay response. We had some hardware/configuration issues
that appear to be solved now, so now we're starting to actually play with
modifying the database.

On Sat, Dec 7, 2013 at 1:29 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Thu, Dec 5, 2013 at 7:36 AM, Dave Johansen davejohan...@gmail.comwrote:

 I'm managing a database that is adding about 10-20M records per day to a
 table and time is a core part of most queries,



 What is the nature of how the time column is used in the queries?
 Depending on how it is used, you might not get much improvement at all, or
 you might get N fold improvement, or you might find that re-designing your
 indexes could get you the same query improvement that partitioning would,
 but with less headache.


The time column is usually used to calculate statistics, find/analyze
duplicates, analyze data contents, etc on a specific time window. So there
will be queries with GROUP BY and WINDOWs with a specific time filter in
the where clause.



 so I've been looking into seeing if I need to start using partitioning
 based on the time column and I've found these general guidelines:

 Don't use more than about 50 paritions (
 http://www.postgresql.org/message-id/17147.1271696...@sss.pgh.pa.us )
  Use triggers to make the interface easier (
 https://wiki.postgresql.org/wiki/Table_partitioning#Trigger-based and
 http://stackoverflow.com/questions/16049396/postgres-partition-by-week )


 Using triggers slows INSERTs down by a lot (unless they were already slow
 due to the need to hit disk to maintain the indexes or something like
 that).  Are you sure you can handle that slow down, given your insertion
 rate?  You could get the best of both worlds by having your bulk loaders
 target the correct partition directly, but also have the triggers on the
 parent table for any programs that don't get the message.


Inserting directly into the correct partition whenever possible and leaving
the trigger on the parent table seems like the best option.



 The only data I found fell inline with what you'd expect (i.e. speeds up
 selects but slows down inserts/updates
 http://www.if-not-true-then-false.com/2009/performance-testing-between-partitioned-and-non-partitioned-postgresql-tables-part-3/)



 One of the big benefits of partitioning can be to speed up insertions
 tremendously, by keeping the hot part of the indices that need to be
 maintained upon insertion together in shared_buffers.


We insert lots of new data, but rarely modify existing data once it's in
the database, so it sounds like this would be a big benefit for us.




 So I was thinking that partitioning based on month to keep the number of
 partitions low, so that would mean about 0.5G records in each table. Does
 that seem like a reasonable number of records in each partition? Is there
 anything else that I should consider or be aware of?


 How will data be expired?  Hows does the size of one of your intended
 partitions compare to your RAM and shared_buffers.


We add about 10-20 million records per day with each being about 200 bytes
in size (there's a bytea in there with that being the average size) to each
table and there's 64 GB of RAM on the machine.



 Cheers,

 Jeff


On Sat, Dec 7, 2013 at 1:37 PM, Scott Marlowe scott.marl...@gmail.comwrote:
 I'll add that you can use assymetric partitioning if you tend to do a
 lot of more fine grained queries on recent data and more big roll up
 on older ones. I.e. partition by month except for the last 30 days, do
 it by day etc. Then at the end of the month roll all the days into a
 month partition and delete them.

This sounds like a great solution for us. Is there some trick to roll the
records from one partition to another? Or is the only way just a SELECT
INTO followed by a DELETE?

Thanks,
Dave


Re: [PERFORM] Recommendations for partitioning?

2013-12-19 Thread Scott Marlowe
On Thu, Dec 19, 2013 at 9:53 AM, Dave Johansen davejohan...@gmail.com wrote:

 On Sat, Dec 7, 2013 at 1:37 PM, Scott Marlowe scott.marl...@gmail.com
 wrote:
 I'll add that you can use assymetric partitioning if you tend to do a
 lot of more fine grained queries on recent data and more big roll up
 on older ones. I.e. partition by month except for the last 30 days, do
 it by day etc. Then at the end of the month roll all the days into a
 month partition and delete them.

 This sounds like a great solution for us. Is there some trick to roll the
 records from one partition to another? Or is the only way just a SELECT INTO
 followed by a DELETE?

That's pretty much it. What I did was to create the new month table
and day tables, alter my triggers to reflect this, then move the data
with insert into / select from query for each old day partition. Then
once their data is moved you can just drop them. Since you changed the
triggers first those tables are no long taking input so it's usually
safe to drop them now.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-07 Thread desmodemone
Hi Dave,
  About the number of partitions , I didn't have so much
problems with hundreds of partitions ( like 360 days in a year ).
Moreover you could bypass the overhead of trigger with a direct insert on
the partition, also to have a parallel insert without to firing too much
the trigger. Remember to enable the check constraints..
In my opinion it's better you try to have less rows/partition. How much is
the average row length in byte ? If you will have to rebuild indexes , it
will be possible , if the partition it's too big, that the
maintenance_work_mem will be not enough and you will sort on disk.
I think you have to evaluate also to divide the partitions on  different
tablespaces so to spread the i/o on different storage types/number ( and so
on ) and to manage with different strategy the indexes (it's possible the
searches will be different on historical partitions and on live
partitions).
Another strategy it's also, not only to create partitions, but to shard
data between more nodes.


Bye

Mat


2013/12/5 Dave Johansen davejohan...@gmail.com

 I'm managing a database that is adding about 10-20M records per day to a
 table and time is a core part of most queries, so I've been looking into
 seeing if I need to start using partitioning based on the time column and
 I've found these general guidelines:

 Don't use more than about 50 paritions (
 http://www.postgresql.org/message-id/17147.1271696...@sss.pgh.pa.us )
 Use triggers to make the interface easier (
 https://wiki.postgresql.org/wiki/Table_partitioning#Trigger-based and
 http://stackoverflow.com/questions/16049396/postgres-partition-by-week )

 The only data I found fell inline with what you'd expect (i.e. speeds up
 selects but slows down inserts/updates
 http://www.if-not-true-then-false.com/2009/performance-testing-between-partitioned-and-non-partitioned-postgresql-tables-part-3/)

 So I was thinking that partitioning based on month to keep the number of
 partitions low, so that would mean about 0.5G records in each table. Does
 that seem like a reasonable number of records in each partition? Is there
 anything else that I should consider or be aware of?

 Thanks,
 Dave



Re: [PERFORM] Recommendations for partitioning?

2013-12-07 Thread Jeff Janes
On Thu, Dec 5, 2013 at 7:36 AM, Dave Johansen davejohan...@gmail.comwrote:

 I'm managing a database that is adding about 10-20M records per day to a
 table and time is a core part of most queries,



What is the nature of how the time column is used in the queries?
Depending on how it is used, you might not get much improvement at all, or
you might get N fold improvement, or you might find that re-designing your
indexes could get you the same query improvement that partitioning would,
but with less headache.


 so I've been looking into seeing if I need to start using partitioning
 based on the time column and I've found these general guidelines:

 Don't use more than about 50 paritions (
 http://www.postgresql.org/message-id/17147.1271696...@sss.pgh.pa.us )
  Use triggers to make the interface easier (
 https://wiki.postgresql.org/wiki/Table_partitioning#Trigger-based and
 http://stackoverflow.com/questions/16049396/postgres-partition-by-week )


Using triggers slows INSERTs down by a lot (unless they were already slow
due to the need to hit disk to maintain the indexes or something like
that).  Are you sure you can handle that slow down, given your insertion
rate?  You could get the best of both worlds by having your bulk loaders
target the correct partition directly, but also have the triggers on the
parent table for any programs that don't get the message.



 The only data I found fell inline with what you'd expect (i.e. speeds up
 selects but slows down inserts/updates
 http://www.if-not-true-then-false.com/2009/performance-testing-between-partitioned-and-non-partitioned-postgresql-tables-part-3/)



One of the big benefits of partitioning can be to speed up insertions
tremendously, by keeping the hot part of the indices that need to be
maintained upon insertion together in shared_buffers.



 So I was thinking that partitioning based on month to keep the number of
 partitions low, so that would mean about 0.5G records in each table. Does
 that seem like a reasonable number of records in each partition? Is there
 anything else that I should consider or be aware of?


How will data be expired?  Hows does the size of one of your intended
partitions compare to your RAM and shared_buffers.

Cheers,

Jeff


Re: [PERFORM] Recommendations for partitioning?

2013-12-07 Thread Scott Marlowe
On Sat, Dec 7, 2013 at 10:09 AM, desmodemone desmodem...@gmail.com wrote:
 Hi Dave,
   About the number of partitions , I didn't have so much
 problems with hundreds of partitions ( like 360 days in a year ).
 Moreover you could bypass the overhead of trigger with a direct insert on
 the partition, also to have a parallel insert without to firing too much the
 trigger. Remember to enable the check constraints..
 In my opinion it's better you try to have less rows/partition. How much is
 the average row length in byte ? If you will have to rebuild indexes , it
 will be possible , if the partition it's too big, that the
 maintenance_work_mem will be not enough and you will sort on disk.
 I think you have to evaluate also to divide the partitions on  different
 tablespaces so to spread the i/o on different storage types/number ( and so
 on ) and to manage with different strategy the indexes (it's possible the
 searches will be different on historical partitions and on live
 partitions).
 Another strategy it's also, not only to create partitions, but to shard data
 between more nodes.

I agree on the number of partitions. I've run a stats db with daily
partitions with about 2 years data in it with no real problems due to
high numbers of partitions. Somewhere around 1,000 things start to get
slower.

I'll add that you can use assymetric partitioning if you tend to do a
lot of more fine grained queries on recent data and more big roll up
on older ones. I.e. partition by month except for the last 30 days, do
it by day etc. Then at the end of the month roll all the days into a
month partition and delete them.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Recommendations for partitioning?

2013-12-05 Thread Dave Johansen
I'm managing a database that is adding about 10-20M records per day to a
table and time is a core part of most queries, so I've been looking into
seeing if I need to start using partitioning based on the time column and
I've found these general guidelines:

Don't use more than about 50 paritions (
http://www.postgresql.org/message-id/17147.1271696...@sss.pgh.pa.us )
Use triggers to make the interface easier (
https://wiki.postgresql.org/wiki/Table_partitioning#Trigger-based and
http://stackoverflow.com/questions/16049396/postgres-partition-by-week )

The only data I found fell inline with what you'd expect (i.e. speeds up
selects but slows down inserts/updates
http://www.if-not-true-then-false.com/2009/performance-testing-between-partitioned-and-non-partitioned-postgresql-tables-part-3/)

So I was thinking that partitioning based on month to keep the number of
partitions low, so that would mean about 0.5G records in each table. Does
that seem like a reasonable number of records in each partition? Is there
anything else that I should consider or be aware of?

Thanks,
Dave