Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-03-09 Thread Mark Kirkwood

New version:

- adds documentation
- adds category RESOURCES_DISK


temp-files-v2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Robert Haas
On Thu, Feb 17, 2011 at 10:17 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 This is WIP, it does seem to work ok, but some areas/choices I'm not
 entirely clear about are mentioned in the patch itself. Mainly:

 - name of the guc... better suggestions welcome
 - datatype for the guc - real would be good, but at the moment the nice
 parse KB/MB/GB business only works for int

Please add this to the next CommitFest:

https://commitfest.postgresql.org/action/commitfest_view/open

With respect to the datatype of the GUC, int seems clearly correct.
Why would you want to use a float?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Josh Berkus
Mark,

 I got to wonder how hard this would be to do in Postgres, and attached
 is my (WIP) attempt. It provides a guc (max_temp_files_size) to limit
 the size of all temp files for a backend and amends fd.c cancel
 execution if the total size of temporary files exceeds this.

First, are we just talking about pgsql_tmp here, or the pg_temp
tablespace?  That is, just sort/hash files, or temporary tables as well?

Second, the main issue with these sorts of macro-counters has generally
been their locking effect on concurrent activity.  Have you been able to
run any tests which try to run lots of small externally-sorted queries
at once on a multi-core machine, and checked the effect on throughput?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Robert Haas
On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkus j...@agliodbs.com wrote:
 Second, the main issue with these sorts of macro-counters has generally
 been their locking effect on concurrent activity.  Have you been able to
 run any tests which try to run lots of small externally-sorted queries
 at once on a multi-core machine, and checked the effect on throughput?

Since it's apparently a per-backend limit, that doesn't seem relevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Josh Berkus
On 2/18/11 11:44 AM, Robert Haas wrote:
 On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkus j...@agliodbs.com wrote:
 Second, the main issue with these sorts of macro-counters has generally
 been their locking effect on concurrent activity.  Have you been able to
 run any tests which try to run lots of small externally-sorted queries
 at once on a multi-core machine, and checked the effect on throughput?
 
 Since it's apparently a per-backend limit, that doesn't seem relevant.

Oh!  I missed that.

What good would a per-backend limit do, though?

And what happens with queries which exceed the limit?  Error message?  Wait?


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Robert Haas
On Fri, Feb 18, 2011 at 2:48 PM, Josh Berkus j...@agliodbs.com wrote:
 On 2/18/11 11:44 AM, Robert Haas wrote:
 On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkus j...@agliodbs.com wrote:
 Second, the main issue with these sorts of macro-counters has generally
 been their locking effect on concurrent activity.  Have you been able to
 run any tests which try to run lots of small externally-sorted queries
 at once on a multi-core machine, and checked the effect on throughput?

 Since it's apparently a per-backend limit, that doesn't seem relevant.

 Oh!  I missed that.

 What good would a per-backend limit do, though?

 And what happens with queries which exceed the limit?  Error message?  Wait?

Well I have not RTFP, but I assume it'd throw an error.  Waiting isn't
going to accomplish anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Mark Kirkwood

On 19/02/11 02:34, Robert Haas wrote:


Please add this to the next CommitFest:

https://commitfest.postgresql.org/action/commitfest_view/open

With respect to the datatype of the GUC, int seems clearly correct.
Why would you want to use a float?



Added. With respect to the datatype, using int with KB units means the 
largest temp size is approx 2047GB - I know that seems like a lot now... 
but maybe someone out there wants (say) their temp files limited to 
4096GB :-)


Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Mark Kirkwood

On 19/02/11 08:48, Josh Berkus wrote:

On 2/18/11 11:44 AM, Robert Haas wrote:

On Fri, Feb 18, 2011 at 2:41 PM, Josh Berkusj...@agliodbs.com  wrote:

Second, the main issue with these sorts of macro-counters has generally
been their locking effect on concurrent activity.  Have you been able to
run any tests which try to run lots of small externally-sorted queries
at once on a multi-core machine, and checked the effect on throughput?

Since it's apparently a per-backend limit, that doesn't seem relevant.

Oh!  I missed that.

What good would a per-backend limit do, though?

And what happens with queries which exceed the limit?  Error message?  Wait?




By temp files I mean those in pgsql_tmp. LOL - A backend limit will 
have the same sort of usefulness as work_mem does - i.e stop a query 
eating all your filesystem space or bringing a server to its knees with 
io load. We have had this happen twice - I know of other folks who have too.


Obviously you need to do the same sort of arithmetic as you do with 
work_mem to decide on a reasonable limit to cope with multiple users 
creating temp files. Conservative dbas might want to set it to (free 
disk)/max_connections etc. Obviously for ad-hoc systems it is a bit more 
challenging - but having a per-backend limit is way better than having 
what we have now, which is ... errr... nothing.


As an example I'd find it useful to avoid badly written queries causing 
too much io load on the db backend of (say) a web system (i.e such a 
system should not *have* queries that want to use that much resource).


To answer the other question, what happens when the limit is exceeded is 
modeled on statement timeout, i.e query is canceled and a message says 
why (exceeded temp files size).


Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Josh Berkus

 Obviously you need to do the same sort of arithmetic as you do with
 work_mem to decide on a reasonable limit to cope with multiple users
 creating temp files. Conservative dbas might want to set it to (free
 disk)/max_connections etc. Obviously for ad-hoc systems it is a bit more
 challenging - but having a per-backend limit is way better than having
 what we have now, which is ... errr... nothing.

Agreed.

 To answer the other question, what happens when the limit is exceeded is
 modeled on statement timeout, i.e query is canceled and a message says
 why (exceeded temp files size).

When does this happen?  When you try to allocate the file, or when it
does the original tape sort estimate?

The disadvantage of the former is that the user waited for minutes in
order to have their query cancelled.  The disadvantage of the latter is
that the estimate isn't remotely accurate.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Tom Lane
Mark Kirkwood mark.kirkw...@catalyst.net.nz writes:
 Added. With respect to the datatype, using int with KB units means the 
 largest temp size is approx 2047GB - I know that seems like a lot now... 
 but maybe someone out there wants (say) their temp files limited to 
 4096GB :-)

[ shrug... ]  Sorry, I can't imagine a use case for this parameter where
the value isn't a *lot* less than that.  Maybe if it were global, but
not if it's per-backend.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Mark Kirkwood

On 19/02/11 10:38, Josh Berkus wrote:



To answer the other question, what happens when the limit is exceeded is
modeled on statement timeout, i.e query is canceled and a message says
why (exceeded temp files size).

When does this happen?  When you try to allocate the file, or when it
does the original tape sort estimate?

The disadvantage of the former is that the user waited for minutes in
order to have their query cancelled.  The disadvantage of the latter is
that the estimate isn't remotely accurate.



Neither - it checks each write (I think this is pretty cheap - adds two 
int and double + operations and a  /,  operation to FileWrite). If the 
check shows you've written more than the limit, you get canceled. So you 
can exceed the limit by 1 buffer size.


Yeah, the disadvantage is that (like statement timeout) it is a 'bottom 
of the cliff' type of protection. The advantage is there are no false 
positives...


Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Josh Berkus

 Yeah, the disadvantage is that (like statement timeout) it is a 'bottom
 of the cliff' type of protection. The advantage is there are no false
 positives...

Yeah, just trying to get a handle on the proposed feature.  I have no
objections; it seems like a harmless limit for most people, and useful
to a few.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-18 Thread Mark Kirkwood

On 19/02/11 11:30, Josh Berkus wrote:

Yeah, the disadvantage is that (like statement timeout) it is a 'bottom
of the cliff' type of protection. The advantage is there are no false
positives...

Yeah, just trying to get a handle on the proposed feature.  I have no
objections; it seems like a harmless limit for most people, and useful
to a few.

No worries and sorry, I should have used the per backend phrase in the 
title to help clarify what was intended.


Cheers

Mark



[HACKERS] WIP - Add ability to constrain backend temporary file space

2011-02-17 Thread Mark Kirkwood
Recently two systems here have suffered severely with excessive 
temporary file creation during query execution. In one case it could 
have been avoided by more stringent qa before application code release, 
whereas the other is an ad-hoc system, and err...yes.


In both cases it would have been great to be able to constrain the 
amount of temporary file space a query could use. In theory you can sort 
of do this with the various ulimits, but it seems pretty impractical as 
at that level all files look the same and you'd be just as likely to 
unexpectedly crippled the entire db a few weeks later when a table grows...


I got to wonder how hard this would be to do in Postgres, and attached 
is my (WIP) attempt. It provides a guc (max_temp_files_size) to limit 
the size of all temp files for a backend and amends fd.c cancel 
execution if the total size of temporary files exceeds this.


This is WIP, it does seem to work ok, but some areas/choices I'm not 
entirely clear about are mentioned in the patch itself. Mainly:


- name of the guc... better suggestions welcome
- datatype for the guc - real would be good, but at the moment the nice 
parse KB/MB/GB business only works for int


regards

Mark


temp-files-v1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers