Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 3:59 PM, Peter Geoghegan wrote: > On Tue, Jul 5, 2016 at 12:58 PM, Tom Lane wrote: >> Perhaps we could change the wording of temp_file_limit's description >> from "space that a session can use" to "space that a process can use" >> to help clarify this? > > That's all that I was looking for, really. OK, done that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
Peter Geoghegan writes: > On Tue, Jul 5, 2016 at 12:00 PM, Robert Haas wrote: >> I think that it is not worth mentioning specifically for >> temp_file_limit; to me that seems to be a hole with no bottom. We'll >> end up arguing about which GUCs should mention it specifically and >> there will be no end to it. > I don't think that you need it for any other GUC, so I really don't > know why you're concerned about a slippery slope. FWIW, I agree with Robert on this. It seems just weird to call out temp_file_limit specifically. Also, I don't agree that that's the only interesting per-process resource consumption; max_files_per_process seems much more likely to cause trouble in practice. Perhaps we could change the wording of temp_file_limit's description from "space that a session can use" to "space that a process can use" to help clarify this? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 12:58 PM, Tom Lane wrote: > Perhaps we could change the wording of temp_file_limit's description > from "space that a session can use" to "space that a process can use" > to help clarify this? That's all that I was looking for, really. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 12:00 PM, Robert Haas wrote: > I think that it is not worth mentioning specifically for > temp_file_limit; to me that seems to be a hole with no bottom. We'll > end up arguing about which GUCs should mention it specifically and > there will be no end to it. I don't think that you need it for any other GUC, so I really don't know why you're concerned about a slippery slope. The only other resource GUC that is scoped per session that I can see is temp_buffers, but that doesn't need to change, since parallel workers cannot use temp_buffers directly in practice. max_files_per_process is already clearly per process, so no change needed there either. I don't see a case other than temp_file_limit that appears to be even marginally in need of a specific note. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 1:58 PM, Peter Geoghegan wrote: > On Tue, Jul 5, 2016 at 7:45 AM, Robert Haas wrote: >> Since Peter doesn't seem in a hurry to produce a patch for this issue, >> I wrote one. It is attached. I'll commit this in a day or two if >> nobody objects. > > Sorry about the delay. > > Your patch seems reasonable, but I thought we'd also want to change > "per session" to "per session (with an additional temp_file_limit > allowance within each parallel worker)" for temp_file_limit. > > I think it's worthwhile noting this for temp_file_limit specifically, > since it's explicitly a per session limit, whereas users are quite > used to the idea that work_mem might be doled out multiple times for > multiple executor nodes. I think that it is not worth mentioning specifically for temp_file_limit; to me that seems to be a hole with no bottom. We'll end up arguing about which GUCs should mention it specifically and there will be no end to it. We can see what other people think, but that's my position. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 7:45 AM, Robert Haas wrote: > Since Peter doesn't seem in a hurry to produce a patch for this issue, > I wrote one. It is attached. I'll commit this in a day or two if > nobody objects. Sorry about the delay. Your patch seems reasonable, but I thought we'd also want to change "per session" to "per session (with an additional temp_file_limit allowance within each parallel worker)" for temp_file_limit. I think it's worthwhile noting this for temp_file_limit specifically, since it's explicitly a per session limit, whereas users are quite used to the idea that work_mem might be doled out multiple times for multiple executor nodes. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jun 21, 2016 at 8:15 AM, Robert Haas wrote: > On Mon, Jun 20, 2016 at 11:01 PM, Tom Lane wrote: >> Peter Geoghegan writes: >>> On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: What I'm tempted to do is trying to document that, as a point of policy, parallel query in 9.6 uses up to (workers + 1) times the resources that a single session might use. That includes not only CPU but also things like work_mem and temp file space. This obviously isn't ideal, but it's what could be done by the ship date. >> >>> Where would that be documented, though? Would it need to be noted in >>> the case of each such GUC? >> >> Why can't we just note this in the number-of-workers GUCs? It's not like >> there even *is* a GUC for many of our per-process resource consumption >> behaviors. > > +1. Since Peter doesn't seem in a hurry to produce a patch for this issue, I wrote one. It is attached. I'll commit this in a day or two if nobody objects. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company parallel-workers-guc-doc.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Mon, Jun 20, 2016 at 11:01 PM, Tom Lane wrote: > Peter Geoghegan writes: >> On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: >>> What I'm tempted to do is trying to document that, as a point of >>> policy, parallel query in 9.6 uses up to (workers + 1) times the >>> resources that a single session might use. That includes not only CPU >>> but also things like work_mem and temp file space. This obviously >>> isn't ideal, but it's what could be done by the ship date. > >> Where would that be documented, though? Would it need to be noted in >> the case of each such GUC? > > Why can't we just note this in the number-of-workers GUCs? It's not like > there even *is* a GUC for many of our per-process resource consumption > behaviors. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
Peter Geoghegan writes: > On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: >> What I'm tempted to do is trying to document that, as a point of >> policy, parallel query in 9.6 uses up to (workers + 1) times the >> resources that a single session might use. That includes not only CPU >> but also things like work_mem and temp file space. This obviously >> isn't ideal, but it's what could be done by the ship date. > Where would that be documented, though? Would it need to be noted in > the case of each such GUC? Why can't we just note this in the number-of-workers GUCs? It's not like there even *is* a GUC for many of our per-process resource consumption behaviors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: > What I'm tempted to do is trying to document that, as a point of > policy, parallel query in 9.6 uses up to (workers + 1) times the > resources that a single session might use. That includes not only CPU > but also things like work_mem and temp file space. This obviously > isn't ideal, but it's what could be done by the ship date. Where would that be documented, though? Would it need to be noted in the case of each such GUC? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jun 7, 2016 at 8:32 AM, Robert Haas wrote: > You previously offered to write a patch for this. Are you still > planning to do that? OK, I'll get to that in the next few days. I'm slightly concerned that I might have missed a real problem in the code. I'll need to examine the issue more closely. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Sun, Jun 5, 2016 at 4:32 PM, Peter Geoghegan wrote: > On Wed, May 18, 2016 at 12:01 PM, Peter Geoghegan wrote: >>> I think for 9.6 we just have to document this issue. In the next >>> release, we could (and might well want to) try to do something more >>> clever. >> >> Works for me. You may wish to update comments within fd.c at the same time. > > I've created a 9.6 open issue for this. You previously offered to write a patch for this. Are you still planning to do that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 12:01 PM, Peter Geoghegan wrote: >> I think for 9.6 we just have to document this issue. In the next >> release, we could (and might well want to) try to do something more >> clever. > > Works for me. You may wish to update comments within fd.c at the same time. I've created a 9.6 open issue for this. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: >> I'll write a patch to fix the issue, if there is a consensus on a solution. > > I think for 9.6 we just have to document this issue. In the next > release, we could (and might well want to) try to do something more > clever. Works for me. You may wish to update comments within fd.c at the same time. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On 18 May 2016 at 22:40, Robert Haas wrote: > On Tue, May 17, 2016 at 6:40 PM, Peter Geoghegan wrote: > > On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan wrote: > >> Fundamentally, since temporary_files_size enforcement simply > >> piggy-backs on low-level fd.c file management, without any > >> consideration of what the temp files contain, it'll be hard to be sure > >> that parallel workers will not have issues. I think it'll be far > >> easier to fix the problem then it would be to figure out if it's > >> possible to get away with it. > > > > I'll write a patch to fix the issue, if there is a consensus on a > solution. > > I think for 9.6 we just have to document this issue. In the next > release, we could (and might well want to) try to do something more > clever. > > What I'm tempted to do is trying to document that, as a point of > policy, parallel query in 9.6 uses up to (workers + 1) times the > resources that a single session might use. That includes not only CPU > but also things like work_mem and temp file space. This obviously > isn't ideal, but it's what could be done by the ship date. > I was asked (internally I believe) about abuse of work_mem during my work on parallel aggregates, at the time I didn't really feel like I was abusing that any more than parallel hash join was. My thought was that one day it would be nice if work_mem could be granted to a query and we had some query marshal system which ensured that the total grants did not exceed the server wide memory dedicated to work_mem. Of course that's lots of work, as there's at least one node (HashAgg) which can still blow out work_mem for bad estimates. For this release, I assumed it wouldn't be too big an issue if we're shipping with max_parallel_degree = 0 as we could just decorate the docs with some warnings about work_mem is per node / per worker to caution users setting this setting any higher. That might be enough to give us wriggle from for the future where we can make improvements, so I agree with Robert, the docs seem like the best solution for 9.6. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, May 17, 2016 at 6:40 PM, Peter Geoghegan wrote: > On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan wrote: >> Fundamentally, since temporary_files_size enforcement simply >> piggy-backs on low-level fd.c file management, without any >> consideration of what the temp files contain, it'll be hard to be sure >> that parallel workers will not have issues. I think it'll be far >> easier to fix the problem then it would be to figure out if it's >> possible to get away with it. > > I'll write a patch to fix the issue, if there is a consensus on a solution. I think for 9.6 we just have to document this issue. In the next release, we could (and might well want to) try to do something more clever. What I'm tempted to do is trying to document that, as a point of policy, parallel query in 9.6 uses up to (workers + 1) times the resources that a single session might use. That includes not only CPU but also things like work_mem and temp file space. This obviously isn't ideal, but it's what could be done by the ship date. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan wrote: > Fundamentally, since temporary_files_size enforcement simply > piggy-backs on low-level fd.c file management, without any > consideration of what the temp files contain, it'll be hard to be sure > that parallel workers will not have issues. I think it'll be far > easier to fix the problem then it would be to figure out if it's > possible to get away with it. I'll write a patch to fix the issue, if there is a consensus on a solution. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, May 17, 2016 at 1:53 PM, Amit Kapila wrote: > What kind of special treatment are you expecting for temporary_files_size, > also why do you think it is required? Currently neither we build hash in > parallel nor there is any form of parallel sort work. I expect only that temporary_files_size be described accurately, and have new behavior for parallel query that is not surprising. There are probably several solutions that would meet that standard, and I am not attached to any particular one of them. I wrote a parallel sort patch already (CREATE INDEX for the B-Tree AM), and will post it at an opportune time. So, I think we can expect your observations about there not being parallel sort work to no longer apply in a future release, which we should get ahead of now. Also, won't parallel workers that build their own copy of the hash table (for a hash join) also use their own temp files, if there is a need for temp files? I think parallel query will end up sharing temp files fairly often, and not just out of convenience to implementers (that is, not just to avoid using shared memory extensively). Fundamentally, since temporary_files_size enforcement simply piggy-backs on low-level fd.c file management, without any consideration of what the temp files contain, it'll be hard to be sure that parallel workers will not have issues. I think it'll be far easier to fix the problem then it would be to figure out if it's possible to get away with it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 12:55 AM, Peter Geoghegan wrote: > > temp_file_limit "specifies the maximum amount of disk space that a > session can use for temporary files, such as sort and hash temporary > files", according to the documentation. That's not true when parallel > query is in use, since the global variable temporary_files_size > receives no special treatment for parallel query. > What kind of special treatment are you expecting for temporary_files_size, also why do you think it is required? Currently neither we build hash in parallel nor there is any form of parallel sort work. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com