Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
I'm okay with any of the proposed designs or with dropping the idea. Closing the loop on a few facts: On Sat, Mar 07, 2015 at 04:34:41PM -0600, Jim Nasby wrote: If we go that route, does it still make sense to explicitly use repalloc_huge? It will just cut over to that at some point (128M?) anyway, and if you're vacuuming a small relation presumably it's not worth messing with. repalloc_huge() differs from repalloc() only in the size ceiling beyond which they raise errors. repalloc() raises errors for requests larger than ~1 GiB, while repalloc_huge() is practically unconstrained on 64-bit and permits up to ~2 GiB on 32-bit. On Mon, Mar 09, 2015 at 05:12:22PM -0500, Jim Nasby wrote: Speaking of which... people have referenced allowing 1GB of dead tuples, which means allowing maintenance_work_mem MAX_KILOBYTES. The comment for that says: /* upper limit for GUC variables measured in kilobytes of memory */ /* note that various places assume the byte size fits in a long variable */ So I'm not sure how well that will work. I think that needs to be a separate patch. On LP64 platforms, MAX_KILOBYTES already covers maintenance_work_mem values up to ~2 TiB. Raising the limit on ILP32 platforms is not worth the trouble. Raising the limit on LLP64 platforms is a valid but separate project. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
Robert Haas wrote: On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. That has the chance of considerably increasing the peak memory usage though, as you obviously need both the old and new allocation during the repalloc(). And in contrast to the unused memory at the tail of the array, which will usually not be actually allocated by the OS at all, this is memory that's actually read/written respectively. Yeah, I'm not sure why everybody wants to repalloc() that instead of making several separate allocations as needed. That would avoid increasing peak memory usage, and would avoid any risk of needing to copy the whole array. Also, you could grow in smaller chunks, like 64MB at a time instead of 1GB or more at a time. Doubling an allocation that's already 1GB or more gets big in a hurry. Yeah, a chunk list rather than a single chunk seemed a good idea to me too. Also, I think the idea of starting with an allocation assuming some small number of dead tuples per page made sense -- and by the time that space has run out, you have a better estimate of actual density of dead tuples, so you can do the second allocation based on that new estimate (but perhaps clamp it at say 1 GB, just in case you just scanned a portion of the table with an unusually high percentage of dead tuples.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. That has the chance of considerably increasing the peak memory usage though, as you obviously need both the old and new allocation during the repalloc(). And in contrast to the unused memory at the tail of the array, which will usually not be actually allocated by the OS at all, this is memory that's actually read/written respectively. Yeah, I'm not sure why everybody wants to repalloc() that instead of making several separate allocations as needed. That would avoid increasing peak memory usage, and would avoid any risk of needing to copy the whole array. Also, you could grow in smaller chunks, like 64MB at a time instead of 1GB or more at a time. Doubling an allocation that's already 1GB or more gets big in a hurry. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 3/9/15 12:28 PM, Alvaro Herrera wrote: Robert Haas wrote: On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. That has the chance of considerably increasing the peak memory usage though, as you obviously need both the old and new allocation during the repalloc(). And in contrast to the unused memory at the tail of the array, which will usually not be actually allocated by the OS at all, this is memory that's actually read/written respectively. Yeah, I'm not sure why everybody wants to repalloc() that instead of making several separate allocations as needed. That would avoid increasing peak memory usage, and would avoid any risk of needing to copy the whole array. Also, you could grow in smaller chunks, like 64MB at a time instead of 1GB or more at a time. Doubling an allocation that's already 1GB or more gets big in a hurry. Yeah, a chunk list rather than a single chunk seemed a good idea to me too. That will be significantly more code than a simple repalloc, but as long as people are OK with that I can go that route. Also, I think the idea of starting with an allocation assuming some small number of dead tuples per page made sense -- and by the time that space has run out, you have a better estimate of actual density of dead tuples, so you can do the second allocation based on that new estimate (but perhaps clamp it at say 1 GB, just in case you just scanned a portion of the table with an unusually high percentage of dead tuples.) I like the idea of using a fresh idea of dead tuple density when we need more space. We would also clamp this at maintenance_work_mem, not a fixed 1GB. Speaking of which... people have referenced allowing 1GB of dead tuples, which means allowing maintenance_work_mem MAX_KILOBYTES. The comment for that says: /* upper limit for GUC variables measured in kilobytes of memory */ /* note that various places assume the byte size fits in a long variable */ So I'm not sure how well that will work. I think that needs to be a separate patch. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 2015-03-09 17:12:22 -0500, Jim Nasby wrote: That will be significantly more code than a simple repalloc, but as long as people are OK with that I can go that route. I still would like to see some actual evidence of need for change before we invest more time/code here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 3/7/15 12:48 AM, Noah Misch wrote: On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. +1. Start far below 64 MiB; grow geometrically using repalloc_huge(); cap growth at vac_work_mem. +1 for repalloc'ing at need, but I'm not sure about the start far below 64 MiB part. 64MB is a pretty small amount on nearly any machine these days (and for anybody who thinks it isn't, that's why maintenance_work_mem is a tunable). True; nothing would explode, especially since the allocation would be strictly smaller than it is today. However, I can't think of a place in PostgreSQL where a growable allocation begins so aggressively, nor a reason to break new ground in that respect. For comparison, tuplestore/tuplesort start memtupsize at 1 KiB. (One could make a separate case for that practice being wrong.) A different line of thought is that it would seem to make sense to have the initial allocation vary depending on the relation size. For instance, you could assume there might be 10 dead tuples per page, and hence try to alloc that much if it fits in vac_work_mem. Sounds better than a fixed 64 MiB start, though I'm not sure it's better than a fixed 256 KiB start. In the case of vacuum, I think we presumably have a pretty good indicator of how much space we should need; namely reltuples * autovacuum_scale_factor. There shouldn't be too much more space needed than that if autovac is keeping up with things. If we go that route, does it still make sense to explicitly use repalloc_huge? It will just cut over to that at some point (128M?) anyway, and if you're vacuuming a small relation presumably it's not worth messing with. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. That has the chance of considerably increasing the peak memory usage though, as you obviously need both the old and new allocation during the repalloc(). And in contrast to the unused memory at the tail of the array, which will usually not be actually allocated by the OS at all, this is memory that's actually read/written respectively. I've to say, I'm rather unconvinced that it's worth changing stuff around here. If overcommit is enabled, vacuum won't fail unless the memory is actually used (= no problem). If overcommit is disabled and you get memory allocations, you're probably already running awfully close to the maximum of your configuration and you're better off adjusting it. I'm not aware of any field complaints about this and thus I'm not sure it's worth fiddling with this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 3/7/15 4:49 PM, Andres Freund wrote: On 2015-03-05 15:28:12 -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. That has the chance of considerably increasing the peak memory usage though, as you obviously need both the old and new allocation during the repalloc(). And in contrast to the unused memory at the tail of the array, which will usually not be actually allocated by the OS at all, this is memory that's actually read/written respectively. That leaves me wondering why we bother with dynamic resizing in other areas (like sorts, for example) then? Why not just palloc work_mem and be done with it? What makes those cases different? I've to say, I'm rather unconvinced that it's worth changing stuff around here. If overcommit is enabled, vacuum won't fail unless the memory is actually used (= no problem). If overcommit is disabled and you get memory allocations, you're probably already running awfully close to the maximum of your configuration and you're better off adjusting it. I'm not aware of any field complaints about this and thus I'm not sure it's worth fiddling with this. Perhaps; Noah seems to be the one one who's seen this. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote: On 3/4/15 9:10 AM, Robert Haas wrote: On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote: Could the large allocation[2] for the dead tuple array in lazy_space_alloc cause problems with linux OOM? [1] and some other things I've read indicate that a large mmap will count towards total system memory, including producing a failure if overcommit is disabled. I believe that this is possible. I have seen that in the field, albeit on a server with a 10 GiB allocation limit, years ago. Would it be worth avoiding the full size allocation when we can? Maybe. I'm not aware of any evidence that this is an actual problem as opposed to a theoretical one. vacrelstats-dead_tuples is limited to a 1GB allocation, which is not a trivial amount of memory, but it's not huge, either. But we could consider changing the representation from a single flat array to a list of chunks, with each chunk capped at say 64MB. That would not only reduce the amount of memory that we I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. +1. Start far below 64 MiB; grow geometrically using repalloc_huge(); cap growth at vac_work_mem. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
Noah Misch n...@leadboat.com writes: On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. +1. Start far below 64 MiB; grow geometrically using repalloc_huge(); cap growth at vac_work_mem. +1 for repalloc'ing at need, but I'm not sure about the start far below 64 MiB part. 64MB is a pretty small amount on nearly any machine these days (and for anybody who thinks it isn't, that's why maintenance_work_mem is a tunable). I think min(64MB, vac_work_mem) might be a reasonable start point. A different line of thought is that it would seem to make sense to have the initial allocation vary depending on the relation size. For instance, you could assume there might be 10 dead tuples per page, and hence try to alloc that much if it fits in vac_work_mem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote: I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. +1. Start far below 64 MiB; grow geometrically using repalloc_huge(); cap growth at vac_work_mem. +1 for repalloc'ing at need, but I'm not sure about the start far below 64 MiB part. 64MB is a pretty small amount on nearly any machine these days (and for anybody who thinks it isn't, that's why maintenance_work_mem is a tunable). True; nothing would explode, especially since the allocation would be strictly smaller than it is today. However, I can't think of a place in PostgreSQL where a growable allocation begins so aggressively, nor a reason to break new ground in that respect. For comparison, tuplestore/tuplesort start memtupsize at 1 KiB. (One could make a separate case for that practice being wrong.) A different line of thought is that it would seem to make sense to have the initial allocation vary depending on the relation size. For instance, you could assume there might be 10 dead tuples per page, and hence try to alloc that much if it fits in vac_work_mem. Sounds better than a fixed 64 MiB start, though I'm not sure it's better than a fixed 256 KiB start. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On 3/4/15 9:10 AM, Robert Haas wrote: On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote: Could the large allocation[2] for the dead tuple array in lazy_space_alloc cause problems with linux OOM? [1] and some other things I've read indicate that a large mmap will count towards total system memory, including producing a failure if overcommit is disabled. I believe that this is possible. Would it be worth avoiding the full size allocation when we can? Maybe. I'm not aware of any evidence that this is an actual problem as opposed to a theoretical one. vacrelstats-dead_tuples is limited to a 1GB allocation, which is not a trivial amount of memory, but it's not huge, either. But we could consider changing the representation from a single flat array to a list of chunks, with each chunk capped at say 64MB. That would not only reduce the amount of memory that we I was thinking the simpler route of just repalloc'ing... the memcpy would suck, but much less so than the extra index pass. 64M gets us 11M tuples, which probably isn't very common. needlessly allocate, but would allow autovacuum to make use of more than 1GB of maintenance_work_mem, which it looks like it currently can't. I'm not sure that's a huge problem right now either, because I'm confused... how autovacuum is special in this regard? Each worker can use up to 1G, just like a regular vacuum, right? Or are you just saying getting rid of the 1G limit would be good? it's probably rare to vacuum at able with more than 1GB / 6 = 178,956,970 dead tuples in it, but it would certainly suck if you did and if the current 1GB limit forced you to do multiple vacuum passes. Yeah, with 100+ GB machines not that uncommon today perhaps it's worth significantly upping this. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit
On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote: Could the large allocation[2] for the dead tuple array in lazy_space_alloc cause problems with linux OOM? [1] and some other things I've read indicate that a large mmap will count towards total system memory, including producing a failure if overcommit is disabled. I believe that this is possible. Would it be worth avoiding the full size allocation when we can? Maybe. I'm not aware of any evidence that this is an actual problem as opposed to a theoretical one. vacrelstats-dead_tuples is limited to a 1GB allocation, which is not a trivial amount of memory, but it's not huge, either. But we could consider changing the representation from a single flat array to a list of chunks, with each chunk capped at say 64MB. That would not only reduce the amount of memory that we needlessly allocate, but would allow autovacuum to make use of more than 1GB of maintenance_work_mem, which it looks like it currently can't. I'm not sure that's a huge problem right now either, because it's probably rare to vacuum at able with more than 1GB / 6 = 178,956,970 dead tuples in it, but it would certainly suck if you did and if the current 1GB limit forced you to do multiple vacuum passes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers