Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-10 Thread Noah Misch
I'm okay with any of the proposed designs or with dropping the idea.  Closing
the loop on a few facts:

On Sat, Mar 07, 2015 at 04:34:41PM -0600, Jim Nasby wrote:
 If we go that route, does it still make sense to explicitly use
 repalloc_huge? It will just cut over to that at some point (128M?) anyway,
 and if you're vacuuming a small relation presumably it's not worth messing
 with.

repalloc_huge() differs from repalloc() only in the size ceiling beyond which
they raise errors.  repalloc() raises errors for requests larger than ~1 GiB,
while repalloc_huge() is practically unconstrained on 64-bit and permits up to
~2 GiB on 32-bit.

On Mon, Mar 09, 2015 at 05:12:22PM -0500, Jim Nasby wrote:
 Speaking of which... people have referenced allowing  1GB of dead tuples,
 which means allowing maintenance_work_mem  MAX_KILOBYTES. The comment for
 that says:
 
 /* upper limit for GUC variables measured in kilobytes of memory */
 /* note that various places assume the byte size fits in a long variable
 */
 
 So I'm not sure how well that will work. I think that needs to be a separate
 patch.

On LP64 platforms, MAX_KILOBYTES already covers maintenance_work_mem values up
to ~2 TiB.  Raising the limit on ILP32 platforms is not worth the trouble.
Raising the limit on LLP64 platforms is a valid but separate project.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Alvaro Herrera
Robert Haas wrote:
 On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
  I was thinking the simpler route of just repalloc'ing... the memcpy would
  suck, but much less so than the extra index pass. 64M gets us 11M tuples,
  which probably isn't very common.
 
  That has the chance of considerably increasing the peak memory usage
  though, as you obviously need both the old and new allocation during the
  repalloc().
 
  And in contrast to the unused memory at the tail of the array, which
  will usually not be actually allocated by the OS at all, this is memory
  that's actually read/written respectively.
 
 Yeah, I'm not sure why everybody wants to repalloc() that instead of
 making several separate allocations as needed.  That would avoid
 increasing peak memory usage, and would avoid any risk of needing to
 copy the whole array.  Also, you could grow in smaller chunks, like
 64MB at a time instead of 1GB or more at a time.  Doubling an
 allocation that's already 1GB or more gets big in a hurry.

Yeah, a chunk list rather than a single chunk seemed a good idea to me
too.

Also, I think the idea of starting with an allocation assuming some
small number of dead tuples per page made sense -- and by the time that
space has run out, you have a better estimate of actual density of dead
tuples, so you can do the second allocation based on that new estimate
(but perhaps clamp it at say 1 GB, just in case you just scanned a
portion of the table with an unusually high percentage of dead tuples.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Robert Haas
On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

 That has the chance of considerably increasing the peak memory usage
 though, as you obviously need both the old and new allocation during the
 repalloc().

 And in contrast to the unused memory at the tail of the array, which
 will usually not be actually allocated by the OS at all, this is memory
 that's actually read/written respectively.

Yeah, I'm not sure why everybody wants to repalloc() that instead of
making several separate allocations as needed.  That would avoid
increasing peak memory usage, and would avoid any risk of needing to
copy the whole array.  Also, you could grow in smaller chunks, like
64MB at a time instead of 1GB or more at a time.  Doubling an
allocation that's already 1GB or more gets big in a hurry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Jim Nasby

On 3/9/15 12:28 PM, Alvaro Herrera wrote:

Robert Haas wrote:

On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund and...@2ndquadrant.com wrote:

On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.


That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.


Yeah, I'm not sure why everybody wants to repalloc() that instead of
making several separate allocations as needed.  That would avoid
increasing peak memory usage, and would avoid any risk of needing to
copy the whole array.  Also, you could grow in smaller chunks, like
64MB at a time instead of 1GB or more at a time.  Doubling an
allocation that's already 1GB or more gets big in a hurry.


Yeah, a chunk list rather than a single chunk seemed a good idea to me
too.


That will be significantly more code than a simple repalloc, but as long 
as people are OK with that I can go that route.



Also, I think the idea of starting with an allocation assuming some
small number of dead tuples per page made sense -- and by the time that
space has run out, you have a better estimate of actual density of dead
tuples, so you can do the second allocation based on that new estimate
(but perhaps clamp it at say 1 GB, just in case you just scanned a
portion of the table with an unusually high percentage of dead tuples.)


I like the idea of using a fresh idea of dead tuple density when we need 
more space. We would also clamp this at maintenance_work_mem, not a 
fixed 1GB.


Speaking of which... people have referenced allowing  1GB of dead 
tuples, which means allowing maintenance_work_mem  MAX_KILOBYTES. The 
comment for that says:


/* upper limit for GUC variables measured in kilobytes of memory */
/* note that various places assume the byte size fits in a long 
variable */


So I'm not sure how well that will work. I think that needs to be a 
separate patch.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-09 Thread Andres Freund
On 2015-03-09 17:12:22 -0500, Jim Nasby wrote:
 That will be significantly more code than a simple repalloc, but as long as
 people are OK with that I can go that route.

I still would like to see some actual evidence of need for change before
we invest more time/code here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Jim Nasby

On 3/7/15 12:48 AM, Noah Misch wrote:

On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote:

Noah Misch n...@leadboat.com writes:

On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.



+1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
growth at vac_work_mem.


+1 for repalloc'ing at need, but I'm not sure about the start far below
64 MiB part.  64MB is a pretty small amount on nearly any machine these
days (and for anybody who thinks it isn't, that's why maintenance_work_mem
is a tunable).


True; nothing would explode, especially since the allocation would be strictly
smaller than it is today.  However, I can't think of a place in PostgreSQL
where a growable allocation begins so aggressively, nor a reason to break new
ground in that respect.  For comparison, tuplestore/tuplesort start memtupsize
at 1 KiB.  (One could make a separate case for that practice being wrong.)


A different line of thought is that it would seem to make sense to have
the initial allocation vary depending on the relation size.  For instance,
you could assume there might be 10 dead tuples per page, and hence try to
alloc that much if it fits in vac_work_mem.


Sounds better than a fixed 64 MiB start, though I'm not sure it's better than
a fixed 256 KiB start.


In the case of vacuum, I think we presumably have a pretty good 
indicator of how much space we should need; namely reltuples * 
autovacuum_scale_factor. There shouldn't be too much more space needed 
than that if autovac is keeping up with things.


If we go that route, does it still make sense to explicitly use 
repalloc_huge? It will just cut over to that at some point (128M?) 
anyway, and if you're vacuuming a small relation presumably it's not 
worth messing with.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Andres Freund
On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.

I've to say, I'm rather unconvinced that it's worth changing stuff
around here. If overcommit is enabled, vacuum won't fail unless the
memory is actually used (= no problem). If overcommit is disabled and
you get memory allocations, you're probably already running awfully
close to the maximum of your configuration and you're better off
adjusting it.  I'm not aware of any field complaints about this and thus
I'm not sure it's worth fiddling with this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-07 Thread Jim Nasby

On 3/7/15 4:49 PM, Andres Freund wrote:

On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:

I was thinking the simpler route of just repalloc'ing... the memcpy would
suck, but much less so than the extra index pass. 64M gets us 11M tuples,
which probably isn't very common.


That has the chance of considerably increasing the peak memory usage
though, as you obviously need both the old and new allocation during the
repalloc().

And in contrast to the unused memory at the tail of the array, which
will usually not be actually allocated by the OS at all, this is memory
that's actually read/written respectively.


That leaves me wondering why we bother with dynamic resizing in other 
areas (like sorts, for example) then? Why not just palloc work_mem and 
be done with it? What makes those cases different?



I've to say, I'm rather unconvinced that it's worth changing stuff
around here. If overcommit is enabled, vacuum won't fail unless the
memory is actually used (= no problem). If overcommit is disabled and
you get memory allocations, you're probably already running awfully
close to the maximum of your configuration and you're better off
adjusting it.  I'm not aware of any field complaints about this and thus
I'm not sure it's worth fiddling with this.


Perhaps; Noah seems to be the one one who's seen this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Noah Misch
On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
 On 3/4/15 9:10 AM, Robert Haas wrote:
 On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 Could the large allocation[2] for the dead tuple array in lazy_space_alloc
 cause problems with linux OOM? [1] and some other things I've read indicate
 that a large mmap will count towards total system memory, including
 producing a failure if overcommit is disabled.
 
 I believe that this is possible.

I have seen that in the field, albeit on a server with a 10 GiB allocation
limit, years ago.

 Would it be worth avoiding the full size allocation when we can?
 
 Maybe.  I'm not aware of any evidence that this is an actual problem
 as opposed to a theoretical one.  vacrelstats-dead_tuples is limited
 to a 1GB allocation, which is not a trivial amount of memory, but it's
 not huge, either.  But we could consider changing the representation
 from a single flat array to a list of chunks, with each chunk capped
 at say 64MB.  That would not only reduce the amount of memory that we
 
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

+1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
growth at vac_work_mem.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
 I was thinking the simpler route of just repalloc'ing... the memcpy would
 suck, but much less so than the extra index pass. 64M gets us 11M tuples,
 which probably isn't very common.

 +1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
 growth at vac_work_mem.

+1 for repalloc'ing at need, but I'm not sure about the start far below
64 MiB part.  64MB is a pretty small amount on nearly any machine these
days (and for anybody who thinks it isn't, that's why maintenance_work_mem
is a tunable).  I think min(64MB, vac_work_mem) might be a reasonable
start point.

A different line of thought is that it would seem to make sense to have
the initial allocation vary depending on the relation size.  For instance,
you could assume there might be 10 dead tuples per page, and hence try to
alloc that much if it fits in vac_work_mem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-06 Thread Noah Misch
On Sat, Mar 07, 2015 at 12:46:42AM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Thu, Mar 05, 2015 at 03:28:12PM -0600, Jim Nasby wrote:
  I was thinking the simpler route of just repalloc'ing... the memcpy would
  suck, but much less so than the extra index pass. 64M gets us 11M tuples,
  which probably isn't very common.
 
  +1.  Start far below 64 MiB; grow geometrically using repalloc_huge(); cap
  growth at vac_work_mem.
 
 +1 for repalloc'ing at need, but I'm not sure about the start far below
 64 MiB part.  64MB is a pretty small amount on nearly any machine these
 days (and for anybody who thinks it isn't, that's why maintenance_work_mem
 is a tunable).

True; nothing would explode, especially since the allocation would be strictly
smaller than it is today.  However, I can't think of a place in PostgreSQL
where a growable allocation begins so aggressively, nor a reason to break new
ground in that respect.  For comparison, tuplestore/tuplesort start memtupsize
at 1 KiB.  (One could make a separate case for that practice being wrong.)

 A different line of thought is that it would seem to make sense to have
 the initial allocation vary depending on the relation size.  For instance,
 you could assume there might be 10 dead tuples per page, and hence try to
 alloc that much if it fits in vac_work_mem.

Sounds better than a fixed 64 MiB start, though I'm not sure it's better than
a fixed 256 KiB start.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-05 Thread Jim Nasby

On 3/4/15 9:10 AM, Robert Haas wrote:

On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote:

Could the large allocation[2] for the dead tuple array in lazy_space_alloc
cause problems with linux OOM? [1] and some other things I've read indicate
that a large mmap will count towards total system memory, including
producing a failure if overcommit is disabled.


I believe that this is possible.


Would it be worth avoiding the full size allocation when we can?


Maybe.  I'm not aware of any evidence that this is an actual problem
as opposed to a theoretical one.  vacrelstats-dead_tuples is limited
to a 1GB allocation, which is not a trivial amount of memory, but it's
not huge, either.  But we could consider changing the representation
from a single flat array to a list of chunks, with each chunk capped
at say 64MB.  That would not only reduce the amount of memory that we


I was thinking the simpler route of just repalloc'ing... the memcpy 
would suck, but much less so than the extra index pass. 64M gets us 11M 
tuples, which probably isn't very common.



needlessly allocate, but would allow autovacuum to make use of more
than 1GB of maintenance_work_mem, which it looks like it currently
can't.  I'm not sure that's a huge problem right now either, because


I'm confused... how autovacuum is special in this regard? Each worker 
can use up to 1G, just like a regular vacuum, right? Or are you just 
saying getting rid of the 1G limit would be good?



it's probably rare to vacuum at able with more than 1GB / 6 =
178,956,970 dead tuples in it, but it would certainly suck if you did
and if the current 1GB limit forced you to do multiple vacuum passes.


Yeah, with 100+ GB machines not that uncommon today perhaps it's worth 
significantly upping this.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-04 Thread Robert Haas
On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 Could the large allocation[2] for the dead tuple array in lazy_space_alloc
 cause problems with linux OOM? [1] and some other things I've read indicate
 that a large mmap will count towards total system memory, including
 producing a failure if overcommit is disabled.

I believe that this is possible.

 Would it be worth avoiding the full size allocation when we can?

Maybe.  I'm not aware of any evidence that this is an actual problem
as opposed to a theoretical one.  vacrelstats-dead_tuples is limited
to a 1GB allocation, which is not a trivial amount of memory, but it's
not huge, either.  But we could consider changing the representation
from a single flat array to a list of chunks, with each chunk capped
at say 64MB.  That would not only reduce the amount of memory that we
needlessly allocate, but would allow autovacuum to make use of more
than 1GB of maintenance_work_mem, which it looks like it currently
can't.  I'm not sure that's a huge problem right now either, because
it's probably rare to vacuum at able with more than 1GB / 6 =
178,956,970 dead tuples in it, but it would certainly suck if you did
and if the current 1GB limit forced you to do multiple vacuum passes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers