Re: [HACKERS] optimizing vacuum truncation scans

2015-09-28 Thread Robert Haas
On Mon, Sep 28, 2015 at 2:05 AM, Haribabu Kommi
 wrote:
> I got the following way to get the whether data file is present in the
> DISK or SSD.
>
> 1. Get the device file system that table data file is mapped using the
> following or similar.
>
> df -P "filename" | awk 'NR==2{print $1}'
>
> 2. if the device file system is of type /dev/sd* then treat is as a
> disk system and proceed
> with the prefetch optimization.
>
> 3. if we are not able to find the device details directly then we need
> to get the information
> from the mapping system.
>
> Usually the devices will map like the following
>
> /dev/mapper/v** points to ../dm-*
>
> 4. Get the name of the "dm-*"  from the above details and check
> whether it is a SSD or not
> with the following command.
>
> /sys/block/dm-*/queue/rotation
>
> 5. If the value is 0 then it is an SSD drive, 1 means disk drive.
>
> The described above procedure works only for linux. I didn't check for
> other operating systems yet.
> Is it worth to consider?

No.  If we need to have the behavior depend on the hardware, it should
be a GUC or tablespace option or reloption, not some kind of crazy
OS-dependent discovery.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-09-28 Thread Haribabu Kommi
On Tue, Aug 4, 2015 at 2:18 AM, Jeff Janes  wrote:
> On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs  wrote:
>>
>> On 22 July 2015 at 17:11, Jeff Janes  wrote:
>>>
>>> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas 
>>> wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes 
 wrote:
 > Attached is a patch that implements the vm scan for truncation.  It
 > introduces a variable to hold the last blkno which was skipped during
 > the
 > forward portion.  Any blocks after both this blkno and after the last
 > inspected nonempty page (which the code is already tracking) must have
 > been
 > observed to be empty by the current vacuum.  Any other process
 > rendering the
 > page nonempty are required to clear the vm bit, and no other process
 > can set
 > the bit again during the vacuum's lifetime.  So if the bit is still
 > set, the
 > page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.
>>>
>>>
>>> I wouldn't say forever, as it would be easy to revert the change if
>>> something more important came along that conflicted with it.
>>
>>
>> I think what is being said here is that someone is already using this
>> technique, or if not, then we actively want to encourage them to do so as an
>> extension or as a submission to core.
>>
>> In that case, I think the rely-on-VM technique sinks again, sorry Jim,
>> Jeff. Probably needs code comments added.
>
>
> Sure, that sounds like the consensus.  The VM method was very efficient, but
> I agree it is pretty fragile and restricting.
>
>>
>>
>> That does still leave the prefetch technique, so all is not lost.
>>
>> Can we see a patch with just prefetch, probably with a simple choice of
>> stride? Thanks.
>
>
> I probably won't get back to it this commit fest, so it can be set to
> returned with feedback.  But if anyone has good ideas for how to set the
> stride (or detect that it is on SSD and so is pointless to try) I'd love to
> hear about them anytime.

I got the following way to get the whether data file is present in the
DISK or SSD.

1. Get the device file system that table data file is mapped using the
following or similar.

df -P "filename" | awk 'NR==2{print $1}'

2. if the device file system is of type /dev/sd* then treat is as a
disk system and proceed
with the prefetch optimization.

3. if we are not able to find the device details directly then we need
to get the information
from the mapping system.

Usually the devices will map like the following

/dev/mapper/v** points to ../dm-*

4. Get the name of the "dm-*"  from the above details and check
whether it is a SSD or not
with the following command.

/sys/block/dm-*/queue/rotation

5. If the value is 0 then it is an SSD drive, 1 means disk drive.

The described above procedure works only for linux. I didn't check for
other operating systems yet.
Is it worth to consider?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-08-12 Thread Simon Riggs
On 3 August 2015 at 17:18, Jeff Janes jeff.ja...@gmail.com wrote:


 That does still leave the prefetch technique, so all is not lost.

 Can we see a patch with just prefetch, probably with a simple choice of
 stride? Thanks.


 I probably won't get back to it this commit fest, so it can be set to
 returned with feedback.  But if anyone has good ideas for how to set the
 stride (or detect that it is on SSD and so is pointless to try) I'd love to
 hear about them anytime.


I've Returned With Feedback, as you suggest.

Given your earlier numbers, I'd just pick a constant value of 128, to keep
it simple. That balances out the various factors of small/large tables and
the uncertainty that we will be able to truncate the whole chunk of blocks.
I'd like to see tests on SSD before commit, but I hope and expect the
slightly negative results with SSD won't be substantiated on larger tests.

Kept simple, its a patch with a clear win in a restricted use case and I
would be happy to commit that sometime in the future.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] optimizing vacuum truncation scans

2015-08-03 Thread Jeff Janes
On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 22 July 2015 at 17:11, Jeff Janes jeff.ja...@gmail.com wrote:

 On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com
 wrote:
  Attached is a patch that implements the vm scan for truncation.  It
  introduces a variable to hold the last blkno which was skipped during
 the
  forward portion.  Any blocks after both this blkno and after the last
  inspected nonempty page (which the code is already tracking) must have
 been
  observed to be empty by the current vacuum.  Any other process
 rendering the
  page nonempty are required to clear the vm bit, and no other process
 can set
  the bit again during the vacuum's lifetime.  So if the bit is still
 set, the
  page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.


 I wouldn't say forever, as it would be easy to revert the change if
 something more important came along that conflicted with it.


 I think what is being said here is that someone is already using this
 technique, or if not, then we actively want to encourage them to do so as
 an extension or as a submission to core.

 In that case, I think the rely-on-VM technique sinks again, sorry Jim,
 Jeff. Probably needs code comments added.


Sure, that sounds like the consensus.  The VM method was very efficient,
but I agree it is pretty fragile and restricting.



 That does still leave the prefetch technique, so all is not lost.

 Can we see a patch with just prefetch, probably with a simple choice of
 stride? Thanks.


I probably won't get back to it this commit fest, so it can be set to
returned with feedback.  But if anyone has good ideas for how to set the
stride (or detect that it is on SSD and so is pointless to try) I'd love to
hear about them anytime.

Cheers,

Jeff


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Robert Haas
On Sat, Jul 25, 2015 at 4:46 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 22 July 2015 at 14:59, Robert Haas robertmh...@gmail.com wrote:
 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.

 What is the reason why we don't do that already? Surely its a one liner?

I wish!

It actually doesn't look simple to me to modify heap_page_prune to
know whether any not-all-visible items remain at the end.  It's
definitely true that, in order for marking the page all-visible to be
a possibility, we need to reach heap_page_prune_execute() with ndead
== 0.  But that's not enough: we also need to know that any tuple that
survived the prune operation (that is, it wasn't redirected or marked
unused) has a new-enough xmin, which isn't tracked anywhere.
vacuumlazy.c solves that problem by making a second pass over all the
tuples on the page to determine whether the page is all-visible, but
that seems pretty inefficient and I think heap_page_prune() is already
a CPU hog on some workloads.  (Didn't Heikki have a patch to improve
that?)

Another sticky wicket is that, as things stand today, we'd have to
insert a second WAL record to mark the page all-visible, which would
add probably-significant overhead.  We could probably modify the
format of the record we're already emitting for pruning to carry a
flag indicating whether to additionally set PD_ALL_VISIBLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Simon Riggs
On 22 July 2015 at 17:11, Jeff Janes jeff.ja...@gmail.com wrote:

 On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  Attached is a patch that implements the vm scan for truncation.  It
  introduces a variable to hold the last blkno which was skipped during
 the
  forward portion.  Any blocks after both this blkno and after the last
  inspected nonempty page (which the code is already tracking) must have
 been
  observed to be empty by the current vacuum.  Any other process
 rendering the
  page nonempty are required to clear the vm bit, and no other process
 can set
  the bit again during the vacuum's lifetime.  So if the bit is still
 set, the
  page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.


 I wouldn't say forever, as it would be easy to revert the change if
 something more important came along that conflicted with it.


I think what is being said here is that someone is already using this
technique, or if not, then we actively want to encourage them to do so as
an extension or as a submission to core.

In that case, I think the rely-on-VM technique sinks again, sorry Jim,
Jeff. Probably needs code comments added.

That does still leave the prefetch technique, so all is not lost.

Can we see a patch with just prefetch, probably with a simple choice of
stride? Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Jim Nasby

On 7/27/15 10:39 AM, Robert Haas wrote:

But that's not enough: we also need to know that any tuple that
survived the prune operation (that is, it wasn't redirected or marked
unused) has a new-enough xmin, which isn't tracked anywhere.


Wouldn't that be old-enough xmin?

heap_prune_chain() is already calling HeapTupleSatisfiesVacuum, so it 
should be able to figure out if the page is all visible without a lot of 
extra work, and pass that info back to heap_page_prune (which would then 
pass it down to _execute()).


Definitely not a one-liner though.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-25 Thread Simon Riggs
On 22 July 2015 at 14:59, Robert Haas robertmh...@gmail.com wrote:


 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.


What is the reason why we don't do that already? Surely its a one liner?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-24 Thread Jim Nasby

On 7/22/15 11:11 AM, Jeff Janes wrote:

I guess add an Assert(InRecovery || running_a_vacuum); to
the visibilitymap_set with a comment there, except that I don't know how
to implement running_a_vacuum so that it covers manual vacs as well as
autovac.


static bool RunningVacuum; ? It only needs to be per-backend, right?

If it is that simple then my $0.02 is we should have the assert, not 
just a comment.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-23 Thread Robert Haas
On Wed, Jul 22, 2015 at 12:11 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.

 I wouldn't say forever, as it would be easy to revert the change if
 something more important came along that conflicted with it.

Well, the people who want this change would likely object to reverting it later.

 I don't think
 this change would grow tentacles across the code that make it hard to
 revert, you would just have to take the performance hit (and by that time,
 maybe HDD will truly be dead anyway and so we don't care anymore). But yes,
 that is definitely a downside.  HOT pruning is one example, but also one
 could envision having someone (bgwriter?) set vm bits on unindexed tables.
 Or if we invent some efficient way to know that no expiring tids for a
 certain block range are stored in indexes, other jobs could also set the vm
 bit on indexed tables.  Or parallel vacuums in the same table, not that I
 really see a reason to have those.

Yes, good points.  (I think we will need parallel vacuum to cope with
TB-sized tables, but that's another story.)

 At the least we'd better document that carefully
 so that nobody breaks it later.  But I wonder if there isn't some
 better approach, because I would certainly rather that we didn't
 foreclose the possibility of doing something like that in the future.

 But where do we document it (other than in-place)?  README.HOT doesn't seem
 sufficient, and there is no README.vm.

visibilitymap_set?

 I guess add an Assert(InRecovery || running_a_vacuum); to the
 visibilitymap_set with a comment there, except that I don't know how to
 implement running_a_vacuum so that it covers manual vacs as well as autovac.
 Perhaps assert that we hold a SHARE UPDATE EXCLUSIVE on rel?

I care more about the comment than I do about the Assert().

 The advantage of the other approach, just force kernel read-ahead to work
 for us, is that it doesn't impose any of these restrictions on future
 development.  The disadvantage is that I don't know how to auto-tune it, or
 auto-disable it for SSD, and it will never be as quite as efficient.

I have to say that I like the prefetch approach.  I don't know what to
do about the fact that it loses to the VM-bit based approach, but I
think it's a bad bet that we will never care about setting visibility
map bits anyplace other than VACUUM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-23 Thread Andres Freund
On 2015-07-23 11:06:25 -0400, Robert Haas wrote:
 I don't know what to do about the fact that it loses to the VM-bit
 based approach, but I think it's a bad bet that we will never care
 about setting visibility map bits anyplace other than VACUUM.

+1

A background process that tries to hint all pages shortly before they're
written out would be truly awesome. And would very likely set hint
bits...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-22 Thread Jeff Janes
On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  Attached is a patch that implements the vm scan for truncation.  It
  introduces a variable to hold the last blkno which was skipped during the
  forward portion.  Any blocks after both this blkno and after the last
  inspected nonempty page (which the code is already tracking) must have
 been
  observed to be empty by the current vacuum.  Any other process rendering
 the
  page nonempty are required to clear the vm bit, and no other process can
 set
  the bit again during the vacuum's lifetime.  So if the bit is still set,
 the
  page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.


I wouldn't say forever, as it would be easy to revert the change if
something more important came along that conflicted with it.  I don't think
this change would grow tentacles across the code that make it hard to
revert, you would just have to take the performance hit (and by that time,
maybe HDD will truly be dead anyway and so we don't care anymore). But yes,
that is definitely a downside.  HOT pruning is one example, but also one
could envision having someone (bgwriter?) set vm bits on unindexed tables.
Or if we invent some efficient way to know that no expiring tids for a
certain block range are stored in indexes, other jobs could also set the vm
bit on indexed tables.  Or parallel vacuums in the same table, not that I
really see a reason to have those.


 At the least we'd better document that carefully
 so that nobody breaks it later.  But I wonder if there isn't some
 better approach, because I would certainly rather that we didn't
 foreclose the possibility of doing something like that in the future.


But where do we document it (other than in-place)?  README.HOT doesn't seem
sufficient, and there is no README.vm.

I guess add an Assert(InRecovery || running_a_vacuum); to
the visibilitymap_set with a comment there, except that I don't know how to
implement running_a_vacuum so that it covers manual vacs as well as
autovac.  Perhaps assert that we hold a SHARE UPDATE EXCLUSIVE on rel?

The advantage of the other approach, just force kernel read-ahead to work
for us, is that it doesn't impose any of these restrictions on future
development.  The disadvantage is that I don't know how to auto-tune it, or
auto-disable it for SSD, and it will never be as quite as efficient.

Cheers,

Jeff


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-22 Thread Robert Haas
On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 Attached is a patch that implements the vm scan for truncation.  It
 introduces a variable to hold the last blkno which was skipped during the
 forward portion.  Any blocks after both this blkno and after the last
 inspected nonempty page (which the code is already tracking) must have been
 observed to be empty by the current vacuum.  Any other process rendering the
 page nonempty are required to clear the vm bit, and no other process can set
 the bit again during the vacuum's lifetime.  So if the bit is still set, the
 page is still empty without needing to inspect it.

Urgh.  So if we do this, that forever precludes having HOT pruning set
the all-visible bit.  At the least we'd better document that carefully
so that nobody breaks it later.  But I wonder if there isn't some
better approach, because I would certainly rather that we didn't
foreclose the possibility of doing something like that in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-19 Thread Simon Riggs
On 16 July 2015 at 06:51, Haribabu Kommi kommi.harib...@gmail.com wrote:


 I marked the patch as ready for committer.


The most recent results seem to indicate that the prefetch isn't worth
pursuing, but the vm test is. Can someone repeat the perf tests on
something larger so we can see, when/if there is a benefit?

Jeff, can you add detailed comments to explain the theory of operation of
these patches? The patches add the code, but don't say why. We will forget,
so lets put the detail in there now please. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-16 Thread Amit Kapila
On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
 
  One case where this patch can behave differently then current
  code is, when before taking Exclusive lock on relation for truncation,
  if some backend clears the vm bit and then inserts-deletes-prune that
  page.  I think with patch it will not consider to truncate pages whereas
  current code will allow to truncate it as it re-verifies each page.  I
think
  such a case would be rare and we might not want to bother about it,
  but still worth to think about it once.

 Thanks for your review.

 corrected the code as instead of returning the blkno after visibility map
 check failure, the code just continue to verify the contents as
 earlier approach.


Okay, I think this should work.


 Here I attached updated patches,
 1) without prefetch logic.
 2) with combined vm and prefetch logic.


I think it is better to just get the first patch in as that in itself is a
clear win and then we can evaluate the second one (prefetching during
truncation) separately.  I think after first patch, the use case for doing
prefetch seems to be less beneficial and I think it could hurt by loading
not required pages (assume you have prefetched 32 pages and after
inspecting first page, it decides to quit the loop as that is non-empty page
and other is when it has prefetched just because for page the visibility map
bit was cleared, but for others it is set) in OS buffer cache.

Having said that, I am not against prefetching in this scenario as that
can help in more cases then it could hurt, but I think it will be better
to evaluate that separately.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-16 Thread Haribabu Kommi
On Thu, Jul 16, 2015 at 10:17 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jul 16, 2015 at 11:21 AM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:


 Here I attached updated patches,
 1) without prefetch logic.
 2) with combined vm and prefetch logic.


 I think it is better to just get the first patch in as that in itself is a
 clear win and then we can evaluate the second one (prefetching during
 truncation) separately.  I think after first patch, the use case for doing
 prefetch seems to be less beneficial and I think it could hurt by loading
 not required pages (assume you have prefetched 32 pages and after
 inspecting first page, it decides to quit the loop as that is non-empty page
 and other is when it has prefetched just because for page the visibility map
 bit was cleared, but for others it is set) in OS buffer cache.

Yes, in the above cases, the prefetch is an overhead. I am not sure
how frequently such
scenarios occur in real time scenarios.

 Having said that, I am not against prefetching in this scenario as that
 can help in more cases then it could hurt, but I think it will be better
 to evaluate that separately.

Yes, the prefetch patch works better in cases, where majorly the first
vacuum skips
the truncation because of lock waiters. If such cases are more in real
time scenarios,
then the prefetch patch is needed.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-15 Thread Haribabu Kommi
On Wed, Jul 15, 2015 at 11:19 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote:


 Attached is a patch that implements the vm scan for truncation.  It
 introduces a variable to hold the last blkno which was skipped during the
 forward portion.  Any blocks after both this blkno and after the last
 inspected nonempty page (which the code is already tracking) must have been
 observed to be empty by the current vacuum.  Any other process rendering the
 page nonempty are required to clear the vm bit, and no other process can set
 the bit again during the vacuum's lifetime.  So if the bit is still set, the
 page is still empty without needing to inspect it.


 One case where this patch can behave differently then current
 code is, when before taking Exclusive lock on relation for truncation,
 if some backend clears the vm bit and then inserts-deletes-prune that
 page.  I think with patch it will not consider to truncate pages whereas
 current code will allow to truncate it as it re-verifies each page.  I think
 such a case would be rare and we might not want to bother about it,
 but still worth to think about it once.

Thanks for your review.

corrected the code as instead of returning the blkno after visibility map
check failure, the code just continue to verify the contents as
earlier approach.


 Some minor points about patch:

 count_nondeletable_pages()
 {
 ..
 if (PageIsNew(page) || PageIsEmpty(page))
 {
 /* PageIsNew probably shouldn't happen... */
 UnlockReleaseBuffer(buf);
 continue;
 }
 ..
 }

 Why vmbuffer is not freed in above loop?


In the above check, we are just continuing to the next blkno, at the
end of the loop the vmbuffer
is freed.


 count_nondeletable_pages()
 {
 ..
 + /*
 + * Pages that were inspected and found to be empty by this vacuum run
 + * must still be empty if the vm bit is still set.  Only vacuum sets
 + * vm bits, and only one vacuum can exist in a table at one time.
 + */
 + trust_vm=vacrelstats-nonempty_pagesvacrelstats-skipped_pages ?
 + vacrelstats-nonempty_pages : vacrelstats-skipped_pages;

 ..
 }

 I think it is better to have spaces in above assignment statement
 (near '=' and near '')


corrected.

Here I attached updated patches,
1) without prefetch logic.
2) with combined vm and prefetch logic.

I marked the patch as ready for committer.


Regards,
Hari Babu
Fujitsu Australia


vac_trunc_trust_vm_v2.patch
Description: Binary data


vac_trunc_trust_vm_and_prefetch_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-15 Thread Amit Kapila
On Mon, Jun 29, 2015 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote:


 Attached is a patch that implements the vm scan for truncation.  It
introduces a variable to hold the last blkno which was skipped during the
forward portion.  Any blocks after both this blkno and after the last
inspected nonempty page (which the code is already tracking) must have been
observed to be empty by the current vacuum.  Any other process rendering
the page nonempty are required to clear the vm bit, and no other process
can set the bit again during the vacuum's lifetime.  So if the bit is still
set, the page is still empty without needing to inspect it.


One case where this patch can behave differently then current
code is, when before taking Exclusive lock on relation for truncation,
if some backend clears the vm bit and then inserts-deletes-prune that
page.  I think with patch it will not consider to truncate pages whereas
current code will allow to truncate it as it re-verifies each page.  I think
such a case would be rare and we might not want to bother about it,
but still worth to think about it once.

Some minor points about patch:

count_nondeletable_pages()
{
..
if (PageIsNew(page) || PageIsEmpty(page))
{
/* PageIsNew probably shouldn't happen... */
UnlockReleaseBuffer(buf);
continue;
}
..
}

Why vmbuffer is not freed in above loop?


count_nondeletable_pages()
{
..
+ /*
+  * Pages that were inspected and found to be empty by this vacuum run
+  * must still be empty if the vm bit is still set.  Only vacuum sets
+  * vm bits, and only one vacuum can exist in a table at one time.
+  */
+ trust_vm=vacrelstats-nonempty_pagesvacrelstats-skipped_pages ?
+ vacrelstats-nonempty_pages : vacrelstats-skipped_pages;

..
}

I think it is better to have spaces in above assignment statement
(near '=' and near '')


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-14 Thread Haribabu Kommi
On Mon, Jul 13, 2015 at 5:16 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
 kommi.harib...@gmail.com wrote:
 On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com 
 wrote:

 I will do some performance tests and send you the results.

 Here are the performance results tested on my machine.


  Head  vm patchvm+prefetch 
 patch

 First vacuum120sec1sec 1sec
 second vacuum180 sec   180 sec30 sec

 I did some modifications in the code to skip the vacuum truncation by
 the first vacuum command.
 This way I collected the second vacuum time taken performance.

 I just combined your vm and prefetch patch into a single patch
 vm+prefetch patch without a GUC.
 I kept the prefetch as 32 and did the performance test. I chosen
 prefetch based on the current
 buffer access strategy, which is 32 for vacuum presently instead of an
 user option.
 Here I attached the modified patch with both vm+prefetch logic.

 I will do some tests on a machine with SSD and let you know the
 results. Based on these results,
 we can decide whether we need a GUC or not? based on the impact of
 prefetch on ssd machines.

 Following are the performance readings on a machine with SSD.
 I increased the pgbench scale factor to 1000 in the test instead of 500
 to show a better performance numbers.

  Head   vm patchvm+prefetch patch

 First vacuum6.24 sec   2.91 sec   2.91 sec
 second vacuum6.66 sec   6.66 sec   7.19 sec

 There is a small performance impact on SSD with prefetch.

The above prefetch overhead is observed with prefeching of 1639345 pages.
I feel this overhead is small.

Hi Jeff,

If you are fine with earlier attached patch, then I will mark this patch as
ready for committer, to get some committer view on the patch.


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-13 Thread Haribabu Kommi
On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com 
 wrote:

 I will do some performance tests and send you the results.

 Here are the performance results tested on my machine.


  Head  vm patchvm+prefetch 
 patch

 First vacuum120sec1sec 1sec
 second vacuum180 sec   180 sec30 sec

 I did some modifications in the code to skip the vacuum truncation by
 the first vacuum command.
 This way I collected the second vacuum time taken performance.

 I just combined your vm and prefetch patch into a single patch
 vm+prefetch patch without a GUC.
 I kept the prefetch as 32 and did the performance test. I chosen
 prefetch based on the current
 buffer access strategy, which is 32 for vacuum presently instead of an
 user option.
 Here I attached the modified patch with both vm+prefetch logic.

 I will do some tests on a machine with SSD and let you know the
 results. Based on these results,
 we can decide whether we need a GUC or not? based on the impact of
 prefetch on ssd machines.

Following are the performance readings on a machine with SSD.
I increased the pgbench scale factor to 1000 in the test instead of 500
to show a better performance numbers.

 Head   vm patchvm+prefetch patch

First vacuum6.24 sec   2.91 sec   2.91 sec
second vacuum6.66 sec   6.66 sec   7.19 sec

There is a small performance impact on SSD with prefetch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-12 Thread Haribabu Kommi
On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com wrote:

 I will do some performance tests and send you the results.

Here are the performance results tested on my machine.


 Head  vm patchvm+prefetch patch

First vacuum120sec1sec 1sec
second vacuum180 sec   180 sec30 sec

I did some modifications in the code to skip the vacuum truncation by
the first vacuum command.
This way I collected the second vacuum time taken performance.

I just combined your vm and prefetch patch into a single patch
vm+prefetch patch without a GUC.
I kept the prefetch as 32 and did the performance test. I chosen
prefetch based on the current
buffer access strategy, which is 32 for vacuum presently instead of an
user option.
Here I attached the modified patch with both vm+prefetch logic.

I will do some tests on a machine with SSD and let you know the
results. Based on these results,
we can decide whether we need a GUC or not? based on the impact of
prefetch on ssd machines.

Regards,
Hari Babu
Fujitsu Australia


vac_trunc_trust_vm_and_prefetch.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-09 Thread Haribabu Kommi
On Thu, Jul 9, 2015 at 4:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Jul 8, 2015 at 9:46 PM, Haribabu Kommi kommi.harib...@gmail.com
 wrote:

 On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 
  There is still the case of pages which had their visibility bit set by a
  prior vacuum and then were not inspected by the current one.  Once the
  truncation scan runs into these pages, it falls back to the previous
  behavior of reading block by block backwards.  So there could still be
  reason to optimize that fallback using forward-reading prefetch.

 The case, I didn't understand is that, how the current vacuum misses
 the page which
 was set by the prior vacuum?


 The prior vacuum set them to all visible, but then doesn't delete them.
 Either because it got interrupted or because there were still some pages
 after them (at the time) that were not all empty.

 The current vacuum skips them entirely on the forward scan because it thinks
 it is waste of time to vacuum all visible pages.  Since it skips them, it
 doesn't know if they are empty as well as being all-visible.  There is no
 permanent indicator of the pages being all-empty, there is only the
 inference based on the current vacuum's counters and protected by the lock
 held on the table.



 The page should be counted either in skipped_pages or in
 nonempty_pages. Is it possible
 that a page doesn't comes under these two categories and it is not empty
 also?

 If the above doesn't exist, then we can directly truncate the relation
 from the highest block
 number of either nonempty_pages or skipped_pages to till the end of
 the relation.


 Right, and that is what this does (provided the vm bit is still set, so it
 does still have to loop over the vm to verify that it is still set, while it
 holds the access exclusive lock).

Thanks I got it. To re-verify the vm bit of a page after getting the
access exclusive lock,
we have to do the backward scan.

I also feel that your prefetch buffer patch needed to improve the
performance, as because
there is a need of backward scan to re-verify vm bit,

I will do some performance tests and send you the results.

 The problem is that the pages between the two counters are not known to be
 empty, and also not known to be nonempty.  Someone has to be willing to go
 check on those pages at some point, or else they will never be eligible for
 truncation.

Yes, there is no way to identify the page is empty or not without
reading the page.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-09 Thread Jeff Janes
On Wed, Jul 8, 2015 at 9:46 PM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

 On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 
  Attached is a patch that implements the vm scan for truncation.  It
  introduces a variable to hold the last blkno which was skipped during the
  forward portion.  Any blocks after both this blkno and after the last
  inspected nonempty page (which the code is already tracking) must have
 been
  observed to be empty by the current vacuum.  Any other process rendering
 the
  page nonempty are required to clear the vm bit, and no other process can
 set
  the bit again during the vacuum's lifetime.  So if the bit is still set,
 the
  page is still empty without needing to inspect it.

 The patch looks good and it improves the vacuum truncation speed
 significantly.

  There is still the case of pages which had their visibility bit set by a
  prior vacuum and then were not inspected by the current one.  Once the
  truncation scan runs into these pages, it falls back to the previous
  behavior of reading block by block backwards.  So there could still be
  reason to optimize that fallback using forward-reading prefetch.

 The case, I didn't understand is that, how the current vacuum misses
 the page which
 was set by the prior vacuum?


The prior vacuum set them to all visible, but then doesn't delete them.
Either because it got interrupted or because there were still some pages
after them (at the time) that were not all empty.

The current vacuum skips them entirely on the forward scan because it
thinks it is waste of time to vacuum all visible pages.  Since it skips
them, it doesn't know if they are empty as well as being all-visible.
There is no permanent indicator of the pages being all-empty, there is only
the inference based on the current vacuum's counters and protected by the
lock held on the table.



 The page should be counted either in skipped_pages or in
 nonempty_pages. Is it possible
 that a page doesn't comes under these two categories and it is not empty
 also?

 If the above doesn't exist, then we can directly truncate the relation
 from the highest block
 number of either nonempty_pages or skipped_pages to till the end of
 the relation.


Right, and that is what this does (provided the vm bit is still set, so it
does still have to loop over the vm to verify that it is still set, while
it holds the access exclusive lock).

The problem is that the pages between the two counters are not known to be
empty, and also not known to be nonempty.  Someone has to be willing to go
check on those pages at some point, or else they will never be eligible for
truncation.

Cheers,

Jeff


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-08 Thread Haribabu Kommi
On Mon, Jun 29, 2015 at 3:54 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 Attached is a patch that implements the vm scan for truncation.  It
 introduces a variable to hold the last blkno which was skipped during the
 forward portion.  Any blocks after both this blkno and after the last
 inspected nonempty page (which the code is already tracking) must have been
 observed to be empty by the current vacuum.  Any other process rendering the
 page nonempty are required to clear the vm bit, and no other process can set
 the bit again during the vacuum's lifetime.  So if the bit is still set, the
 page is still empty without needing to inspect it.

The patch looks good and it improves the vacuum truncation speed significantly.

 There is still the case of pages which had their visibility bit set by a
 prior vacuum and then were not inspected by the current one.  Once the
 truncation scan runs into these pages, it falls back to the previous
 behavior of reading block by block backwards.  So there could still be
 reason to optimize that fallback using forward-reading prefetch.

The case, I didn't understand is that, how the current vacuum misses
the page which
was set by the prior vacuum?

The page should be counted either in skipped_pages or in
nonempty_pages. Is it possible
that a page doesn't comes under these two categories and it is not empty also?

If the above doesn't exist, then we can directly truncate the relation
from the highest block
number of either nonempty_pages or skipped_pages to till the end of
the relation.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-05-27 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 As it is currently, it is not clear what purpose nonempty_pages serves.

It's to allow an early exit at vacuumlazy.c:271 in case there is no chance
of deleting a useful number of pages.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-05-26 Thread Jeff Janes
On Tue, May 26, 2015 at 12:37 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby jim.na...@bluetreble.com
 wrote:

 On 4/20/15 1:50 AM, Jeff Janes wrote:


  For that matter, why do we scan backwards anyway? The comments don't
 explain it, and we have nonempty_pages as a starting point, so why
 don't we just scan forward? I suspect that eons ago we didn't have
 that and just blindly reverse-scanned until we finally hit a
 non-empty buffer...


 nonempty_pages is not concurrency safe, as the pages could become used
 after vacuum passed them over but before the access exclusive lock was
 grabbed before the truncation scan.  But maybe the combination of the
 two?  If it is above nonempty_pages, then anyone who wrote into the page
 after vacuum passed it must have cleared the VM bit. And currently I
 think no one but vacuum ever sets VM bit back on, so once cleared it
 would stay cleared.


 Right.

  In any event nonempty_pages could be used to set the guess as to how
 many pages (if any) might be worth prefetching, as that is not needed
 for correctness.


 Yeah, but I think we'd do a LOT better with the VM idea, because we could
 immediately truncate without scanning anything.


 Right now all the interlocks to make this work seem to be in place (only
 vacuum and startup can set visibility map bits, and only one vacuum can be
 in a table at a time).  But as far as I can tell, those assumption are not
 baked in and we have pondered loosening them before.

 For example, letting HOT clean up mark a page as all-visible if it finds
 it be such.  Now in that specific case it would be OK, as HOT cleanup would
 not cause a page to become empty (or could it?  If an insert on a table
 with no indexes was rolled back, and hot clean up found it and cleaned it
 up, it could conceptually become empty--unless we make special code to
 prevent it) , and so the page would have to be below nonempty_pages.  But
 there may be other cases.

 And I know other people have mentioned making VACUUM concurrent (although
 I don't see the value in that myself).

 So doing it this way would be hard to beat (scanning a bitmap vs the table
 itself), but it would also introduce a modularity violation that I am not
 sure is worth it.

 Of course this could always be reverted if its requirements became a
 problem for a more important change (assuming of course that we detected
 the problem)


The fatal problem here is that nonempty_pages is unreliable.  If vacuum
skips all-visible pages, it doesn't necessarily increment nonempty_pages
beyond that skippage.  So if you just rely on nonempty_pages, you will
truncate away pages that were already all visible but are not empty.  If we
changed it so that it did increment nonempty_pages past the skipped ones,
then pages which were all empty and got marked as all visible without being
truncated (say, because a lock could not be acquired, or because there was
a non-empty page after them which later became empty), then those pages
would never get truncated away.

As it is currently, it is not clear what purpose nonempty_pages serves.  It
is a guardian value which doesn't seem to actually guard anything.  At best
it prevents you from needing to inspect one page (the page of the guardian
value itself) to see if that page is actually empty, and finding that it is
not.  That hardly seems worthwhile.

We could adopt two nonempty_pages counters, once that fails low on skipped
all-visible pages, and one that failed high on them.  And then fast
truncate down to the high one, and do the current page by page scan between
the low and high. That seems rather grotesque.

Cheers,

Jeff


Re: [HACKERS] optimizing vacuum truncation scans

2015-05-26 Thread Jeff Janes
On Mon, Apr 20, 2015 at 10:18 AM, Jim Nasby jim.na...@bluetreble.com
wrote:

 On 4/20/15 1:50 AM, Jeff Janes wrote:


  For that matter, why do we scan backwards anyway? The comments don't
 explain it, and we have nonempty_pages as a starting point, so why
 don't we just scan forward? I suspect that eons ago we didn't have
 that and just blindly reverse-scanned until we finally hit a
 non-empty buffer...


 nonempty_pages is not concurrency safe, as the pages could become used
 after vacuum passed them over but before the access exclusive lock was
 grabbed before the truncation scan.  But maybe the combination of the
 two?  If it is above nonempty_pages, then anyone who wrote into the page
 after vacuum passed it must have cleared the VM bit. And currently I
 think no one but vacuum ever sets VM bit back on, so once cleared it
 would stay cleared.


 Right.

  In any event nonempty_pages could be used to set the guess as to how
 many pages (if any) might be worth prefetching, as that is not needed
 for correctness.


 Yeah, but I think we'd do a LOT better with the VM idea, because we could
 immediately truncate without scanning anything.


Right now all the interlocks to make this work seem to be in place (only
vacuum and startup can set visibility map bits, and only one vacuum can be
in a table at a time).  But as far as I can tell, those assumption are not
baked in and we have pondered loosening them before.

For example, letting HOT clean up mark a page as all-visible if it finds it
be such.  Now in that specific case it would be OK, as HOT cleanup would
not cause a page to become empty (or could it?  If an insert on a table
with no indexes was rolled back, and hot clean up found it and cleaned it
up, it could conceptually become empty--unless we make special code to
prevent it) , and so the page would have to be below nonempty_pages.  But
there may be other cases.

And I know other people have mentioned making VACUUM concurrent (although I
don't see the value in that myself).

So doing it this way would be hard to beat (scanning a bitmap vs the table
itself), but it would also introduce a modularity violation that I am not
sure is worth it.

Of course this could always be reverted if its requirements became a
problem for a more important change (assuming of course that we detected
the problem)

Cheers,

Jeff


Re: [HACKERS] optimizing vacuum truncation scans

2015-04-20 Thread Jim Nasby

On 4/20/15 1:50 AM, Jeff Janes wrote:

Shouldn't completely empty pages be set as all-visible in the VM? If
so, can't we just find the largest not-all-visible page and move
forward from there, instead of moving backwards like we currently do?


If the entire table is all-visible, we would be starting from the
beginning, even though the beginning of the table still has read only
tuples present.


Except we'd use it in conjunction with nonempty_pages. IIRC, that's set 
to the last page that vacuum saw data on. If any page after that got 
written to after vacuum visited it, the VM bit would be cleared. So 
after we acquire the exclusive lock, AFAICT it's safe to just scan the 
VM starting with nonempty_pages.



For that matter, why do we scan backwards anyway? The comments don't
explain it, and we have nonempty_pages as a starting point, so why
don't we just scan forward? I suspect that eons ago we didn't have
that and just blindly reverse-scanned until we finally hit a
non-empty buffer...


nonempty_pages is not concurrency safe, as the pages could become used
after vacuum passed them over but before the access exclusive lock was
grabbed before the truncation scan.  But maybe the combination of the
two?  If it is above nonempty_pages, then anyone who wrote into the page
after vacuum passed it must have cleared the VM bit. And currently I
think no one but vacuum ever sets VM bit back on, so once cleared it
would stay cleared.


Right.


In any event nonempty_pages could be used to set the guess as to how
many pages (if any) might be worth prefetching, as that is not needed
for correctness.


Yeah, but I think we'd do a LOT better with the VM idea, because we 
could immediately truncate without scanning anything.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-04-20 Thread Qingqing Zhou
On Sun, Apr 19, 2015 at 7:09 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I did literally the simplest thing I could think of as a proof of concept 
 patch, to see if it would actually fix things.  I just jumped back a certain 
 number of blocks occasionally and prefetched them forward, then resumed the 
 regular backward scan.
IIRC, this patches introduces optional prefetch for the backward scan
where file system prefetch may not work well. The result is promising!

 Also, what would be the best way to drill a hole through bufmgr.c into md.c 
 so that the prefetch could specify an entire range, rather than looping over 
 each individual block?

Different from mdread(), which requires a buffer in argument.
Extending mdprefetch() looks natural and safe to me. This shall also
layout a foundation for other scan style optimizations.

 What would have to be done to detect people running on SSD and disable the 
 feature, if anything?

There are other call sites of prefetch - so this shall be a topic not
just for this optimization.

Regards,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-04-20 Thread Jeff Janes
On Sun, Apr 19, 2015 at 10:38 PM, Jim Nasby jim.na...@bluetreble.com
wrote:

 On 4/19/15 9:09 PM, Jeff Janes wrote:

 I did literally the simplest thing I could think of as a proof of
 concept patch, to see if it would actually fix things.  I just jumped
 back a certain number of blocks occasionally and prefetched them
 forward, then resumed the regular backward scan.  The patch and driving
 script are attached.


 Shouldn't completely empty pages be set as all-visible in the VM? If so,
 can't we just find the largest not-all-visible page and move forward from
 there, instead of moving backwards like we currently do?


If the entire table is all-visible, we would be starting from the
beginning, even though the beginning of the table still has read only
tuples present.


 For that matter, why do we scan backwards anyway? The comments don't
 explain it, and we have nonempty_pages as a starting point, so why don't we
 just scan forward? I suspect that eons ago we didn't have that and just
 blindly reverse-scanned until we finally hit a non-empty buffer...


nonempty_pages is not concurrency safe, as the pages could become used
after vacuum passed them over but before the access exclusive lock was
grabbed before the truncation scan.  But maybe the combination of the two?
If it is above nonempty_pages, then anyone who wrote into the page after
vacuum passed it must have cleared the VM bit. And currently I think no one
but vacuum ever sets VM bit back on, so once cleared it would stay cleared.

In any event nonempty_pages could be used to set the guess as to how many
pages (if any) might be worth prefetching, as that is not needed for
correctness.

Cheers,

Jeff


[HACKERS] optimizing vacuum truncation scans

2015-04-19 Thread Jeff Janes
After a large bulk load aborted near the end, I decided to vacuum the main
table so as to not leave a huge chunk of free space in the middle of it,
before re-running the bulk load.  This vacuum took a frustratingly long
time, as the backwards scan over the table to truncate the space did not
trigger read-ahead (or read-behind) in the file system.  (Which was ext4).

In the hind-sight, the best thing to do would have been to open a psql
session and take an access-share lock on the main table, so that it would
prevent the truncation from occurring. Leaving cleaned, reusable space at
the end of the table to be refilled upon the next (successful) run of the
bulk load.  But it would be nice if the system could do some optimization
for me, as hind sight never comes in time.

I did literally the simplest thing I could think of as a proof of concept
patch, to see if it would actually fix things.  I just jumped back a
certain number of blocks occasionally and prefetched them forward, then
resumed the regular backward scan.  The patch and driving script are
attached.

It was very successful in this test.  I've report the fastest time to do
the truncation for each setting of the prefetch size out of about 10 runs
each, after removing a handful of cases where buffer pins prevented it from
doing a full-size truncation.  (I used the minimum, because the server has
other duties what would occasionally kick in and cause a spike in time for
a few runs).  0 setting disables this feature.

 JJFETCH   trunc_time(s)  0 355.78  1 350.5  2 277.61  4 213.9  8 172.79  16
138.87  32 133.02  64 76.25  128 47.63  256 35.28  512 20.44  1024 18.6
2048 15.63  4096 15.34  8192 14.72  16384 13.45  32768 13.13

For context, the forward-scan part of this vacuum took about 80 sec (IQR 75
to 88).

Do we want something like this at all?  If so, what would we have to do to
implement it for real?  I thought that maybe the back-off amount could
start at one block and double each time, until it hits a maximum.  And that
the maximum would be either seg_size, or 1/8 of effective_cache_size,
whichever is lower.  (I don't think we would want to prefetch more than a
fraction of effective_cache_size, or else it could have already been
displaced by the time the scan gets back to it).

Also, what would be the best way to drill a hole through bufmgr.c into md.c
so that the prefetch could specify an entire range, rather than looping
over each individual block?

What would have to be done to detect people running on SSD and disable the
feature, if anything?

I'll add this to next commitfest as WIP patch.

Cheers,

Jeff


jjprefetch.patch
Description: Binary data


jjprefetch.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-04-19 Thread Jim Nasby

On 4/19/15 9:09 PM, Jeff Janes wrote:

I did literally the simplest thing I could think of as a proof of
concept patch, to see if it would actually fix things.  I just jumped
back a certain number of blocks occasionally and prefetched them
forward, then resumed the regular backward scan.  The patch and driving
script are attached.


Shouldn't completely empty pages be set as all-visible in the VM? If so, 
can't we just find the largest not-all-visible page and move forward 
from there, instead of moving backwards like we currently do?


For that matter, why do we scan backwards anyway? The comments don't 
explain it, and we have nonempty_pages as a starting point, so why don't 
we just scan forward? I suspect that eons ago we didn't have that and 
just blindly reverse-scanned until we finally hit a non-empty buffer...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers