subject:"GiST VACUUM"

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-12-08 Thread Peter Geoghegan

On Tue, Nov 30, 2021 at 5:09 PM Peter Geoghegan  wrote:
> Attached draft patch attempts to explain things in this area within
> the nbtree README. There is a much shorter comment about it within
> vacuumlazy.c. I am concerned about GiST index-only scans themselves,
> of course, but I discovered this issue when thinking carefully about
> the concurrency rules for VACUUM -- I think it's valuable to formalize
> and justify the general rules that index access methods must follow.

I pushed a commit that described how this works for nbtree, in the README file.

I think that there might be an even more subtle race condition in
nbtree itself, though, during recovery. We no longer do a "pin scan"
during recovery these days (see commits 9f83468b, 3e4b7d87, and
687f2cd7 for full information). I think that it might be necessary to
do that, just for the benefit of index-only scans -- if it's necessary
during original execution, then why not during recovery?

The work to remove "pin scans" was justified by pointing out that we
no longer use various kinds of snapshots during recovery, but it said
nothing about index-only scans, which need the TID recycling interlock
(i.e. need to hold onto a leaf page while accessing the heap in sync)
even with an MVCC snapshot. It's easy to imagine how it might have
been missed: nobody ever documented the general issue with index-only
scans, until now. Commit 2ed5b87f recognized they were unsafe for the
optimization that it added (to avoid blocking VACUUM), but never
explained why they were unsafe.

Going back to doing pin scans during recovery seems deeply
unappealing, especially to fix a totally narrow race condition.

-- 
Peter Geoghegan

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-30 Thread Peter Geoghegan

On Tue, Nov 30, 2021 at 5:09 PM Peter Geoghegan  wrote:
> I believe that there have been several historic reasons why we need a
> cleanup lock during nbtree VACUUM, and that there is only one
> remaining reason for it today. So the history is unusually complicated.

Minor correction: we actually also have to worry about plain index
scans that don't use an MVCC snapshot, which is possible within
nbtree. It's quite likely when using logical replication, actually.
See the patch for more.

Like with the index-only scan case, a non-MVCC snapshot + plain nbtree
index scan cannot rely on heap access within the index scan node -- it
won't reliably notice that any newer heap tuples (that are really the
result of concurrent TID recycling) are not actually visible to its
MVCC snapshot -- because there isn't an MVCC snapshot. The only
difference in the index-only scan scenario is that we use the
visibility map (not the heap) -- which is racey in a way that makes
our MVCC snapshot (IOSs always have an MVCC snapshot) an ineffective
protection.

In summary, to be safe against confusion from concurrent TID recycling
during index/index-only scans, we can do either of the following
things:

1. Hold a pin of our leaf page while accessing the heap -- that'll
definitely conflict with the cleanup lock that nbtree VACUUM will
inevitably try to acquire on our leaf page.

OR:

2. Hold an MVCC snapshot, AND do an actual heap page access during the
plain index scan -- do both together.

With approach 2, our plain index scan must determine visibility using
real XIDs (against something like a dirty snapshot), rather than using
a visibility map bit. That is also necessary because the VM might
become invalid or ambiguous, in a way that's clearly not possible when
looking at full heap tuple headers with XIDs -- concurrent recycling
becomes safe if we know that we'll reliably notice it and not give
wrong answers.

Does that make sense?

-- 
Peter Geoghegan

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-30 Thread Peter Geoghegan

On Fri, Nov 5, 2021 at 3:26 AM Andrey Borodin  wrote:
> > 4 нояб. 2021 г., в 20:58, Peter Geoghegan  написал(а):
> > That's a pretty unlikely scenario. And even
> > if it happened it would only happen once (until the next time we get
> > unlucky). What are the chances of somebody noticing a more or less
> > once-off, slightly wrong answer?
>
> I'd say next to impossible, yet not impossible. Or, perhaps, I do not see 
> protection from this.

I think that that's probably all correct -- I would certainly make the
same guess. It's very unlikely to happen, and when it does happen it
happens only once.

> Moreover there's a "microvacuum". It kills tuples with BUFFER_LOCK_SHARE. 
> AFAIU it should take cleanup lock on buffer too?

No, because there is no heap vacuuming involved (because that doesn't
happen outside lazyvacuum.c). The work that VACUUM does inside
lazy_vacuum_heap_rel() is part of the problem here -- we need an
interlock between that work, and index-only scans. Making LP_DEAD
items in heap pages LP_UNUSED is only ever possible during a VACUUM
operation (I'm sure you know why). AFAICT there would be no bug at all
without that detail.

I believe that there have been several historic reasons why we need a
cleanup lock during nbtree VACUUM, and that there is only one
remaining reason for it today. So the history is unusually complicated. But
AFAICT it's always some kind of "interlock with heapam VACUUM" issue,
with TID recycling, with no protection from our MVCC snapshot. I would
say that that's the "real problem" here, when I get to first principles.

Attached draft patch attempts to explain things in this area within
the nbtree README. There is a much shorter comment about it within
vacuumlazy.c. I am concerned about GiST index-only scans themselves,
of course, but I discovered this issue when thinking carefully about
the concurrency rules for VACUUM -- I think it's valuable to formalize
and justify the general rules that index access methods must follow.

We can talk about this some more in NYC. See you soon!
--
Peter Geoghegan
From ea6612300e010f1f2b02119b5a0de95e46d1157d Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 3 Nov 2021 14:38:15 -0700
Subject: [PATCH v1] nbtree README: Improve VACUUM interlock section.

Also document a related issue for index-only scans in vacuumlazy.c.

Author: Peter Geoghegan 
Discussion: https://postgr.es/m/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c |  10 ++
 src/backend/access/nbtree/README | 145 ---
 2 files changed, 75 insertions(+), 80 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f87..8bfe196bf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2384,6 +2384,16 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
  * LP_DEAD items on the page that were determined to be LP_DEAD items back
  * when the same page was visited by lazy_scan_prune() (i.e. those whose TID
  * was recorded in the dead_items array at the time).
+ *
+ * We can opportunistically set the visibility map bit for the page here,
+ * which is valuable when lazy_scan_prune couldn't earlier on, owing only to
+ * the fact that there were LP_DEAD items that we'll now mark as unused.  This
+ * is why index AMs that support index-only scans have to hold a pin on an
+ * index page as an interlock against VACUUM while accessing the visibility
+ * map (which is really just a dense summary of visibility information in the
+ * heap).  If they didn't do this then there would be rare race conditions
+ * where a heap TID that is actually dead appears alive to an unlucky
+ * index-only scan.
  */
 static int
 lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 2a7332d07..c6f04d856 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -89,25 +89,28 @@ Page read locks are held only for as long as a scan is examining a page.
 To minimize lock/unlock traffic, an index scan always searches a leaf page
 to identify all the matching items at once, copying their heap tuple IDs
 into backend-local storage.  The heap tuple IDs are then processed while
-not holding any page lock within the index.  We do continue to hold a pin
-on the leaf page in some circumstances, to protect against concurrent
-deletions (see below).  In this state the scan is effectively stopped
-"between" pages, either before or after the page it has pinned.  This is
-safe in the presence of concurrent insertions and even page splits, because
-items are never moved across pre-existing page boundaries --- so the scan
-cannot miss any items it should have seen, nor accidentally return the same
-item twice.  The scan must remember the page's right-link at the time it
-was scanned, since that is the page to

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-05 Thread Andrey Borodin




> 4 нояб. 2021 г., в 20:58, Peter Geoghegan  написал(а):
> That's a pretty unlikely scenario. And even
> if it happened it would only happen once (until the next time we get
> unlucky). What are the chances of somebody noticing a more or less
> once-off, slightly wrong answer?

I'd say next to impossible, yet not impossible. Or, perhaps, I do not see 
protection from this.

Moreover there's a "microvacuum". It kills tuples with BUFFER_LOCK_SHARE. AFAIU 
it should take cleanup lock on buffer too?

Best regards, Andrey Borodin.

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-04 Thread Peter Geoghegan

On Thu, Nov 4, 2021 at 8:52 AM Andrey Borodin  wrote:
> Let's enumerate steps how things can go wrong.
>
> Backend1: Index-Only scan returns tid and xs_hitup with index_tuple1 on 
> index_page1 pointing to heap_tuple1 on page1
>
> Backend2: Remove index_tuple1 and heap_tuple1
>
> Backend3: Mark page1 all-visible
> Backend1: Thinks that page1 is all-visible and shows index_tuple1 as visible
>
> To avoid this Backend1 must hold pin on index_page1 until it's done with 
> checking visibility, and Backend2 must do LockBufferForCleanup(index_page1). 
> Do I get things right?

Almost. Backend3 is actually Backend2 here (there is no 3) -- it runs
VACUUM throughout.

Note that it's not particularly likely that Backend2/VACUUM will "win"
this race, because it typically has to do much more work than
Backend1. It has to actually remove the index tuples from the leaf
page, then any other index work (for this and other indexes). Then it
has to arrive back in vacuumlazy.c to set the VM bit in
lazy_vacuum_heap_page(). That's a pretty unlikely scenario. And even
if it happened it would only happen once (until the next time we get
unlucky). What are the chances of somebody noticing a more or less
once-off, slightly wrong answer?

-- 
Peter Geoghegan

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-04 Thread Andrey Borodin

  04.11.2021, 04:33, "Peter Geoghegan" :But what about index-only scans, which GiST also supports? I thinkthat the rules are different there, even though index-only scans usean MVCC snapshot.Let's enumerate steps how things can go wrong.Backend1: Index-Only scan returns tid and xs_hitup with index_tuple1 on index_page1 pointing to heap_tuple1 on page1Backend2: Remove index_tuple1 and heap_tuple1Backend3: Mark page1 all-visibleBackend1: Thinks that page1 is all-visible and shows index_tuple1 as visible To avoid this Backend1 must hold pin on index_page1 until it's done with checking visibility, and Backend2 must do LockBufferForCleanup(index_page1). Do I get things right? Best regards, Andrey Borodin.

Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2021-11-03 Thread Peter Geoghegan

The code in gistvacuum.c is closely based on similar code in nbtree.c,
except that it only acquires an exclusive lock -- not a
super-exclusive lock. I suspect that that's because it seemed
unnecessary; nbtree plain index scans have their own special reasons
for this, that don't apply to GiST. Namely: nbtree plain index scans
that don't use an MVCC snapshot clearly need some other interlock to
protect against concurrent recycling of pointed-to-by-leaf-page TIDs.
And so as a general rule nbtree VACUUM needs a full
super-exclusive/cleanup lock, just in case there is a plain index scan
that uses some other kind of snapshot (logical replication, say).

To say the same thing another way: nbtree follows "the third rule"
described by "62.4. Index Locking Considerations" in the docs [1], but
GiST does not. The idea that GiST's behavior is okay here does seem
consistent with what the same docs go on to say about it: "When using
an MVCC-compliant snapshot, there is no problem because the new
occupant of the slot is certain to be too new to pass the snapshot
test".

But what about index-only scans, which GiST also supports? I think
that the rules are different there, even though index-only scans use
an MVCC snapshot.

The (admittedly undocumented) reason why we can never drop the leaf
page pin for an index-only scan in nbtree (but can do so for plain
index scans) also relates to heap interlocking -- but with a twist.
Here's the twist: the second heap pass by VACUUM can set visibility
map bits independently of the first (once LP_DEAD items from the first
pass over the heap are set to LP_UNUSED, which renders the page
all-visible) -- this all happens at the end of
lazy_vacuum_heap_page(). That's why index-only scans can't just assume
that VACUUM won't have deleted the TID from the leaf page they're
scanning immediately after they're done reading it. VACUUM could even
manage to set the visibility map bit for a relevant heap page inside
lazy_vacuum_heap_page(), before the index-only scan can read the
visibility map. If that is allowed to happen, the index-only would
give wrong answers if one of the TID references held in local memory
by the index-only scan happens to be marked LP_UNUSED inside
lazy_vacuum_heap_page(). IOW, it looks like we run the risk of a
concurrently recycled dead-to-everybody TID becoming visible during
GiST index-only scans, just because we have no interlock.

In summary:

UUIC this is only safe for nbtree because 1.) It acquires a
super-exclusive lock when vacuuming leaf pages, and 2.) Index-only
scans never drop their pin on the leaf page when accessing the
visibility map "in-sync" with the scan (of course we hope not to
access the heap proper at all for index-only scans). These precautions
are both necessary to make the race condition I describe impossible,
because they ensure that VACUUM cannot reach lazy_vacuum_heap_page()
until after our index-only scan reads the visibility map (and then has
to read the heap for at least that one dead-to-all TID, discovering
that the TID is dead to its snapshot). Why wouldn't GiST need to take
the same precautions, though?

[1] https://www.postgresql.org/docs/devel/index-locking.html
--
Peter Geoghegan

Re: Questions/Observations related to Gist vacuum

2020-01-12 Thread Dilip Kumar

On Thu, Jan 9, 2020 at 4:41 PM Mahendra Singh Thalor  wrote:
>
> On Mon, 9 Dec 2019 at 14:37, Amit Kapila  wrote:
> >
> > On Mon, Dec 9, 2019 at 2:27 PM Amit Kapila  wrote:
> > >
> > > I have modified the patch for the above points and additionally ran
> > > pgindent.  Let me know what you think about the attached patch?
> > >
> >
> > A new version with a slightly modified commit message.
>
> I reviewed v4 patch and below is the one review comment:
>
> + * These are used to memorize all internal and empty leaf pages. They are
> + * used for deleting all the empty pages.
>   */
> After dot, there should be 2 spaces. Earlier, there was 2 spaces.
>
> Other than that patch looks fine to me.
>
Thanks for the comment. Amit has already taken care of this before pushing it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2020-01-09 Thread Mahendra Singh Thalor

On Mon, 9 Dec 2019 at 14:37, Amit Kapila  wrote:
>
> On Mon, Dec 9, 2019 at 2:27 PM Amit Kapila 
wrote:
> >
> > I have modified the patch for the above points and additionally ran
> > pgindent.  Let me know what you think about the attached patch?
> >
>
> A new version with a slightly modified commit message.

I reviewed v4 patch and below is the one review comment:

+ * These are used to memorize all internal and empty leaf pages. They
are
+ * used for deleting all the empty pages.
  */
After dot, there should be 2 spaces. Earlier, there was 2 spaces.

Other than that patch looks fine to me.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-12-09 Thread Dilip Kumar

On Mon, Dec 9, 2019 at 2:37 PM Amit Kapila  wrote:
>
> On Mon, Dec 9, 2019 at 2:27 PM Amit Kapila  wrote:
> >
> > I have modified the patch for the above points and additionally ran
> > pgindent.  Let me know what you think about the attached patch?
> >
>
> A new version with a slightly modified commit message.

Your changes look fine to me.  Thanks!

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-12-09 Thread Amit Kapila

On Mon, Dec 9, 2019 at 2:27 PM Amit Kapila  wrote:
>
> I have modified the patch for the above points and additionally ran
> pgindent.  Let me know what you think about the attached patch?
>

A new version with a slightly modified commit message.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v4-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM.patch
Description: Binary data

Re: Questions/Observations related to Gist vacuum

2019-12-09 Thread Amit Kapila

On Fri, Oct 25, 2019 at 9:22 PM Masahiko Sawada  wrote:
>
> On Wed, Oct 23, 2019 at 8:14 PM Amit Kapila  wrote:
> >
> > On Tue, Oct 22, 2019 at 2:17 PM Dilip Kumar  wrote:
> > >
> > > On Tue, Oct 22, 2019 at 10:53 AM Amit Kapila  
> > > wrote:
> > >
> > > I have modified as we discussed.  Please take a look.
> > >
> >
> > Thanks, I haven't reviewed this yet, but it seems to be on the right
> > lines.  Sawada-San, can you please prepare the next version of the
> > parallel vacuum patch on top of this patch and enable parallel vacuum
> > for Gist indexes?
>
> Yeah I've sent the latest patch set that is built on top of this
> patch[1]. BTW I looked at this patch briefly but it looks good to me.
>

Today, I have looked at this patch and found a few things that need to
be changed:

1.
 static void gistvacuum_delete_empty_pages(IndexVacuumInfo *info,
-   GistBulkDeleteResult *stats);
-static bool gistdeletepage(IndexVacuumInfo *info, GistBulkDeleteResult *stats,
+   GistVacState *stats);

I think stats is not a good name for GistVacState.  How about vstate?

2.
+ /* we don't need the internal and empty page sets anymore */
+ MemoryContextDelete(vstate.page_set_context);

After memory context delete, we can reset this and other related
variables as we were doing without the patch.

3.  There are a couple of places in code (like comments, README) that
mentions the deletion of empty pages in the second stage of the
vacuum.  We should change all such places.

I have modified the patch for the above points and additionally ran
pgindent.  Let me know what you think about the attached patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v3-0001-Delete-empty-pages-in-each-pass-during-GIST-VACUUM.patch
Description: Binary data

Re: Questions/Observations related to Gist vacuum

2019-10-25 Thread Masahiko Sawada

On Wed, Oct 23, 2019 at 8:14 PM Amit Kapila  wrote:
>
> On Tue, Oct 22, 2019 at 2:17 PM Dilip Kumar  wrote:
> >
> > On Tue, Oct 22, 2019 at 10:53 AM Amit Kapila  
> > wrote:
> > >
> > > > Basically, only IndexBulkDeleteResult is now shared across the stage
> > > > so we can move all members to GistVacState and completely get rid of
> > > > GistBulkDeleteResult?
> > > >
> > >
> > > Yes, something like that would be better.  Let's try and see how it comes 
> > > out.
> > I have modified as we discussed.  Please take a look.
> >
>
> Thanks, I haven't reviewed this yet, but it seems to be on the right
> lines.  Sawada-San, can you please prepare the next version of the
> parallel vacuum patch on top of this patch and enable parallel vacuum
> for Gist indexes?

Yeah I've sent the latest patch set that is built on top of this
patch[1]. BTW I looked at this patch briefly but it looks good to me.

[1] 
https://www.postgresql.org/message-id/CAD21AoBMo9dr_QmhT%3DdKh7fmiq7tpx%2ByLHR8nw9i5NZ-SgtaVg%40mail.gmail.com

Regards,

--
Masahiko Sawada

Re: Questions/Observations related to Gist vacuum

2019-10-22 Thread Dilip Kumar

On Tue, Oct 22, 2019 at 10:53 AM Amit Kapila  wrote:
>
> On Tue, Oct 22, 2019 at 10:50 AM Dilip Kumar  wrote:
> >
> > On Tue, Oct 22, 2019 at 9:10 AM Amit Kapila  wrote:
> > >
> > > On Fri, Oct 18, 2019 at 4:51 PM Dilip Kumar  wrote:
> > > >
> > > > I have prepared a first version of the patch.  Currently, I am
> > > > performing an empty page deletion for all the cases.
> > > >
> > >
> > > Few comments:
> > > --
> > > 1.
> > > -/*
> > > - * State kept across vacuum stages.
> > > - */
> > >  typedef struct
> > >  {
> > > - IndexBulkDeleteResult stats; /* must be first */
> > > + IndexBulkDeleteResult *stats; /* kept across vacuum stages. */
> > >
> > >   /*
> > > - * These are used to memorize all internal and empty leaf pages in the 
> > > 1st
> > > - * vacuum stage.  They are used in the 2nd stage, to delete all the empty
> > > - * pages.
> > > + * These are used to memorize all internal and empty leaf pages. They are
> > > + * used for deleting all the empty pages.
> > >   */
> > >   IntegerSet *internal_page_set;
> > >   IntegerSet *empty_leaf_set;
> > >
> > > Now, if we don't want to share the remaining stats across
> > > gistbulkdelete and gistvacuumcleanup, isn't it better to keep the
> > > information of internal and empty leaf pages as part of GistVacState?
> >
> > Basically, only IndexBulkDeleteResult is now shared across the stage
> > so we can move all members to GistVacState and completely get rid of
> > GistBulkDeleteResult?
> >
>
> Yes, something like that would be better.  Let's try and see how it comes out.
I have modified as we discussed.  Please take a look.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v2-0001-delete-empty-page-in-gistbulkdelete.patch
Description: Binary data

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Amit Kapila

On Tue, Oct 22, 2019 at 10:50 AM Dilip Kumar  wrote:
>
> On Tue, Oct 22, 2019 at 9:10 AM Amit Kapila  wrote:
> >
> > On Fri, Oct 18, 2019 at 4:51 PM Dilip Kumar  wrote:
> > >
> > > I have prepared a first version of the patch.  Currently, I am
> > > performing an empty page deletion for all the cases.
> > >
> >
> > Few comments:
> > --
> > 1.
> > -/*
> > - * State kept across vacuum stages.
> > - */
> >  typedef struct
> >  {
> > - IndexBulkDeleteResult stats; /* must be first */
> > + IndexBulkDeleteResult *stats; /* kept across vacuum stages. */
> >
> >   /*
> > - * These are used to memorize all internal and empty leaf pages in the 1st
> > - * vacuum stage.  They are used in the 2nd stage, to delete all the empty
> > - * pages.
> > + * These are used to memorize all internal and empty leaf pages. They are
> > + * used for deleting all the empty pages.
> >   */
> >   IntegerSet *internal_page_set;
> >   IntegerSet *empty_leaf_set;
> >
> > Now, if we don't want to share the remaining stats across
> > gistbulkdelete and gistvacuumcleanup, isn't it better to keep the
> > information of internal and empty leaf pages as part of GistVacState?
>
> Basically, only IndexBulkDeleteResult is now shared across the stage
> so we can move all members to GistVacState and completely get rid of
> GistBulkDeleteResult?
>

Yes, something like that would be better.  Let's try and see how it comes out.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Dilip Kumar

On Tue, Oct 22, 2019 at 9:10 AM Amit Kapila  wrote:
>
> On Fri, Oct 18, 2019 at 4:51 PM Dilip Kumar  wrote:
> >
> > I have prepared a first version of the patch.  Currently, I am
> > performing an empty page deletion for all the cases.
> >
>
> Few comments:
> --
> 1.
> -/*
> - * State kept across vacuum stages.
> - */
>  typedef struct
>  {
> - IndexBulkDeleteResult stats; /* must be first */
> + IndexBulkDeleteResult *stats; /* kept across vacuum stages. */
>
>   /*
> - * These are used to memorize all internal and empty leaf pages in the 1st
> - * vacuum stage.  They are used in the 2nd stage, to delete all the empty
> - * pages.
> + * These are used to memorize all internal and empty leaf pages. They are
> + * used for deleting all the empty pages.
>   */
>   IntegerSet *internal_page_set;
>   IntegerSet *empty_leaf_set;
>
> Now, if we don't want to share the remaining stats across
> gistbulkdelete and gistvacuumcleanup, isn't it better to keep the
> information of internal and empty leaf pages as part of GistVacState?

Basically, only IndexBulkDeleteResult is now shared across the stage
so we can move all members to GistVacState and completely get rid of
GistBulkDeleteResult?

IndexBulkDeleteResult *stats
IntegerSet *internal_page_set;
IntegerSet *empty_leaf_set;
MemoryContext page_set_context;


> Also, I think it is better to call gistvacuum_delete_empty_pages from
> function gistvacuumscan as that will avoid it calling from multiple
> places.
Yeah we can do that.
>
> 2.
> - gist_stats->page_set_context = NULL;
> - gist_stats->internal_page_set = NULL;
> - gist_stats->empty_leaf_set = NULL;
>
> Why have you removed this initialization?
This was post-cleanup reset since we were returning the gist_stats so
it was better to clean up but now we are not returning it out so I
though we don't need to clean this.  But, I think now we can free the
memory gist_stats itself.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Amit Kapila

On Fri, Oct 18, 2019 at 4:51 PM Dilip Kumar  wrote:
>
> I have prepared a first version of the patch.  Currently, I am
> performing an empty page deletion for all the cases.
>

Few comments:
--
1.
-/*
- * State kept across vacuum stages.
- */
 typedef struct
 {
- IndexBulkDeleteResult stats; /* must be first */
+ IndexBulkDeleteResult *stats; /* kept across vacuum stages. */

  /*
- * These are used to memorize all internal and empty leaf pages in the 1st
- * vacuum stage.  They are used in the 2nd stage, to delete all the empty
- * pages.
+ * These are used to memorize all internal and empty leaf pages. They are
+ * used for deleting all the empty pages.
  */
  IntegerSet *internal_page_set;
  IntegerSet *empty_leaf_set;

Now, if we don't want to share the remaining stats across
gistbulkdelete and gistvacuumcleanup, isn't it better to keep the
information of internal and empty leaf pages as part of GistVacState?
Also, I think it is better to call gistvacuum_delete_empty_pages from
function gistvacuumscan as that will avoid it calling from multiple
places.

2.
- gist_stats->page_set_context = NULL;
- gist_stats->internal_page_set = NULL;
- gist_stats->empty_leaf_set = NULL;

Why have you removed this initialization?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Dilip Kumar

On Mon, Oct 21, 2019 at 2:58 PM Andrey Borodin  wrote:
>
>
>
> > 21 окт. 2019 г., в 11:12, Dilip Kumar  написал(а):
> >
> > On Mon, Oct 21, 2019 at 2:30 PM Andrey Borodin  wrote:
> >>
> >> I've took a look into the patch, and cannot understand one simple thing...
> >> We should not call gistvacuum_delete_empty_pages() for same gist_stats 
> >> twice.
> >> Another way once the function is called we should somehow update or zero 
> >> empty_leaf_set.
> >> Does this invariant hold in your patch?
> >>
> > Thanks for looking into the patch.   With this patch now
> > GistBulkDeleteResult is local to single gistbulkdelete call or
> > gistvacuumcleanup.  So now we are not sharing GistBulkDeleteResult,
> > across the calls so I am not sure how it will be called twice for the
> > same gist_stats?  I might be missing something here?
>
> Yes, you are right, sorry for the noise.
> Currently we are doing both gistvacuumscan() and 
> gistvacuum_delete_empty_pages() in both gistbulkdelete() and 
> gistvacuumcleanup(). Is it supposed to be so?

There was an issue discussed in parallel vacuum thread[1], and for
solving that it has been discussed in this thread[2] that we can
delete empty pages in bulkdelete phase itself.  But, that does not
mean that we can remove that from the gistvacuumcleanup phase.
Because if the gistbulkdelete is not at all called in the vacuum pass
then gistvacuumcleanup, will perform both gistvacuumscan and
gistvacuum_delete_empty_pages.  In short, In whichever pass, we detect
the empty page in the same pass we delete the empty page.

Functions gistbulkdelete() and gistvacuumcleanup() look very similar
and share some comments. This is what triggered my attention.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JEQ2y3uNucNopDjK8pse6xSe5%3D_oknoWfRQvAF%3DVqsBA%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/69EF7B88-F3E7-4E09-824D-694CF39E5683%40iki.fi

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Andrey Borodin




> 21 окт. 2019 г., в 11:12, Dilip Kumar  написал(а):
> 
> On Mon, Oct 21, 2019 at 2:30 PM Andrey Borodin  wrote:
>> 
>> I've took a look into the patch, and cannot understand one simple thing...
>> We should not call gistvacuum_delete_empty_pages() for same gist_stats twice.
>> Another way once the function is called we should somehow update or zero 
>> empty_leaf_set.
>> Does this invariant hold in your patch?
>> 
> Thanks for looking into the patch.   With this patch now
> GistBulkDeleteResult is local to single gistbulkdelete call or
> gistvacuumcleanup.  So now we are not sharing GistBulkDeleteResult,
> across the calls so I am not sure how it will be called twice for the
> same gist_stats?  I might be missing something here?

Yes, you are right, sorry for the noise.
Currently we are doing both gistvacuumscan() and 
gistvacuum_delete_empty_pages() in both gistbulkdelete() and 
gistvacuumcleanup(). Is it supposed to be so? Functions gistbulkdelete() and 
gistvacuumcleanup() look very similar and share some comments. This is what 
triggered my attention.

Thanks!

--
Andrey Borodin
Open source RDBMS development team leader
Yandex.Cloud

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Dilip Kumar

On Mon, Oct 21, 2019 at 2:30 PM Andrey Borodin  wrote:
>
> Hi!
>
> > 18 окт. 2019 г., в 13:21, Dilip Kumar  написал(а):
> >
> > On Fri, Oct 18, 2019 at 10:55 AM Amit Kapila  
> > wrote:
> >>
> >>
> >> I think we can do it in general as adding some check for parallel
> >> vacuum there will look bit hackish.
> > I agree with that point.
> > It is not clear if we get enough
> >> benefit by keeping it for cleanup phase of the index as discussed in
> >> emails above.  Heikki, others, let us know if you don't agree here.
> >
> > I have prepared a first version of the patch.  Currently, I am
> > performing an empty page deletion for all the cases.
>
> I've took a look into the patch, and cannot understand one simple thing...
> We should not call gistvacuum_delete_empty_pages() for same gist_stats twice.
> Another way once the function is called we should somehow update or zero 
> empty_leaf_set.
> Does this invariant hold in your patch?
>
Thanks for looking into the patch.   With this patch now
GistBulkDeleteResult is local to single gistbulkdelete call or
gistvacuumcleanup.  So now we are not sharing GistBulkDeleteResult,
across the calls so I am not sure how it will be called twice for the
same gist_stats?  I might be missing something here?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Andrey Borodin

Hi!

> 18 окт. 2019 г., в 13:21, Dilip Kumar  написал(а):
> 
> On Fri, Oct 18, 2019 at 10:55 AM Amit Kapila  wrote:
>> 
>> 
>> I think we can do it in general as adding some check for parallel
>> vacuum there will look bit hackish.
> I agree with that point.
> It is not clear if we get enough
>> benefit by keeping it for cleanup phase of the index as discussed in
>> emails above.  Heikki, others, let us know if you don't agree here.
> 
> I have prepared a first version of the patch.  Currently, I am
> performing an empty page deletion for all the cases.

I've took a look into the patch, and cannot understand one simple thing...
We should not call gistvacuum_delete_empty_pages() for same gist_stats twice.
Another way once the function is called we should somehow update or zero 
empty_leaf_set.
Does this invariant hold in your patch?

Best regards, Andrey Borodin.

Re: Questions/Observations related to Gist vacuum

2019-10-21 Thread Dilip Kumar

On Mon, Oct 21, 2019 at 11:23 AM Amit Kapila  wrote:
>
> On Fri, Oct 18, 2019 at 10:48 AM Amit Kapila  wrote:
> >
> > Thanks for the test.  It shows that prior to patch the memory was
> > getting leaked in TopTransactionContext during multi-pass vacuum and
> > after patch, there is no leak.  I will commit the patch early next
> > week unless Heikki or someone wants some more tests.
> >
>
> Pushed.
Thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-20 Thread Amit Kapila

On Fri, Oct 18, 2019 at 10:48 AM Amit Kapila  wrote:
>
> Thanks for the test.  It shows that prior to patch the memory was
> getting leaked in TopTransactionContext during multi-pass vacuum and
> after patch, there is no leak.  I will commit the patch early next
> week unless Heikki or someone wants some more tests.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-18 Thread Dilip Kumar

On Fri, Oct 18, 2019 at 10:55 AM Amit Kapila  wrote:
>
> On Fri, Oct 18, 2019 at 9:41 AM Dilip Kumar  wrote:
> >
> > On Wed, Oct 16, 2019 at 7:22 PM Heikki Linnakangas  wrote:
> > >
> > > On 16 October 2019 12:57:03 CEST, Amit Kapila  
> > > wrote:
> > > >On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas 
> > > >wrote:
> > > >> All things
> > > >> considered, I'm not sure which is better.
> > > >
> > > >Yeah, this is a tough call to make, but if we can allow it to delete
> > > >the pages in bulkdelete conditionally for parallel vacuum workers,
> > > >then it would be better.
> > >
> > > Yeah, if it's needed for parallel vacuum, maybe that tips the scale.
> >
> > Are we planning to do this only if it is called from parallel vacuum
> > workers or in general?
> >
>
> I think we can do it in general as adding some check for parallel
> vacuum there will look bit hackish.
I agree with that point.
 It is not clear if we get enough
> benefit by keeping it for cleanup phase of the index as discussed in
> emails above.  Heikki, others, let us know if you don't agree here.

I have prepared a first version of the patch.  Currently, I am
performing an empty page deletion for all the cases.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


delete_emptypages_in_gistbulkdelete_v1.patch
Description: Binary data

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Amit Kapila

On Fri, Oct 18, 2019 at 9:41 AM Dilip Kumar  wrote:
>
> On Wed, Oct 16, 2019 at 7:22 PM Heikki Linnakangas  wrote:
> >
> > On 16 October 2019 12:57:03 CEST, Amit Kapila  
> > wrote:
> > >On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas 
> > >wrote:
> > >> All things
> > >> considered, I'm not sure which is better.
> > >
> > >Yeah, this is a tough call to make, but if we can allow it to delete
> > >the pages in bulkdelete conditionally for parallel vacuum workers,
> > >then it would be better.
> >
> > Yeah, if it's needed for parallel vacuum, maybe that tips the scale.
>
> Are we planning to do this only if it is called from parallel vacuum
> workers or in general?
>

I think we can do it in general as adding some check for parallel
vacuum there will look bit hackish.  It is not clear if we get enough
benefit by keeping it for cleanup phase of the index as discussed in
emails above.  Heikki, others, let us know if you don't agree here.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Amit Kapila

On Fri, Oct 18, 2019 at 9:34 AM Dilip Kumar  wrote:
>
> On Thu, Oct 17, 2019 at 6:32 PM Dilip Kumar  wrote:
> >
> > On Thu, 17 Oct 2019, 14:59 Amit Kapila,  wrote:
> >>
> >> On Thu, Oct 17, 2019 at 1:47 PM Dilip Kumar  wrote:
> >> >
> >> > On Thu, Oct 17, 2019 at 12:27 PM Heikki Linnakangas  
> >> > wrote:
> >> > >
> >> > > Thanks! Looks good to me. Did either of you test it, though, with a
> >> > > multi-pass vacuum?
> >> >
> >> > From my side, I have tested it with the multi-pass vacuum using the
> >> > gist index and after the fix, it's using expected memory context.
> >> >
> >>
> >> I have also verified that, but I think what additionally we can test
> >> here is that without the patch it will leak the memory in
> >> TopTransactionContext (CurrentMemoryContext), but after patch it
> >> shouldn't leak it during multi-pass vacuum.
> >>
> >> Make sense to me, I will test that by tomorrow.
>
> I have performed the test to observe the memory usage in the
> TopTransactionContext using the MemoryContextStats function from gdb.
>
> For testing this, in gistvacuumscan every time, after it resets the
> page_set_context, I have collected the sample.  I have collected 3
> samples for both the head and the patch.  We can clearly see that on
> the head the memory is getting accumulated in the
> TopTransactionContext whereas with the patch there is no memory
> getting accumulated.
>

Thanks for the test.  It shows that prior to patch the memory was
getting leaked in TopTransactionContext during multi-pass vacuum and
after patch, there is no leak.  I will commit the patch early next
week unless Heikki or someone wants some more tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Dilip Kumar

On Wed, Oct 16, 2019 at 7:22 PM Heikki Linnakangas  wrote:
>
> On 16 October 2019 12:57:03 CEST, Amit Kapila  wrote:
> >On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas 
> >wrote:
> >> All things
> >> considered, I'm not sure which is better.
> >
> >Yeah, this is a tough call to make, but if we can allow it to delete
> >the pages in bulkdelete conditionally for parallel vacuum workers,
> >then it would be better.
>
> Yeah, if it's needed for parallel vacuum, maybe that tips the scale.

Are we planning to do this only if it is called from parallel vacuum
workers or in general?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Dilip Kumar

On Thu, Oct 17, 2019 at 6:32 PM Dilip Kumar  wrote:
>
> On Thu, 17 Oct 2019, 14:59 Amit Kapila,  wrote:
>>
>> On Thu, Oct 17, 2019 at 1:47 PM Dilip Kumar  wrote:
>> >
>> > On Thu, Oct 17, 2019 at 12:27 PM Heikki Linnakangas  
>> > wrote:
>> > >
>> > > On 17/10/2019 05:31, Amit Kapila wrote:
>> > > >
>> > > > The patch looks good to me.  I have slightly modified the comments and
>> > > > removed unnecessary initialization.
>> > > >
>> > > > Heikki, are you fine me committing and backpatching this to 12?  Let
>> > > > me know if you have a different idea to fix.
>> > >
>> > > Thanks! Looks good to me. Did either of you test it, though, with a
>> > > multi-pass vacuum?
>> >
>> > From my side, I have tested it with the multi-pass vacuum using the
>> > gist index and after the fix, it's using expected memory context.
>> >
>>
>> I have also verified that, but I think what additionally we can test
>> here is that without the patch it will leak the memory in
>> TopTransactionContext (CurrentMemoryContext), but after patch it
>> shouldn't leak it during multi-pass vacuum.
>>
>> Make sense to me, I will test that by tomorrow.

I have performed the test to observe the memory usage in the
TopTransactionContext using the MemoryContextStats function from gdb.

For testing this, in gistvacuumscan every time, after it resets the
page_set_context, I have collected the sample.  I have collected 3
samples for both the head and the patch.  We can clearly see that on
the head the memory is getting accumulated in the
TopTransactionContext whereas with the patch there is no memory
getting accumulated.

head:
TopTransactionContext: 1056832 total in 2 blocks; 3296 free (5
chunks); 1053536 used
  GiST VACUUM page set context: 112 total in 0 blocks (0 chunks); 0
free (0 chunks); 112 used
Grand total: 1056944 bytes in 2 blocks; 3296 free (5 chunks); 1053648 used

TopTransactionContext: 1089600 total in 4 blocks; 19552 free (14
chunks); 1070048 used
  GiST VACUUM page set context: 112 total in 0 blocks (0 chunks); 0
free (0 chunks); 112 used
Grand total: 1089712 bytes in 4 blocks; 19552 free (14 chunks); 1070160 used

TopTransactionContext: 1122368 total in 5 blocks; 35848 free (20
chunks); 1086520 used
  GiST VACUUM page set context: 112 total in 0 blocks (0 chunks); 0
free (0 chunks); 112 used
Grand total: 1122480 bytes in 5 blocks; 35848 free (20 chunks); 1086632 used


With Patch:
TopTransactionContext: 1056832 total in 2 blocks; 3296 free (1
chunks); 1053536 used
  GiST VACUUM page set context: 112 total in 0 blocks (0 chunks); 0
free (0 chunks); 112 used
Grand total: 1056944 bytes in 2 blocks; 3296 free (1 chunks); 1053648 used

TopTransactionContext: 1056832 total in 2 blocks; 3296 free (1
chunks); 1053536 used
  GiST VACUUM page set context: 112 total in 0 blocks (0 chunks); 0
free (0 chunks); 112 used
Grand total: 1056944 bytes in 2 blocks; 3296 free (1 chunks); 1053648 used

TopTransactionContext: 1056832 total in 2 blocks; 3296 free (1
chunks); 1053536 used
  GiST VACUUM page set context: 112 total in 0 blocks (0 chunks); 0
free (0 chunks); 112 used
Grand total: 1056944 bytes in 2 blocks; 3296 free (1 chunks); 1053648 used

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Dilip Kumar

On Thu, 17 Oct 2019, 14:59 Amit Kapila,  wrote:

> On Thu, Oct 17, 2019 at 1:47 PM Dilip Kumar  wrote:
> >
> > On Thu, Oct 17, 2019 at 12:27 PM Heikki Linnakangas 
> wrote:
> > >
> > > On 17/10/2019 05:31, Amit Kapila wrote:
> > > >
> > > > The patch looks good to me.  I have slightly modified the comments
> and
> > > > removed unnecessary initialization.
> > > >
> > > > Heikki, are you fine me committing and backpatching this to 12?  Let
> > > > me know if you have a different idea to fix.
> > >
> > > Thanks! Looks good to me. Did either of you test it, though, with a
> > > multi-pass vacuum?
> >
> > From my side, I have tested it with the multi-pass vacuum using the
> > gist index and after the fix, it's using expected memory context.
> >
>
> I have also verified that, but I think what additionally we can test
> here is that without the patch it will leak the memory in
> TopTransactionContext (CurrentMemoryContext), but after patch it
> shouldn't leak it during multi-pass vacuum.
>
> Make sense to me, I will test that by tomorrow.

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Amit Kapila

On Thu, Oct 17, 2019 at 1:47 PM Dilip Kumar  wrote:
>
> On Thu, Oct 17, 2019 at 12:27 PM Heikki Linnakangas  wrote:
> >
> > On 17/10/2019 05:31, Amit Kapila wrote:
> > >
> > > The patch looks good to me.  I have slightly modified the comments and
> > > removed unnecessary initialization.
> > >
> > > Heikki, are you fine me committing and backpatching this to 12?  Let
> > > me know if you have a different idea to fix.
> >
> > Thanks! Looks good to me. Did either of you test it, though, with a
> > multi-pass vacuum?
>
> From my side, I have tested it with the multi-pass vacuum using the
> gist index and after the fix, it's using expected memory context.
>

I have also verified that, but I think what additionally we can test
here is that without the patch it will leak the memory in
TopTransactionContext (CurrentMemoryContext), but after patch it
shouldn't leak it during multi-pass vacuum.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Dilip Kumar

On Thu, Oct 17, 2019 at 12:27 PM Heikki Linnakangas  wrote:
>
> On 17/10/2019 05:31, Amit Kapila wrote:
> > On Wed, Oct 16, 2019 at 11:20 AM Dilip Kumar  wrote:
> >>
> >> On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas  wrote:
> >>>
> >>> On 15/10/2019 09:37, Amit Kapila wrote:
>  While reviewing a parallel vacuum patch [1], we noticed a few things
>  about $SUBJECT implemented in commit -
>  7df159a620b760e289f1795b13542ed1b3e13b87.
> 
>  1. A new memory context GistBulkDeleteResult->page_set_context has
>  been introduced, but it doesn't seem to be used.
> >>>
> >>> Oops. internal_page_set and empty_leaf_set were supposed to be allocated
> >>> in that memory context. As things stand, we leak them until end of
> >>> vacuum, in a multi-pass vacuum.
> >>
> >> Here is a patch to fix this issue.
> >
> > The patch looks good to me.  I have slightly modified the comments and
> > removed unnecessary initialization.
> >
> > Heikki, are you fine me committing and backpatching this to 12?  Let
> > me know if you have a different idea to fix.
>
> Thanks! Looks good to me. Did either of you test it, though, with a
> multi-pass vacuum?

>From my side, I have tested it with the multi-pass vacuum using the
gist index and after the fix, it's using expected memory context.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-17 Thread Heikki Linnakangas


On 17/10/2019 05:31, Amit Kapila wrote:

On Wed, Oct 16, 2019 at 11:20 AM Dilip Kumar  wrote:


On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas  wrote:


On 15/10/2019 09:37, Amit Kapila wrote:

While reviewing a parallel vacuum patch [1], we noticed a few things
about $SUBJECT implemented in commit -
7df159a620b760e289f1795b13542ed1b3e13b87.

1. A new memory context GistBulkDeleteResult->page_set_context has
been introduced, but it doesn't seem to be used.


Oops. internal_page_set and empty_leaf_set were supposed to be allocated
in that memory context. As things stand, we leak them until end of
vacuum, in a multi-pass vacuum.


Here is a patch to fix this issue.


The patch looks good to me.  I have slightly modified the comments and
removed unnecessary initialization.

Heikki, are you fine me committing and backpatching this to 12?  Let
me know if you have a different idea to fix.


Thanks! Looks good to me. Did either of you test it, though, with a 
multi-pass vacuum?


- Heikki

Re: Questions/Observations related to Gist vacuum

2019-10-16 Thread Dilip Kumar

On Thu, Oct 17, 2019 at 9:15 AM Amit Kapila  wrote:
>
> On Wed, Oct 16, 2019 at 7:21 PM Heikki Linnakangas  wrote:
> >
> > On 16 October 2019 12:57:03 CEST, Amit Kapila  
> > wrote:
> > >On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas 
> > >wrote:
> > >> All things
> > >> considered, I'm not sure which is better.
> > >
> > >Yeah, this is a tough call to make, but if we can allow it to delete
> > >the pages in bulkdelete conditionally for parallel vacuum workers,
> > >then it would be better.
> >
> > Yeah, if it's needed for parallel vacuum, maybe that tips the scale.
> >
>
> makes sense.  I think we can write a patch for it and prepare the
> parallel vacuum patch on top of it.  Once the parallel vacuum is in a
> committable shape, we can commit the gist-index related patch first
> followed by parallel vacuum patch.

+1
I can write a patch for the same.

> > Hopefully, multi-pass vacuums are rare in practice. And we should lift the 
> > current 1 GB limit on the dead TID array, replacing it with something more 
> > compact and expandable, to make multi-pass vacuums even more rare. So I 
> > don't think we need to jump through many hoops to optimize the multi-pass 
> > case.
> >
>
> Yeah, that will be a good improvement.
+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-16 Thread Amit Kapila

On Wed, Oct 16, 2019 at 7:21 PM Heikki Linnakangas  wrote:
>
> On 16 October 2019 12:57:03 CEST, Amit Kapila  wrote:
> >On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas 
> >wrote:
> >> All things
> >> considered, I'm not sure which is better.
> >
> >Yeah, this is a tough call to make, but if we can allow it to delete
> >the pages in bulkdelete conditionally for parallel vacuum workers,
> >then it would be better.
>
> Yeah, if it's needed for parallel vacuum, maybe that tips the scale.
>

makes sense.  I think we can write a patch for it and prepare the
parallel vacuum patch on top of it.  Once the parallel vacuum is in a
committable shape, we can commit the gist-index related patch first
followed by parallel vacuum patch.

> Hopefully, multi-pass vacuums are rare in practice. And we should lift the 
> current 1 GB limit on the dead TID array, replacing it with something more 
> compact and expandable, to make multi-pass vacuums even more rare. So I don't 
> think we need to jump through many hoops to optimize the multi-pass case.
>

Yeah, that will be a good improvement.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-16 Thread Amit Kapila

On Wed, Oct 16, 2019 at 11:20 AM Dilip Kumar  wrote:
>
> On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas  wrote:
> >
> > On 15/10/2019 09:37, Amit Kapila wrote:
> > > While reviewing a parallel vacuum patch [1], we noticed a few things
> > > about $SUBJECT implemented in commit -
> > > 7df159a620b760e289f1795b13542ed1b3e13b87.
> > >
> > > 1. A new memory context GistBulkDeleteResult->page_set_context has
> > > been introduced, but it doesn't seem to be used.
> >
> > Oops. internal_page_set and empty_leaf_set were supposed to be allocated
> > in that memory context. As things stand, we leak them until end of
> > vacuum, in a multi-pass vacuum.
>
> Here is a patch to fix this issue.
>

The patch looks good to me.  I have slightly modified the comments and
removed unnecessary initialization.

Heikki, are you fine me committing and backpatching this to 12?  Let
me know if you have a different idea to fix.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-memory-leak-introduced-in-commit-7df159a620.patch
Description: Binary data

Re: Questions/Observations related to Gist vacuum

2019-10-16 Thread Heikki Linnakangas

On 16 October 2019 12:57:03 CEST, Amit Kapila  wrote:
>On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas 
>wrote:
>> All things
>> considered, I'm not sure which is better.
>
>Yeah, this is a tough call to make, but if we can allow it to delete
>the pages in bulkdelete conditionally for parallel vacuum workers,
>then it would be better.

Yeah, if it's needed for parallel vacuum, maybe that tips the scale.

Hopefully, multi-pass vacuums are rare in practice. And we should lift the 
current 1 GB limit on the dead TID array, replacing it with something more 
compact and expandable, to make multi-pass vacuums even more rare. So I don't 
think we need to jump through many hoops to optimize the multi-pass case.

- Heikki

Re: Questions/Observations related to Gist vacuum

2019-10-16 Thread Amit Kapila

On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas  wrote:
>
> On 15/10/2019 09:37, Amit Kapila wrote:
> > 2. Right now, in gistbulkdelete we make a note of empty leaf pages and
> > internals pages and then in the second pass during gistvacuumcleanup,
> > we delete all the empty leaf pages.  I was thinking why unlike nbtree,
> > we have delayed the deletion of empty pages till gistvacuumcleanup.  I
> > don't see any problem if we do this during gistbulkdelete itself
> > similar to nbtree, also I think there is some advantage in marking the
> > pages as deleted as early as possible.  Basically, if the vacuum
> > operation is canceled or errored out between gistbulkdelete and
> > gistvacuumcleanup, then I think the deleted pages could be marked as
> > recyclable very early in next vacuum operation.  The other advantage
> > of doing this during gistbulkdelete is we can avoid sharing
> > information between gistbulkdelete and gistvacuumcleanup which is
> > quite helpful for a parallel vacuum as the information is not trivial
> > (it is internally stored as in-memory Btree).   OTOH, there might be
> > some advantage for delaying the deletion of pages especially in the
> > case of multiple scans during a single VACUUM command.  We can
> > probably delete all empty leaf pages in one go which could in some
> > cases lead to fewer internal page reads.  However, I am not sure if it
> > is really advantageous to postpone the deletion as there seem to be
> > some downsides to it as well. I don't see it documented why unlike
> > nbtree we consider delaying deletion of empty pages till
> > gistvacuumcleanup, but I might be missing something.
>
> Hmm. The thinking is/was that removing the empty pages is somewhat
> expensive, because it has to scan all the internal nodes to find the
> downlinks to the to-be-deleted pages. Furthermore, it needs to scan all
> the internal pages (or at least until it has found all the downlinks),
> regardless of how many empty pages are being deleted. So it makes sense
> to do it only once, for all the empty pages. You're right though, that
> there would be advantages, too, in doing it after each pass.
>

I was thinking more about this and it seems that there could be more
cases where delaying the delete mark for pages can further delay the
recycling of pages.  It is quite possible that immediately after bulk
delete the value of nextFullXid (used as deleteXid) is X and during
vacuum clean up it can be X + N where the chances of N being large is
more when there are multiple passes of vacuum scan.  Now, if we would
have set the value of deleteXid as X, then there are more chances for
the next vacuum to recycle it.  I am not sure but it might be that in
the future, we could come up with something (say if we can recompute
RecentGlobalXmin again) where we can recycle pages of first index scan
in the next scan of the index during single vacuum operation.

This is just to emphasize the point that doing the delete marking of
pages in the same pass has advantages, otherwise, I understand that
there are advantages in delaying it as well.

> All things
> considered, I'm not sure which is better.
>

Yeah, this is a tough call to make, but if we can allow it to delete
the pages in bulkdelete conditionally for parallel vacuum workers,
then it would be better.

I think we have below option w.r.t Gist indexes for parallel vacuum
a. don't allow Gist Index to participate in parallel vacuum
b. allow delete of leaf pages in bulkdelete for parallel worker
c. always allow deleting leaf pages in bulkdelete
d. Invent some mechanism to share all the Gist stats via shared memory

(a) is not a very good option, but it is a safe option as we can
extend it in the future and we might decide to go with it especially
if we can't decide among any other options. (b) would serve the need
but would add some additional checks in gistbulkdelete and will look
more like a hack.  (c) would be best, but I think it will be difficult
to be sure that is a good decision for all type of cases. (d) can
require a lot of effort and I am not sure if it is worth adding
complexity in the proposed patch.

Do you have any thoughts on this?

Just to give you an idea of the current parallel vacuum patch, the
master backend scans the heap and forms the dead tuple array in dsm
and then we launch one worker for each index based on the availability
of workers and share the dead tuple array with each worker.  Each
worker performs bulkdelete for the index.  In the end, we perform
cleanup of all the indexes either via worker or master backend based
on some conditions.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Questions/Observations related to Gist vacuum

2019-10-15 Thread Dilip Kumar

On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas  wrote:
>
> On 15/10/2019 09:37, Amit Kapila wrote:
> > While reviewing a parallel vacuum patch [1], we noticed a few things
> > about $SUBJECT implemented in commit -
> > 7df159a620b760e289f1795b13542ed1b3e13b87.
> >
> > 1. A new memory context GistBulkDeleteResult->page_set_context has
> > been introduced, but it doesn't seem to be used.
>
> Oops. internal_page_set and empty_leaf_set were supposed to be allocated
> in that memory context. As things stand, we leak them until end of
> vacuum, in a multi-pass vacuum.

Here is a patch to fix this issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


user_correct_memorycontext_gist_stat.patch
Description: Binary data

Re: Questions/Observations related to Gist vacuum

2019-10-15 Thread Heikki Linnakangas


On 15/10/2019 09:37, Amit Kapila wrote:

While reviewing a parallel vacuum patch [1], we noticed a few things
about $SUBJECT implemented in commit -
7df159a620b760e289f1795b13542ed1b3e13b87.

1. A new memory context GistBulkDeleteResult->page_set_context has
been introduced, but it doesn't seem to be used.


Oops. internal_page_set and empty_leaf_set were supposed to be allocated 
in that memory context. As things stand, we leak them until end of 
vacuum, in a multi-pass vacuum.



2. Right now, in gistbulkdelete we make a note of empty leaf pages and
internals pages and then in the second pass during gistvacuumcleanup,
we delete all the empty leaf pages.  I was thinking why unlike nbtree,
we have delayed the deletion of empty pages till gistvacuumcleanup.  I
don't see any problem if we do this during gistbulkdelete itself
similar to nbtree, also I think there is some advantage in marking the
pages as deleted as early as possible.  Basically, if the vacuum
operation is canceled or errored out between gistbulkdelete and
gistvacuumcleanup, then I think the deleted pages could be marked as
recyclable very early in next vacuum operation.  The other advantage
of doing this during gistbulkdelete is we can avoid sharing
information between gistbulkdelete and gistvacuumcleanup which is
quite helpful for a parallel vacuum as the information is not trivial
(it is internally stored as in-memory Btree).   OTOH, there might be
some advantage for delaying the deletion of pages especially in the
case of multiple scans during a single VACUUM command.  We can
probably delete all empty leaf pages in one go which could in some
cases lead to fewer internal page reads.  However, I am not sure if it
is really advantageous to postpone the deletion as there seem to be
some downsides to it as well. I don't see it documented why unlike
nbtree we consider delaying deletion of empty pages till
gistvacuumcleanup, but I might be missing something.


Hmm. The thinking is/was that removing the empty pages is somewhat 
expensive, because it has to scan all the internal nodes to find the 
downlinks to the to-be-deleted pages. Furthermore, it needs to scan all 
the internal pages (or at least until it has found all the downlinks), 
regardless of how many empty pages are being deleted. So it makes sense 
to do it only once, for all the empty pages. You're right though, that 
there would be advantages, too, in doing it after each pass. All things 
considered, I'm not sure which is better.


- Heikki

Questions/Observations related to Gist vacuum

2019-10-15 Thread Amit Kapila

While reviewing a parallel vacuum patch [1], we noticed a few things
about $SUBJECT implemented in commit -
7df159a620b760e289f1795b13542ed1b3e13b87.

1. A new memory context GistBulkDeleteResult->page_set_context has
been introduced, but it doesn't seem to be used.
2. Right now, in gistbulkdelete we make a note of empty leaf pages and
internals pages and then in the second pass during gistvacuumcleanup,
we delete all the empty leaf pages. I was thinking why unlike nbtree,
we have delayed the deletion of empty pages till gistvacuumcleanup. I
don't see any problem if we do this during gistbulkdelete itself
similar to nbtree, also I think there is some advantage in marking the
pages as deleted as early as possible. Basically, if the vacuum
operation is canceled or errored out between gistbulkdelete and
gistvacuumcleanup, then I think the deleted pages could be marked as
recyclable very early in next vacuum operation. The other advantage
of doing this during gistbulkdelete is we can avoid sharing
information between gistbulkdelete and gistvacuumcleanup which is
quite helpful for a parallel vacuum as the information is not trivial
(it is internally stored as in-memory Btree). OTOH, there might be
some advantage for delaying the deletion of pages especially in the
case of multiple scans during a single VACUUM command. We can
probably delete all empty leaf pages in one go which could in some
cases lead to fewer internal page reads. However, I am not sure if it
is really advantageous to postpone the deletion as there seem to be
some downsides to it as well. I don't see it documented why unlike
nbtree we consider delaying deletion of empty pages till
gistvacuumcleanup, but I might be missing something.

Thoughts?

[1] -
https://www.postgresql.org/message-id/CAA4eK1JEQ2y3uNucNopDjK8pse6xSe5%3D_oknoWfRQvAF%3DVqsBA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

1 2 >

1 - 100 of 131 matches

Mail list logo