Re: pgsql: Optimize btree insertions for common case of increasing values

2018-04-10 Thread Pavan Deolasee
On Tue, Apr 10, 2018 at 1:18 AM, Peter Geoghegan <p...@bowt.ie> wrote:

> On Thu, Apr 5, 2018 at 10:16 AM, Peter Geoghegan <p...@bowt.ie> wrote:
> > I think you can take that wording almost verbatim. Obviously it should
> > refer to the optimization by name, and blend into the surrounding text
> > in the README. I suggest putting a small section before "On-the-Fly
> > Deletion Of Index Tuples", but after the main discussion of deletion +
> > recycling. It's essentially an exception to the general rule, so that
> > placement makes sense to me.
>
> I also think that we should also say something about extent-based
> storage. This optimization relies on the assumption that reading some
> stale block cannot read a block from some other relation (which could
> perhaps be its own rightmost leaf page). If we ever wanted to share
> storage between small relations as extents, that would invalidate the
> optimization.
>
> This came up recently on the "PostgreSQL's handling of fsync() errors
> is unsafe and risks data loss at least on XFS" thread, and what I
> describe is in fact how other database systems manage storage, so this
> seems like a real practical consideration.
>
>
Hmm. I am a bit confused why we want to mention anything about something
we're not even considering seriously, let alone any patch or work in that
direction. If we at all change everything to use extent based storage,
there would be many other things that will break and require changes, no?

 Apart from that, other changes requested are included in the patch. This
also takes care of Heikki's observation regarding UNLOGGED tables on the
other thread.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Post-commit-cleanup-for-B-Tree-optimization-work.patch
Description: Binary data


Re: pgsql: Fix comment on B-tree insertion fastpath condition.

2018-04-10 Thread Pavan Deolasee
On Tue, Apr 10, 2018 at 7:28 PM, Heikki Linnakangas <
heikki.linnakan...@iki.fi> wrote:

> Fix comment on B-tree insertion fastpath condition.
>
> The comment earlier in the function correctly states "and the insertion
> key is strictly greater than the first key in this page". That is what
> we check here, not "greater than or equal".
>
>
This was part of the follow-on patch that I'd sent (which needs a little
more adjustments based on Peter's feedback). But thanks for taking care of
it anyways.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Optimize btree insertions for common case of increasing values

2018-04-05 Thread Pavan Deolasee
On Wed, Apr 4, 2018 at 10:46 PM, Peter Geoghegan <p...@bowt.ie> wrote:

>
>
> > You mean passing "fastpath" to _bt_insertonpg and then checking it's
> false
> > if page split is needed? But isn't page split is only needed if the page
> > doesn't have enough free space? If so, we have checked for that before
> > setting "fastpath".
>
> That's not exactly what I meant. I meant that if:
>
> 1. This is an insertion to the leaf level in _bt_insertonpg().
>
> and
>
> 2. We don't have space on the page, and so must do a split (or at
> least free LP_DEAD items).
>
> and
>
> 3. RelationGetTargetBlock(rel) != InvalidBlockNumber
>
> There should be an assertion failure. This new assertion within
> _bt_insertonpg() makes it much less likely that the assumption breaks
> later.
>

I came up with something like this:

+   /*
+* If we're here then a pagesplit is needed. We should
never reach here
+* if we're using the fastpath since we should have checked
for all the
+* required conditions, including the fact that this page
has enough
+* freespace. Note that this routine can in-theory deal
with the
+* situation where a NULL stack pointer is passed (that's
what would
+* happen if the fastpath is taken), like it does during
crash
+* recovery. But that path is much slower, defeating the
very purpose
+* of the optimisation.  The following assertion should
protect us from
+* any future code changes that invalidate those
assumptions.
+*
+* Note the fact that whenever we fail to take the
fastpath, we clear
+* the cached block. So checking for a valid cached block
at this point
+* is enough to decide whether we're in a fastpath or not.
+*/
+   Assert(!(P_ISLEAF(lpageop) &&
+
 BlockNumberIsValid(RelationGetTargetBlock(rel;
+

But then I started thinking can _bt_insertonpg() be called from a code path
which doesn't start at _bt_doinsert()? I traced one such path
_bt_getstackbuf() -> _bt_finish_split() -> _bt_insert_parent(), but that
can only happen in case of a non-leaf page. The assertion which checks for
a leaf page, should be fine, right?


>
> Finally, I'd like to see a small paragraph in the nbtree README, about
> the high level theory behind this optimization and page recycling. The
> assumption is that there can only be one non-ignorable leaf rightmost
> page, and so even a RecentGlobalXmin style interlock isn't required.
> We cannot fail to detect that our hint was invalidated, because there
> can only be one such page in the B-Tree at any time. It's possible
> that the page will be deleted and recycled without a backend's cached
> page also being detected as invalidated, but only when we happen to
> recycle a page that once again becomes the rightmost leaf page.
>
>
Can I borrow that almost verbatim, after adding details about the
optimisation itself? Looks like I'm too tired to think sensibly.


> Once those changes are made, this should be fine to commit.
>

Ok, thanks.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: New files for MERGE

2018-04-04 Thread Pavan Deolasee
On Thu, Apr 5, 2018 at 12:16 AM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2018-04-05 00:02:06 +0530, Pavan Deolasee wrote:
> > Apologies from my end. Simon checked with me regarding your referenced
> > email. I was in the middle of responding to it (with a add-on patch to
> take
> > care of your review comments), but got side tracked by some high priority
> > customer escalation. I shall respond soon.
>
> Hows that an explanation for just going ahead and committing? Without
> even commenting on why one thinks the pointed out issues are something
> that can be resolved later or somesuch?  This has an incredibly rushed
> feel to it.
>

While I don't want to answer that on Simon's behalf, my feeling is that he
may not seen your email since it came pretty late. He had probably planned
to commit the patch again first thing in the morning with the fixes I'd
sent.

Anyways, I think your reviews comments are useful and I've incorporated
most of those. Obviously certain things like creating a complete new
executor machinery is not practical given where we're in the release cycle
and I am not sure if that has any significant advantages over what we have
today.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: New files for MERGE

2018-04-04 Thread Pavan Deolasee
On Wed, Apr 4, 2018 at 10:40 PM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2018-04-03 08:32:45 -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2018-04-03 09:24:12 +, Simon Riggs wrote:
> > > New files for MERGE
> > > src/backend/executor/nodeMerge.c   |  575 +++
> > > src/backend/parser/parse_merge.c   |  660 
> > > src/include/executor/nodeMerge.h   |   22 +
> > > src/include/parser/parse_merge.h   |   19 +
> >
> > Getting a bit grumpy here.  So you pushed this, without responding in
> > any way to the objections I made in
> > http://archives.postgresql.org/message-id/20180403021800.
> b5nsgiclzanobiup%40alap3.anarazel.de
> > and did it in a manner that doesn't even compile?
>
> This needs at the very least a response to the issues pointed out in the
> referenced email that you chose to ignore without any sort of comment.
>
>
Apologies from my end. Simon checked with me regarding your referenced
email. I was in the middle of responding to it (with a add-on patch to take
care of your review comments), but got side tracked by some high priority
customer escalation. I shall respond soon.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: pgsql: Optimize btree insertions for common case of increasing values

2018-04-04 Thread Pavan Deolasee
On Thu, Mar 29, 2018 at 4:39 AM, Peter Geoghegan <p...@bowt.ie> wrote:

>
>
> Suggested next steps to deal with this:
>
> * A minor point: I don't think you should call
> RelationSetTargetBlock() when the page P_ISROOT(), which, as I
> mentioned, is a condition that can coexist with P_ISLEAF() with very
> small B-Trees. There can be no point in doing so. No need to add
> P_ISROOT() to the main "Is cached page stale?" test within
> _bt_doinsert(), though.
>

Ok. Adding a check for tree height and setting target block only if it's >=
2, as suggested by you and Simon later. Rahila helped me also ran another
round of tests and this does not lead to any performance regression (we
were worried about whether calling _bt_getrootheight will be expensive).

Also moved RelationSetTargetBlock() to a point where we are not holding any
locks and are outside the critical section.


>
> * An assertion would make me feel a lot better about depending on not
> having a page split from a significant distance.
>

Ok. I assume you mean an assertion to check that the target page doesn't
have an in-complete split. Added that though not sure if it's useful since
we concluded that right-most page can't have in-complete split.

Let me know if you mean something else.


> Your optimization should continue to not be used when it would result
> in a page split, but only because that would be slow. The comments
> should say so, IMV.


Added.


> Also, _bt_insertonpg() should have an assertion
> against a page split actually occurring when the optimization was
> used, just in case. When !BufferIsValid(cbuf), we know that we're
> being called from _bt_doinsert() (see existing assertion at the top of
> _bt_insertonpg() as an example of this), so at the point where it's
> clear a page split is needed, we should assert that there is no target
> block that we must have been passed as the target page.
>
>
You mean passing "fastpath" to _bt_insertonpg and then checking it's false
if page split is needed? But isn't page split is only needed if the page
doesn't have enough free space? If so, we have checked for that before
setting "fastpath".


> * The indentation of the main _bt_doinsert() test does not follow
> project guidelines. Please tweak that, too.
>

Ok. Fixed.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pg_btree_target_block_v4_delta3.patch
Description: Binary data


Re: pgsql: Modified files for MERGE

2018-04-02 Thread Pavan Deolasee
On Tue, Apr 3, 2018 at 2:06 AM, Andres Freund <and...@anarazel.de> wrote:

>
> This seems to have turned several animals red:
> - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
> thrips=2018-04-02%2020%3A27%3A28
>   use of uint, which isn't a portable type.
>

Right.


> - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
> calliphoridae=2018-04-02%2020%3A20%3A01
>   looks like copy/read/out/equalfuncs aren't properly filled out
>
>
Looks like we missed handling of new MergeAction node in
raw_expression_tree_walker

I will send a add-on patch to fix whatever I could spot before the patch
was reverted. I see Andres has already started discussion about displaying
memory usage which was one other thing I'd spotted while the build farm was
red.

Thanks,
Pavan


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services