from:"Edmund Horner"

Re: PATCH: psql tab completion for SELECT

2021-10-13 Thread Edmund Horner

On Fri, 8 Oct 2021 at 20:01, David Fetter  wrote:

> On Sat, Sep 22, 2018 at 06:53:55PM +1200, Edmund Horner wrote:
> > Hi all,
> >
> > Here are some rebased versions of the last two patches.  No changes in
> > functionality, except a minor case sensitivity fix in the "completion
> > after commas" patch.
> >
> > Edmund
>
> I've rebased and updated these to be more principled about what
> functions could be tab completed.
>
> Still missing: tests.
>
> What precisely is this supposed to do?

Hi David,

The main patch 0001 was to add completion for "SELECT " using
attributes, functions, and a couple of hard-coded options.

The changes to _complete_from_query were so that simple_query (when used in
addon mode, as for this feature) could have the current word interpolated
into it.  This feature uses a schema query to get the function names, as
they can be schema qualified, and an addon to get the column names, as they
can't (they could be table-qualified, but that's hard when you don't know
what the table aliases will be).  Previously existing queries using addons
did not need to interpolate into them, but this one did, hence the change.

The second patch 0002 was to build on 0001 and add support for pressing
 after a comma while in the column list, i.e. when SELECT was the most
recent major keyword (major being defined by the list of keywords in
CurrentQueryPartMatches).

There's a bit of trickery around completing "," by adding a single space.
Without the space, the next  will try to complete using the whole word
including the part before the comma.

Regarding testing, I can't see any automated tests for tab completion.
Though it seems to me we could do it with a readline driver program of some
sorts, if the current regression test scripts don't work interactively.

Cheers,
Edmund

Re: Tid scan improvements

2021-06-07 Thread Edmund Horner

On Mon, 7 Jun 2021 at 22:11, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> This patch didn't add _outTidRangePath() even though we have outNode()
> coverage for most/all path nodes.  Was this just forgotten?  See
> attached patch.
>

Yes, it looks like an omission.  Thanks for spotting it.  Patch looks good
to me.

Edmund

Re: Tid scan improvements

2021-01-12 Thread Edmund Horner

On Fri, 1 Jan 2021 at 14:30, David Fetter  wrote:

> On Sun, Dec 01, 2019 at 11:34:16AM +0900, Michael Paquier wrote:
> > Okay, still nothing has happened after two months.  Once this is
> > solved a new patch submission could be done.  For now I have marked
> > the entry as returned with feedback.
>
> I dusted off the last version of the patch without implementing the
> suggestions in this thread between here and there.
>
> I think this is a capability worth having, as I was surprised when it
> turned out that it didn't exist when I was looking to make an
> improvement to pg_dump. My idea, which I'll get back to when this is
> in, was to use special knowledge of heap AM tables to make it possible
> to parallelize dumps of large tables by working separately on each
> underlying file, of which there could be quite a few for a large one.
>
> Will try to understand the suggestions upthread better and implement
> same.
>

Hi David,

Thanks for updating the patch.  I'd be very happy if this got picked up
again, and I'd certainly follow along and do some review.

+   splan->tidrangequals =
+   fix_scan_list(root,
splan->tidrangequals,
+ rtoffset,
1); /* v9_tid XXX Not sure this is right */

I'm pretty sure the parameter num_exec = 1 is fine; it seems to only affect
correlated subselects, which shouldn't really make their way into the
tidrangequals expressions.  It's more or less the same situation as
tidquals for TidPath, anyway.  We could put a little comment:  /*
correlated subselects shouldn't get into tidquals/tidrangequals */ or
something to that effect.

Edmund

Re: Tid scan improvements

2019-09-04 Thread Edmund Horner

On Wed, 4 Sep 2019 at 10:34, Alvaro Herrera  wrote:
>
> On 2019-Aug-01, Edmund Horner wrote:
>
> > Hi everyone,
> >
> > Sadly it is the end of the CF and I have not had much time to work on
> > this.  I'll probably get some time in the next month to look at the
> > heapam changes.
> >
> > Should we move to CF?  We have been in the CF cycle for almost a year now...
>
> Hello, do we have an update on this patch?  Last version that was posted
> was v9 from David on July 17th; you said you had made some changes on
> July 22nd but didn't attach any patch.  v9 doesn't apply anymore.

Hi pgsql-hackers,

I have had a lot of difficulty making the changes to heapam.c and I
think it's becoming obvious I won't get them done by myself.

The last *working* patch pushed the work into heapam.c, but it was
just a naive loop over the whole table.  I haven't found how to
rewrite heapgettup_pagemode in the way Andres suggests.

So, I think we need to either get some help from someone familiar with
heapam.c, or maybe shelve the patch.  It has been work in progress for
over a year now.

Edmund

Re: Tid scan improvements

2019-07-31 Thread Edmund Horner

On Thu, 1 Aug 2019 at 17:47, Thomas Munro  wrote:
> On Thu, Aug 1, 2019 at 5:34 PM Edmund Horner  wrote:
> > Should we move to CF?  We have been in the CF cycle for almost a year now...
>
> Hi Edmund,
>
> No worries, that's how it goes sometimes.  Please move it to the next
> CF if you think you'll find some time before September.  Don't worry
> if it might have to be moved again.  We want the feature, and good
> things take time!

Ok thanks.

I tried moving it to the new commitfest, but it seems I cannot from
its current state.

If it's ok, I'll just leave it to you in 7 hours' time!

Edmund

Re: Tid scan improvements

2019-07-31 Thread Edmund Horner

On Mon, 22 Jul 2019 at 19:44, Edmund Horner  wrote:
> On Mon, 22 Jul 2019 at 19:25, David Rowley 
> > On Sat, 20 Jul 2019 at 06:21, Andres Freund  wrote:
> > > Yea, I was thinking of something like 2.  We already have a few extra
> > > types of scan nodes (bitmap heap and sample scan), it'd not be bad to
> > > add another. And as you say, they can just share most of the guts: For
> > > heap I'd just implement pagemode, and perhaps split heapgettup_pagemode
> > > into two parts (one to do the page processing, the other to determine
> > > the relevant page).
> > >
> > > You say that we'd need new fields in HeapScanDescData - not so sure
> > > about that, it seems feasible to just provide the boundaries in the
> > > call? But I think it'd also just be fine to have the additional fields.
> >
> > Thanks for explaining.
> >
> > I've set the CF entry for the patch back to waiting on author.
> >
> > I think if we get this part the way Andres would like it, then we're
> > pretty close.

> [...]

> I'll have to spend a bit of time looking at heapgettup_pagemode to
> figure how to split it as Andres suggests.

Hi everyone,

Sadly it is the end of the CF and I have not had much time to work on
this.  I'll probably get some time in the next month to look at the
heapam changes.

Should we move to CF?  We have been in the CF cycle for almost a year now...

Edmund

Re: Tid scan improvements

2019-07-22 Thread Edmund Horner

On Mon, 22 Jul 2019 at 19:25, David Rowley 
> On Sat, 20 Jul 2019 at 06:21, Andres Freund  wrote:
> > Yea, I was thinking of something like 2.  We already have a few extra
> > types of scan nodes (bitmap heap and sample scan), it'd not be bad to
> > add another. And as you say, they can just share most of the guts: For
> > heap I'd just implement pagemode, and perhaps split heapgettup_pagemode
> > into two parts (one to do the page processing, the other to determine
> > the relevant page).
> >
> > You say that we'd need new fields in HeapScanDescData - not so sure
> > about that, it seems feasible to just provide the boundaries in the
> > call? But I think it'd also just be fine to have the additional fields.
>
> Thanks for explaining.
>
> I've set the CF entry for the patch back to waiting on author.
>
> I think if we get this part the way Andres would like it, then we're
> pretty close.

I've moved the code in question into heapam, with:

  * a new scan type SO_TYPE_TIDRANGE (renumbering some of the other
enums).

  * a wrapper table_beginscan_tidrange(Relation rel, Snapshot snapshot);
I'm not sure whether we need scankeys and the other flags?

  * a new optional callback scan_settidrange(TableScanDesc scan,
ItemPointer startTid, ItemPointer endTid) with wrapper
table_scan_settidrange.
I thought about combining it with table_beginscan_tidrange but we're not
really doing that with any of the other beginscan methods.

  * another optional callback scan_getnexttidrangeslot.  The presence of
these two callbacks indicates that TidRangeScan is supported for
this relation.

Let me know if you can think of better names.

However, the heap_getnexttidrangeslot function is just the same
iterative code calling heap_getnextslot and checking the tuples
against the tid range (two fields added to the ScanDesc).

I'll have to spend a bit of time looking at heapgettup_pagemode to
figure how to split it as Andres suggests.

Thanks,
Edmund

Re: Tid scan improvements

2019-07-17 Thread Edmund Horner

Thanks for the edits and fixing that pretty glaring copy-paste bug.

Regarding enable_tidscan, I couldn't decide whether we really need it,
and erred on the side of not adding yet another setting.

The current patch only creates a tid range path if there's at least
one ctid qual.  But during development of earlier patches I was a bit
concerned about the possibility of tid range scan being picked instead
of seq scan when the whole table is scanned, perhaps due to a tiny
discrepency in costing.  Both scans will scan the whole table, but seq
scan is preferred since it can be parallellised, synchronised with
other scans, and has a bit less overhead with tuple checking.  If a
future change creates tid range paths for more queries, for instance
for MIN/MAX(ctid) or ORDER BY ctid, then it might be more important to
have a separate setting for it.

On Wed, 17 Jul 2019 at 23:11, David Rowley  wrote:
>
> On Mon, 15 Jul 2019 at 17:54, Edmund Horner  wrote:
> > Summary of changes compared to last time:
> >   - I've added the additional "scan_setlimits" table AM method.  To
> > check whether it's implemented in the planner, I have added an
> > additional "has_scan_setlimits" flag to RelOptInfo.  It seems to work.
> >   - I've also changed nodeTidrangescan to not require anything from heapam 
> > now.
> >   - To simply the patch and avoid changing heapam, I've removed the
> > backward scan support (which was needed for FETCH LAST/PRIOR) and made
> > ExecSupportsBackwardScan return false for this plan type.
> >   - I've removed the vestigial passing of "direction" through
> > nodeTidrangescan.  If my understanding is correct, the direction
> > passed to TidRangeNext will always be forward.  But I did leave FETCH
> > LAST/PRIOR in the regression tests (after adding SCROLL to the
> > cursor).
>
> I spent some time today hacking at this.  I fixed a bug in how
> has_scan_setlimits set, rewrite a few comments and simplified some of
> the code.
>
> When I mentioned up-thread about the optional scan_setlimits table AM
> callback, I'd forgotten that you'd not have access to check that
> directly during planning. As you mention above, you've added
> RelOptInfo has_scan_setlimits so the planner knows if it can use TID
> Range scans or not. It would be nice to not have to add this flag, but
> that would require either:
>
> 1. Making scan_setlimits a non-optional callback function in table AM, or;
> 2. Allowing the planner to have access to the opened Relation.
>
> #2 is not for this patch, but there has been some talk about it. It
> was done for the executor last year in d73f4c74dd3.
>
> I wonder if Andres has any thoughts on #1?
>
> The other thing I was thinking about was if enable_tidscan should be
> in charge of TID Range scans too. I see you have it that way, but
> should we be adding enable_tidrangescan?  The docs claim that
> enable_tidscan: "Enables or disables the query planner's use of TID
> scan plan types.". Note: "types" is  plural.  Maybe we could call that
> fate and keep it the way the patch has it already.  Does anyone have
> another idea about that?
>
> I've attached a delta of the changes I made and also a complete v9 patch.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

Re: Tid scan improvements

2019-07-14 Thread Edmund Horner

On Thu, 11 Jul 2019 at 10:22, David Rowley  wrote:
> So it seems that the plan is to insist that TIDs are tuple identifiers
> for all table AMs, for now.  If that changes in the future, then so be
> it, but I don't think that's cause for delaying any work on TID Range
> Scans.  Also from scanning around tableam.h, I see that there's no
> shortage of usages of BlockNumber, so it seems reasonable to assume
> table AMs must use blocks... It's hard to imagine moving away from
> that given that we have shared buffers.
>
> We do appear to have some table AM methods that are optional, although
> I'm not sure where the documentation is about that. For example, in
> get_relation_info() I see:
>
> info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
> relation->rd_tableam->scan_bitmap_next_block != NULL;
>
> so it appears that at least scan_bitmap_next_block is optional.
>
> I think what I'd do would be to add a table_setscanlimits API method
> for table AM and perhaps have the planner only add TID range scan
> paths if the relation has a non-NULL function pointer for that API
> function.  It would be good to stick a comment at least in tableam.h
> that mentions that the callback is optional.  That might help a bit
> when it comes to writing documentation on each API function and what
> they do.

Hi.  Here's a new patch.

Summary of changes compared to last time:
  - I've added the additional "scan_setlimits" table AM method.  To
check whether it's implemented in the planner, I have added an
additional "has_scan_setlimits" flag to RelOptInfo.  It seems to work.
  - I've also changed nodeTidrangescan to not require anything from heapam now.
  - To simply the patch and avoid changing heapam, I've removed the
backward scan support (which was needed for FETCH LAST/PRIOR) and made
ExecSupportsBackwardScan return false for this plan type.
  - I've removed the vestigial passing of "direction" through
nodeTidrangescan.  If my understanding is correct, the direction
passed to TidRangeNext will always be forward.  But I did leave FETCH
LAST/PRIOR in the regression tests (after adding SCROLL to the
cursor).

The patch now only targets the simple use case of restricting the
range of a table to scan.  I think it would be nice to eventually
support the other major use cases of ORDER BY ctid ASC/DESC and
MIN/MAX(ctid), but that can be another feature...

Edmund


v8-0001-Add-a-new-plan-type-Tid-Range-Scan-to-support-range-.patch
Description: Binary data

Re: Tid scan improvements

2019-07-11 Thread Edmund Horner

On Thu, 11 Jul 2019 at 10:22, David Rowley  wrote:
> > A.  Continue to target only heapam tables, making the bare minimum
> > changes necessary for the new tableam api.
> > B.  Try to do something more general that works on all tableam
> > implementations for which it may be useful.
>
> Going by the conversation with Andres above:
>
> On Tue, 26 Mar 2019 at 10:52, Andres Freund  wrote:
> >
> > On 2019-03-18 22:35:05 +1300, David Rowley wrote:
> > > The commit message in 8586bf7ed mentions:
> > >
> > > > Subsequent commits will incrementally abstract table access
> > > > functionality to be routed through table access methods. That change
> > > > is too large to be reviewed & committed at once, so it'll be done
> > > > incrementally.
> > >
> > > and looking at [1] I see patch 0004 introduces some changes in
> > > nodeTidscan.c to call a new tableam API function named
> > > heapam_fetch_row_version. I see this function does have a ItemPointer
> > > argument, so I guess we must be keeping those as unique row
> > > identifiers in the API.
> >
> > Right, we are. At least for now - there's some discussions around
> > allowing different format for TIDs, to allow things like index organized
> > tables, but that's for later.
>
> So it seems that the plan is to insist that TIDs are tuple identifiers
> for all table AMs, for now.  If that changes in the future, then so be
> it, but I don't think that's cause for delaying any work on TID Range
> Scans.  Also from scanning around tableam.h, I see that there's no
> shortage of usages of BlockNumber, so it seems reasonable to assume
> table AMs must use blocks... It's hard to imagine moving away from
> that given that we have shared buffers.
>
> We do appear to have some table AM methods that are optional, although
> I'm not sure where the documentation is about that. For example, in
> get_relation_info() I see:
>
> info->amhasgetbitmap = amroutine->amgetbitmap != NULL &&
> relation->rd_tableam->scan_bitmap_next_block != NULL;
>
> so it appears that at least scan_bitmap_next_block is optional.

> I think what I'd do would be to add a table_setscanlimits API method
> for table AM and perhaps have the planner only add TID range scan
> paths if the relation has a non-NULL function pointer for that API
> function.  It would be good to stick a comment at least in tableam.h
> that mentions that the callback is optional.  That might help a bit
> when it comes to writing documentation on each API function and what
> they do.

This seems like a good path forward.

> > There may not be much different between them, but B. means a bit more
> > research into zheap, zstore and other possible tableams.
> >
> > Next question, how will the executor access the table?
> >
> > 1. Continue to use the seqscan tableam methods, by setting limits.
> > 2. Use the bitmap scan methods, for instance by faking a 
> > BitmapIteratorResuit.
> > 3. Add new tableam methods specially for scanning a range of TIDs.
> >
> > Any thoughts?
>
> I think #1 is fine for now. #3 might be slightly more efficient since
> you'd not need to read each tuple in the given page before the given
> offset and throw it away, but I don't really think it's worth adding a
> new table AM method for.

Yeah, it's not perfectly efficient in that regard.  But it's at least
a step in the right direction.

Thanks for the guidance David.  I think I'll be able to make some
progress now before hitting the next obstacle!

Edmund

Re: Tid scan improvements

2019-07-06 Thread Edmund Horner

On Thu, 4 Jul 2019 at 15:43, David Rowley  wrote:
> On Mon, 1 Jul 2019 at 23:29, Thomas Munro  wrote:
> > The new CF is here.  I'm going through poking threads for submissions
> > that don't apply, but it sounds like this needs more than a rebase?
> > Perhaps this belongs in the next CF?
>
> 0001 and 0004 of v7 got pushed in PG12. The CFbot will be trying to
> apply 0001 still, but on testing 0002, no joy there either.
>
> It would be good to see this back in PG13. For now, I'll mark it as
> waiting on author.

Hi,

I'm not really sure how to proceed.  I started with a fairly pragmatic
solution to "WHERE ctid > ? AND ctid < ?" for tables, and then tableam
came along.

The options I see are:

A.  Continue to target only heapam tables, making the bare minimum
changes necessary for the new tableam api.
B.  Try to do something more general that works on all tableam
implementations for which it may be useful.

There may not be much different between them, but B. means a bit more
research into zheap, zstore and other possible tableams.

Next question, how will the executor access the table?

1. Continue to use the seqscan tableam methods, by setting limits.
2. Use the bitmap scan methods, for instance by faking a BitmapIteratorResuit.
3. Add new tableam methods specially for scanning a range of TIDs.

Any thoughts?

Re: Tid scan improvements

2019-03-26 Thread Edmund Horner

On Tue, 26 Mar 2019 at 11:54, Tom Lane  wrote:
>
> Andres Freund  writes:
> > I was kinda hoping to keep block numbers out of the "main" APIs, to
> > avoid assuming everything is BLCKSZ based. I don't have a particular
> > problem allowing an optional setscanlimits type callback that works with
> > block numbers. The planner could check its presence and just not build
> > tid range scans if not present.  Alternatively a bespoke scan API for
> > tid range scans, like the later patches in the tableam series for
> > bitmap, sample, analyze scans, might be an option.
>
> Given Andres' API concerns, and the short amount of time remaining in
> this CF, I'm not sure how much of this patch set we can expect to land
> in v12.  It seems like it might be a good idea to scale back our ambitions
> and see whether there's a useful subset we can push in easily.

I agree.  It'll take some time to digest Andres' advice and write a
better patch.

Should I set update CF app to a) set the target version to 13, and/or
move it to next commitfest?

> With that in mind, I went ahead and pushed 0001+0004, since improving
> the planner's selectivity estimate for a "ctid vs constant" qual is
> likely to be helpful whether the executor is smart about it or not.

Cool.

> FWIW, I don't really see the point of treating 0002 as a separate patch.
> If it had some utility on its own, then it'd be sensible, but what
> would that be?  Also, it looks from 0002 like you are trying to overload
> rs_startblock with a different meaning than it has for syncscans, and
> I think that might be a bad idea.

Yeah I don't think either patch is useful without the other.  They
were separate because, initially, only some of the TidRangeScan
functionality depended on it, and I was particularly uncomfortable
with what I was doing to heapam.c.

The changes in heapam.c were required for backward scan support, as
used by ORDER BY ctid DESC and MAX(ctid); and also for FETCH LAST and
FETCH PRIOR.  I have removed the backward scans functionality from the
current set of patches, but support for backward cursor fetches
remains.

I guess to brutally simplify the patch further, we could give up
backward cursor fetches entirely?  This means such cursors that end up
using a TidRangeScan will require SCROLL to go backwards (which is a
small pain for user experience), but TBH I don't think backwards-going
cursors on CTID will be hugely common.

I'm still not familiar enough with heapam.c to have any better ideas
on how to support backward scanning a limited range.

Re: Fix foreign key constraint check for partitioned tables

2019-03-23 Thread Edmund Horner

On Sat, 23 Mar 2019 at 12:01, Hadi Moshayedi  wrote:
> Yesterday while doing some tests, I noticed that the following doesn't work 
> properly:
>
> create role test_role with login;
> create table ref(a int primary key);
> grant references on ref to test_role;
> set role test_role;
> create table t1(a int, b int) partition by list (a);
> alter table t1 add constraint t1_b_key foreign key (b) references ref(a);
>
> In postgres 11.2, this results in the following error:
>
> ERROR:  could not open file "base/12537/16390": No such file or directory
>
> and in the master branch it simply crashes.
>
> It seems that validateForeignKeyConstraint() in tablecmds.c cannot use 
> RI_Initial_Check() to check the foreign key constraint, so it tries to open 
> the relation and scan it and verify each row by a call to 
> RI_FKey_check_ins(). Opening and scanning the relation fails, because it is a 
> partitioned table and has no storage.
>
> The attached patch fixes the problem by skipping foreign key constraint check 
> for relations with no storage. In partitioned table case, it will be verified 
> by scanning the partitions, so we are safe to skip the parent table.

Hi Hadi,

I reproduced the problem and tested your fix.  It looks simple and
correct to me.

I was a bit curious about the need for "set role" in the reproduction,
but I see that it's because RI_Initial_Check does some checks to see
if a simple SELECT can be used, and one of the checks is for basic
table permissions.

I wonder if the macro RELKIND_HAS_STORAGE should be used instead of
checking for each relkind?  This would apply to the check on line 4405
too.

Edmund

Re: What is a savepointLevel ?

2019-03-16 Thread Edmund Horner

On Fri, 15 Mar 2019 at 18:18, Chapman Flack  wrote:
> What exactly is a savepointLevel?
>
> They seem to have been there for 15 years[1], diligently copied from
> parent transactions to children, fastidiously checked to avoid crossing
> a level on rollback or release, but does anything ever change the level
> from its initial value? I'm drawing a blank[2].

I had a look too, checking for uses where savepointLevel might be set
as part of a struct initialisation.  I can't find any.

There's some discussion about it in July 2004.

https://www.postgresql.org/message-id/flat/Pine.LNX.4.58.0407101609080.4563%40linuxworld.com.au#dad1807aaa73de2be7070a1bc54d0f6b

https://www.postgresql.org/message-id/flat/5902.1090695230%40sss.pgh.pa.us#53d8db46b7f452acd19ec89fcb023e71

Adding the field was committed on the 27th.

(I'm very ignorant on the following.)

It looks like the point of savepoint levels is to distinguish between
savepoints created in the top transaction level versus those created
in nested function calls, and to stop you from trying to
release/rollback to a savepoint belonging to the outer scope.  But I
don't think we support savepoints from inside functions of any kind.
Various PLs use BeginInternalSubTransaction and they handle the
rolling back/releasing internally.

So the savepointLevel variable, and the two error checks that use it,
look a bit unused.  If SAVEPOINT commands were supported in functions,
you'd want to increment savepointLevel when you made a subtransaction
on entering the function.

Does that sound approximately right?

Edmund

> [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blame;f=src/backend/access/transam/xact.c;h=fd5d6b5;hb=90cb9c3#l93
>
> [2]
> https://git.postgresql.org/gitweb/?p=postgresql.git=search=0516c61b=grep=savepointLevel

Re: Tid scan improvements

2019-03-14 Thread Edmund Horner

On Thu, 14 Mar 2019 at 23:37, Edmund Horner  wrote:
> On Thu, 14 Mar 2019 at 23:06, David Rowley  
> wrote:
> > Just looking again, I think the block of code starting:
> >
> > + if (density > 0.0)
> >
> > needs a comment to mention what it's doing. Perhaps:
> >
> > + /*
> > + * Using the average tuples per page, calculate how far into
> > + * the page the itemptr is likely to be and adjust block
> > + * accordingly.
> > + */
> > + if (density > 0.0)
> >
> > Or some better choice of words.  With that done, I think 0001 is good to go.
>
> Ok, I'll look at it and hopefully get a new patch up soon.

Hullo,

Here's a new set of patches.

It includes new versions of the other patches, which needed to be
rebased because of the introduction of the "tableam" API by
c2fe139c20.

I've had to adapt it to use the table scan API.  I've got it compiling
and passing tests, but I'm uneasy about some things that still use the
heapam API.

1. I call heap_setscanlimits as I'm not sure there is a tableam equivalent.
2. I'm not sure whether non-heap tableam implementations can also be
supported by my TID Range Scan: we need to be able to set the scan
limits.  There may not be any other implementations yet, but when
there are, how do we stop the planner using a TID Range Scan for
non-heap relations?
3. When fetching tuples, I see that nodeSeqscan.c uses
table_scan_getnextslot, which saves dealing with HeapTuples.  But
nodeTidrangescan wants to do some checking of the block and offset
before returning the slot.  So I have it using heap_getnext and
ExecStoreBufferHeapTuple.  Apart from being heapam-specific, it's just
not as clean as the new API calls.

Ideally, we can get to to support general tableam implementations
rather than using heapam-specific calls.  Any advice on how to do
this?

Thanks
Edmund

v7-0001-Add-selectivity-estimate-for-CTID-system-variables.patch
Description: Binary data

v7-0002-Support-backward-scans-over-restricted-ranges-in-hea.patch
Description: Binary data

v7-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patch
Description: Binary data

v7-0003-Support-range-quals-in-Tid-Scan.patch
Description: Binary data

Re: Tid scan improvements

2019-03-14 Thread Edmund Horner

On Thu, 14 Mar 2019 at 23:06, David Rowley  wrote:
> On Thu, 14 Mar 2019 at 21:12, Edmund Horner  wrote:
> > I'm not sure how an unreasonable underestimation would occur here.  If
> > you have a table bloated to say 10x its minimal size, the estimator
> > still assumes an even distribution of tuples (I don't think we can do
> > much better than that).  So the selectivity of "ctid >=  > that would exist without bloat>" is still going to be 0.9.
>
> Okay, think you're right there.  I guess the only risk there is just
> varying tuple density per page, and that seems no greater risk than we
> have with the existing stats.

Yeah that is a risk, and will probably come up in practice.  But at
least we're not just picking a hardcoded selectivity any more.

> Just looking again, I think the block of code starting:
>
> + if (density > 0.0)
>
> needs a comment to mention what it's doing. Perhaps:
>
> + /*
> + * Using the average tuples per page, calculate how far into
> + * the page the itemptr is likely to be and adjust block
> + * accordingly.
> + */
> + if (density > 0.0)
>
> Or some better choice of words.  With that done, I think 0001 is good to go.

Ok, I'll look at it and hopefully get a new patch up soon.

Edmund

Re: Tid scan improvements

2019-03-14 Thread Edmund Horner

On Thu, 14 Mar 2019 at 16:46, David Rowley  wrote:
>
> The only possible risk I can foresee is that it may be more likely we
> underestimate the selectivity and that causes something like a nested
> loop join due to the estimation being, say 1 row.
>
> It could happen in a case like:
>
> SELECT * FROM bloated_table WHERE ctid >=  without bloat>
>
> but I don't think we should keep using DEFAULT_INEQ_SEL just in case
> this happens. We could probably fix 90% of those cases by returning 2
> rows instead of 1.

Thanks for looking at the patch David.

I'm not sure how an unreasonable underestimation would occur here.  If
you have a table bloated to say 10x its minimal size, the estimator
still assumes an even distribution of tuples (I don't think we can do
much better than that).  So the selectivity of "ctid >= " is still going to be 0.9.

Edmund

Re: Patch for SortSupport implementation on inet/cdir

2019-02-08 Thread Edmund Horner

On Sat, 9 Feb 2019 at 04:48, Brandur Leach  wrote:
> I've attached a patch that implements SortSupport for the
> inet/cidr types. It has the effect of typically reducing
> the time taken to sort these types by ~50-60% (as measured
> by `SELECT COUNT(DISTINCT ...)` which will carry over to
> common operations like index creation, `ORDER BY`, and
> `DISTINCT`.

Hi Brandur,

I had a look at this.  Your V2 patch applies cleanly, and the code was
straightforward and well commented.  I appreciate the big comment at the
top of network_abbrev_convert explaining how you encode the data.

The tests pass.  I ran a couple of large scale tests myself and didn't find
any problems.  Sorting a million random inets in work_mem = 256MB goes from
roughty 3670ms to 1620ms with the SortSupport, which is pretty impressive.
(But that's in my debug build, so not a serious benchmark.)

An interesting thing about sorting IPv4 inets on 64-bit machines is that
when the inets are the same, the abbreviated comparitor will return 0 which
is taken by the sorting machinery to mean "the datums are the same up to
this point, so you need to call the full comparitor" -- but, in this case,
0 means "the datums truly are the same, no need to call the full
comparitor".  Since the full comparitor assumes its arguments to be the
original (typically pass-by-reference) datums, you can't do it there.
You'd need to add another optional comparitor to be called after the
abbreviated one.  In inet's case on a 64-bit machine, it would look at the
abbreviated datums and if they're both in the IPv4 family, would return 0
(knowing that the abbreviated comparitor has already done the real work).
I have no reason to believe this particular optimisation is worth anything
much, though; it's outside the scope of this patch, besides.

I have some comments on the comments:

network.c:552
* SortSupport conversion routine. Converts original inet/cidr
representations
* to abbreviated keys . The inet/cidr types are pass-by-reference, so is an
* optimization so that sorting routines don't have to pull full values from
* the heap to compare.

Looks like you have an extra space before the "." on line 553.  And
abbreviated keys being an optimisation for pass-by-reference types can be
taken for granted, so I think the last sentence is redundant.

network.c::567
* IPv4 and IPv6 are identical in this makeup, with the difference being that
* IPv4 addresses have a maximum of 32 bits compared to IPv6's 64 bits, so in
* IPv6 each part may be larger.

IPv6's addresses are 128 bit.  I'm not sure sure if "maximum" is accurate,
or whether you should just say "IPv4 addresses have 32 bits".

network.c::571
* inet/cdir types compare using these sorting rules. If inequality is
detected
* at any step, comparison is done. If any rule is a tie, the algorithm drops
* through to the next to break it:

When you say "comparison is done" it sounds like more comparing is going to
be done, but what I think you mean is that comparison is finished.

> [...]

> My benchmarking methodology and script is available here
> [1], and involves gathering statistics for 100
> `count(distinct ...)` queries at various data sizes. I've
> saved the results I got on my machine here [2].

I didn't see any links for [1], [2] and [3] in your email.

Finally, there's a duplicate CF entry:
https://commitfest.postgresql.org/22/1990/ .

Since you're updating https://commitfest.postgresql.org/22/1991/ , I
suggest you mark 1990 as Withdrawn to avoid confusion.  If there's a way to
remove it from the CF list, that would be even better.

Edmund

Re: Tid scan improvements

2019-02-03 Thread Edmund Horner

On Sat, 19 Jan 2019 at 17:04, Edmund Horner  wrote:

> On Sat, 19 Jan 2019 at 05:35, Tom Lane  wrote:
>
>> Edmund Horner  writes:
>> > My patch uses the same path type and executor for all extractable
>> tidquals.
>>
>> > This worked pretty well, but I am finding it difficult to reimplement
>> it in
>> > the new tidpath.c code.
>>
>> I didn't like that approach to begin with, and would suggest that you go
>> over to using a separate path type and executor node.  I don't think the
>> amount of commonality for the two cases is all that large, and doing it
>> as you had it required some ugly ad-hoc conventions about the semantics
>> of the tidquals list.  Where I think this should go is that the tidquals
>> list still has OR semantics in the existing path type, but you use AND
>> semantics in the new path type, so that "ctid > ? AND ctid < ?" is just
>> represented as an implicit-AND list of two simple RestrictInfos.
>>
>
> Thanks for the advice.  This approach resembles my first draft, which had
> a separate executor type.  However, it did have a combined path type, with
> an enum TidPathMethod to determine how tidquals was interpreted.  At this
> point, I think a different path type is clearer, though generation of both
> types can live in tidpath.c (just as indxpath.c generates different index
> path types).
>

Hi, here's a new set of patches.  This one adds a new path type called
TidRangePath and a new execution node called TidRangeScan.  I haven't
included any of the patches for adding pathkeys to TidPaths or
TidRangePaths.

1. v6-0001-Add-selectivity-estimate-for-CTID-system-variables.patch
2. v6-0002-Support-backward-scans-over-restricted-ranges-in-hea.patch
3. v6-0003-Support-range-quals-in-Tid-Scan.patch
4. v6-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patch

Patches 1, 2, and 4 are basically unchanged from my previous post.  Patch 4
is an optional tweak to the CTID selectivity estimates.

Patch 3 is a substantial rewrite from what I had before.  I've checked
David's most recent review and tried to make sure the new code meets his
suggestions where applicable, although there is one spot where I left the
code as "if (tidrangequals) ..." instead of the preferred "if
(tidrangequals != NIL) ...", just for consistency with the surrounding code.

Questions --

1. Tid Range Paths are costed as random_page_cost for the first page, and
sequential page cost for the remaining pages.  It made sense when there
could be multiple non-overlapping ranges.  Now that there's only one range,
it might not, but it has the benefit of making Tid Range Scans a little bit
more expensive than Sequential Scans, so that they are less likely to be
picked when a Seq Scan will do just as well.  Is there a better cost
formula to use?

2. Is it worth trying to get rid of some of the code duplication between
the TidPath and TidRangePath handling, such as in costsize.c or
createplan.c?

3. TidRangeRecheck (copied from TidRecheck) has an existing comment asking
whether it should actually be performing a check on the returned tuple.  It
seems to me that as long as TidRangeNext doesn't return a tuple outside the
requested range, then the check shouldn't be necessary (and we'd simplify
the comment to "nothing to check").  If a range key can change at runtime,
it should never have been included in the TidRangePath.  Is my
understanding correct?

4. I'm a little uncomfortable with the way heapam.c changes the scan limits
("--scan->rs_numblocks") as it progresses through the pages.  I have the
executor node reset the scan limits after scanning all the tuples, which
seems to work for the tests I have, but I'm using the
heap_setscanlimits feature in a slightly different way from the only
existing use, which is for the one-off scans when building a BRIN index.  I
have added some tests for cursor fetches which seems to exercise the code,
but I'd still appreciate close review of how I'm using heapam.

Edmund

v6-0001-Add-selectivity-estimate-for-CTID-system-variables.patch
Description: Binary data

v6-0003-Support-range-quals-in-Tid-Scan.patch
Description: Binary data

v6-0004-TID-selectivity-reduce-the-density-of-the-last-page-.patch
Description: Binary data

v6-0002-Support-backward-scans-over-restricted-ranges-in-hea.patch
Description: Binary data

Use zero for nullness estimates of system attributes

2019-01-24 Thread Edmund Horner

I added some code to selfuncs.c to estimate the selectivity of CTID,
including nullness, in my ongoing attempt to add TID range scans [1].  And
as Tom pointed out [2], no system attribute can be null, so we might as
well handle them all.

That's what the attached patch does.
I observed a few interesting things with outer join selectivity:

While system attributes aren't NULL in the table, they can be in queries
such as:

SELECT *
FROM a LEFT JOIN b ON a.id = b.id
WHERE b.ctid IS NULL;

And the patch does affect the estimates for such plans.  But it's just
replacing one hardcoded nullness (0.005) for another (0.0), which seems no
worse than the original.

I was a bit concerned that with, for example,

CREATE TABLE a (id INTEGER);
INSERT INTO a SELECT * FROM generate_series(1,1000);
ANALYZE a;
CREATE TABLE b (id INTEGER, id2 INTEGER);
INSERT INTO b SELECT *, * FROM generate_series(1,10);
ANALYZE b;

EXPLAIN ANALYZE
SELECT * FROM a LEFT JOIN b ON a.id = b.id
WHERE b.ctid IS NULL;

you get a row estimate of 1 (vs the actual 990).  It's not specific to
system attributes.  Plain left-join selectivity calculation doesn't seem to
take into account the join selectivity, while anti-join calculation does.

I do not think this affects the usefulness of the present patch, but maybe
it's something we could improve.

Finally: I thought about introducing a macro to attnum.h:

/*
* AttrNumberIsForSystemAttr
* True iff the attribute number corresponds to a system attribute.
*/
#define AttrNumberIsForSystemAttr(attributeNumber) \
 ((bool) ((attributeNumber) < 0))

But there's a zillion places that could be changed to use it, so I haven't
in this version of the patch.

Edmund

[1]
https://www.postgresql.org/message-id/flat/31682.1545415852%40sss.pgh.pa.us#bdca5c18ed64f847f44c2645f98ea3a0
[2] https://www.postgresql.org/message-id/31682.1545415852%40sss.pgh.pa.us


v1-nullness-selectivity-for-system-cols.patch
Description: Binary data

Re: Tid scan improvements

2019-01-18 Thread Edmund Horner

On Sat, 19 Jan 2019 at 05:35, Tom Lane  wrote:

> Edmund Horner  writes:
> > My patch uses the same path type and executor for all extractable
> tidquals.
>
> > This worked pretty well, but I am finding it difficult to reimplement it
> in
> > the new tidpath.c code.
>
> I didn't like that approach to begin with, and would suggest that you go
> over to using a separate path type and executor node.  I don't think the
> amount of commonality for the two cases is all that large, and doing it
> as you had it required some ugly ad-hoc conventions about the semantics
> of the tidquals list.  Where I think this should go is that the tidquals
> list still has OR semantics in the existing path type, but you use AND
> semantics in the new path type, so that "ctid > ? AND ctid < ?" is just
> represented as an implicit-AND list of two simple RestrictInfos.
>

Thanks for the advice.  This approach resembles my first draft, which had a
separate executor type.  However, it did have a combined path type, with an
enum TidPathMethod to determine how tidquals was interpreted.  At this
point, I think a different path type is clearer, though generation of both
types can live in tidpath.c (just as indxpath.c generates different index
path types).


> Now admittedly, this wouldn't give us an efficient way to execute
> queries with conditions like "WHERE ctid = X OR (ctid > Y AND ctid < Z)",
> but I find myself quite unable to get excited about supporting that.
> I see no reason for the new code to worry about any cases more complex
> than one or two TID inequalities at top level of the restriction list.
>

I'm a bit sad to see support for multiple ranges go, though I never saw
such queries as ever being particularly common.  (And there was always a
nagging feeling that tidpath.c was beginning to perform feats of boolean
acrobatics out of proportion to its importance.  Perhaps in some distant
future, TID quals will become another way of supplying TIDs to a bitmap
heap scan, which would enable complicated boolean queries using both
indexes and TID scans.  But that's just musing, not a proposal.)

> In the query information given to the path generator, there is no existing
> > RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?".  I
> > am still learning about RestrictInfos, but my understanding is it doesn't
> > make sense to have a RestrictInfo for an AND clause, anyway; you're
> > supposed to have them for the sub-expressions of it.
>
> FWIW, the actual data structure for cases like that is that there's
> a RestrictInfo for the whole clause ctid = X OR (ctid > Y AND ctid < Z),
> and if you look into its "orclause" field, you will find RestrictInfos
> attached to the primitive clauses ctid = X, ctid > Y, ctid < Z.  (The
> old code in tidpath.c didn't know that, because it'd never been rewritten
> since RestrictInfos were invented.)  However, I think this new code should
> not worry about OR cases at all, but just pull out top-level TID
> comparison clauses.
>

Thanks for the explanation.

> And it doesn't seem a good idea to try to create new RestrictInfos in the
> > path generation just to pass the tidquals back to plan creation.
>
> No, you should avoid that.  There are places that assume there's only
> one RestrictInfo for any given original clause (or sub-clause).
>

Re: Tid scan improvements

2019-01-17 Thread Edmund Horner

Hi all,

I am a bit stuck and I think it's best to try to explain where.

I'm still rebasing the patches for the changes Tom made to support
parameterised TID paths for joins.  While the addition of join support
itself does not really touch the same code, the modernisation -- in
particular, returning a list of RestrictInfos rather than raw quals -- does
rewrite quite a bit of tidpath.c.

The original code returned:

List (with OR semantics)
  CTID = ?   or   CTID = ANY (...)   or   IS CURRENT OF
  (more items)

That changed recently to return:

List (with OR semantics)
  RestrictInfo
CTID = ?   or ...
  (more items)

My last set of patches extended the tidqual extraction to pull out lists
(with AND semantics) of range quals of the form CTID < ?, etc.  Each list
of more than one item was converted into an AND clause before being added
to the tidqual list; a single range qual can be added to tidquals as is.

This required looking across multiple RestrictInfos at the top level, for
example:

  - "WHERE ctid > ? AND ctid < ?" would arrive at tidpath as a list of two
RestrictInfos, from which we extract a single tidqual in the form of an AND
clause.
  - "WHERE ctid = ? OR (ctid > ? AND ctid < ?)" arrives as only one
RestrictInfo, but we extract two tidquals (an OpExpr, and an AND clause).

The code could also ignore additional unusable quals from a list of
top-level RestrictInfos, or from a list of quals from an AND clause, for
example:

  - "WHERE foo = ? AND ctid > ? AND ctid < ?" gives us the single tidqual
"ctid > ? AND ctid < ?".
  - "WHERE (ctid = ? AND bar = ?) OR (foo = ? AND ctid > ? AND ctid < ?)"
gives us the two tidquals "ctid = ?" and "ctid > ? AND ctid < ?".

As the extracted tidquals no longer match the original query quals, they
aren't removed from scan_clauses in createplan.c, and hence are correctly
checked by the filter.

Aside: The analogous situation with an indexed user attribute "x" behaves a
bit differently:
  - "WHERE x = ? OR (x > ? AND x < ?)", won't use a regular index scan, but
might use a bitmap index scan.

My patch uses the same path type and executor for all extractable tidquals.

This worked pretty well, but I am finding it difficult to reimplement it in
the new tidpath.c code.

In the query information given to the path generator, there is no existing
RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?".  I
am still learning about RestrictInfos, but my understanding is it doesn't
make sense to have a RestrictInfo for an AND clause, anyway; you're
supposed to have them for the sub-expressions of it.

And it doesn't seem a good idea to try to create new RestrictInfos in the
path generation just to pass the tidquals back to plan creation.  They're
complicated objects.

There's also the generation of scan_clauses in create_tidscan_plan
(createplan.c:3107).  This now uses RestrictInfos -- I'd image we'd need
each AND clause to be wrapped in a RestrictInfo to be able to check it
properly.

To summarise, I'm not sure what kind of structure I should add to the
tidquals list to represent a compound range expression.  Maybe it's better
to create a different path (either a new path type, or a flag in TidPath to
say what kind of quals are attached) ?

Edmund

Re: Tid scan improvements

2019-01-14 Thread Edmund Horner

On Sat, 22 Dec 2018 at 07:10, Tom Lane  wrote:

> BTW, with respect to this bit in 0001:
>
> @@ -1795,6 +1847,15 @@ nulltestsel(PlannerInfo *root, NullTestType
> nulltesttype, Node *arg,
>  return (Selectivity) 0; /* keep compiler quiet */
>  }
>  }
> +else if (vardata.var && IsA(vardata.var, Var) &&
> + ((Var *) vardata.var)->varattno ==
> SelfItemPointerAttributeNumber)
> +{
> +/*
> + * There are no stats for system columns, but we know CTID is
> never
> + * NULL.
> + */
> +selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
> +}
>  else
>  {
>  /*
>
> I'm not entirely sure why you're bothering; surely nulltestsel is
> unrelated to what this patch is about?  And would anybody really
> write "WHERE ctid IS NULL"?
>

I found that it made a difference with selectivity of range comparisons,
because clauselist_selectivity tries to correct for it (clausesel.c:274):

s2 = rqlist->hibound + rqlist->lobound - 1.0

/* Adjust for double-exclusion of NULLs */
s2 += nulltestsel(root, IS_NULL, rqlist->var,
  varRelid, jointype, sjinfo);

It was adding DEFAULT_UNK_SEL = 0.005 to the selectivity, which (while not
major) did make the selectivity less accurate.

However, if we do think it's worth adding code to cover this case,
> I wouldn't make it specific to CTID.  *All* system columns can be
> assumed not null, see heap_getsysattr().
>
I guess we could have a standalone patch to add this for all system columns?

Re: Joins on TID

2018-12-31 Thread Edmund Horner

On Sat, 22 Dec 2018 at 12:34, Tom Lane  wrote:
> I decided to spend an afternoon seeing exactly how much work would be
> needed to support parameterized TID scans, ie nestloop-with-inner-TID-
> scan joins, as has been speculated about before, most recently here:
>
> https://www.postgresql.org/message-id/flat/CAMqTPq%3DhNg0GYFU0X%2BxmuKy8R2ARk1%2BA_uQpS%2BMnf71MYpBKzg%40mail.gmail.com
>
> It turns out it's not that bad, less than 200 net new lines of code
> (all of it in the planner; the executor seems to require no work).
>
> Much of the code churn is because tidpath.c is so ancient and crufty.
> It was mostly ignoring the RestrictInfo infrastructure, in particular
> emitting the list of tidquals as just bare clauses not RestrictInfos.
> I had to change that in order to avoid inefficiencies in some places.

It seems good, and I can see you've committed it now.  (I should have
commented sooner, but it's the big summer holiday period here, which
means I have plenty of time to work on PostgreSQL, but none of my
usual resources.  In any case, I was going to say "this looks useful
and not too complicated, please go ahead".)

I did notice that multiple tidquals are no longer removed from scan_clauses:

EXPLAIN SELECT * FROM pg_class WHERE ctid = '(1,1)' OR ctid = '(2,2)';

 Tid Scan on pg_class  (cost=0.01..8.03 rows=2 width=265)
   TID Cond: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))
   Filter: ((ctid = '(1,1)'::tid) OR (ctid = '(2,2)'::tid))

I guess if we thought it was a big deal we could attempt to recreate
the old logic with RestrictInfos.

> I haven't really looked at how much of a merge problem there'll be
> with Edmund Horner's work for TID range scans.  My feeling about it
> is that we might be best off treating that as a totally separate
> code path, because the requirements are significantly different (for
> instance, a range scan needs AND semantics not OR semantics for the
> list of quals to apply).

Well, I guess it's up to me to merge it.  I can't quite see which
parts we'd use a separate code path for.  Can you elaborate?

Edmund

Re: Tid scan improvements

2018-12-21 Thread Edmund Horner

On Fri, 21 Dec 2018 at 16:31, Tom Lane  wrote:
>
> Edmund Horner  writes:
> > For the forward scan, I seem to recall, from your merge join example,
> > that it's useful to set the pathkeys even when there are no
> > query_pathkeys.  We just have to unconditionally set them so that the
> > larger plan can make use of them.
>
> No.  Look at indxpath.c: it does not worry about pathkeys unless
> has_useful_pathkeys is true, and it definitely does not generate
> pathkeys that don't get past truncate_useless_pathkeys.  Those
> functions are responsible for worrying about whether mergejoin
> can use the pathkeys.  It's not tidpath.c's job to outthink them.

Ok.  I think that will simplify things.  So if I follow you correctly,
we should do:

1. If has_useful_pathkeys is true: generate pathkeys (for CTID ASC),
and use truncate_useless_pathkeys on them.
2. If we have tid quals or pathkeys, emit a TID scan path.

For the (optional) backwards scan support patch, should we separately
emit another path, in the reverse direction?  (My current patch only
creates one path, and tries to decide what the best direction is by
looking at query_pathkeys.  This doesn't fit into the above
algorithm.)

Re: Tid scan improvements

2018-12-20 Thread Edmund Horner

On Fri, 21 Dec 2018 at 13:25, David Rowley  wrote:
> On Fri, 21 Dec 2018 at 13:09, Edmund Horner  wrote:
> > On Fri, 21 Dec 2018 at 11:21, Tom Lane  wrote:
> > > I'm having a hard time wrapping my mind around why you'd bother with
> > > backwards TID scans.  The amount of code needed versus the amount of
> > > usefulness seems like a pretty bad cost/benefit ratio, IMO.  I can
> > > see that there might be value in knowing that a regular scan has
> > > "ORDER BY ctid ASC" pathkeys (mainly, that it might let us mergejoin
> > > on TID without an explicit sort).  It does not, however, follow that
> > > there's any additional value in supporting the DESC case.
> >
> > I have occasionally found myself running "SELECT MAX(ctid) FROM t"
> > when I was curious about why a table is so big after vacuuming.
> >
> > Perhaps that's not a common enough use case to justify the amount of
> > code, especially the changes to heapam.c and explain.c.
> >
> > We'd still need the pathkeys to make good use of forward scans.  (And
> > I think the executor still needs to support seeking backward for
> > cursors.)
>
> I think the best thing to do here is separate out all the additional
> backwards scan code into a separate patch to allow it to be easier
> considered and approved, or rejected. I think if there's any hint of
> this blocking the main patch then it should be a separate patch to
> allow it's worth to be considered independently.

Yeah I think you're right.  I'll separate those parts into the basic
forward scan, and then the optional backward scan support.  I think
we'll still only generate a backward scan if the query_pathkeys makes
use of it.

For the forward scan, I seem to recall, from your merge join example,
that it's useful to set the pathkeys even when there are no
query_pathkeys.  We just have to unconditionally set them so that the
larger plan can make use of them.

> Also, my primary interest in this patch is to find tuples that are
> stopping the heap being truncated during a vacuum. Generally, when I'm
> looking for that I have a good idea of what size I expect the relation
> should be, (otherwise I'd not think it was bloated), in which case I'd
> be doing WHERE ctid >= '(N,1)'. However, it might be easier to write
> some auto-bloat-removal script if we could have an ORDER BY ctid DESC
> LIMIT n.

Re: Tid scan improvements

2018-11-27 Thread Edmund Horner

On Thu, 22 Nov 2018 at 20:41, David Rowley  wrote:
> I've now had a look over the latest patches and I've found a few more
> things.  Many of these are a bit nitpicky, but certainly not all. I
> also reviewed 0004 this time.

Whew!  A lot more things to look at.

I've tried to address most of what you've raised, and attach yet
another set of patches.  There are are few things that I'm not settled
on, discussed below under Big Items.

CC'd Tomas, if he wants to check what I've done with the TidRange
array allocation.

* Big Items *

> 0001:
>
> 1. The row estimates are not quite right.  This cases the row
> estimation to go the wrong way for isgt.
>
> For example, the following gets 24 rows instead of 26.
>
> postgres=# create table t (a int);
> CREATE TABLE
> postgres=# insert into t select generate_Series(1,100);
> INSERT 0 100
> postgres=# analyze t;
> postgres=# explain analyze select * from t where ctid >= '(0,75)';
>  QUERY PLAN
> -
>  Seq Scan on t  (cost=0.00..2.25 rows=24 width=4) (actual
> time=0.046..0.051 rows=26 loops=1)
>Filter: (ctid >= '(0,75)'::tid)
>Rows Removed by Filter: 74
>  Planning Time: 0.065 ms
>  Execution Time: 0.074 ms
> (5 rows)
>
> The < and <= case is not quite right either. < should have 1 fewer
> tuple than the calculated average tuples per page, and <= should have
> the same (assuming no gaps)
>
> I've attached a small delta patch that I think improves things here.

Thanks, I've incorporated your patch.  I think the logic for iseq and
isgt makes sense now.

Since we only have the total number of tuples and the total number of
pages, and no real statistics, this might be the best we can
reasonably do.  There's still a noticable rowcount error for the last
page, and slighter rowcount errors for other pages.  We estimate
density = ntuples/npages for all pages; but in a densely populated
table, we'll average only half the number of tuples in the last page
as earlier pages.

I guess we *could* estimate density = ntuples/(npages - 0.5) for all
but the last page; and half that for the last.  But that adds
complexity, and you'd still only get a good row count when the last
page was about half full.

I implemented this anyway, and it does improve row counts a bit.  I'll
include it in the next patch set and you can take a look.

I also spent some time today agonising over how visiblity would affect
things, but did not come up with anything useful to add to our
formulas.

> 3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
> allocation until it reaches the required size. See how
> MakeSharedInvalidMessagesArray() does it.  Doing it this way ensures
> we always have a power of two sized array which is much nicer if we
> ever reach the palloc() limit as if the array is sized at the palloc()
> limit / 2 + 1, then if we try to double it'll fail.  Of course, it's
> unlikely to be a problem here, but... the question would be how to
> decide on the initial size.

I've tried to change things that way, but we still need to deal with
excessive numbers of items.

I've defined a constant MaxTidRanges = MaxAllocSize/sizeof(TidRange),
and raise an error if the required size exceeds that.

> 4. "at" needs shifted left a couple of words
>
> /*
> * If the lower bound was already or above at the maximum block
> * number, then there is no valid range.
> */
>
> but I don't see how it could be "or above".  The ctid type does not
> have the room for that. Although, that's not to say you should test if
> (block == MaxBlockNumber), the >= seems better for the code. I'm just
> complaining about the comment.

We have to deal with TIDs entered by the user, which can include
invalid ones like (4294967295,0).  MaxBlockNumber is 4294967294.

> 12. expected_comparison_operator is a bit long a name:
>
> IsTidComparison(OpExpr *node, int varno, Oid expected_comparison_operator)
>
> How about just expected_opno?
>
> 14. This is not great:
>
> [horrible macros in tidpath.c]
>
> The 4 macros for >, >=, < and <= are only used by IsTidRangeClause()
> which means IsTidComparison() could get called up to 4 times. Most of
> the work it does would be redundant in that case.  Maybe it's better
> to rethink that?

Yeah.  I've rewritten all this as two functions, IsTidEqualClause and
IsTidRangeClause, which each check the opno, with a helper function
IsTidBinaryExpression that checks everything else.

> 18. I'd expect the following not to produce a sort above the Tid Scan.
>
> postgres=# set enable_seqscan=0;
> SET
> postgres=# explain select * from t inner join t t1 on t.ctid = t1.ctid
> where t.ctid < '(0,10)' ;
>   QUERY PLAN
> ---
>  Merge Join  (cost=108.65..109.28 rows=9 width=8)
>Merge Cond: (t.ctid = t1.ctid)
>

Re: Tid scan improvements

2018-11-23 Thread Edmund Horner

On Sat, 24 Nov 2018 at 15:46, Tomas Vondra  wrote:
> On 11/24/18 1:56 AM, Edmund Horner wrote:
> > On Fri, 23 Nov 2018 at 07:03, Tomas Vondra  
> > wrote:
> >> On 11/22/18 8:41 AM, David Rowley wrote:
> >>> ...
> >>> 3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
> >>> allocation until it reaches the required size. See how
> >>> MakeSharedInvalidMessagesArray() does it.  Doing it this way ensures
> >>> we always have a power of two sized array which is much nicer if we
> >>> ever reach the palloc() limit as if the array is sized at the palloc()
> >>> limit / 2 + 1, then if we try to double it'll fail.  Of course, it's
> >>> unlikely to be a problem here, but... the question would be how to
> >>> decide on the initial size.
> >>
> >> I think it kinda tries to do that in some cases, by doing this:
> >>
> >>  *numAllocRanges *= 2;
> >>  ...
> >>  tidRanges = (TidRange *)
> >>  repalloc(tidRanges,
> >>   *numAllocRanges * sizeof(TidRange));
> >>
> >> The problem here is that what matters is not numAllocRanges being 2^N,
> >> but the number of bytes allocated being 2^N. Because that's what ends up
> >> in AllocSet, which keeps lists of 2^N chunks.
> >>
> >> And as TidRange is 12B, so this is guaranteed to waste memory, because
> >> no matter what the first factor is, the result will never be 2^N.
> >
> > For simplicity, I think making it a strict doubling of capacity each
> > time is fine.  That's what we see in numerous other places in the
> > backend code.
> >
>
> Sure.
>
> > What we don't really see is intentionally setting the initial capacity
> > so that each subsequent capacity is close-to-but-not-exceeding a power
> > of 2 bytes.  You can't really do that optimally if working in terms of
> > whole numbers of items that aren't each a power of 2 size.  This step,
> > there may be 2/3 of an item spare; next step, we'll have a whole item
> > spare that we're not going to use.
>
> Ah, I missed the detail with setting initial size.
>
> > So we could keep track in terms of bytes allocated, and then figure
> > out how many items we can fit at the current time.
> >
> > In my opinion, such complexity is overkill for Tid scans.
> >
> > Currently, we try to pick an initial size based on the number of
> > expressions.  We assume each expression will yield one range, and
> > allow that a saop expression might require us to enlarge the array.
> >
> > Again, for simplicity, we should scrap that and pick something like
> > floor(256/sizeof(TidRange)) = 21 items, with about 1.5% wastage.
> >
>
> Probably. I don't think it'd be a lot of code to do the exact sizing,
> but you're right 1.5% is close enough. As long as there is a comment
> explaining the initial sizing, I'm fine with that.
>
> If I could suggest one more thing, I'd define a struct combining the
> array of ranges, numRanges and numAllocRangeslike:
>
> typedef struct TidRanges
> {
> int numRanges;
> int numAllocRanges;
> TidRangeranges[FLEXIBLE_ARRAY_MEMBER];
> } TidRanges;
>
> and use that instead of the plain array. I find it easier to follow
> compared to passing the various fields directly (sometimes as a value,
> sometimes pointer to the value, etc.).

Ok, I've made rewritten it to use a struct:

typedef struct TidRangeArray {
TidRange *ranges;
int numRanges;
int numAllocated;
}   TidRangeArray;

which is slightly different from the flexible array member version you
suggested.  The TidRangeArray is allocated on the stack in the
function that builds it, and then ranges and numRanges are copied into
the TidScanState before the function returns.

Any particular pros/cons of this versus your approach?  With yours, I
presume we'd have a pointer to TidRanges in TidScanState.

My other concern now is that EnsureTidRangeSpace needs a loop to
double the allocated size.  Most such arrays in the backend only ever
grow by 1, so a single doubling is fine, but the TID scan one can grow
by an arbitrary number with a scalar array op, and it's nice to not
have to check the space for each individual item.  Here's what I've
got.

void
EnsureTidRangeSpace(TidRangeArray *tidRangeArray, int numNewItems)
{
int requiredSpace = tidRangeArray->numRanges + numNewItems;
if (requiredSpace <= tidRangeArray->numAllocated)
return;

/* it's not safe to double the size unless we're less than half MAX_INT */
if (requiredSpace >= INT_MAX / 2)
tidRangeArray->numAllo

Re: Tid scan improvements

2018-11-23 Thread Edmund Horner

On Fri, 23 Nov 2018 at 07:03, Tomas Vondra  wrote:
> On 11/22/18 8:41 AM, David Rowley wrote:
> > ...
> > 3. I'd rather see EnsureTidRangeSpace() keep doubling the size of the
> > allocation until it reaches the required size. See how
> > MakeSharedInvalidMessagesArray() does it.  Doing it this way ensures
> > we always have a power of two sized array which is much nicer if we
> > ever reach the palloc() limit as if the array is sized at the palloc()
> > limit / 2 + 1, then if we try to double it'll fail.  Of course, it's
> > unlikely to be a problem here, but... the question would be how to
> > decide on the initial size.
>
> I think it kinda tries to do that in some cases, by doing this:
>
>  *numAllocRanges *= 2;
>  ...
>  tidRanges = (TidRange *)
>  repalloc(tidRanges,
>   *numAllocRanges * sizeof(TidRange));
>
> The problem here is that what matters is not numAllocRanges being 2^N,
> but the number of bytes allocated being 2^N. Because that's what ends up
> in AllocSet, which keeps lists of 2^N chunks.
>
> And as TidRange is 12B, so this is guaranteed to waste memory, because
> no matter what the first factor is, the result will never be 2^N.

For simplicity, I think making it a strict doubling of capacity each
time is fine.  That's what we see in numerous other places in the
backend code.

What we don't really see is intentionally setting the initial capacity
so that each subsequent capacity is close-to-but-not-exceeding a power
of 2 bytes.  You can't really do that optimally if working in terms of
whole numbers of items that aren't each a power of 2 size.  This step,
there may be 2/3 of an item spare; next step, we'll have a whole item
spare that we're not going to use.  So we could keep track in terms of
bytes allocated, and then figure out how many items we can fit at the
current time.

In my opinion, such complexity is overkill for Tid scans.

Currently, we try to pick an initial size based on the number of
expressions.  We assume each expression will yield one range, and
allow that a saop expression might require us to enlarge the array.

Again, for simplicity, we should scrap that and pick something like
floor(256/sizeof(TidRange)) = 21 items, with about 1.5% wastage.

Re: Tid scan improvements

2018-11-11 Thread Edmund Horner

Hi, here's the new patch(s).

Mostly the same, but trying to address your comments from earlier as
well as clean up a few other things I noticed.

Cheers,
Edmund

On Fri, 9 Nov 2018 at 15:01, Edmund Horner  wrote:
>
> On Tue, 6 Nov 2018 at 16:40, David Rowley  
> wrote:
> > I've been looking over 0001 to 0003. I ran out of steam before 0004.
>
> Hi David, thanks for another big review with lots of improvements.
>
> > I like the design of the new patch. From what I threw so far at the
> > selectivity estimation code, it seems pretty good.  I also quite like
> > the design in nodeTidscan.c for range scans.
>
>
> > I didn't quite manage to wrap my head around the code that removes
> > redundant quals from the tidquals. For example, with:
> >
> > postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
> > QUERY PLAN
> > --
> >  Tid Scan on t1  (cost=0.00..3.19 rows=1 width=4)
> >TID Cond: (ctid <= '(0,10)'::tid)
> >Filter: (a = 0)
> > (3 rows)
> >
> > and:
> >
> > postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
> > and ctid >= '(0,0)';
> >   QUERY PLAN
> > --
> >  Tid Scan on t1  (cost=0.01..176.18 rows=12 width=4)
> >TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
> >Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= 
> > '(0,0)'::tid)))
> > (3 rows)
> >
> > I understand why the 2nd query didn't remove the ctid quals from the
> > filter, and I understand why the first query could. I just didn't
> > manage to convince myself that the code behaves correctly for all
> > cases.
>
> I agree it's not obvious.
>
> 1. We extract a set of tidquals that can be directly implemented by
> the Tid scan.  This set is of the form:  "(CTID op ? AND ...) OR
> (...)" (with some limitations).
> 2. If they happened to come verbatim from the original RestrictInfos,
> then they will be found in scan_clauses, and we can remove them.
> 3. If they're not verbatim, i.e. the original RestrictInfos have
> additional criteria that the Tid scan can't use, then tidquals won't
> match anything in scan_clauses, and hence scan_clauses will be
> unchanged.
> 4. We do a bit of extra work for the common and useful case of "(CTID
> op ? AND ...)".  Since the top-level operation of the input quals is
> an AND, it will typically be split into multiple RestrictInfo items.
> We remove each part from scan_clauses.
>
> > 1. I see a few instances of:
> >
> > #define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
> > #define ItemPointerGetDatum(X) PointerGetDatum(X)
> >
> > in both tid.c and ginfuncs.c, and I see you have:
> >
> > + itemptr = (ItemPointer) DatumGetPointer(constval);
> >
> > Do you think it would be worth moving the macros out of tid.c and
> > ginfuncs.c into postgres.h and use that macro instead?
> >
> > (I see the code in this file already did this, so it might not matter
> > about this)
>
> I'm not sure about this one - - I think it's better as a separate
> patch, since we'd also change ginfuncs.c.  I have left it alone for
> now.
>
> > 2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
> > can just return MakeTidRangeQuals(found_quals); or return NIL.
>
> Yup, gone.
>
> > 3. Can you explain why this only needs to take place when list_length() == 
> > 1?
> >
> > /*
> > * In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
> > * the various parts will have come from different RestrictInfos.  So
> > * remove each part separately.
> > */
> > ...
>
> I've tried to improve the comment.
>
> > 4. Accidental change?
> >
> > - tidquals);
> > + tidquals
> > + );
> >
> > 5. Shouldn't this comment get changed?
> >
> > - * NumTidsnumber of tids in this scan
> > + * NumRangesnumber of tids in this scan
> >
> > 6. There's no longer a field named NumTids
> >
> > - * TidListevaluated item pointers (array of size NumTids)
> > + * TidRangesevaluated item pointers (array of size NumTids)
> >
> > 7. The following field is not documented in TidScanState:
> >
> > + bool tss_inScan;
> >
> > 8. Can you name this exprtype instead?
> >
> > + TidExprType type; /* type of op */
> >
> > "type" is used by No

Re: Tid scan improvements

2018-11-08 Thread Edmund Horner

On Tue, 6 Nov 2018 at 16:40, David Rowley  wrote:
> I've been looking over 0001 to 0003. I ran out of steam before 0004.

Hi David, thanks for another big review with lots of improvements.

> I like the design of the new patch. From what I threw so far at the
> selectivity estimation code, it seems pretty good.  I also quite like
> the design in nodeTidscan.c for range scans.


> I didn't quite manage to wrap my head around the code that removes
> redundant quals from the tidquals. For example, with:
>
> postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
> QUERY PLAN
> --
>  Tid Scan on t1  (cost=0.00..3.19 rows=1 width=4)
>TID Cond: (ctid <= '(0,10)'::tid)
>Filter: (a = 0)
> (3 rows)
>
> and:
>
> postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
> and ctid >= '(0,0)';
>   QUERY PLAN
> --
>  Tid Scan on t1  (cost=0.01..176.18 rows=12 width=4)
>TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
>Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= '(0,0)'::tid)))
> (3 rows)
>
> I understand why the 2nd query didn't remove the ctid quals from the
> filter, and I understand why the first query could. I just didn't
> manage to convince myself that the code behaves correctly for all
> cases.

I agree it's not obvious.

1. We extract a set of tidquals that can be directly implemented by
the Tid scan.  This set is of the form:  "(CTID op ? AND ...) OR
(...)" (with some limitations).
2. If they happened to come verbatim from the original RestrictInfos,
then they will be found in scan_clauses, and we can remove them.
3. If they're not verbatim, i.e. the original RestrictInfos have
additional criteria that the Tid scan can't use, then tidquals won't
match anything in scan_clauses, and hence scan_clauses will be
unchanged.
4. We do a bit of extra work for the common and useful case of "(CTID
op ? AND ...)".  Since the top-level operation of the input quals is
an AND, it will typically be split into multiple RestrictInfo items.
We remove each part from scan_clauses.

> 1. I see a few instances of:
>
> #define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
> #define ItemPointerGetDatum(X) PointerGetDatum(X)
>
> in both tid.c and ginfuncs.c, and I see you have:
>
> + itemptr = (ItemPointer) DatumGetPointer(constval);
>
> Do you think it would be worth moving the macros out of tid.c and
> ginfuncs.c into postgres.h and use that macro instead?
>
> (I see the code in this file already did this, so it might not matter
> about this)

I'm not sure about this one - - I think it's better as a separate
patch, since we'd also change ginfuncs.c.  I have left it alone for
now.

> 2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
> can just return MakeTidRangeQuals(found_quals); or return NIL.

Yup, gone.

> 3. Can you explain why this only needs to take place when list_length() == 1?
>
> /*
> * In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
> * the various parts will have come from different RestrictInfos.  So
> * remove each part separately.
> */
> ...

I've tried to improve the comment.

> 4. Accidental change?
>
> - tidquals);
> + tidquals
> + );
>
> 5. Shouldn't this comment get changed?
>
> - * NumTidsnumber of tids in this scan
> + * NumRangesnumber of tids in this scan
>
> 6. There's no longer a field named NumTids
>
> - * TidListevaluated item pointers (array of size NumTids)
> + * TidRangesevaluated item pointers (array of size NumTids)
>
> 7. The following field is not documented in TidScanState:
>
> + bool tss_inScan;
>
> 8. Can you name this exprtype instead?
>
> + TidExprType type; /* type of op */
>
> "type" is used by Node types to indicate their type.

Yup, yup, yup, yup, yup.

> 9. It would be neater this:
>
> if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
> tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
> else if (expr->opno == TIDGreaterOperator || expr->opno == 
> TIDGreaterEqOperator)
> tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
> else
> tidopexpr->type = TIDEXPR_EQ;
>
> tidopexpr->exprstate = exprstate;
>
> tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno
> == TIDGreaterEqOperator;
>
> as a switch: ...

Yup, I think the switch is a bit nicer.

> 10. I don't quite understand this comment:
>
> + * Create an ExprState corresponding to the value part of a TID comparison,
> + * and wrap it in a TidOpExpr.  Set the type and inclusivity of the TidOpExpr
> + * appropriately, depending on the operator and position of the its 
> arguments.
>
> I don't quite see how the code sets the inclusivity depending on the
> position of the arguments.
>
> Maybe the comment should be:
>
> + * For the given 'expr' build and return

Re: Cache relation sizes?

2018-11-08 Thread Edmund Horner

On Wed, 7 Nov 2018 at 11:41, Thomas Munro  wrote:
>
> Hello,
>
> PostgreSQL likes to probe the size of relations with lseek(SEEK_END) a
> lot.  For example, a fully prewarmed pgbench -S transaction consists
> of recvfrom(), lseek(SEEK_END), lseek(SEEK_END), sendto().  I think
> lseek() is probably about as cheap as a syscall can be so I doubt it
> really costs us much, but it's still a context switch and it stands
> out when tracing syscalls, especially now that all the lseek(SEEK_SET)
> calls are gone (commit c24dcd0cfd).
>
> If we had a different kind of buffer mapping system of the kind that
> Andres has described, there might be a place in shared memory that
> could track the size of the relation.  Even if we could do that, I
> wonder if it would still be better to do a kind of per-backend
> lock-free caching, like this:

On behalf of those looking after databases running over NFS (sigh!), I
think this is definitely worth exploring.  Looking at the behaviour of
my (9.4.9) server, there's an lseek(SEEK_END) for every relation
(table or index) used by a query, which is a lot of them for a heavily
partitioned database.  The lseek counts seem to be the same with
native partitions and 10.4.

As an incredibly rough benchmark, a "SELECT * FROM t ORDER BY pk LIMIT
0" on a table with 600 partitions, which builds a
MergeAppend/IndexScan plan, invokes lseek around 1200 times, and takes
600ms to return when repeated.  (It's much slower the first time,
because the backend has to open the files, and read index blocks.  I
found that increasing max_files_per_process above the number of
tables/indexes in the query made a huge difference, too!)  Testing
separately, 1200 lseeks on that NFS mount takes around 400ms.

I'm aware of other improvements since 9.4.9 that would likely improve
things (the pread/pwrite change; possibly using native partitioning
instead of inheritance), but I imagine reducing lseeks would too.

(Of course, an even better improvement is to not put your data
directory on an NFS mount (sigh).)

Re: Tid scan improvements

2018-11-07 Thread Edmund Horner

On Tue, 6 Nov 2018 at 16:52, Alvaro Herrera  wrote:
> On 2018-Nov-06, David Rowley wrote:
> > 14. we pass 'false' to what?
> >
> > + * save the tuple and the buffer returned to us by the access methods in
> > + * our scan tuple slot and return the slot.  Note: we pass 'false' because
> > + * tuples returned by heap_getnext() are pointers onto disk pages and were
> > + * not created with palloc() and so should not be pfree()'d.  Note also
> > + * that ExecStoreHeapTuple will increment the refcount of the buffer; the
> > + * refcount will not be dropped until the tuple table slot is cleared.
> >   */
>
> Seems a mistake stemming from 29c94e03c7d0 ...

Yep -- I copied that bit from nodeSeqscan.c.  Some of the notes were
removed in that change, but nodeSeqscan.c and nodeIndexscan.c still
have them.

I made a little patch to remove them.


remove-obsolete-ExecStoreTuple-notes.patch
Description: Binary data

Re: Tid scan improvements

2018-11-03 Thread Edmund Horner

Hi all,

I have managed to split my changes into 4 patches:

v3-0001-Add-selectivity-and-nullness-estimates-for-the-ItemP.patch
v3-0002-Support-range-quals-in-Tid-Scan.patch
v3-0003-Support-backward-scans-over-restricted-ranges-in-hea.patch
v3-0004-Tid-Scan-results-are-ordered.patch

(1) is basically independent, and usefully improves estimates for ctid quals.
(2) is the main patch, adding basic range scan support to TidPath and TidScan.
(3) is a small change to properly support backward scans over a
restricted range in heapam.c, and is needed for (4).
(4) adds Backward Tid Scans, and adds path keys to Tid Paths so that
the planner doesn't have to add a sort for certain queries.

I have tried to apply David's suggestions.

In (1), I've included the offset part of a CTID constant in the
selectivity calculation.  I've not included "allvisfrac" in the
calculation; I'm not sure it's worth it as it would only affect the
offset part.
I have tried to use iseq to differentiate between <=,>= versus <,>,
but I'm not sure I've got this right.  I am also not entirely sure
it's worth it; the changes are already an improvement over the current
behaviour of using hardcoded selectivity constants.

In (2), the planner now picks up a greater variety of TID quals,
including AND-clauses with arbitrary children instead of the original
lower bound/upper bound pair.  These are resolved in the executor into
a list of ranges to scan.

(3) is the same code, but I've added a couple of comments to explain the change.

(4) is basically the same pathkey/direction code as before (but as a
separate patch).

I hope the separation will make it easier to review.  Item (2) is
still quite big, but a third of it is tests.

Cheers.
Edmund


v3-0002-Support-range-quals-in-Tid-Scan.patch
Description: Binary data


v3-0001-Add-selectivity-and-nullness-estimates-for-the-ItemP.patch
Description: Binary data


v3-0003-Support-backward-scans-over-restricted-ranges-in-hea.patch
Description: Binary data


v3-0004-Tid-Scan-results-are-ordered.patch
Description: Binary data

Re: Calculate total_table_pages after set_base_rel_sizes()

2018-10-05 Thread Edmund Horner

David Rowley said:
> I believe that we should be delaying the PlannerInfo's
> total_table_pages calculation until after constraint exclusion and
> partition pruning have taken place. Doing this calculation before we
> determine which relations we don't need to scan can lead to
> incorrectly applying random_page_cost to too many pages processed
> during an Index Scan.
>
> We already don't count relations removed by join removals from this
> calculation, so counting pruned partitions seems like an omission.
>
> The attached patch moves the calculation to after set_base_rel_sizes()
> is called and before set_base_rel_pathlists() is called, where the
> information is actually used.
>
> I am considering this a bug fix, but I'm proposing this for PG12 only
> as I don't think destabilising plans in the back branches is a good
> idea. I'll add this to the September commitfest.

I played with the new patched today with a set of large partitions.
It seems to work, though the effect is subtle.  The expected behaviour
of index_pages_fetched is a bit hard to fathom when the cache share
pushes it between cases A,B and C of the formula, but that's not
really down to this patch.

Basically I think it's ready for a committer to look at.  Should I
update the CF entry?

Edmund

Re: Tid scan improvements

2018-10-04 Thread Edmund Horner

On Wed, 3 Oct 2018 at 17:36, David Rowley  wrote:
> I know commit fest is over, but I made a pass of this to hopefully
> provide a bit of guidance so that it's closer for the November 'fest.

Hi David.  Thanks for the review.  It's fairly thorough and you must
have put some time into it -- I really appreciate it.

> I've only done light testing on the patch and it does seem to work,
> but there are a few things that I think should be changed. Most
> importantly #11 below I think needs to be done. That might overwrite
> some of the items that come before it in the list as you likely will
> have to pull some of code which I mention out out due to changing #11.
> I've kept them around anyway just in case some of it remains.

> 1. Could wrap for tables > 16TB. Please use double. See index_pages_fetched()
> 2. Should multiply by nseqpages, not add.
> 3. Should be double.

I agree with these three.

> 4. Comment needs updated to mention what the new code does in
> heapgettup() and heapgettup_pagemode()
>
> +
>   /* start from last page of the scan */
> - if (scan->rs_startblock > 0)
> - page = scan->rs_startblock - 1;
> + if (scan->rs_numblocks == InvalidBlockNumber)
> + {
> + if (scan->rs_startblock > 0)
> + page = scan->rs_startblock - 1;
> + else
> + page = scan->rs_nblocks - 1;
> + }
>   else
> - page = scan->rs_nblocks - 1;
> + page = scan->rs_startblock + scan->rs_numblocks - 1;
> +

I'm thinking that, as they don't depend on the others, the heapam.c
changes should be a separate preparatory patch?

The heap scan code has to support things like synchonised scans and
parallel scans, but as far as I know, its support for scanning
subranges is currently used only for building BRIN indexes.  I found
that although I could specify a subrange with heap_setscanlimits, I
could not scan backward over it, because the original version of the
above code would start at the end of the whole table.

I'm not especially comfortable with this understanding of heapam, so
close review would be appreciated.

I note that there's a lot of common code in heapgettup and
heapgettup_pagemode, which my changes add to.  It might be worth
trying to factor out somehow.

> 5. Variables should be called "inclusive". We use "strict" to indicate
> an operator comparison cannot match NULL values.
>
> + bool strict; /* Indicates < rather than <=, or > rather */
> + bool strict2; /* than >= */
>
> Don't break the comment like that. If you need more space don't end
> the comment and use a new line and tab the next line out to match the
> * of the first line.

> 8. Many instances of the word "strict" are used to mean "inclusive".
> Can you please change all of them.

I don't mind renaming it.  I took "strict" from "strictly greater/less
than" but I knew it was confusable with the other usages of "strict".

> 9. Confusing comment:
>
> + * If the expression was non-strict (<=) and the offset is 0, then just
> + * pretend it was strict, because offset 0 doesn't exist and we may as
> + * well exclude that block.
>
> Shouldn't this be, "If the operator is non-inclusive, then since TID
> offsets are 1-based, for simplicity, we can just class the expression
> as inclusive.", or something along those lines.

Ok, I'll try to reword it along those lines.

> I think I'm going to stop here as changing this going to cause quite a
> bit of churn.
>
> but one more...

> 12. I think the changes to selfuncs.c to get the selectivity estimate
> is a fairly credible idea, but I think it also needs to account for
> offsets. You should be able to work out the average number of items
> per page with rel->tuples / rel->pages and factor that in to get a
> better estimate for cases like:
>
> postgres=# explain analyze select ctid,* from t1 where ctid <= '(0,200)';
>   QUERY PLAN
> ---
>  Tid Scan on t1  (cost=0.00..5.00 rows=1 width=10) (actual
> time=0.025..0.065 rows=200 loops=1)
>TID Cond: (ctid <= '(0,200)'::tid)
>  Planning Time: 0.081 ms
>  Execution Time: 0.088 ms
> (4 rows)
>
> You can likely add on "(offset / avg_tuples_per_page) / rel->pages" to
> the selectivity and get a fairly accurate estimate... at least when
> there are no dead tuples in the heap

I think the changes to selfuncs.c could also be a separate patch?

I'll try to include the offset in the selectivity too.

Related -- what should the selectivity be on an empty table?  My code has:

/* If the relation's empty, we're going to read all of it. */
if (vardata->rel->pages == 0)
return 1.0;

(which needs rewording, since selectivity isn't always about reading).
Is 1.0 the right thing to return?

> 6. Why not pass the TidExpr into MakeTidOpExprState() and have it set
> the type instead of repeating code
> 7. It's not very obvious why the following Assert() can't fail. [...]
> I had to hunt around quite a bit to see that
>

Re: Tid scan improvements

2018-09-28 Thread Edmund Horner

On Fri, 28 Sep 2018 at 17:02, Edmund Horner  wrote:
> I did run pgindent over it though. :)

But I didn't check if it still applied to master.  Sigh.  Here's one that does.
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3395445..e89343f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -575,11 +575,18 @@ heapgettup(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+if (scan->rs_startblock > 0)
+	page = scan->rs_startblock - 1;
+else
+	page = scan->rs_nblocks - 1;
+			}
 			else
-page = scan->rs_nblocks - 1;
+page = scan->rs_startblock + scan->rs_numblocks - 1;
+
 			heapgetpage(scan, page);
 		}
 		else
@@ -876,11 +883,18 @@ heapgettup_pagemode(HeapScanDesc scan,
 			 * forward scanners.
 			 */
 			scan->rs_syncscan = false;
+
 			/* start from last page of the scan */
-			if (scan->rs_startblock > 0)
-page = scan->rs_startblock - 1;
+			if (scan->rs_numblocks == InvalidBlockNumber)
+			{
+if (scan->rs_startblock > 0)
+	page = scan->rs_startblock - 1;
+else
+	page = scan->rs_nblocks - 1;
+			}
 			else
-page = scan->rs_nblocks - 1;
+page = scan->rs_startblock + scan->rs_numblocks - 1;
+
 			heapgetpage(scan, page);
 		}
 		else
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ed6afe7..aed7016 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -111,6 +111,7 @@ static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage);
+static void show_scan_direction(ExplainState *es, ScanDirection direction);
 static void ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 		ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
@@ -1245,7 +1246,6 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_SeqScan:
 		case T_SampleScan:
 		case T_BitmapHeapScan:
-		case T_TidScan:
 		case T_SubqueryScan:
 		case T_FunctionScan:
 		case T_TableFuncScan:
@@ -1254,6 +1254,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_WorkTableScan:
 			ExplainScanTarget((Scan *) plan, es);
 			break;
+		case T_TidScan:
+			show_scan_direction(es, ((TidScan *) plan)->direction);
+			ExplainScanTarget((Scan *) plan, es);
+			break;
 		case T_ForeignScan:
 		case T_CustomScan:
 			if (((Scan *) plan)->scanrelid > 0)
@@ -2867,25 +2871,21 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
 }
 
 /*
- * Add some additional details about an IndexScan or IndexOnlyScan
+ * Show the direction of a scan.
  */
 static void
-ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
-		ExplainState *es)
+show_scan_direction(ExplainState *es, ScanDirection direction)
 {
-	const char *indexname = explain_get_index_name(indexid);
-
 	if (es->format == EXPLAIN_FORMAT_TEXT)
 	{
-		if (ScanDirectionIsBackward(indexorderdir))
+		if (ScanDirectionIsBackward(direction))
 			appendStringInfoString(es->str, " Backward");
-		appendStringInfo(es->str, " using %s", indexname);
 	}
 	else
 	{
 		const char *scandir;
 
-		switch (indexorderdir)
+		switch (direction)
 		{
 			case BackwardScanDirection:
 scandir = "Backward";
@@ -2901,8 +2901,24 @@ ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
 break;
 		}
 		ExplainPropertyText("Scan Direction", scandir, es);
+	}
+}
+
+/*
+ * Add some additional details about an IndexScan or IndexOnlyScan
+ */
+static void
+ExplainIndexScanDetails(Oid indexid, ScanDirection indexorderdir,
+		ExplainState *es)
+{
+	const char *indexname = explain_get_index_name(indexid);
+
+	show_scan_direction(es, indexorderdir);
+
+	if (es->format == EXPLAIN_FORMAT_TEXT)
+		appendStringInfo(es->str, " using %s", indexname);
+	else
 		ExplainPropertyText("Index Name", indexname, es);
-	}
 }
 
 /*
diff --git a/src/backend/executor/nodeTidscan.c b/src/backend/executor/nodeTidscan.c
index 0cb1946..9b455d8 100644
--- a/src/backend/executor/nodeTidscan.c
+++ b/src/backend/executor/nodeTidscan.c
@@ -22,7 +22,9 @@
  */
 #include "postgres.h"
 
+#include "access/relscan.h"
 #include "access/sysattr.h"
+#include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeTidscan.h"
@@ -39,21 +41,78 @@
 	 ((Var *) (node))->varattno == SelfIt

Re: Tid scan improvements

2018-09-27 Thread Edmund Horner

On Wed, 19 Sep 2018 at 18:56, David Rowley  wrote:
>
> On 19 September 2018 at 18:04, Edmund Horner  wrote:
> > I have been generally following this approach (handling more kinds of
> > TID comparisons), and have found myself doing things like pairing up >
> > with <, estimating how much of a table is covered by some set of >, <,
> > or "> AND <" quals, etc.  Things that I'm sure are handled in an
> > advanced way by index paths; unfortunately I didn't see any easily
> > reusable code in the index path code.  So I've ended up writing
> > special-case code for TID scans.  Hopefully it will be worth it.
>
> I don't think it would need to be as complex as the index matching
> code. Just looping over the quals and gathering up all compatible ctid
> quals should be fine.  I imagine the complex handling of sorting the
> quals by ctid and removal of redundant quals that are covered by some
> range would be done in the executor.

I've got the path creation and execution pretty much working, though
with some inefficiencies:
  - Each individual TID is treated as a range of size 1 (but CURRENT
OF is handled as a single fetch)
  - Range scans have to scan whole blocks, and skip over the tuples
that are out of range.
But it's enough to get the tests passing.

Right now I'm looking at costing:

> Probably the costing will get more complex. At the moment it seems we
> add a random_page_cost per ctid, but you'd probably need to make that
> better and loop over the quals in each implicitly ANDed set and find
> the max ctid for the > / >= quals and the the min < / <= ctid, then
> get the page number from each and assume max - min seq_page_cost, then
> add random_page_cost for any remaining equality quals.  The costs from
> other OR branches can likely just be added on.  This would double
> count if someone did WHERE ctid BETWEEN '(0,0') AND '(100,300)' OR
> ctid BETWEEN '(0,0') AND '(100,300)';  The current code seems to
> double count now for duplicate ctids anyway. It even double counts if
> the ctid being compared to is on the same page as another ctid, so I
> don't think that would be unacceptable.

There are two stages of costing:
1. Estimating the number of rows that the relation will return.  This
happens before path generation.
2. Estimating the cost of the path.

In the existing code, (1) goes through the normal clausesel.c
machinery, eventually getting to the restriction function defined in
pg_operator.  For range quals, e.g. >, it looks for a stats entry for
the variable, but since it's a system variable with no stats, it
returns DEFAULT_INEQ_SEL (in function scalarineqsel).  For equality
quals, it does have some special-case code (in function
get_variable_numdistinct) to use stadistinct=-1 for the CTID variable,
resulting in a selectivity estimate of 1/ntuples.

(2), on the other hand, has special-case code in costsize.c (function
cost_tidscan), which estimates each TID as being a separate tuple
fetch from a different page.  (The existing code only has to support
=, IN, and CURRENT OF as quals for a TID path.)

In my work, I have been adding support for range quals to (2), which
includes estimating the selectivity of expressions like (CTID > a AND
CTID < b).  I got tired of handling all the various ways of ordering
the quals, so I thought I would try re-using the clausesel.c
machinery.  In selfuncs.c, I've added special case code for
scalarineqsel and nulltestsel to handle CTID variables.  (This also
improves the row count estimates.)

I'm not 100% sure what the costs of each range should be.  I think the
first block should incur random_page_cost, with subsequent blocks
being seq_page_cost.  Simple "CTID = ?" quals are still estimated as 1
tuple + 1 random block.

Have a look at the attached WIP if you like and tell me if you think
it's going in the right direction.  I'm sorry for the size of the
patch; I couldn't find a nice way to cut it up.  I did run pgindent
over it though. :)

Cheers,
Edmund

tid_scan_improvements-v1.patch
Description: Binary data

Re: Calculate total_table_pages after set_base_rel_sizes()

2018-09-24 Thread Edmund Horner

David Rowley said:
> I believe that we should be delaying the PlannerInfo's
> total_table_pages calculation until after constraint exclusion and
> partition pruning have taken place. Doing this calculation before we
> determine which relations we don't need to scan can lead to
> incorrectly applying random_page_cost to too many pages processed
> during an Index Scan.
> 
> We already don't count relations removed by join removals from this
> calculation, so counting pruned partitions seems like an omission.
> 
> The attached patch moves the calculation to after set_base_rel_sizes()
> is called and before set_base_rel_pathlists() is called, where the
> information is actually used.
> 
> I am considering this a bug fix, but I'm proposing this for PG12 only
> as I don't think destabilising plans in the back branches is a good
> idea. I'll add this to the September commitfest.

Hi David, I had a quick look at this.  (I haven't tested it so this isn't a 
full review.)

It looks like a fairly straightforward code move.  And I think it's correct to 
exclude the pages from partitions that won't be read.

I have a small tweak.  In make_one_rel, we currently have:

/*
 * Compute size estimates and consider_parallel flags for each base rel,
 * then generate access paths.
 */
set_base_rel_sizes(root);
set_base_rel_pathlists(root);

Your patch inserts code between the two lines.  I think the comment should be 
split too.

/* Compute size estimates and consider_parallel flags for each base rel. */
set_base_rel_sizes(root);

// NEW CODE

/* Generate access paths. */
set_base_rel_pathlists(root);

Cheers,
Edmund

Re: PATCH: psql tab completion for SELECT

2018-09-22 Thread Edmund Horner

Hi all,

Here are some rebased versions of the last two patches.  No changes in
functionality, except a minor case sensitivity fix in the "completion
after commas" patch.

Edmund


01-psql-select-tab-completion-v8.patch
Description: Binary data


02-select-completion-after-commas.patch
Description: Binary data

Re: Tid scan improvements

2018-09-19 Thread Edmund Horner

On Mon, 17 Sep 2018 at 23:21, David Rowley  wrote:
> On 15 August 2018 at 11:11, Edmund Horner  wrote:
>> So we'd extend that to:
>>   - Include in the OR-list "range" subquals of the form (ctid > ? AND
>> ctid < ?) (either side could be optional, and we have to deal with >=
>> and <= and having ctid on the rhs, etc.).
>>   - Cost the range subquals by assuming they don't overlap, and
>> estimating how many blocks and tuples they span.
>>   - When beginning the scan, evaluate all the ?s and build an array of
>> "tid ranges" to fetch.  A tid range is a struct with a starting tid,
>> and an ending tid, and might just be a single tid item.
>>   - Sort and remove duplicates.
>>   - Iterate over the array, using a single fetch for single-item tid
>> ranges, and starting/ending a heap scan for multi-item tid ranges.
>>
>> I think I'll try implementing this.
>
>
> I've set this patch as waiting on author in the commitfest app.

Thanks David.

Between work I have found time here and there to work on it, but
making a path type that handles all the above turns out to be
surprisingly harder than my tid range scan.

In the earlier discussion from 2012, Tom Lane said:

> Bruce Momjian  writes:
> > On Wed, Jun 13, 2012 at 03:21:17PM -0500, Merlin Moncure wrote:
> >> IMNSHO, it's a no-brainer for the todo (but I think it's more
> >> complicated than adding some comparisons -- which are working now):
>
> > I see. Seems we have to add index smarts to those comparisons. That
> > might be complicated.
>
> Uh, the whole point of a TID scan is to *not* need an index.
>
> What would be needed is for tidpath.c to let through more kinds of TID
> comparison quals than it does now, and then for nodeTidscan.c to know
> what to do with them. The latter logic might well look something like
> btree indexscan qual preparation, but it wouldn't be the same code.

I have been generally following this approach (handling more kinds of
TID comparisons), and have found myself doing things like pairing up >
with <, estimating how much of a table is covered by some set of >, <,
or "> AND <" quals, etc.  Things that I'm sure are handled in an
advanced way by index paths; unfortunately I didn't see any easily
reusable code in the index path code.  So I've ended up writing
special-case code for TID scans.  Hopefully it will be worth it.

Edmund

Re: Tid scan improvements

2018-08-14 Thread Edmund Horner

On 12 August 2018 at 20:07, David Rowley  wrote:
>> Since range scan execution is rather different from the existing
>> TidScan execution, I ended up making a new plan type, TidRangeScan.
>> There is still only one TidPath, but it has an additional member that
>> describes which method to use.
>
> I always thought that this would be implemented by overloading
> TidScan.  I thought that TidListEval() could be modified to remove
> duplicates accounting for range scans. For example:
>
> SELECT * FROM t WHERE ctid BETWEEN '(0,1)' AND (0,10') OR ctid
> IN('(0,5)','(0,30)');
>
> would first sort all the tids along with their operator and then make
> a pass over the sorted array to remove any equality ctids that are
> redundant because they're covered in a range.

Initially, I figured that 99% of the time, the user either wants to
filter by a specific set of ctids (such as those returned by a
subquery), or wants to process a table in blocks.  Picking up an
OR-set of ctid-conditions and determining which parts should be picked
up row-wise (as in the existing code) versus which parts are blocks
that should be scanned -- and ensuring that any overlaps were removed
-- seemed more complicated than it was worth.

Having thought about it, I think what you propose might be worth it;
at least it limits us to a single TidScan plan to maintain.

The existing code:
  - Looks for a qual that's an OR-list of (ctid = ?) or (ctid IN (?))
  - Costs it by assuming each matching tuple is a separate page.
  - When beginning the scan, evaluates all the ?s and builds an array
of tids to fetch.
  - Sorts and remove duplicates.
  - Iterates over the array, fetching tuples.

So we'd extend that to:
  - Include in the OR-list "range" subquals of the form (ctid > ? AND
ctid < ?) (either side could be optional, and we have to deal with >=
and <= and having ctid on the rhs, etc.).
  - Cost the range subquals by assuming they don't overlap, and
estimating how many blocks and tuples they span.
  - When beginning the scan, evaluate all the ?s and build an array of
"tid ranges" to fetch.  A tid range is a struct with a starting tid,
and an ending tid, and might just be a single tid item.
  - Sort and remove duplicates.
  - Iterate over the array, using a single fetch for single-item tid
ranges, and starting/ending a heap scan for multi-item tid ranges.

I think I'll try implementing this.

>> As part of the work I also taught TidScan that its results are ordered
>> by ctid, i.e. to set a pathkey on a TidPath.  The benefit of this is
>> that queries such as
>>
>> SELECT MAX(ctid) FROM t;
>> SELECT * FROM t WHERE ctid IN (...) ORDER BY ctid;
>
> I think that can be done as I see you're passing allow_sync as false
> in heap_beginscan_strat(), so the scan will start at the beginning of
> the heap.

I found that heap scan caters to parallel scans, synchronised scans,
and block range indexing; but it didn't quite work for my case of
specifying a subset of a table and scanning backward or forward over
it.  Hence my changes.  I'm not overly familiar with the heap scan
code though.

>>   - Is there a less brittle way to create tables of a specific number
>> of blocks/tuples in the regression tests?
>
> Perhaps you could just populate a table with some number of records
> then DELETE the ones above ctid (x,100) on each page, where 100 is
> whatever you can be certain will fit on a page on any platform. I'm
> not quite sure if our regress test would pass with a very small block
> size anyway, but probably worth verifying that before you write the
> first test that will break it.

I don't think I've tested with extreme block sizes.

> I'll try to look in a bit more detail during the commitfest.
>
> It's perhaps a minor detail at this stage, but generally, we don't
> have code lines over 80 chars in length. There are some exceptions,
> e.g not breaking error message strings so that they're easily
> greppable.  src/tools/pgindent has a tool that you can run to fix the
> whitespace so it's in line with project standard.

I'll try to get pgindent running before my next patch.

Thanks for the comments!

Tid scan improvements

2018-08-11 Thread Edmund Horner

Hello,

To scratch an itch, I have been working on teaching TidScan how to do
range queries, i.e. those using >=, <, BETWEEN, etc.  This means we
can write, for instance,

SELECT * FROM t WHERE ctid >= '(1000,0)' AND ctid < '(2000,0)';

instead of resorting to the old trick:

SELECT * FROM t WHERE ctid = ANY (ARRAY(SELECT format('(%s,%s)', i, j)::tid
FROM generate_series(1000,1999) AS gs(i), generate_series(1,200)
AS gs2(j)));

where "200" is some guess at how many tuples can fit on a page for that table.

There's some previous discussion about this at
https://www.postgresql.org/message-id/flat/CAHyXU0zJhg_5RtxKnNbAK%3D4ZzQEFUFi%2B52RjpLrxtkRTD6CDFw%40mail.gmail.com#3ba2c3a6be217f40130655a3250d80a4
.

Since range scan execution is rather different from the existing
TidScan execution, I ended up making a new plan type, TidRangeScan.
There is still only one TidPath, but it has an additional member that
describes which method to use.

As part of the work I also taught TidScan that its results are ordered
by ctid, i.e. to set a pathkey on a TidPath.  The benefit of this is
that queries such as

SELECT MAX(ctid) FROM t;
SELECT * FROM t WHERE ctid IN (...) ORDER BY ctid;

are now planned a bit more efficiently.  Execution was already
returning tuples in ascending ctid order; I just had to add support
for descending order.

Attached are a couple of patches:
  - 01_tid_scan_ordering.patch
  - 02_tid_range_scan.patch, to be applied on top of 01.

Can I add this to the next CommitFest?

Obviously the whole thing needs thorough review, and I expect there to
be numerous problems.  (I had to make this prototype to demonstrate to
myself that it wasn't completely beyond me.  I know from experience
how easy it is to enthusiastically volunteer something for an open
source project, discover that one does not have the time or skill
required, and be too embarrassed to show one's face again!)

As well as actual correctness, some aspects that I am particularly
unsure about include:

  - Is it messy to use TidPath for both types of scan?
  - What is the planning cost for plans that don't end up being a
TidScan or TidRangeScan?
  - Have I put the various helper functions in the right files?
  - Is there a less brittle way to create tables of a specific number
of blocks/tuples in the regression tests?
  - Have a got the ScanDirection right during execution?
  - Are my changes to heapam ok?

Cheers,
Edmund


01_tid_scan_ordering.patch
Description: Binary data


02_tid_range_scan.patch
Description: Binary data

Re: PATCH: psql tab completion for SELECT

2018-07-16 Thread Edmund Horner

On 17 July 2018 at 03:27, Joao De Almeida Pereira
 wrote:
> After playing alittle bit around with the patch I noticed that a comma was
> missing in line 1214
> + 1202 /* min_server_version */
> + 1203 9,
> + 1204 /* catname */
> + 1205 "pg_catalog.pg_proc p",
> + 1206 /* selcondition */
> + 1207 "p.prorettype NOT IN ('trigger'::regtype,
> 'internal'::regtype) "
> + 1208 "AND 'internal'::regtype != ALL (p.proargtypes) "
> + 1209 "AND p.oid NOT IN (SELECT
> unnest(array[typinput,typoutput,typreceive,typsend,typmodin,typmodout,typanalyze])
> FROM pg_type) "
> + 1210 "AND p.oid NOT IN (SELECT
> unnest(array[aggtransfn,aggfinalfn]) FROM pg_aggregate) "
> + 1211 "AND p.oid NOT IN (SELECT
> unnest(array[oprcode,oprrest,oprjoin]) FROM pg_operator) "
> + 1212 "AND p.oid NOT IN (SELECT
> unnest(array[lanplcallfoid,laninline,lanvalidator]) FROM pg_language) "
> + 1213 "AND p.oid NOT IN (SELECT castfunc FROM pg_cast) "
> + 1214 "AND p.oid NOT IN (SELECT amproc FROM pg_amproc) "
> + 1215 /* viscondition */
> + 1216 "pg_catalog.pg_function_is_visible(p.oid)",
>
> To catch these typos would be good if we could get some testing around psql.
> (Newbie question: do we have any kind of testing around tools like psql?)
>
> Thanks
> Joao

Hi Joao,

Ah yes, the embarrassing missing comma.  I kind of wish my IDE or
compiler had pointed it out to me, but how was it to know that I
foolishly combining two struct fields?  Sadly, I think the best
defence right now is to have others scrutinise the code.

I attached a new version of the patch in reply to Heikki.  Would you
care to take a careful look for me?

I think some automated test framework for the tab completion queries
is possible, by calling psql_completion and examining the results.
Maybe someone will volunteer...

Edmund

Re: PATCH: psql tab completion for SELECT

2018-07-16 Thread Edmund Horner

On 17 July 2018 at 00:00, Heikki Linnakangas  wrote:
> Playing around with this a little bit, I'm not very satisfied with the
> completions. Sometimes this completes too much, and sometimes too little.
> All of this has been mentioned in this and the other thread [1] already,
> this just my opinion on all this.

Hi Heikki, thanks for getting this thread active again.

I do actually have another patch, which I didn't post yet because
everyone was busy with v11, and I wanted to wait for more feedback
about inclusions/exclusions.  Holding it back now seems like a mistake
(especially since the last patch had a bug) so here it is.

There's also a patch, to be applied on top, that adds completion after commas.

> Too much:
>
> postgres=# \d lp
>Table "public.lp"
>Column  |  Type   | Collation | Nullable | Default
> --+-+---+--+-
>   id   | integer |   |  |
>   partkey  | text|   |  |
>   partcol1 | text|   |  |
>   partcol2 | text|   |  |
> Partition key: LIST (partkey)
> Number of partitions: 1000 (Use \d+ to list them.)
>
> postgres=# select pa[TAB]
> pad_attribute  parameter_default  parameter_stylepartattrs partcol2
> partexprs  partrelid
> page   parameter_mode parameter_typespartclass
> partcollation  partkeypartstrat
> pageno parameter_name parse_ident(   partcol1 partdefid
> partnatts  passwd

I agree that there is too much.  I don't know what the best way to
reduce it is, short of specifically excluding certain things.  In
theory, I think the query could be sensitive to how much text is
entered, for example, let pa[TAB] not show partrelid and other columns
from pg_partition, but part[TAB] show them; the general rule could be
"only show attributes for system tables if 3 or more letters entered".
But I'm wary of overcomplicating a patch that is (IMHO) a useful
improvement even if simple.

> Too little:
>
> postgres=# select partkey, p[TAB]
> [no completions]
>
>
> Then there's the multi-column case, which seems weird (to a user - I know
> the implementation and understand why):
>
> postgres=# select partkey, partc[TAB]
> [no completions]

Yep, there was no completion after commas in the previous patches.

> And I'd love this case, where go back to edit the SELECT list, after already
> typing the FROM part, to be smarter:
>
> postgres=# select p[TAB] FROM lp;
> Display all 370 possibilities? (y or n)

I don't know enough about readline to estimate how easy this would be.
I think all our current completions use only the text up to the
cursor.

> There's something weird going on with system columns, from a user's point of
> view:
>
> SELECT oi[TAB]
> oid  oidvectortypes(
>
> postgres=# select xm[TAB]
> xmin  xmlcomment( xml_is_well_formed_content(
> xmlvalidate(
> xmlagg(   xml_is_well_formed(
> xml_is_well_formed_document(
>
> So oid and xmin are completed. But "xmax" and "ctid" are not. I think this
> is because in fact none of the system columns are completed, but there
> happen to be some tables with regular columns named "oid" and "xmin". So it
> makes sense once you know that, but it's pretty confusing to a user.
> Tab-completion is a great way for a user to explore what's available, so
> it's weird that some system columns are effectively completed while others
> are not.

You are correct, system columns are excluded, but of course there are
always the same column names in other tables.  Do you think we should
just include all columns?

> I'd like to not include set-returning functions from the list. Although you
> can do "SELECT generate_series(1,10)", I'd like to discourage people from
> doing that, since using set-returning functions in the target list is a
> PostgreSQL extension to the SQL standard, and IMHO the "SELECT * FROM
> generate_series(1,10)" syntax is easier to understand and works more sanely
> with joins etc. Conversely, it would be really nice to include set-returning
> function in the completions after FROM.

I'm happy to exclude SRFs after SELECT if others agree.  Including
them after FROM seems worthwhile, but best left for a different patch?

> I understand that there isn't much you can do about some of those things,
> short of adding a ton of new context information about previous queries and
> heuristics. I think the completion needs to be smarter to be acceptable. I
> don't know what exactly to do, but perhaps someone else has ideas.
>
> I'm also worried about performance, especially of the query to get all the
> column names. It's pretty much guaranteed to do perform a
> SeqScan+Sort+Unique on pg_attribute. pg_attribute can be very large. In my
> little test database, with the above 1000-partition table, hitting tab after
> "SELECT " takes about 1 second, until you get the "Display all 1000
>

Re: Add a tab-completion for "SELECT INTO" or "SELECT FROM" in psql

2018-07-02 Thread Edmund Horner

On 2 July 2018 at 17:57, C,C H  wrote:
> I use tab-completion in psql quite often and I find that I can't complete
> "FROM" for SELECT query.
>
> So I try to create a patch for it.
>
> I download the source code from GitHub master branch and modify the file to
> create the patch.
>
> I compile the code and my code change works like follows,
> --
> postgres=# SELECT *  
> FROM  INTO
> postgres=# SELECT *
> --
>
>
> This is my first patch contribution trial here, please correct me if there
> is something wrong.

Hi CCHsu,

This looks useful, but it does overlap with my patch to add completion
for column and function names to SELECT -- see
https://commitfest.postgresql.org/18/1376/ .  I have been quiet on
this as everyone was (and still is) busy with the 11.0 release.

I also have a patch for columns and functions after a comma in the
select list, but I've been even more quiet on that one.

We can try to combine them.  It won't be as simple as your patch here,
unfortunately.  Luckily the completions are active at mostly*
different times.  Was the last word SELECT ?  Offer a column or
function name.  Otherwise offer FROM/INTO.

* note that FROM/INTO can also appear directly after SELECT!

Cheers,
Edmund

Re: TABLE tab completion

2018-06-02 Thread Edmund Horner

On 29 May 2018 at 07:28, Vik Fearing  wrote:
> The tab completion for the TABLE command includes indexes but that's a
> bug.  Attached is a trivial patch to fix it.

Looks correct to me.

You lose tab completion for "TABLE pg_toast.pg_toast_xyz" but that
seems reasonable.  If people want to query toast tables I'm sure
they'll manage.

Redundant psql tab-completion for CREATE PUBLICATION

2018-05-29 Thread Edmund Horner

Hi,

While looking at Justin's patch for VACUUM completions, I found an
existing bit of code that tries to match on a word with a space:

   /* Complete "CREATE PUBLICATION  FOR TABLE " */
else if (Matches4("CREATE", "PUBLICATION", MatchAny, "FOR TABLE"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);

I think the clause will never match the "FOR TABLE" word; and can, in
any case, be removed, as the the final check for completable "things"
(at the end of psql_completion) will see "TABLE" and act
appropriately.

Here's a little patch to remove these lines.

Edmund


remove-redundant-create-publication-completion.patch
Description: Binary data

Re: adding tab completions

2018-05-29 Thread Edmund Horner

On 29 May 2018 at 12:06, Justin Pryzby  wrote:
> Find attached tab completion for the following:
>
> "... Also, recursively perform VACUUM and ANALYZE on partitions when the
> command is applied to a partitioned table."
> 3c3bb99330aa9b4c2f6258bfa0265d806bf365c3
>
> Add parenthesized options syntax for ANALYZE.
> 854dd8cff523bc17972d34772b0e39ad3d6d46a4
>
> Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
> ede62e56fbe809baa1a7bc3873d82f12ffe7540b
>
> Allow multiple tables to be specified in one VACUUM or ANALYZE command.
> 11d8d72c27a64ea4e30adce11cf6c4f3dd3e60db

Hi Justin,

I don't believe it's meaningful to match on words with spaces in them,
for instance, in

else if (Matches3("VACUUM", "FULL|FREEZE|FULL FREEZE", "ANALYZE")) {

as there will never be a word called "FULL FREEZE" (the tab completion
logic splits using spaces, though it will keep things in quotes and
parentheses together).

I don't know what the best approach is for cases like VACUUM, where
there are multiple optional words.  Maybe something like the
following?  It's pretty ugly, but then, it is part of the tab
completion logic; a good sense of compromise is needed there.

else if (Matches1("VACUUM"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION SELECT 'FULL'"
   " UNION SELECT 'FREEZE'"
   " UNION SELECT 'VERBOSE'"
   " UNION SELECT 'ANALYZE'"
   " UNION SELECT '('");
else if (HeadMatches1("VACUUM") && TailMatches1("FULL"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION SELECT 'FREEZE'"
   " UNION SELECT 'VERBOSE'"
   " UNION SELECT 'ANALYZE'");
else if (HeadMatches1("VACUUM") && TailMatches1("FREEZE"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION SELECT 'VERBOSE'"
   " UNION SELECT 'ANALYZE'");
else if (HeadMatches1("VACUUM") && TailMatches1("VERBOSE"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION SELECT 'ANALYZE'");

(Not a patch file, so that you don't have to merge it with the rest of
your patch. ;-) )

Cheers,
Edmund

Re: PATCH: psql tab completion for SELECT

2018-04-07 Thread Edmund Horner

On 6 April 2018 at 13:29, Peter Eisentraut
 wrote:
> The selection of which functions to show can be further refined.
>
> I don't think the contents of pg_amproc and pg_cast should be excluded.
> Some of those functions are useful.  Similarly for pg_operator.oprcode.
>
> Things like oprrest and oprjoin will already be excluded because they
> have "internal" arguments.  Similarly for some columns in pg_aggregate.

You're right.  There were lots of useful functions being excluded by
the pg_amproc, pg_cast, and oprcode checks.  And all the oprrest and
oprjoin functions are already excluded, so I can remove that check.

Perhaps we should remove the "appears in this catalog table" exclusion
checks, and just use argument and return type?

> There are also additional pseudo-types that should be excluded.  See
> pg_type.typtype = 'p', except some of those should not be excluded.
> Needs more thought.

I don't know much about some of the pseudotypes but this is what I propose:

any*, record, _record, cstring should NOT be excluded
void should NOT be excluded for return type (and perhaps in general;
void_out and void_send are callable, if not hugely useful in psql)

trigger, event_trigger should be excluded
internal, opaque, unknown, pg_ddl_command should be excluded
language_handler, tsm_handler, index_am_handler, fdw_handler should be excluded

For modern servers, our query can be simplified to something like:

SELECT distinct pg_catalog.quote_ident(p.proname)
FROM pg_catalog.pg_proc p
WHERE NOT arrayoverlap(ARRAY['internal', 'event_trigger', 'internal',
'opaque', 'unknown', 'pg_ddl_command', 'language_handler',
'tsm_handler', 'index_am_handler', 'fdw_handler']::regtype[]::oid[],
ARRAY(SELECT p.prorettype UNION SELECT unnest(proargtypes)));

What do you think?

Re: PATCH: psql tab completion for SELECT

2018-04-05 Thread Edmund Horner

On 6 April 2018 at 13:29, Peter Eisentraut
 wrote:
> I looked at this a bit now.  I think it still needs some work.

Hi Peter, thanks for the feedback.

> Some of the queries for older versions contain syntax errors that causes
> them not to work.
>
> For example, in 80100:
>
> "'internal'::regtype != ALL ([.proargtypes) "
>
> The query definition structures are missing a comma between selcondition
> and viscondition.  This causes all the following fields to be
> misassigned.  I'm not quite sure how it actually works at all some of
> the time.  There might be another bug that offsets this one.

One of the problems was a missing comma before the viscondition value
in the struct!

/* selcondition */
"p.prorettype NOT IN ('trigger'::regtype, 'internal'::regtype) "
...
"AND p.oid NOT IN (SELECT amproc FROM pg_amproc) " <--
Should be a comma here!
/* viscondition */
"pg_catalog.pg_function_is_visible(p.oid)",

I did some fairly thorough testing against older servers, but that was
before I rewrote it to fit in Tom's versioned SchemaQuery.  I'll fix
this and repeat the testing.

> I'd like to see a short comment what is different between each of the
> version queries.  You had a list earlier in the thread.

Ok, I'll see if I can add that.

> The selection of which functions to show can be further refined.
>
> I don't think the contents of pg_amproc and pg_cast should be excluded.
> Some of those functions are useful.  Similarly for pg_operator.oprcode.
>
> Things like oprrest and oprjoin will already be excluded because they
> have "internal" arguments.  Similarly for some columns in pg_aggregate.
>
> There are also additional pseudo-types that should be excluded.  See
> pg_type.typtype = 'p', except some of those should not be excluded.
> Needs more thought.

Ok I'll play with these.

Selection of which functions and attributes to show is something with
probably no right answer, so we might have to settle for "close
enough" at some point.

> Considering these issues, I think it would be appropriate to move this
> patch to the next commitfest.

Re: Fix for pg_stat_activity putting client hostaddr into appname field

2018-04-05 Thread Edmund Horner

On 29 March 2018 at 20:46, Michael Paquier <mich...@paquier.xyz> wrote:
> On Tue, Mar 27, 2018 at 03:47:07PM +1300, Edmund Horner wrote:
>> But the stats array includes auxiliary processes, which means it has
>> NumBackendStatSlots items.  The pointers for the aux process strings
>> are out of range.  In the case of my query, the pointers for
>> st_appname in the aux processes happen to point into the
>> BackendClientHostnameBuffer segment.
>>
>> This patch uses NumBackendStatSlots for the sizes of those segments,
>> rather than MaxBackends.
>
> I am adding in CC Robert and Kuntal who worked on that (issue added as
> well to the older bug section in v11 open item list).

Thanks for making a note of it, Michael.  I guess it should be
considered for the next patch release of v10 too?

Re: pgbench doc typos

2018-03-30 Thread Edmund Horner

On 30 March 2018 at 19:26, Fabien COELHO  wrote:
> Thanks for the check. You might consider turning the patch as ready in the
> cf app.

Ok, I have done so, since the patch is small and simple.

>> Fixing the abs/hash bracketing seems clear.  The wasn't sure about
>> rewriting "mod(i, bj)" as "mod(i, j)", because there could be some
>> convention about parameter names, but I can't think of anything the "b"
>> could be in the case of mod.
>
>
> Might be a left-over a previous version which might have used MOD(a, b), but
> a/b are rather used for fp numbers and i/j for integers.

Ah that seems possible.

Re: pgbench doc typos

2018-03-29 Thread Edmund Horner

I did a quick review of this.

The patch is just a doc typo fix and it applies cleanly to master (as of this 
email).  I was able to build the docs, and they look ok.

Fixing the abs/hash bracketing seems clear.  The wasn't sure about rewriting 
"mod(i, bj)" as "mod(i, j)", because there could be some convention about 
parameter names, but I can't think of anything the "b" could be in the case of 
mod.

Edmund

Re: Fix for pg_stat_activity putting client hostaddr into appname field

2018-03-28 Thread Edmund Horner

I sent the original in haste, and now I need to make some corrections... sigh.

> Subject: Fix for pg_stat_activity putting client hostaddr into appname field

Actually, it's the hostname appears in the appname field.

> I noticed when querying pg_stat_activity (in 10.1):

10.1 was where I first noticed the bug, but it's still present in master.

> I've tracked this down to bootstrap/pgstat.c.

Should be postmaster/pgstat.c.

> In the case of my query, the pointers for st_appname in the aux processes 
> happen to point into the BackendClientHostnameBuffer segment.

To be clear, I think this is a memory error.  These rogue pointers
could do a lot more damage than merely pointing to the wrong strings.

> It's an extra 3 kB ...

A rough estimate, that was also wrong.  7 aux processes * (1024 bytes
activity + 64 for hostname + 64 for appname) = about 8 kB.

I do apologise for the confusion!

Edmund

Fix for pg_stat_activity putting client hostaddr into appname field

2018-03-26 Thread Edmund Horner

I noticed when querying pg_stat_activity (in 10.1):

$ SELECT pid, application_name, client_hostname, backend_type FROM
pg_stat_activity;

  pid  | application_name |client_hostname| backend_type
---+--+---+--
 10207 |  |   | autovacuum launcher
 10209 |  |   | logical
replication launcher
 10211 | psql | localhost | client backend
 10212 | DBeaver 4.3.4 - Main | ms006593.met.co.nz| client backend
 10213 | DBeaver 4.3.4 - Metadata | ms006593.met.co.nz| client backend
 10219 | psql | at-ice-db01.met.co.nz | client backend
 10205 | ms006593.met.co.nz   |   | background writer
 10204 | ms006593.met.co.nz   |   | checkpointer
 10206 | at-ice-db01.met.co.nz|   | walwriter

I've tracked this down to bootstrap/pgstat.c.

We create shared memory segments BackendAppnameBuffer,
BackendClientHostnameBuffer, and BackendActivityBufferSize with enough
room for MaxBackends strings each.  In each PgBackendStatus item, we
point st_appname, st_clienthostname, and st_activity_raw to the
appropriate offset within this blocks.

But the stats array includes auxiliary processes, which means it has
NumBackendStatSlots items.  The pointers for the aux process strings
are out of range.  In the case of my query, the pointers for
st_appname in the aux processes happen to point into the
BackendClientHostnameBuffer segment.

This patch uses NumBackendStatSlots for the sizes of those segments,
rather than MaxBackends.

I considered whether aux processes really need those strings
(especially st_clienthostname), but decided it was more consistent
just to assume that they might.  (It's an extra 3 kB... if we want to
save that we can put a bunch of if statements in pgstat_bestart and
other places.)

The patch also allocates local memory for st_clienthostname in
pgstat_read_current_status.  These strings shouldn't change very
often, but it seems safer and more consistent to treat them as we
treat the other two string fields.  Without the string copy, I think a
client disconnect and be replaced after the stats have been copied,
resulting in the new hostname appearing alongside the copied stats
fields from the old connection.

Cheers,
Edmund


pgstats_memory_sizes_v1.patch
Description: Binary data

Re: PATCH: psql tab completion for SELECT

2018-03-21 Thread Edmund Horner

On 8 March 2018 at 17:23, Edmund Horner <ejr...@gmail.com> wrote:
> New patch that fixes a little bug with appending NULL addons to schema 
> queries.

Hi all, I haven't heard anything for a while and so assume you're
beavering away on real features. :)

I've been dogfooding this patch at work, and I am personally pretty
happy with it.

I still think the number of completions on an empty string is a bit
too big, but I don't know what to omit.  There are around 1700
completions on the empty "postgres" database in my testing, and we
show the first 1000 (arbitrarily limited, as the generated SQL has
LIMIT 1000 but no ORDER BY).

Should we just leave it as is?

Whether we improve the filtering or not, I'm optimistic the feature
will be committed in this CF or the next.

I've also been working on adding support for completions after commas,
but I really want to see the current feature finished first.

Cheers,
Edmund

Re: PATCH: psql tab completion for SELECT

2018-03-07 Thread Edmund Horner

Some updates on patch status:
  - "version-dependent-tab-completion-1.patch" by Tom Lane was committed in 
722408bcd.
  - "psql-tab-completion-savepoint-v1.patch" by Edmund Horner is probably not 
needed.
  - "psql-select-tab-completion-v6.patch" (the latest) is still under 
development/review.

Re: PATCH: psql tab completion for SELECT

2018-03-07 Thread Edmund Horner

New patch that fixes a little bug with appending NULL addons to schema queries.


psql-select-tab-completion-v6.patch
Description: Binary data

Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Edmund Horner

I've reworked the SELECT completion patch to use the VersionedQuery
infrastructure.

I've also made it a schema query (for the functions), with an addon
for the attributes.  This provides schema-aware completion.

Previously, addons to schema queries were appended verbatim; I've
changed this to use the same interpolation just as in the simple_query
case, so that the attributes can be filtered in the query.  Otherwise,
relevant items can be omitted when they don't make it into the first
1000 rows in the query result, even when only a small number of items
are ultimately presented for completion.

Edmund


psql-select-tab-completion-v5.patch
Description: Binary data

Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Edmund Horner

On 5 March 2018 at 21:46, Vik Fearing  wrote:
> Tab completion on functions in SELECT (a subset of this thread's patch)
> is quite important to me personally.  I will work on this in the coming
> days.

It's something I've missed numerous times in the last few months at
work.  I guess I should really be running a psql with my own
preliminary patches applied; but I'm only a novice pgsql-hacker, and
easily default to using the official one.

If you are going to work on a patch just for functions, you should
probably make it a SchemaQuery.  I did not find a way to support
schema-qualified functions in a query that also returned column names.

Cheers,
Edmund

Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Edmund Horner

On 5 March 2018 at 08:06, Tom Lane  wrote:
> I looked into this patch and was disappointed to discover that it had
> only a very ad-hoc solution to the problem of version-dependent tab
> completion queries.  We need something better --- in particular, the
> recent prokind changes mean that there needs to be a way to make
> SchemaQuery queries version-dependent.

Thanks for the review Tom.

I was avoiding changing things that already worked and hence ended up
with the ad-hoc code.  It's much better have a unified approach to
handling this, though.

> So ... here is a modest proposal.  It invents a VersionedQuery concept
> and also extends the SchemaQuery infrastructure to allow those to be
> versioned.  I have not taken this nearly as far as it could be taken,
> since it's mostly just proposing mechanism.  To illustrate the
> VersionedQuery infrastructure, I fixed it so it wouldn't send
> publication/subscription queries to pre-v10 servers, and to illustrate
> the versioned SchemaQuery infrastructure, I fixed the prokind problems.

(Hmm, I guess the new prokind column may be germane to my query for
callable functions.)

> If people like this approach, I propose to commit this more or less
> as-is.  The select-tab-completion patch would then need to be rewritten
> to use this infrastructure, but I think that should be straightforward.

The SELECT completion patch was definitely ugly.  I think the
VersionedQuery is a nice way of making it less ad-hoc, but the work of
writing the numerous query versions, where necessary, remains.

My patch had 4 versions: PRE_81, PRE_90, PRE_96, and current.  This
could be reformulated as

static const VersionedQuery
Query_for_list_of_selectable_functions_or_attributes[] = {
{90600, ... },
{9, ... },
{80100, ... },
{70300, ... },
{0, NULL}
};

I do think the version indicators are nicer when they mean "supported
as of this server version" rather than my "fallback if the server
version is prior to this".

Is it overkill to have so many variations?

I am still just slightly unclear on where we are in relation to the
SAVEPOINT patch -- is that redundant now?

For SELECT support, I would be happy with applying:

  a. Your patch
  b. A reworked simple SELECT patch (possibly with fewer query versions)
  c. A SAVEPOINT patch *if* it's deemed useful [1]

And perhaps later,
  d. A cleverer SELECT patch (supporting additional items after the
first comma, for instance)

> As a separate line of work, the infrastructure could be applied to fix
> the pre-existing places where tab completion fails against old servers.
> But that's probably work for v12 or beyond, unless somebody's really
> motivated to do it right now.
>
> regards, tom lane

Edmund

[1] There is more complexity with tab completions and transactions
than I want to tackle just for SELECT; see
https://www.postgresql.org/message-id/flat/CAMyN-kAyFTC4Xavp%2BD6XYOoAOZQW2%3Dc79htji06DXF%2BuF6StOQ%40mail.gmail.com#CAMyN-kAyFTC4Xavp+D6XYOoAOZQW2=c79htji06dxf+uf6s...@mail.gmail.com

psql tab completion vs transactions

2018-02-05 Thread Edmund Horner

Hi folks,

While working on tab completion for SELECT I found a few existing
problems with how psql's tab completion queries interact with
transactions.

  - If a tab completion query fails, it will abort the user's
transaction.  For example, typing, "ALTER PUBLICATION " when
connected to an older server version that doesn't have a
pg_publication table.

  - If the user's transaction is already failed, psql tab completions
that depend on queries won't work.

I made a patch to wrap tab completion queries in SAVEPOINT/ROLLBACK TO
SAVEPOINT.  This addresses the first problem, but not the second
(which is less critical IMHO).  Unfortunately I published it in the
same thread as SELECT tab completion, which may have conflated the two
pieces of work:

https://www.postgresql.org/message-id/CAMyN-kCr6aUFaE%3Dov%2B_r%3Db4nS_ihtmJ8-MLMoomZGWdNFgMaVw%40mail.gmail.com

In the meantime, another problem has appeared:

BEGIN;
SET TRANS(User is hoping to avoid typing the
following long command)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

ERROR:  SET TRANSACTION ISOLATION LEVEL must be called before any query

This happens because tab completion for SET does a query on
pg_settings to get a list of variables to suggest.

I don't believe a savepoint will help here, as one cannot issue SET
TRANSACTION ISOLATION LEVEL after a SAVEPOINT.  (The savepoint
mechanism might still be useful for the other cases, though.)

Perhaps it could be fixed by some special case code to suppress the
pg_settings query if and only if:

 1. We're in a transaction.
 2. No queries have been run yet (psql would need to track this
itself, I think, as libpq doesn't provide it -- am I right?).
 3. The user is trying to tab complete on a SET.

If these are true, the code could jump straight to SET TRANSACTION
ISOLATION LEVEL.

Of course, this is a pain for users who don't want to do that but *do*
want to set a variable when beginning a transaction.

Ideas?

Edmund

Re: PATCH: psql tab completion for SELECT

2018-01-25 Thread Edmund Horner

On 26 January 2018 at 13:44, Vik Fearing <vik.fear...@2ndquadrant.com> wrote:
> On 01/26/2018 01:28 AM, Edmund Horner wrote:
>> The patch mentioned attempts to put savepoints around the tab
>> completion query where appropriate.
>
> I am -1 on this idea.

May I ask why?  It doesn't stop psql working against older versions,
as it checks that the server supports savepoints.

Re: PATCH: psql tab completion for SELECT

2018-01-25 Thread Edmund Horner

On 19 January 2018 at 05:37, Vik Fearing <vik.fear...@2ndquadrant.com> wrote:
> On 01/18/2018 01:07 AM, Tom Lane wrote:
>> Edmund Horner <ejr...@gmail.com> writes:
>>> On 15 January 2018 at 15:45, Andres Freund <and...@anarazel.de> wrote:
>>>> All worries like this are supposed to check the server version.
>>
>>> In psql there are around 200 such tab completion queries, none of
>>> which checks the server version.  Many would cause the user's
>>> transaction to abort if invoked on an older server.  Identifying the
>>> appropriate server versions for each one would be quite a bit of work.
>>
>>> Is there a better way to make this more robust?
>>
>> Maybe it'd be worth the effort to wrap tab completion queries in
>> SAVEPOINT/RELEASE SAVEPOINT if we're inside a user transaction
>> (which we could detect from libpq's state, I believe).
>>
>> That seems like an independent patch, but it'd be a prerequisite
>> if you want to issue tab completion queries with version dependencies.
>>
>> A bigger point here is: do you really want SELECT tab completion
>> to work only against the latest and greatest server version?
>> That would become an argument against committing the feature at all;
>> maybe not enough to tip the scales against it, but still a demerit.
>
> I don't really want such a patch.  I use new psql on old servers all the
> time.

I'm not sure where we got with this.  I really should have put this
patch in a separate thread, since it's an independent feature.

The patch mentioned attempts to put savepoints around the tab
completion query where appropriate.

Re: PATCH: psql tab completion for SELECT

2018-01-25 Thread Edmund Horner

On 23 January 2018 at 21:47, Vik Fearing  wrote:
> Don't forget to include the list :-)
> I'm quoting the entirety of the message---which I would normally trim---for
> the archives.

Thanks for spotting that.  Sorry list!

> If this were my patch, I'd have one query for 8.0, and then queries for all
> currently supported versions if needed.  If 9.2 only gets 8.0-level
> features, that's fine by me.  That limits us to a maximum of six queries.
> Just because we keep support for dead versions doesn't mean we have to
> retroactively develop for them.  Au contraire.
>> I'll make another patch version, with these few differences.

I managed to do testing against servers on all the versions >= 8.0 and
I discovered that prior to version 8.1 they don't support ALL/ANY on
oidvectors; so I added another query form.

But I don't like having so many different queries, so I am in favour
of trimming it down.  Can I leave that up to the committer to remove
some cases or do we need another patch?

> Great!

Here's another.

psql-select-tab-completion-v3.patch
Description: Binary data

Re: PATCH: psql tab completion for SELECT

2018-01-17 Thread Edmund Horner

On 15 January 2018 at 15:45, Andres Freund <and...@anarazel.de> wrote:
> On January 14, 2018 5:44:01 PM PST, Edmund Horner <ejr...@gmail.com> wrote:
>>In psql if you have readline support and press TAB, psql will often
>>run a DB query to get a list of possible completions to type on your
>>current command line.
>>
>>It uses the current DB connection for this, which means that if the
>>tab completion query fails (e.g. because psql is querying catalog
>>objects that doesn't exist in your server), the current transaction
>>(if any) fails.  An example of this happening is:
>
> Ah, that's what I thought. I don't think this is the right fix.
>
>
>> pg_subscription table doesn't
>>exist on 9.2!  User realises their mistake and types a different
>>command)
>>
>>postgres=# select  * from foo;
>>ERROR:  current transaction is aborted, commands ignored until end
>>of transaction block
>
> All worries like this are supposed to check the server version.

In psql there are around 200 such tab completion queries, none of
which checks the server version.  Many would cause the user's
transaction to abort if invoked on an older server.  Identifying the
appropriate server versions for each one would be quite a bit of work.

Is there a better way to make this more robust?

Re: PATCH: psql tab completion for SELECT

2018-01-14 Thread Edmund Horner

On 15 January 2018 at 14:20, Andres Freund <and...@anarazel.de> wrote:
> On January 14, 2018 5:12:37 PM PST, Edmund Horner <ejr...@gmail.com> wrote:
>>And here's a patch to add savepoint protection for tab completion.
>
> It'd be good to explain what that means, so people don't have to read the 
> patch to be able to discuss whether this is a good idea.

Good idea.

In psql if you have readline support and press TAB, psql will often
run a DB query to get a list of possible completions to type on your
current command line.

It uses the current DB connection for this, which means that if the
tab completion query fails (e.g. because psql is querying catalog
objects that doesn't exist in your server), the current transaction
(if any) fails.  An example of this happening is:

$ psql -h old_database_server
psql (10.1, server 9.2.24)
Type "help" for help.

postgres=# begin;
BEGIN
postgres=# create table foo (id int);
CREATE TABLE
postgres=# alter subscription 

(no tab completions because the pg_subscription table doesn't
exist on 9.2!  User realises their mistake and types a different
command)

postgres=# select  * from foo;
ERROR:  current transaction is aborted, commands ignored until end
of transaction block

This patch:
  - checks that the server supports savepoints (version 8.0 and later)
and that the user is currently idle in a transaction
  - if so, creates a savepoint just before running a tab-completion query
  - rolls back to that savepoint just after running the query

The result is that on an 8.0 or later server, the user's transaction
is still ok:

$ psql -h old_database_server
psql (11devel, server 9.2.24)
Type "help" for help.

postgres=# begin;
BEGIN
postgres=# create table foo (id int);
CREATE TABLE
postgres=# alter subscription 

(again, no tab completions)

postgres=# select * from p;
 id

(0 rows)

postgres=# commit;
COMMIT

Note that only the automatic tab-completion query is protected; the
user can still fail their transaction by typing an invalid command
like ALTER SUBSCRIPTION ;.

Re: PATCH: psql tab completion for SELECT

2018-01-14 Thread Edmund Horner

And here's a patch to add savepoint protection for tab completion.

It could definitely use some scrutiny to make sure it's not messing up
the user's transaction.

I added some error checking for the savepoint creation and the
rollback, and then wrapped it in #ifdef NOT_USED (just as the query
error checking is) as I wasn't sure how useful it is for normal use.

But I do feel that if psql unexpectedly messes up the transaction, it
should report it somehow.  How can we tell that we've messed it up for
the user?

Cheers,
Edmund


psql-tab-completion-savepoint-v1.patch
Description: Binary data

Re: PATCH: psql tab completion for SELECT

2018-01-11 Thread Edmund Horner

Hi, here's a new patch.

This one makes some changes to the criteria for which functions to
include; namely filtering out trigger functions and those that take or
return values of type "internal"; and including aggregate and window
functions.  Some of the other checks could be removed as they were
covered by the "internal" check.

It also uses the server version to determine which query to run.  I
have not written a custom query for every version from 7.1!  I've
split up the server history into:

pre-7.3 - does not even have pg_function_is_visible.  Not supported.
pre-9.0 - does not have several support functions in types, languages,
etc.  We don't bother filtering using columns in other tables.
pre-9.6 - does not have various aggregate support functions.
9.6 or later - the full query

I was able to test against 9.2, 9.6, and 10 servers, but compiling and
testing the older releases was a bit much for a Friday evening.  I'm
not sure there's much value in support for old servers, as long as we
can make completion queries fail a bit more gracefully.

Edmund


psql-select-tab-completion-v2.patch
Description: Binary data

Re: [HACKERS] PATCH: psql tab completion for SELECT

2018-01-10 Thread Edmund Horner

On 11 January 2018 at 03:28, Vik Fearing <vik.fear...@2ndquadrant.com> wrote:
> On 01/10/2018 06:38 AM, Edmund Horner wrote:
>> Regarding support for older versions, psql fails silently if a tab
>> completion query fails.
>
> No it doesn't, see below.
>
>> We could just let it do this, which is what
>> happens with, for example, ALTER PUBLICATION against a 9.6 server.  I
>> can't see any existing completions that check the server version --
>> but completions that don't work against older versions, like ALTER
>> PUBLICATION, also aren't useful for older versions.  SELECT is a bit
>> different as it can be useful against older versions that don't have
>> the pg_aggregate.aggcombinefn that my query uses for filtering out
>> aggregation support functions.
>
> That's a bug in my opinion, but not one that needs to be addressed by
> this patch.
>
> There are no completions that cater to the server version (yet) but all
> the \d stuff certainly does.  You can see that in action in
> src/bin/psql/describe.c as well as all over the place in pg_dump and the
> like.
>
>> There's also the small irritation that when a completion query fails,
>> it aborts the user's current transaction to provide an incentive for
>> handling older versions gracefully.
>
> That is actively hostile and not at all what I would consider "silently
> failing".  If you don't want to do the multiple versions thing, you
> should at least check if you're on 9.6+ before issuing the query.

There's also the less-critical problem that you can't complete
anything from an already-failed transaction.

Let's just talk about a separate patch that might improve this.

I can think of two options:

1. Use a separate connection for completions.  The big problem with
this is people want to complete on objects created in their current
transaction.  Maybe there's a way to use SET TRANSACTION SNAPSHOT to
access the user's transaction but this seems far too intrusive just
for a bit of tab completion.

2. Use savepoints.  In exec_query you'd have:

SAVEPOINT _psql_tab_completion;
run the query
ROLLBACK TO _psql_tab_completion;   // Just in case there was an
error, but safe to do anyway.
RELEASE SAVEPOINT _psql_tab_completion;// This may not be worth doing.

It doesn't help with tab completion in already-failed transactions.
But would it be a reasonable way to make tab completion safer?  I
don't know whether savepoint creation/rollback/release has some
cumulative cost that we'd want to avoid incurring.

Re: [HACKERS] PATCH: psql tab completion for SELECT

2018-01-09 Thread Edmund Horner

Hi Vik, thanks so much for the comments and the offer to review!

I kept a low profile after my first message as there was already a
commitfest in progress, but now I'm working on a V2 patch.

I will include aggregate and window functions as you suggest.  And
I'll exclude trigger functions.

I'd like to exclude more if we could, because there are already over
1000 completions on an empty database.  I had thought of filtering out
functions with an argument of type internal but couldn't figure out
how to interrogate the proargtypes oidvector in a nice way.

Regarding support for older versions, psql fails silently if a tab
completion query fails.  We could just let it do this, which is what
happens with, for example, ALTER PUBLICATION against a 9.6 server.  I
can't see any existing completions that check the server version --
but completions that don't work against older versions, like ALTER
PUBLICATION, also aren't useful for older versions.  SELECT is a bit
different as it can be useful against older versions that don't have
the pg_aggregate.aggcombinefn that my query uses for filtering out
aggregation support functions.

There's also the small irritation that when a completion query fails,
it aborts the user's current transaction to provide an incentive for
handling older versions gracefully.

Regarding multiple columns, I have an idea that if we check that:

a) the last of any SELECT/WHERE/GROUP BY/etc.-level keyword is a
SELECT (i.e. that we're in the SELECT clause of the query), and
b) the last word was a comma (or ended in a comma).
we can then proffer the column/function completions.

There may be other completions that could benefit from such a check,
e.g. table names after a comma in the FROM clause, but I just want to
get SELECT working first.

72 matches

Mail list logo