from:"Alexander Korotkov"

Hi!

I've started to review this patch.  My first question is whether we're
able to handle different subscript types differently.  For instance,
one day we could handle jsonpath subscripts for jsonb.  And for sure,
jsonpath subscripts are expected to be handled differently from text
subscripts.  I see we can distinguish types during in prepare and
validate functions.  But it seems there is no type information in
fetch and assign functions.  Should we add something like this to the
SubscriptingRefState for future usage?

Datum uppertypeoid[MAX_SUBSCRIPT_DEPTH];
Datum lowertypeoid[MAX_SUBSCRIPT_DEPTH];

--
Regards,
Alexander Korotkov

Re: Improving spin-lock implementation on ARM.

On Fri, Nov 27, 2020 at 11:55 AM Michael Paquier  wrote:
> Not planning to buy one here, anything I have read on that tells that
> it is worth a performance study.

Another interesting area for experiments is AWS graviton2 instances.
Specification says it supports arm v8.2, so it should have swpal/casa
instructions as well.

--
Regards,
Alexander Korotkov

Re: range_agg

Hi!

On Thu, Sep 24, 2020 at 3:05 AM Paul A Jungwirth
 wrote:
> On Sun, Aug 16, 2020 at 12:55 PM Paul A Jungwirth
>  wrote:
> > This is rebased on the current master, including some changes to doc
> > tables and pg_upgrade handling of type oids.
>
> Here is a rebased version of this patch, including a bunch of cleanup
> from Alvaro. (Thanks Alvaro!)

I'd like to review this patch.  Could you please rebase it once again?  Thanks.

--
Regards,
Alexander Korotkov

Re: [Patch] Optimize dropping of relation buffers using dlist

Hi!

I've found this patch is RFC on commitfest application.  I've quickly
checked if it's really ready for commit.  It seems there are still
unaddressed review notes.  I'm going to switch it to WFA.

--
Regards,
Alexander Korotkov

Re: Improving spin-lock implementation on ARM.

2020-11-26 Thread Alexander Korotkov

On Fri, Nov 27, 2020 at 2:20 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Thu, Nov 26, 2020 at 1:32 PM Heikki Linnakangas  wrote:
> >> Is there some official ARM documentation, like a programmer's reference
> >> manual or something like that, that would show a reference
> >> implementation of a spinlock on ARM? It would be good to refer to an
> >> authoritative source on this.
>
> > I've compared assembly output of gcc implementations of CAS and TAS.
>
> FWIW, I see quite different assembly using Apple's clang on their M1
> processor.  What I get for SpinLockAcquire on HEAD is (lock pointer
> initially in x0):

Yep, arm v8.1 implements single-instruction atomic operations swpal
and casa, which much more look like x86 atomic instructions rather
than loops of ldxr/stlxr.

So, all the reasoning upthread shouldn't work here, but the advantage
is much more huge.

--
Regards,
Alexander Korotkov

Re: Improving spin-lock implementation on ARM.

2020-11-26 Thread Alexander Korotkov

On Fri, Nov 27, 2020 at 1:55 AM Tom Lane  wrote:
>
> Krunal Bauskar  writes:
> > On Thu, 26 Nov 2020 at 10:50, Tom Lane  wrote:
> >> Also, exactly what hardware/software platform were these curves
> >> obtained on?
>
> > Hardware: ARM Kunpeng 920 BareMetal Server 2.6 GHz. 64 cores (56 cores for
> > server and 8 for client) [2 numa nodes]
> > Storage: 3.2 TB NVMe SSD
> > OS: CentOS Linux release 7.6
> > PGSQL: baseline = Release Tag 13.1
>
> Hmm, might not be the sort of hardware ordinary mortals can get their
> hands on.  What's likely to be far more common ARM64 hardware in the
> near future is Apple's new gear.  So I thought I'd try this on the new
> M1 mini I just got.
>
> ... and, after retrieving my jaw from the floor, I present the
> attached.  Apple's chips evidently like this style of spinlock a LOT
> better.  The difference is so remarkable that I wonder if I made a
> mistake somewhere.  Can anyone else replicate these results?

Results look very surprising to me.  I didn't expect there would be
any very busy spin-lock when the number of clients is as low as 4.
Especially in read-only pgbench.

I don't have an M1 at hand.  Could you do some profiling to identify
the source of such a huge difference.

--
Regards,
Alexander Korotkov

Re: Improving spin-lock implementation on ARM.

2020-11-26 Thread Alexander Korotkov

On Thu, Nov 26, 2020 at 1:32 PM Heikki Linnakangas  wrote:
> On 26/11/2020 06:30, Krunal Bauskar wrote:
> > Improving spin-lock implementation on ARM.
> > 
> >
> > * Spin-Lock is known to have a significant effect on performance
> >with increasing scalability.
> >
> > * Existing Spin-Lock implementation for ARM is sub-optimal due to
> >use of TAS (test and swap)
> >
> > * TAS is implemented on ARM as load-store so even if the lock is not free,
> >store operation will execute to replace the same value.
> >This redundant operation (mainly store) is costly.
> >
> > * CAS is implemented on ARM as load-check-store-check that means if the
> >lock is not free, check operation, post-load will cause the loop to
> >return there-by saving on costlier store operation. [1]
>
> Can you add some code comments to explain that why CAS is cheaper than
> TAS on ARM?
>
> Is there some official ARM documentation, like a programmer's reference
> manual or something like that, that would show a reference
> implementation of a spinlock on ARM? It would be good to refer to an
> authoritative source on this.

Let me add my 2 cents.

I've compared assembly output of gcc implementations of CAS and TAS.
The sample C-program is attached.  I've compiled it on raspberry pi 4
using gcc 9.3.0.

The inner loop of CAS is as follows.  So, if the value loaded by ldaxr
doesn't match expected value, then we immediately quit the loop.

.L3:
ldxrw3, [x0]
cmp w3, w1
bne .L4
stlxr   w4, w2, [x0]
cbnzw4, .L3
.L4:

The inner loop of TAS is as follows.  So it really does "stxr"
unconditionally.  In principle, it could check if a new value matches
the observed value and there is nothing to do, but it doesn't.
Moreover, stxr might fail, then we can spend multiple loops of
ldxr/stxr due to concurrent memory access.  AFAIK, those concurrent
accesses could reflect not only lock release, but also other
unsuccessful lock attempts.  So, good chance for extra loops to be
useless.

.L7:
ldxrw2, [x0]
stxrw3, w1, [x0]
cbnzw3, .L7

I've also googled for spinlock implementation on arm and found a blog
post about spinlock implementation in linux kernel [1].  Surprisingly
it doesn't use the trick to skip stxr if the lock is busy.  Instead,
they use some arm-specific power-saving option called WFE.

So, I think it's quite clear that switching from TAS to CAS on arm
would be a win.  But there could be other options to do this even
better.

Links
1. 
https://linux-concepts.blogspot.com/2018/05/spinlock-implementation-in-arm.html

--
Regards,
Alexander Korotkov

test.c
Description: Binary data

POC: Better infrastructure for automated testing of concurrency issues

2020-11-25 Thread Alexander Korotkov

Hackers,

PostgreSQL is a complex multi-process system, and we are periodically faced
with complicated concurrency issues. While the postgres community does a
great job on investigating and fixing the problems, our ability to
reproduce concurrency issues in the source code test suites is limited.

I think we currently have two general ways to reproduce the concurrency
issues.
1. A text scenario for manual reproduction of the issue, which could
involve psql sessions, gdb sessions etc. Couple of examples are [1] and
[2]. This method provides reliable reproduction of concurrency issues. But
it's hard to automate, because it requires external instrumentation
(debugger) and it's not stable in terms of postgres code changes (that is
particular line numbers for breakpoints could be changed). I think this is
why we currently don't have such scenarios among postgres test suites.
2. Another way is to reproduce the concurrency issue without directly
touching the database internals using pgbench or other way to simulate the
workload (see [3] for example). This way is easier to automate, because it
doesn't need external instrumentation and it's not so sensitive to source
code changes. But at the same time this way is not reliable and is
resource-consuming.

In the view of above, I'd like to propose a POC patch, which implements new
builtin infrastructure for reproduction of concurrency issues in automated
test suites. The general idea is so-called "stop events", which are
special places in the code, where the execution could be stopped on some
condition. Stop event also exposes a set of parameters, encapsulated into
jsonb value. The condition over stop event parameters is defined using
jsonpath language.

Following functions control behavior –
* pg_stopevent_set(stopevent_name, jsonpath_conditon) – sets condition for
the stop event. Once the function is executed, all the backends, which run
a given stop event with parameters satisfying the given jsonpath condition,
will be stopped.
* pg_stopevent_reset(stopevent_name) – resets stop events. All the
backends previously stopped on a given stop event will continue the
execution.

For sure, evaluation of stop events causes a CPU overhead. This is why
it's controlled by enable_stopevents GUC, which is off by default. I expect
the overhead with enable_stopevents = off shouldn't be observable. Even if
it would be observable, we could enable stop events only by specific
configure parameter. There is also trace_stopevents GUC, which traces all
the stop events to the log with debug2 level.

In the code stop events are defined using macro STOPEVENT(event_id,
params). The 'params' should be a function call, and it's evaluated only
if stop events are enabled. pg_isolation_test_session_is_blocked() takes
stop events into account. So, stop events are suitable for isolation tests.

POC patch comes with a sample isolation test in
src/test/isolation/specs/gin-traverse-deleted-pages.spec, which reproduces
the issue described in [2] (gin scan steps to the page concurrently deleted
by vacuum).

>From my point of view, stop events would open great possibilities to
improve coverage of concurrency issues. They allow us to reliably test
concurrency issues in both isolation and tap test suites. And such test
suites don't take extraordinary resources for execution. The main cost
here is maintaining a set of stop events among the codebase. But I think
this cost is justified by much better coverage of concurrency issues.

The feedback is welcome.

Links.
1. https://www.postgresql.org/message-id/4E1DE580.1090905%40enterprisedb.com
2.
https://www.postgresql.org/message-id/CAPpHfdvMvsw-NcE5bRS7R1BbvA4BxoDnVVjkXC5W0Czvy9LVrg%40mail.gmail.com
3.
https://www.postgresql.org/message-id/BF9B38A4-2BFF-46E8-BA87-A2D00A8047A6%40hintbits.com

------
Regards,
Alexander Korotkov

0001-Stopevents-v1.patch
Description: Binary data

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-11-16 Thread Alexander Korotkov

Hi!

On Mon, Nov 16, 2020 at 1:09 PM  wrote:
> I've changed the BEGIN WAIT FOR LSN statement to core functions
> pg_waitlsn, pg_waitlsn_infinite and pg_waitlsn_no_wait.
> Currently the functions work inside repeatable read transactions, but
> waitlsn creates a snapshot if called first in a transaction block, which
> can possibly lead the transaction to working incorrectly, so the
> function gives a warning.
>
> Usage examples
> ==
> select pg_waitlsn(‘LSN’, timeout);
> select pg_waitlsn_infinite(‘LSN’);
> select pg_waitlsn_no_wait(‘LSN’);

The name pg_waitlsn_no_wait() looks confusing.  I've tried to see how
it's documented, but the patch doesn't contain documentation...

--
Regards,
Alexander Korotkov

Re: Supporting = operator in gin/gist_trgm_ops

2020-11-15 Thread Alexander Korotkov

On Mon, Nov 16, 2020 at 2:13 AM Jeff Janes  wrote:
> On Sat, Nov 14, 2020 at 12:31 AM Alexander Korotkov  
> wrote:
>> I went through and revised this patch.  I made the documentation
>> statement less categorical.  pg_trgm gist/gin indexes might have lower
>> performance of equality operator search than B-tree.  So, we can't
>> claim the B-tree index is always not needed.  Also, simple comparison
>> operators are <, <=, >, >=, and they are not supported.
>
> Is "simple comparison" here a well-known term of art?  If I read the doc as 
> committed (which doesn't include the sentence above), and if I didn't already 
> know what it was saying, I would be left wondering which comparisons those 
> are.  Could we just say "inequality operators"?

You're right.  "Simple comparison" is vague, let's replace it with
"inequality".  Pushed, thanks!

--
Regards,
Alexander Korotkov

Re: Supporting = operator in gin/gist_trgm_ops

2020-11-14 Thread Alexander Korotkov

On Sat, Nov 14, 2020 at 8:26 PM Julien Rouhaud  wrote:
> On Sat, Nov 14, 2020 at 7:58 PM Erik Rijkers  wrote:
> >
> > On 2020-11-14 12:53, Julien Rouhaud wrote:
> > > On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
> > >  >
> >
> > >Note that those indexes may not be as afficient as regulat B-tree
> > > indexes
> > >for equality operator.
> >
> >
> > 'afficient as regulat'  should be
> > 'efficient as regular'
> >
> >
> > Sorry to be nitpicking - it's the one thing I'm really good at :P
>
> Oops indeed :)

Pushed with all the corrections above.  Thanks!

--
Regards,
Alexander Korotkov

Re: pgbench: option delaying queries till connections establishment?

2020-11-14 Thread Alexander Korotkov

Hi!

On Thu, Feb 27, 2020 at 9:01 PM Andres Freund  wrote:
> I am trying to run a few benchmarks measuring the effects of patch to
> make GetSnapshotData() faster in the face of larger numbers of
> established connections.
>
> Before the patch connection establishment often is very slow due to
> contention. The first few connections are fast, but after that it takes
> increasingly long. The first few connections constantly hold
> ProcArrayLock in shared mode, which then makes it hard for new
> connections to acquire it exclusively (I'm addressing that to a
> significant degree in the patch FWIW).

Hmm...  Let's see the big picture.  You've recently committed a
patchset, which greatly improved the performance of GetSnapshotData().
And you're making further improvements in this direction.  But you're
getting trouble in measuring the effect, because Postgres is still
stuck on ProcArrayLock.  And in this thread you propose a workaround
for that implemented on the pgbench side.  My very dumb idea is
following: should we finally give a chance to more fair lwlocks rather
than inventing workarounds?

As I remember, your major argument against more fair lwlocks was the
idea that we should fix lwlocks use-cases rather than lwlock mechanism
themselves.  But can we expect that we fix all the lwlocks use-case in
any reasonable prospect?  My guess is 'no'.

Links
1. 
https://www.postgresql.org/message-id/CAPpHfdvJhO1qutziOp%3Ddy8TO8Xb4L38BxgKG4RPa1up1Lzh_UQ%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: Supporting = operator in gin/gist_trgm_ops

2020-11-14 Thread Alexander Korotkov

Hi, Erik!

On Sat, Nov 14, 2020 at 11:37 AM Erik Rijkers  wrote:
> On 2020-11-14 06:30, Alexander Korotkov wrote:
>
> > [v4-0001-Handle-equality...in-contrib-pg_trgm.patch (~]
> >
> > I'm going to push this if no objections.
> >
>
> About the sgml, in doc/src/sgml/pgtrgm.sgml :
>
>
> Beginning in PostgreSQL 14, these indexes
> also support equality operator (simple comparison operators are not
> supported).
>
> should be:
>
> Beginning in PostgreSQL 14, these indexes
> also support the equality operator (simple comparison operators are not
> supported).
>
> (added 'the')
>
>
> And:
>
> Although these indexes might have lower the performance of equality
> operator
> search than regular B-tree indexes.
>
> should be (I think - please check the meaning)
>
> Although these indexes might have a lower performance with equality
> operator
> search than with regular B-tree indexes.
>
>
> I am not sure I understood this last sentence correctly. Does this mean
> the slower trgm index might be chosen over the faster btree?

Thank you for your review.

I mean searching for an equal string with pg_trgm GiST/GIN might be
slower than the same search with B-tree.  If you need both pg_trgm
similarity/pattern search and equal search on your column, you have
choice.  You can run with a single pg_trgm index, but your search for
an equal string wouldn't be as fast as with B-tree.  Alternatively you
can have two indexes: pg_trgm index for similarity/pattern search and
B-tree index for equality search.  Second option gives you a fast
equality search, but the second B-tree index would take extra space
and maintenance overhead.  For equality search, the B-tree index
should be chosen by the planner (and that was tested).

--
Regards,
Alexander Korotkov

Re: Supporting = operator in gin/gist_trgm_ops

2020-11-13 Thread Alexander Korotkov

Hi!

On Fri, Nov 13, 2020 at 1:47 PM Georgios  wrote:
> In short, I think v3 of the patch looks good to change to 'RFC' status.
> Given the possible costing concerns, I will refrain from changing the
> status just yet, to give the opportunity to more reviewers to chime in.
> If in the next few days there are no more reviews, I will update the
> status to 'RFC' to move the patch forward.
>
> Thoughts?

I went through and revised this patch.  I made the documentation
statement less categorical.  pg_trgm gist/gin indexes might have lower
performance of equality operator search than B-tree.  So, we can't
claim the B-tree index is always not needed.  Also, simple comparison
operators are <, <=, >, >=, and they are not supported.

I also have checked that btree_gist is preferred over pg_trgm gist
index for equality search.  Despite our gist cost estimate is quite
dumb, it selects btree_gist index due to its lower size.  So, this
part also looks good to me.

I'm going to push this if no objections.

--
Regards,
Alexander Korotkov

v4-0001-Handle-equality-operator-in-contrib-pg_trgm.patch
Description: Binary data

Phrase search vs. multi-lexeme tokens

2020-11-12 Thread Alexander Korotkov

Hackers,

I'm investigating the bug report [1] about the behavior of
websearch_to_tsquery() with quotes and multi-lexeme tokens.  See the
example below.

# select to_tsvector('pg_class foo') @@ websearch_to_tsquery('"pg_class
foo"');
 ?column?
--
 f

So, tsvector doesn't match tsquery, when absolutely the same text was
put to the to_tsvector() and to the quotes of websearch_to_tsquery().
Looks wrong to me.  Let's examine output of to_tsvector() and
websearch_to_tsquery().

# select to_tsvector('pg_class foo');
   to_tsvector
--
 'class':2 'foo':3 'pg':1

# select websearch_to_tsquery('"pg_class foo"');
 websearch_to_tsquery
--
 ( 'pg' & 'class' ) <-> 'foo'
(1 row)

So, 'pg_class' token was split into two lexemes 'pg' and 'class'.  But
the output websearch_to_tsquery() connects 'pg' and 'class' with &
operator.  tsquery expects 'pg' and 'class' to be both neighbors of
'foo'.  So, 'pg' and 'class' are expected to share the same position,
and that isn't true for tsvector.  Let's see how phraseto_tsquery()
handles that.

# select to_tsvector('pg_class foo') @@ phraseto_tsquery('pg_class foo');
 ?column?
--
 t

# select phraseto_tsquery('pg_class foo');
  phraseto_tsquery

 'pg' <-> 'class' <-> 'foo'

phraseto_tsquery() connects all the lexemes with phrase operators and
everything works OK.

For me it's obvious that phraseto_tsquery() and websearch_to_tsquery()
with quotes should work the same way.  Noticeably, current behavior of
websearch_to_tsquery() is recorded in the regression tests.  So, it
might look that this behavior is intended, but it's too ridiculous and
I think the regression tests contain oversight as well.

I've prepared a fix, which doesn't break the fts parser abstractions
too much (attached patch), but I've faced another similar issue in
to_tsquery().

# select to_tsvector('pg_class foo') @@ to_tsquery('pg_class <-> foo');
 ?column?
--
 f

# select to_tsquery('pg_class <-> foo');
  to_tsquery
--
 ( 'pg' & 'class' ) <-> 'foo'

I think if a user writes 'pg_class <-> foo', then it's expected to
match 'pg_class foo' independently on which lexemes 'pg_class' is
split into.

This issue looks like the much more complex design bug in phrase
search.  Fixing this would require some kind of readahead or multipass
processing, because we don't know how to process 'pg_class' in
advance.

Is this really a design bug existing in phrase search from the
beginning.  Or am I missing something?

Links
1. https://www.postgresql.org/message-id/16592-70b110ff9731c07d%40postgresql.org

--
Regards,
Alexander Korotkov


websearch_fix_p2.patch
Description: Binary data

Re: Strange GiST logic leading to uninitialized memory access in pg_trgm gist code

2020-11-11 Thread Alexander Korotkov

Hi!

On Wed, Nov 11, 2020 at 12:53 PM Andrew Gierth
 wrote:
> Now the obvious simple fix is just to reorder those last two operations,
> and the original reporter verified that doing so fixed their problem
> (patch attached). But I'd really like to understand the logic here and
> whether there is any reason to have this special treatment at all - why
> would it not be better to just cache all N items upfront and consider
> them all as potential seeds?

I think this comes from the idea that when N items are passed to the
picksplit method, then the first N-1 are existing items on the page,
while the last Nth is the new item to be inserted.  So, we are trying
to split first N-1 items and then insert the last Nth item there.  But
this is wrong for two reasons.

1) As you've pointed out, GiST code doesn't necessarily pass items to
the picksplit method in that way.
2) Even if items are passed as assumed, there is no point in having
special handling of the item to be inserted.  It's better to consider
the whole set of items to produce a better split.

> Another issue I don't understand yet is that even though this code is
> largely unchanged since 8.x, the original reporter could not reproduce
> the crash on any version before 13.0.

I think this is related to my commit 911e702077.  It has changed the
memory allocation for the signatures to support the signatures of
variable length.  So, it seems that despite the error existing since
8.x, it started causing segfaults only since 911e702077.

> Anyone have any ideas? (If not, I'll commit and backpatch something like
> the attached patch at some suitable time.)

I would rather propose to rip off special handling of the last item
completely (see the attached patch).

--
Regards,
Alexander Korotkov

0001-Fix-handling-of-the-last-item-in-gtrgm_picksplit.patch
Description: Binary data

Re: MultiXact\SLRU buffers configuration

2020-10-28 Thread Alexander Korotkov

Hi, Tomas!

Thank you for your review.

On Wed, Oct 28, 2020 at 4:36 AM Tomas Vondra
 wrote:
> I did a quick review on this patch series. A couple comments:
>
>
> 0001
> 
>
> This looks quite suspicious to me - SimpleLruReadPage_ReadOnly is
> changed to return information about what lock was used, merely to allow
> the callers to do an Assert() that the value is not LW_NONE.

Yes, but this is not merely to allow callers to do an Assert().
Sometimes in multixacts it could save us some relocks.  So, we can
skip relocking lock to exclusive mode if it's in exclusive already.
Adding Assert() to every caller is probably overkill.

> IMO we could achieve exactly the same thing by passing a simple flag
> that would say 'make sure we got a lock' or something like that. In
> fact, aren't all callers doing the assert? That'd mean we can just do
> the check always, without the flag. (I see GetMultiXactIdMembers does
> two calls and only checks the second result, but I wonder if that's
> intended or omission.)

Having just the flag is exactly what the original version by Andrey
did.  But if we have to read two multixact offsets pages or multiple
members page in one GetMultiXactIdMembers()), then it does relocks
from exclusive mode to exclusive mode.  I decide that once we decide
to optimize this locks, this situation is nice to evade.

> In any case, it'd make the lwlock.c changes unnecessary, I think.

I agree that it would be better to not touch lwlock.c.  But I didn't
find a way to evade relocking exclusive mode to exclusive mode without
touching lwlock.c or making code cumbersome in other places.

> 0002
> 
>
> Specifies the number cached MultiXact by backend. Any SLRU lookup ...
>
> should be 'number of cached ...'

Sounds reasonable.

> 0003
> 
>
>  * Conditional variable for waiting till the filling of the next multixact
>  * will be finished.  See GetMultiXactIdMembers() and RecordNewMultiXact()
>  * for details.
>
> Perhaps 'till the next multixact is filled' or 'gets full' would be
> better. Not sure.

Sounds reasonable as well.

--
Regards,
Alexander Korotkov

Re: MultiXact\SLRU buffers configuration

2020-10-27 Thread Alexander Korotkov

On Tue, Oct 27, 2020 at 8:02 PM Alexander Korotkov  wrote:
> On Mon, Oct 26, 2020 at 6:45 PM Andrey Borodin  wrote:
> > Thanks for your review, Alexander!
> > +1 for avoiding double locking in SimpleLruReadPage_ReadOnly().
> > Other changes seem correct to me too.
> >
> >
> > I've tried to find optimal value for cache size and it seems to me that it 
> > affects multixact scalability much less than sizes of offsets\members 
> > buffers. I concur that patch 2 of the patchset does not seem documented 
> > enough.
>
> Thank you.  I've made a few more minor adjustments to the patchset.
> I'm going to push 0001 and 0003 if there are no objections.

I get that patchset v5 doesn't pass the tests due to typo in assert.
The fixes version is attached.

--
Regards,
Alexander Korotkov


v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch
Description: Binary data


v6-0002-Make-MultiXact-local-cache-size-configurable.patch
Description: Binary data


v6-0003-Add-conditional-variable-to-wait-for-next-MultXact.patch
Description: Binary data

Re: MultiXact\SLRU buffers configuration

2020-10-27 Thread Alexander Korotkov

On Mon, Oct 26, 2020 at 6:45 PM Andrey Borodin  wrote:
> Thanks for your review, Alexander!
> +1 for avoiding double locking in SimpleLruReadPage_ReadOnly().
> Other changes seem correct to me too.
>
>
> I've tried to find optimal value for cache size and it seems to me that it 
> affects multixact scalability much less than sizes of offsets\members 
> buffers. I concur that patch 2 of the patchset does not seem documented 
> enough.

Thank you.  I've made a few more minor adjustments to the patchset.
I'm going to push 0001 and 0003 if there are no objections.

--
Regards,
Alexander Korotkov


v5-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch
Description: Binary data


v5-0002-Make-MultiXact-local-cache-size-configurable.patch
Description: Binary data


v5-0003-Add-conditional-variable-to-wait-for-next-MultXact.patch
Description: Binary data

Re: Supporting = operator in gin/gist_trgm_ops

2020-10-26 Thread Alexander Korotkov

On Mon, Oct 26, 2020 at 7:38 AM Julien Rouhaud  wrote:
> Ah, yes this might lead to bad performance if the "fake wildcard"
> matches too many rows, but this shouldn't be a very common use case,
> and the only alternative for that might be to create trigrams for non
> alphanumerics characters.  I didn't try to do that because it would
> mean meaningful overhead for mainstream usage of pg_trgm, and would
> also mean on-disk format break.  In my opinion supporting = should be
> a best effort, especially for such corner cases.

It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm().  It some
cases it would generate more trigrams.  For instance generate_trgm()
would generate '__a', '_ab', 'ab_' for '%ab%' while
generate_wildcard_trgm() would generate nothing.

Also I wonder how our costing would work if there are multiple indices
of the same column.  We should clearly prefer btree than pg_trgm
gist/gin, and I believe our costing provides this.  But we also should
prefer btree_gist/btree_gin than pg_trgm gist/gin, and I'm not sure
our costing provides this especially for gist.

------
Regards,
Alexander Korotkov

Re: MultiXact\SLRU buffers configuration

2020-10-25 Thread Alexander Korotkov

Hi!

On Mon, Sep 28, 2020 at 5:41 PM Andrey M. Borodin  wrote:
> > 28 авг. 2020 г., в 23:08, Anastasia Lubennikova 
> >  написал(а):
> >
> > 1) The first patch is sensible and harmless, so I think it is ready for 
> > committer. I haven't tested the performance impact, though.
> >
> > 2) I like the initial proposal to make various SLRU buffers configurable, 
> > however, I doubt if it really solves the problem, or just moves it to 
> > another place?
> >
> > The previous patch you sent was based on some version that contained 
> > changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' 
> > and 'multixact_members_slru_buffers'. Have you just forgot to attach them? 
> > The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
> > Meanwhile, I attach the rebased patch to calm down the CFbot. The changes 
> > are trivial.
> >
> > 2.1) I think that both min and max values for this parameter are too 
> > extreme. Have you tested them?
> >
> > +   _local_cache_entries,
> > +   256, 2, INT_MAX / 2,
> >
> > 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.
> >
> > 3) No changes for third patch. I just renamed it for consistency.
>
> Thank you for your review.
>
> Indeed, I had 4th patch with tests, but these tests didn't work well: I still 
> did not manage to stress SLRUs to reproduce problem from production...
>
> You are absolutely correct in point 2: I did only tests with sane values. And 
> observed extreme performance degradation with values ~ 64 megabytes. I do not 
> know which highest values should we pick? 1Gb? Or highest possible 
> functioning value?
>
> I greatly appreciate your review, sorry for so long delay. Thanks!

I took a look at this patchset.

The 1st and 3rd patches look good to me.  I made just minor improvements.
1) There is still a case when SimpleLruReadPage_ReadOnly() relocks the
SLRU lock, which is already taken in exclusive mode.  I've evaded this
by passing the lock mode as a parameter to
SimpleLruReadPage_ReadOnly().
3) CHECK_FOR_INTERRUPTS() is not needed anymore, because it's called
inside ConditionVariableSleep() if needed.  Also, no current wait
events use slashes, and I don't think we should introduce slashes
here.  Even if we should, then we should also rename existing wait
events to be consistent with a new one.  So, I've renamed the new wait
event to remove the slash.

Regarding the patch 2.  I see the current documentation in the patch
doesn't explain to the user how to set the new parameter.  I think we
should give users an idea what workloads need high values of
multixact_local_cache_entries parameter and what doesn't.  Also, we
should explain the negative aspects of high values
multixact_local_cache_entries.  Ideally, we should get the advantage
without overloading users with new nontrivial parameters, but I don't
have a particular idea here.

I'd like to propose committing 1 and 3, but leave 2 for further review.

--
Regards,
Alexander Korotkov

v4-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offsets.patch
Description: Binary data

v4-0002-Make-MultiXact-local-cache-size-configurable.patch
Description: Binary data

v4-0003-Add-conditional-variable-to-wait-for-next-MultXact-o.patch
Description: Binary data

Re: POC: contrib/unaccent as IMMUTABLE

2020-10-03 Thread Alexander Korotkov

On Sat, Oct 3, 2020 at 6:37 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > I personally don't have an exact understanding of how strict we are
> > about marking functions immutable.  For example,
> > to_tsvector(regconfig, text) is immutable.
>
> Yeah.  This is in fact wrong, because a TS configuration can *very*
> easily be changed by the user.  We held our noses and did it anyway
> to allow tsvector results to be stored in indexes, figuring that
> (a) we'd put it on the user's head to reindex when necessary, and
> (b) in most situations, small changes in a TS configuration don't
> make an old tsvector index unusable --- at worst, some searches
> will fail that should have succeeded.  (I'm not entirely sure that
> I believe (b), but that was the argument that was advanced.)
>
> I do not think we'd accept such a compromise if it were put forward
> today.  The project's standards for such things have tightened over
> time, as evidenced by the work that's going towards collation
> change detection.
>
> It's also worth noting that the consequences of an out-of-date
> index for unaccent seem likely to be worse than they are for
> tsvectors.  It's not hard to imagine somebody making a unique
> index on an unaccent result, and then setting themselves up for
> dump/reload failures by changing the unaccent rules.  Nobody
> builds unique indexes on tsvectors.

Tom, thank you for the clarification.  Now I get more understanding on
the project policy here.  I agree that we shouldn't mark unaccent() as
immutable in the current infrastructure.

--
Regards,
Alexander Korotkov

Re: Fix inconsistency in jsonpath .datetime()

2020-09-29 Thread Alexander Korotkov

On Fri, Sep 25, 2020 at 2:02 AM Alexander Korotkov  wrote:
> Other opinions?

Given no other opinions yet, I've pushed the both patches.

--
Regards,
Alexander Korotkov

Re: Fix inconsistency in jsonpath .datetime()

2020-09-24 Thread Alexander Korotkov

On Sun, Sep 20, 2020 at 2:23 AM Nikita Glukhov  wrote:
> The beta-tester of PG13 reported a inconsistency in our current jsonpath
> datetime() method implementation.  By the standard format strings in 
> datetime()
> allows only characters "-./,':; " to be used as separators in format strings.
> But our to_json[b]() serializes timestamps into XSD format with "T" separator
> between date and time, so the serialized data cannot be parsed back by 
> jsonpath
> and it looks inconsistent:
>
> =# SELECT to_jsonb('2020-09-19 23:45:06'::timestamp);
>to_jsonb
> ---
>  "2020-09-19T23:45:06"
> (1 row)
>
> =# SELECT jsonb_path_query(to_jsonb('2020-09-19 23:45:06'::timestamp),
>'$.datetime()');
> ERROR:  datetime format is not recognized: "2020-09-19T23:45:06"
> HINT:  Use a datetime template argument to specify the input data format.
>
> =# SELECT jsonb_path_query(to_jsonb('2020-09-19 23:45:06'::timestamp),
>'$.datetime("-mm-dd HH:MI:SS")');
> ERROR:  unmatched format separator " "
>
> =# SELECT jsonb_path_query(to_jsonb('2020-09-19 23:45:06'::timestamp),
>'$.datetime("-mm-dd\"T\"HH:MI:SS")');
> ERROR:  invalid datetime format separator: """
>
>
>
> Excerpt from SQL-2916 standard (5.3 , page 197):
>
>  ::=
> 
>
>  ::=
>[  ]
>
>  ::=
>  
>
>
>
> Attached patch #2 tries to fix this problem by enabling escaped characters in
> standard mode.  I'm not sure is it better to enable the whole set of text
> separators or only the problematic "T" character, allow only quoted text
> separators or not.
>
> Patch #1 is a more simple fix (so it comes first) removing excess space 
> between
> time and timezone fields in built-in format strings used for datetime type
> recognition.  (It seemed to work as expected with extra space in earlier
> version of the patch in which standard mode has not yet been introduced).

Jsonpath .datetime() was developed as an implementation of
corresponding parts of SQL Standard.  Patch #1 fixes inconsistency
between our implementation and Standard.  I'm going to backpatch it to
v13.

There is also inconsistency among to_json[b]() and jsonpath
.datetime().  In this case, I wouldn't say the problem is on the
jsonpath side.  to_json[b]() makes special exceptions for datetime
types and converts them not using standard output function, but using
javascript-compatible format (see f30015b6d7).  Luckily, our input
function for timestamp[tz] datatypes doesn't use strict format
parsing, so it can work with output of to_json[b]().  But according to
SQL Standard, jsonpath .datetime() implements strict format parsing,
so it can't work with output of to_json[b]().  So, I wouldn't say in
this case it's an inconsistency in the jsonpath .datetime() method.
But, given now it's not an appropriate time for redesigning
to_json[b](), we should probably improve jsonpath .datetime() method
to understand more formats.

So, patch #2 is probably acceptable, and even might be backpatched
v13.  One thing I don't particularly like is "In standard mode format
string characters are strictly matched or matched to spaces."
Instead, I would like to just strictly match characters and just add
more options to fmt_str[].

Other opinions?

--
Regards,
Alexander Korotkov

Re: DROP relation IF EXISTS Docs and Tests - Bug Fix

2020-09-15 Thread Alexander Korotkov

Hi!

I've skimmed through the thread and checked the patchset.  Everything
looks good, except one paragraph, which doesn't look completely clear.

+  
+   This emulates the functionality provided by
+but sets the created object's
+   type definition
+   to domain.
+  

As I get it states that CREATE DOMAIN somehow "emulates" CREATE TYPE.
Could you please, rephrase it?  It looks confusing to me yet.

--
Regards,
Alexander Korotkov

Re: INSERT ON CONFLICT and RETURNING

2020-09-03 Thread Alexander Korotkov

On Thu, Sep 3, 2020 at 7:56 PM Geoff Winkless  wrote:
>
> On Mon, 31 Aug 2020 at 14:53, Konstantin Knizhnik
>  wrote:
> > If we are doing such query:
> >
> > INSERT INTO jsonb_schemas (schema) VALUES (obj_schema)
> >ON CONFLICT (schema) DO UPDATE schema=jsonb_schemas.schema RETURNING id
> >
> >
> > Then as far as I understand no extra lookup is used to return ID:
>
> The conflict resolution checks the unique index on (schema) and
> decides whether or not a conflict will exist. For DO NOTHING it
> doesn't have to get the actual row from the table; however in order
> for it to return the ID it would have to go and get the existing row
> from the table. That's the "extra lookup", as you term it. The only
> difference from doing it with RETURNING id versus WITH... COALESCE()
> as you described is the simpler syntax.

As I know, conflict resolution still has to fetch heap tuples, see
_bt_check_unique().  As I understand it, the issues are as follows.
1) Conflict resolution uses the dirty snapshot.  It's unclear whether
we can return this tuple to the user, because the query has a
different snapshot.  Note, that CTE query by Konstantin at thead start
doesn't handle all the cases correctly, it can return no rows on
conflict. We probably should do the trick similar to the EPQ mechanism
for UPDATE.  For instance, UPDATE ... RETURNING old.* can return the
tuple, which doesn't match the query snapshot.  But INSERT ON CONFLICT
might have other caveats in this area, it needs careful analysis.
2) Checking unique conflicts inside the index am is already the
encapsulation-breaking hack.  Returning the heap tuple for index am
would be even worse hack.  We probably should refactor this whole area
before.

--
Regards,
Alexander Korotkov

Re: LSM tree for Postgres

2020-08-08 Thread Alexander Korotkov

On Sat, Aug 8, 2020 at 11:49 PM Konstantin Knizhnik
 wrote:
> On 08.08.2020 21:18, Alexander Korotkov wrote:
> > On Sat, Aug 8, 2020 at 5:07 PM Konstantin Knizhnik
> >  wrote:
> >> I agree with your that loosing sequential order of B-Tree pages may have
> >> negative impact on performance.
> >> But it first of all critical for order-by and range queries, when we
> >> should traverse several subsequent leave pages.
> >> It is less critical for exact-search or delete/insert operations.
> >> Efficiency of merge operations mostly depends on how much keys
> >> will be stored at the same B-Tree page.
> > What do you mean by "mostly"?  Given PostgreSQL has quite small (8k)
> > pages, sequential read in times faster than random read on SSDs
> > (dozens of times on HDDs).  I don't think this is something to
> > neglect.
>
> When yo insert one record in B-Tree, the order of pages doesn't matter
> at all.
> If you insert ten records at one leaf page then order is also not so
> important.
> If you insert 100 records, 50 got to one page and 50 to the next page,
> then insertion may be faster if second page follows on  the disk first one.
> But such insertion may cause page split and so allocation of new page,
> so sequential write order can still be violated.

Sorry, I've no idea of what you're getting at.

> >> And it is first of all
> >> determined by size of top index and key distribution.
> > How can you be sure that the top index can fit memory?  On production
> > systems, typically there are multiple consumers of memory: other
> > tables, indexes, other LSMs.  This is one of reasons why LSM
> > implementations have multiple levels: they don't know in advance which
> > levels fit memory.  Another reason is dealing with very large
> > datasets.  And I believe there is a quite strong reason to keep page
> > order sequential within level.
>
> There is no any warranty that top index is kept in memory.
> But as far top index pages are frequently accessed,  I hope that buffer
> management cache replacement
> algorithm does it best to keep them in memory.

So, the top index should be small enough that we can safely assume it
wouldn't be evicted from cache on a heavily loaded production system.
I think it's evident that it should be in orders of magnitude less
than the total amount of server RAM.

> > I'm OK with your design for a third-party extension.  It's very cool
> > to have.  But I'm -1 for something like this to get into core
> > PostgreSQL, assuming it's feasible to push some effort and get
> > state-of-art LSM there.
> I realize that it is not true LSM.
> But still I wan to notice that it is able to provide ~10 times increase
> of insert speed when size of index is comparable with RAM size.
> And "true LSM" from RocksDB shows similar results.

It's very far from being shown.  All the things you've shown is a
naive benchmark.  I don't object that your design can work out some
cases.  And it's great that we have the lsm3 extension now.  But I
think for PostgreSQL core we should think about better design.

> May be if size of
> index will be 100 times larger then
> size of RAM, RocksDB will be significantly faster than Lsm3. But modern
> servers has 0.5-1Tb of RAM.
> Can't believe that there are databases with 100Tb indexes.

Comparison of whole RAM size to single index size looks plain wrong
for me.  I think we can roughly compare whole RAM size to whole
database size.  But also not the whole RAM size is always available
for caching data.  Let's assume half of RAM is used for caching data.
So, a modern server with 0.5-1Tb of RAM, which suffers from random
B-tree insertions and badly needs LSM-like data-structure, runs a
database of 25-50Tb.  Frankly speaking, there is nothing
counterintuitive for me.

--
Regards,
Alexander Korotkov

Re: LSM tree for Postgres

2020-08-08 Thread Alexander Korotkov

On Sat, Aug 8, 2020 at 5:07 PM Konstantin Knizhnik
 wrote:
> I agree with your that loosing sequential order of B-Tree pages may have
> negative impact on performance.
> But it first of all critical for order-by and range queries, when we
> should traverse several subsequent leave pages.
> It is less critical for exact-search or delete/insert operations.
> Efficiency of merge operations mostly depends on how much keys
> will be stored at the same B-Tree page.

What do you mean by "mostly"?  Given PostgreSQL has quite small (8k)
pages, sequential read in times faster than random read on SSDs
(dozens of times on HDDs).  I don't think this is something to
neglect.

> And it is first of all
> determined by size of top index and key distribution.

How can you be sure that the top index can fit memory?  On production
systems, typically there are multiple consumers of memory: other
tables, indexes, other LSMs.  This is one of reasons why LSM
implementations have multiple levels: they don't know in advance which
levels fit memory.  Another reason is dealing with very large
datasets.  And I believe there is a quite strong reason to keep page
order sequential within level.

I'm OK with your design for a third-party extension.  It's very cool
to have.  But I'm -1 for something like this to get into core
PostgreSQL, assuming it's feasible to push some effort and get
state-of-art LSM there.

--
Regards,
Alexander Korotkov

Re: LSM tree for Postgres

2020-08-07 Thread Alexander Korotkov

ср, 5 авг. 2020 г., 09:13 Konstantin Knizhnik :
> Concerning degrade of basic index - B-Tree itself is balanced tree. Yes,
> insertion of random keys can cause split of B-Tree page.
> In the worst case half of B-Tree page will be empty. So B-Tree size will
> be two times larger than ideal tree.
> It may cause degrade up to two times. But that is all. There should not
> be infinite degrade of speed tending to zero.

My concerns are not just about space utilization.  My main concern is
about the order of the pages.  After the first merge the base index
will be filled in key order.  So physical page ordering perfectly
matches their logical ordering.  After the second merge some pages of
base index splits, and new pages are added to the end of the index.
Splits also happen in key order.  So, now physical and logical
orderings match within two extents corresponding to first and second
merges, but not within the whole tree.  While there are only few such
extents, disk page reads may in fact be mostly sequential, thanks to
OS cache and readahead.  But finally, after many merges, we can end up
with mostly random page reads.  For instance, leveldb doesn't have a
problem of ordering degradation, because it stores levels in sorted
files.

--
Regards,
Alexander Korotkov

Re: LSM tree for Postgres

On Tue, Aug 4, 2020 at 7:56 PM Konstantin Knizhnik
 wrote:
> I do not understand why do we need multiple indexes.
> We need one "hot" index which fits in memory to perform fast inserts.
> But it should not be too small to be able to accumulate substantial
> amount of records to provide efficient bulk insert.
> I expect that top index can be efficiently merger with based index
> because of better access locality.
> I.e. we will insert multiple entries into one B-Tree page and so
> minimize slowdown of random reads.
>
> Third index is needed to perform parallel merge (while merge is in
> progress top index will be blocked and we can not perform inserts in it).
> I do not understand benefits of performing more than one merge in
> parallel: it will only increase fraction of random reads.
>
> Degradation certainly takes place. But it is not so critical as in case
> of standard nbtree.
> It is possible to tune threshold for top index size to make merge most
> efficient.
> But we can not truncate and swap index before we complete the merge.
> So if merge operation takes long time, then it will cause exhaustion of
> top index and it will not fit in memory any more.
> It will lead to further slowdown (so we have negative feedback here).
>
> Certainly it is possible to create more top indexes, keeping their size
> small enough to fit in memory.
> But in this case search will require merge not of 3, but of N indexes.
> I think that it may cause unacceptable slowdown of search operations.
> And it is highly undesirable, taken in account that most of application
> send more select-only queries than updates.

The things you're writing makes me uneasy.  I initially understood
lsm3 as a quick and dirty prototype, while you're probably keeping
some design in your mind (for instance, original design of LSM).
However, your message makes me think you're trying to defend the
approach currently implemented in lsm3 extension.  Therefore, I've to
criticise this approach.

1) The base index can degrade.  At first, since merge can cause page
splits.  Therefore logical ordering of pages will become less
correlated with their physical ordering with each merge.
2) If your workload will include updates and/or deletes, page
utilization may also degrade.
3) While base index degrades, merge performance also degrades.
Traverse of base index in logical order will require more and more
random reads (at some point almost every page read will be random).
While the base index becomes large and/or bloat, you push less top
index tuples to a single base index page (at some point you will push
one tuple per page).

Original LSM design implies strict guarantees over average resources
spent per index operation.  Your design doesn't.  Moreover, I bet lsm3
will degrade significantly even on insert-only workload.  It should
degrade to the performance level of B-tree once you insert enough
data.  Try something like number_of_merges =
numer_of_tuples_per_index_page * 2 and you should see this
degradation.  Real LSM doesn't degrade that way.

> Yes, this approach increase mount of logged data twice:
> we need to write in WAL inserts in top and base indexes.
> And it will certainly have negative influence on performance.
> Unfortunately I have no idea how to avoid it without patching Postgres core.

Huh, I didn't mean "without patching Postgres core" :)  I mean it's
difficult in principle, assuming PostgreSQL recovery is single-process
and doesn't have access to system catalog (because it might be
inconsistent).

> Right now vacuum process Lsm3 indexes in usual way.
> Removing records from top indexes may be not needed at all (just because
> this indexes will be truncated in any case).
> But as far as size of top index is expected to be small enough
> vacuumming it should not take a long time,
> so I didn't to avoid it (although it should not be difficult - just
> disable ambulkdelete for correspondent  nbtree wrappers).

It doesn't seem important, but I don't get your point here.  Postgres
expects ambulkdelete to delete TIDs from index.  If you don't delete
it from the top index, this TID will be merged to the base index.  And
that could lead wrong query answers unless you eliminate those TIDs in
a different way (during the merge stage or something).

> Concerning deletes from main index - I do not understand how it can be
> optimized.

This is a trick you can learn from almost every LSM implementation.
For instance, check docs for leveldb [1] about "delete marker".  For
sure, that requires some redesign of the vacuum and can't be done in
extension (at least in the reasonable way).  But, frankly speaking, I
think core modifications are inevitable to utilize the power of LSM in
PostgreSQL.

Links
1. https://github.com/google/leveldb/blob/master/doc/impl.md

--
Regards,
Alexander Korotkov

Re: Concurrency bug in amcheck

On Wed, Aug 5, 2020 at 1:58 AM Peter Geoghegan  wrote:
> On Tue, Aug 4, 2020 at 7:27 AM Alexander Korotkov  
> wrote:
> > Thank you for your reminder.   Revised patch is attached.  Now, the 
> > contents of deleted btree pages isn't masked.  I've checked that 
> > installcheck passes with wal_consistency_checking='Btree'.  I'm going to 
> > push this if no objections.
>
> This looks good to me. One small thing, though: maybe the comments
> should not say anything about the REDO routine -- that seems like a
> case of "the tail wagging the dog" to me. Perhaps say something like:
>
> "Remove the last pivot tuple on the page.  This keeps things simple
> for WAL consistency checking."

Pushed.  Comment is changed as you suggested.  But I've replaced "last
pivot tuple" with "remaining tuples", because the page can also have a
high key, which is also a tuple.

--
Regards,
Alexander Korotkov

Re: LSM tree for Postgres

On Tue, Aug 4, 2020 at 6:11 PM Tomas Vondra
 wrote:
> On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote:
> >Hi hackers,
> >
> >I want to share results of my last research of implementing LSM index
> >in Postgres.
> >Most of modern databases (RocksDB, MongoDB, Tarantool,...) are using
> >LSM tree instead of classical B-Tree.
> >
>
> I was under the impression that LSM is more an alternative primary
> storage, not for indexes. Or am I wrong / confused?

As I understand, there are different use-cases.  We can use LSM for
index, and this is good already.  Such indexes would be faster for
insertions and probably even vacuum if we redesign it (see my previous
message), but slower for search.  But for updates/deletes you still
have to do random access to the heap.  And you also need to find a
heap record to update/delete, probably using the LSM index (and it's
slower for search than B-tree).

LSM as a primary storage can do more advanced tricks.  For instance,
some updates/inserts_on_conflict could be also just pushed to the top
level of LSM without fetching the affected record before.

So, in my point of view LSM as an index AM is far not a full power LSM
for PostgreSQL, but it's still useful.  Large insert-only tables can
benefit from LSM.  Large tables with many indexes could also benefit,
because non-HOT updates will become cheaper.

--
Regards,
Alexander Korotkov

Re: LSM tree for Postgres

166333
> RocksDB 10 141482
> Lsm3 1 151699
> Lsm3 10 65997
>
> Size of nbtree is about 29Gb, single client performance is even higher than 
> of RocksDB FDW, but parallel results are signficantly worser.

Did you investigate the source of degradation?  Such degradation
doesn't yet look inevitable for me.  Probably, things could be
improved.

> I will be glad to receive any feedback, may be some change requests or 
> proposals.

As I get you benchmarked just inserts.  But what about vacuum?  I
think the way Postgres vacuum works for now isn't optimal for lsm.
Postgres vacuum requires full scan of index, because it provides a
bitmap of tids to be deleted without information of index keys.  For
lsm, it would be better if vacuum would push delete requests to the
top level of lsm (as specially marked tuples of something).  Thanks to
that index deletes could be as efficient as inserts.  This is
especially important for lsm with many levels and/or aggressive
vacuum.

--
Regards,
Alexander Korotkov

Re: WIP: BRIN multi-range indexes

Hi, Tomas!

Sorry for the late reply.

On Sun, Jul 19, 2020 at 6:19 PM Tomas Vondra
 wrote:
> I think there's a number of weak points in this approach.
>
> Firstly, it assumes the summaries can be represented as arrays of
> built-in types, which I'm not really sure about. It clearly is not true
> for the bloom opclasses, for example. But even for minmax oclasses it's
> going to be tricky because the ranges may be on different data types so
> presumably we'd need somewhat nested data structure.
>
> Moreover, multi-minmax summary contains either points or intervals,
> which requires additional fields/flags to indicate that. That further
> complicates the things ...
>
> maybe we could decompose that into separate arrays or something, but
> honestly it seems somewhat premature - there are far more important
> aspects to discuss, I think (e.g. how the ranges are built/merged in
> multi-minmax, or whether bloom opclasses are useful at all).

I see.  But there is at least a second option to introduce a new
datatype with just an output function.  In the similar way
gist/tsvector_ops uses gtsvector key type.  I think it would be more
transparent than using just bytea.  Also, this is the way we already
use in the core.

> >BTW, I've applied the patchset to the current master, but I got a lot
> >of duplicate oids.  Could you please resolve these conflicts.  I think
> >it would be good to use high oid numbers to evade conflicts during
> >development/review, and rely on committer to set final oids (as
> >discussed in [1]).
> >
> >Links
> >1. 
> >https://www.postgresql.org/message-id/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-%3DeaochT0dd2BN9CQ%40mail.gmail.com
>
> Did you use the patchset from 2020/07/03? I don't get any duplicate OIDs
> with it, and it's already using quite high OIDs (part 4 uses >= 8000,
> part 5 uses >= 9000).

Yep, it appears that I was using the wrong version of patchset.
Patchset from 2020/07/03 works good on the current master.

--
Regards,
Alexander Korotkov

Re: Concurrency bug in amcheck

Hi, Peter!

On Sat, Aug 1, 2020 at 3:23 AM Peter Geoghegan  wrote:

> On Wed, May 13, 2020 at 4:06 PM Peter Geoghegan  wrote:
> > On Mon, May 11, 2020 at 5:56 AM Alexander Korotkov
> >  wrote:
> > > Thank you.  2nd patch is proposed for master and makes btree page
> > > unlink remove all the items from the page being deleted.
> >
> > This looks good, but can we do the
> > wal_consistency_checking/btree_mask() improvement, too?
>
> You never got around to committing the second patch (or the
> wal_consistency_checking stuff). Are you planning on picking it up
> again?
>

Thank you for your reminder.   Revised patch is attached.  Now, the
contents of deleted btree pages isn't masked.  I've checked that
installcheck passes with wal_consistency_checking='Btree'.  I'm going to
push this if no objections.

--
Regards,
Alexander Korotkov

0001-Remove-btree-page-items-after-page-unlink-2.patch
Description: Binary data

Re: Mark btree_gist functions as PARALLEL SAFE

2020-07-20 Thread Alexander Korotkov

On Mon, Jul 20, 2020 at 3:18 PM Justin Pryzby  wrote:

> On Wed, Jul 15, 2020 at 03:26:24PM +0300, Alexander Korotkov wrote:
> > On Thu, Jun 18, 2020 at 7:48 PM Winfield, Steven
> >  wrote:
> > > Done - thanks again.
> >
> > This patch looks good to me.
> >
> > I'm going to push this if no objections.
>
> I marked as committed to make patch checker look healthier.
>

Thank you!

--
Regards,
Alexander Korotkov

Re: Mark btree_gist functions as PARALLEL SAFE

2020-07-15 Thread Alexander Korotkov

On Thu, Jun 18, 2020 at 7:48 PM Winfield, Steven
 wrote:
> Done - thanks again.

This patch looks good to me.

I've rechecked it marks all the functions as parallel safe by
installing an extension and querying the catalog.  I've also rechecked
that there is nothing suspicious in these functions in terms of
parallel safety.  I did just minor adjustments in migration script
comments.

I'm going to push this if no objections.

--
Regards,
Alexander Korotkov


0001-Update-btree_gist-extension-for-parallel-query.patch
Description: Binary data

Re: WIP: BRIN multi-range indexes

2020-07-14 Thread Alexander Korotkov

On Mon, Jul 13, 2020 at 5:59 PM Tomas Vondra
 wrote:
> >> > If we really want to print something nicer, I'd say it needs to be a
> >> > special function in the BRIN opclass.
> >>
> >> If that can be done, then +1.  We just need to ensure that the function
> >> knows and can verify the type of index that the value comes from.  I
> >> guess we can pass the index OID so that it can extract the opclass from
> >> catalogs to verify.
> >
> >+1 from me, too. Perhaps we can have it as optional. If a BRIN opclass
> >doesn't have it, the 'values' can be null.
> >
>
> I'd say that if the opclass does not have it, then we should print the
> bytea value (or whatever the opclass uses to store the summary) using
> the type functions.

I've read the recent messages in this thread and I'd like to share my thoughts.

I think the way brin_page_items() displays values is not really
generic.  It uses a range-like textual representation of an array of
values, while that array doesn't necessarily have range semantics.

However, I think it's good that brin_page_items() uses a type output
function to display values.  So, it's not necessary to introduce a new
BRIN opclass function in order to get values displayed in a
human-readable way.  Instead, we could just make a standard of BRIN
value to be human readable.  I see at least two possibilities for
that.
1. Use standard container data-types to represent BRIN values.  For
instance we could use an array of ranges instead of bytea for
multirange.  Not about how convenient/performant it would be.
2. Introduce new data-type to represent values in BRIN index. And for
that type we can define output function with user-readable output. We
did similar things for GiST.  For instance, pg_trgm defines gtrgm
type, which has no input and no output. But for BRIN opclass we can
define type with just output.

BTW, I've applied the patchset to the current master, but I got a lot
of duplicate oids.  Could you please resolve these conflicts.  I think
it would be good to use high oid numbers to evade conflicts during
development/review, and rely on committer to set final oids (as
discussed in [1]).

Links
1. 
https://www.postgresql.org/message-id/CAH2-WzmMTGMcPuph4OvsO7Ykut0AOCF_i-%3DeaochT0dd2BN9CQ%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: output columns of \dAo and \dAp

2020-07-13 Thread Alexander Korotkov

On Mon, Jul 13, 2020 at 7:54 PM Jonathan S. Katz  wrote:
> On 7/13/20 10:37 AM, Tom Lane wrote:
> > Alexander Korotkov  writes:
> >> Good compromise.  Done as you proposed.
> >
> > I'm OK with this version.
>
> I saw this was committed and the item was adjusted on the Open Items list.

Thank you!

--
Regards,
Alexander Korotkov

Re: output columns of \dAo and \dAp

2020-07-13 Thread Alexander Korotkov

On Sat, Jul 11, 2020 at 10:59 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > The proposed patch is attached.  This patch is fixes two points:
> >  * Adds strategy number and purpose to output of \dAo
> >  * Renames "Left/right arg type" columns of \dAp to "Registered left/right 
> > type"
>
> I think that \dAp should additionally be changed to print the
> function via "oid::regprocedure", not just proname.  A possible
> compromise, if you think that's too wordy, is to do it that
> way for "\dAp+" while printing plain proname for "\dAp".

Good compromise.  Done as you proposed.

> BTW, isn't this:
>
>   "  format ('%%s (%%s, %%s)',\n"
>   "CASE\n"
>   "  WHEN pg_catalog.pg_operator_is_visible(op.oid) 
> \n"
>   "  THEN op.oprname::pg_catalog.text \n"
>   "  ELSE 
> o.amopopr::pg_catalog.regoper::pg_catalog.text \n"
>   "END,\n"
>   "pg_catalog.format_type(o.amoplefttype, NULL),\n"
>   "pg_catalog.format_type(o.amoprighttype, NULL)\n"
>   "  ) AS \"%s\"\n,"
>
> just an extremely painful way to duplicate the results of regoperator?
> (You could likely remove the joins to pg_proc and pg_operator altogether
> if you relied on regprocedure and regoperator casts.)

Yeah, this subquery is totally dumb.  Replaced with cast to regoperator.

> > I'm not yet convinced we should change the sort key for \dAo.
>
> After playing with this more, I'm less worried about that than
> I was.  I think I was concerned that the operator name would
> sort ahead of amopstrategy, but now I see that the op name isn't
> part of the sort key at all.

Ok.

> BTW, these queries seem inadequately schema-qualified, notably
> the format() calls.

Thank you for pointing.  I've added schema-qualification to pg_catalog
functions and tables.

--
Regards,
Alexander Korotkov


0001-Improvements-to-psql-dAo-and-dAp-commands-2.patch
Description: Binary data

Re: output columns of \dAo and \dAp

2020-07-11 Thread Alexander Korotkov

On Fri, Jul 10, 2020 at 2:24 AM Alexander Korotkov  wrote:
> On Thu, Jul 9, 2020 at 10:03 PM Jonathan S. Katz  wrote:
> > From the RMT perspective, if there is an agreed upon approach (which it
> > sounds like from the above) can someone please commit to working on
> > resolving this open item?
>
> I hardly can extract an approach from this thread, because for me the
> whole issue is about details :)
>
> But I think we can come to an agreement shortly.  And yes, I commit to
> resolve this.

The proposed patch is attached.  This patch is fixes two points:
 * Adds strategy number and purpose to output of \dAo
 * Renames "Left/right arg type" columns of \dAp to "Registered left/right type"

I'm not yet convinced we should change the sort key for \dAo.

Any thoughts?

--
Regards,
Alexander Korotkov

0001-Improvements-to-psql-dAo-and-dAp-commands.patch
Description: Binary data

Re: jsonpath versus NaN

2020-07-10 Thread Alexander Korotkov

On Thu, Jul 9, 2020 at 4:04 AM Alexander Korotkov  wrote:
> I understand both patches as fixes and propose to backpatch them to 12
> if no objections.

Both patches are pushed.

--
Regards,
Alexander Korotkov

Re: output columns of \dAo and \dAp

2020-07-09 Thread Alexander Korotkov

On Thu, Jul 9, 2020 at 10:03 PM Jonathan S. Katz  wrote:
> From the RMT perspective, if there is an agreed upon approach (which it
> sounds like from the above) can someone please commit to working on
> resolving this open item?

I hardly can extract an approach from this thread, because for me the
whole issue is about details :)

But I think we can come to an agreement shortly.  And yes, I commit to
resolve this.

--
Regards,
Alexander Korotkov

Re: Postgres is not able to handle more than 4k tables!?

2020-07-09 Thread Alexander Korotkov

On Thu, Jul 9, 2020 at 10:00 PM Nikolay Samokhvalov
 wrote:
> In addition to this, it would be good to consider another optimization for 
> the default transaction isolation level: making autovacuum to clean dead 
> tuples in relations that are not currently used in any transaction and when 
> there are no IN_PROGRESS transactions running at RR or S level (which is a 
> very common case because RC is the default level and this is what is actively 
> used in heavily loaded OLTP systems which often suffer from long-running 
> transactions). I don't know the details of how easy it would be to implement, 
> but it always wondered that autovacuum has the global XID "horizon".
>
> With such an optimization, the "hot_standby_feedback=on" mode could be 
> implemented also more gracefully, reporting "min(xid)" for ongoing 
> transactions on standbys separately for RC and RR/S levels.

Yes, the current way of calculation of dead tuples is lossy, because
we only rely on the oldest xid.  However, if we would keep the oldest
snapshot instead of oldest xmin, long-running transactions wouldn't be
such a disaster.  I don't think this is feasible with the current
snapshot model, because keeping full snapshots instead of just xmins
would bloat shared-memory structs and complicate computations.  But
CSN can certainly support this optimization.

--
Regards,
Alexander Korotkov

Re: Postgres is not able to handle more than 4k tables!?

2020-07-09 Thread Alexander Korotkov

On Thu, Jul 9, 2020 at 6:57 PM Konstantin Knizhnik
 wrote:
> 2. Remember in relation info XID of oldest active transaction at the
> moment of last autovacuum.
> At next autovacuum iteration we first of all compare this stored XID
> with current oldest active transaction XID
> and bypass vacuuming this relation if XID is not changed.

This option looks good for me independently of the use case under
consideration.  Long-running transactions are an old and well-known
problem.  If we can skip some useless work here, let's do this.

--
Regards,
Alexander Korotkov

Re: jsonpath versus NaN

2020-07-08 Thread Alexander Korotkov

On Thu, Jul 9, 2020 at 1:20 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > The patchset is attached, sorry for the delay.
>
> > The first patch improves error messages, which appears to be unclear
> > for me.  If one applies .double() method to a numeric value, we
> > restrict that this numeric value should fit to double precision type.
> > If it doesn't fit, the current error message just says the following.
>
> > ERROR: jsonpath item method .double() can only be applied to a numeric value
>
> > But that's confusing, because .double() method is naturally applied to
> > a numeric value.  Patch makes this message explicitly report that
> > numeric value is out of range for double type.  This patch also adds
> > test exercising this error.  When string can't be converted to double
> > precision, I think it's better to explicitly say that we expected
> > valid string representation of double precision type.
>
> I see your point here, but the English of the replacement error messages
> could be improved.  I suggest respectively
>
> numeric argument of jsonpath item method .%s() is out of range for type 
> double precision
>
> string argument of jsonpath item method .%s() is not a valid representation 
> of a double precision number

Good, thank you for corrections!

> As for 0002, I'd rather see the convertJsonbScalar() code changed back
> to the way it was, ie just
>
> case jbvNumeric:
> numlen = VARSIZE_ANY(scalarVal->val.numeric);
> padlen = padBufferToInt(buffer);
> ...
>
> There is no argument for having an its-not-NaN assertion here when
> there aren't similar assertions throughout the jsonb code.
>
> Also, it seems like it'd be smart to reject isinf() and isnan() results
> from float8in_internal_opt_error in both executeItemOptUnwrapTarget code
> paths, ie numeric source as well as string source.  Yeah, we don't expect
> to see those cases in a jbvNumeric (so I wouldn't change the error message
> text), but it's cheap insurance.

OK, corrected as you proposed.

> No other comments.

Revised patches are attached.

I understand both patches as fixes and propose to backpatch them to 12
if no objections.

--
Regards,
Alexander Korotkov


0001-Improve-error-reporting-for-jsonpath-.double-metho-2.patch
Description: Binary data


0002-Forbid-numeric-NaN-in-jsonpath-2.patch
Description: Binary data

Re: Quick doc patch

2020-07-08 Thread Alexander Korotkov

Hi!

On Wed, Jul 8, 2020 at 4:43 AM Michael Paquier  wrote:
> On Tue, Jul 07, 2020 at 06:36:10PM +0900, Michael Paquier wrote:
> > On Tue, Jul 07, 2020 at 09:58:59AM +0200, Daniel Gustafsson wrote:
> >> I agree, it looks like a copy-pasteo in 15cb2bd2700 which introduced the
> >> paragraph for both GIN and BRIN.  LGTM.  Adding Alexander who committed in 
> >> on
> >> cc.
> >
> > +1.
>
> Alexander does not seem to be around, so I have just applied the fix.
> There were more inconsistencies in gin.sgml and spgist.sgml missed in
> 14903f2, making the docs of GIN/SP-GiST less in line with the BRIN
> equivalent, so I have fixed both while on it.

I just read this thread.
Thank you for fixing this!

--
Regards,
Alexander Korotkov

Re: jsonpath versus NaN

2020-07-07 Thread Alexander Korotkov

On Wed, Jul 8, 2020 at 1:16 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > I'm going to push 0002 if there is no objection.
> > Regarding 0001, I think my new error messages need review.
>
> I do intend to review these, just didn't get to it yet.

OK, that you for noticing.  I wouldn't push anything before your review.

--
Regards,
Alexander Korotkov

Re: jsonpath versus NaN

2020-07-07 Thread Alexander Korotkov

On Mon, Jul 6, 2020 at 3:19 PM Alexander Korotkov  wrote:
> The patchset is attached, sorry for the delay.
>
> The first patch improves error messages, which appears to be unclear
> for me.  If one applies .double() method to a numeric value, we
> restrict that this numeric value should fit to double precision type.
> If it doesn't fit, the current error message just says the following.
>
> ERROR: jsonpath item method .double() can only be applied to a numeric value
>
> But that's confusing, because .double() method is naturally applied to
> a numeric value.  Patch makes this message explicitly report that
> numeric value is out of range for double type.  This patch also adds
> test exercising this error.  When string can't be converted to double
> precision, I think it's better to explicitly say that we expected
> valid string representation of double precision type.
>
> Second patch forbids to convert NaN using .double() method.  As I get,
> NaN can't be result of any jsonpath computations assuming there is no
> NaN input.  So, I just put an assert to convertJsonbScalar() ensuring
> there is no NaN in JsonbValue.

I'm going to push 0002 if there is no objection.

Regarding 0001, I think my new error messages need review.

--
Regards,
Alexander Korotkov

Re: output columns of \dAo and \dAp

2020-07-07 Thread Alexander Korotkov

On Sun, Jun 7, 2020 at 12:34 AM Tom Lane  wrote:
> Peter Eisentraut  writes:
> > I'm also wondering whether this is fully correct.  Would it be possible for 
> > the
> > argument types of the operator/function to differ from the left arg/right 
> > arg
> > types?  (Perhaps binary compatible?)
>
> pg_amop.h specifies that
>
>  * The primary key for this table is   * amopstrategy>.  amoplefttype and amoprighttype are just copies of the
>  * operator's oprleft/oprright, ie its declared input data types.
>
> Perhaps it'd be a good idea for opr_sanity.sql to verify that, since
> it'd be an easy thing to mess up in handmade pg_amop entries.  But
> at least for the foreseeable future, there's no reason for \dAo to show
> amoplefttype/amoprighttype separately.

+1 for checking consistency of amoplefttype/amoprighttype in opr_sanity.sql

> I agree that \dAo ought to be showing amopstrategy;

I agree that the strategy and purpose of an operator is valuable
information.  And we probably shouldn't hide it in \dAo. If we do so,
then \dAo and \dAo+ differ by only "sort opfamily" column.  Is it
worth keeping the \dAo+ command for single-column difference?

> moreover that ought
> to be much higher in the sort key than it is.

Do you mean we should sort by strategy number and only then by
arg types?  Current output shows operators grouped by opclasses,
after that cross-opclass operators are shown.  This order seems to me
more worthwhile than seeing all the variations of the same strategy
together.

> Also, if we're not going
> to show amoppurpose, then the view probably ought to hide non-search
> operators altogether.  It is REALLY misleading to not distinguish search
> and ordering operators.

+1

> The situation is different for pg_amproc: if you look for discrepancies
> you will find plenty, since in many cases a support function's signature
> has little to do with what types it is registered under.  Perhaps it'd be
> worthwhile for \dAp to show the functions as regprocedure in addition to
> showing amproclefttype/amprocrighttype explicitly.  In any case, I think
> it's rather misleading for \dAp to label amproclefttype/amprocrighttype as
> "Left arg type" and "Right arg type", because for almost everything except
> btree/hash, that's not what the support function's arguments actually are.
> Perhaps names along the lines of "Registered left type" and "Registered
> right type" would put readers in a better frame of mind to understand
> the entries.

+1 for rename "Left arg type"/"Right arg type" to "Registered left
type"/"Registered right type".

--
Regards,
Alexander Korotkov

Re: jsonpath versus NaN

2020-07-06 Thread Alexander Korotkov

On Thu, Jun 18, 2020 at 8:04 PM Alexander Korotkov
 wrote:
> On Thu, Jun 18, 2020 at 7:45 PM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > Thank you for your answer. I'm trying to understand your point.
> > > Standard claims that .double() method should behave the same way as
> > > CAST to double.  However, standard references the standard behavior of
> > > CAST here, not behavior of your implementation of CAST.  So, if we
> > > extend the functionality of standard CAST in our implementation, that
> > > doesn't automatically mean we should extend the .double() jsonpath
> > > method in the same way.  Is it correct?
> >
> > Right.  We could, if we chose, extend jsonpath to allow Inf/NaN, but
> > I don't believe there's an argument that the spec requires us to.
> >
> > Also the larger point is that it doesn't make sense to extend jsonpath
> > that way when we haven't extended json(b) that way.  This code wart
> > wouldn't exist were it not for that inconsistency.  Also, I find it hard
> > to see why anyone would have a use for NaN in a jsonpath test when they
> > can't write NaN in the input json data, nor have it be correctly reflected
> > into output json data either.
>
> Ok, I got the point.  I have nothing against removing support of NaN
> in jsonpath as far as it doesn't violates the standard.  I'm going to
> write the patch for this.

The patchset is attached, sorry for the delay.

The first patch improves error messages, which appears to be unclear
for me.  If one applies .double() method to a numeric value, we
restrict that this numeric value should fit to double precision type.
If it doesn't fit, the current error message just says the following.

ERROR: jsonpath item method .double() can only be applied to a numeric value

But that's confusing, because .double() method is naturally applied to
a numeric value.  Patch makes this message explicitly report that
numeric value is out of range for double type.  This patch also adds
test exercising this error.  When string can't be converted to double
precision, I think it's better to explicitly say that we expected
valid string representation of double precision type.

Second patch forbids to convert NaN using .double() method.  As I get,
NaN can't be result of any jsonpath computations assuming there is no
NaN input.  So, I just put an assert to convertJsonbScalar() ensuring
there is no NaN in JsonbValue.

--
Regards,
Alexander Korotkov

0001-Improve-error-reporting-for-jsonpath-.double-method.patch
Description: Binary data

0002-Forbid-NaNs-in-jsonpath.patch
Description: Binary data

Re: Operator class parameters and sgml docs

On Sun, Jun 21, 2020 at 2:28 AM Justin Pryzby  wrote:
> And a couple more in spgist.sgml (some of which were not added by this patch).

Pushed, thanks!

--
Regards,
Alexander Korotkov

Re: Operator class parameters and sgml docs

On Sat, Jun 20, 2020 at 10:15 PM Peter Geoghegan  wrote:
> On Sat, Jun 20, 2020 at 3:55 AM Alexander Korotkov  
> wrote:
> > So, pushed!
>
> Noticed one small thing. You forgot to update this part from the B-Tree docs:
>
> "As shown in Table 37.9, btree defines one required and three optional
> support functions. The four user-defined methods are:"

Thanks!  I've also spotted a similar issue in SP-GiST.  Fix for both is pushed.

--
Regards,
Alexander Korotkov

Re: git.postgresql.org ok?

On Sat, Jun 20, 2020 at 6:08 PM Stefan Kaltenbrunner
 wrote:
> On 6/20/20 4:46 PM, Alexander Korotkov wrote:
> > On Sat, Jun 20, 2020 at 3:32 PM Erik Rijkers  wrote:
> >> The mails that I get today from  pgsql-committers contain links (as
> >> usual) to git.postgresql.org
> >> but these links don't seem to give valid pages: I get what looks like a
> >> gitweb page but with '404 - Unknown commit object '
> >>
> >> example:
> >> https://git.postgresql.org/pg/commitdiff/15cb2bd27009f73a84a35c2ba60fdd105b4bf263
> >
> > I've also discovered similar issues.  It seems that new commit appears
> > at https://git.postgresql.org, but with delay. Commit notifications to
> > pgsql-committers also seem to work with delays.
> >
> >> And I can git-pull without error  but nothing more recent than this:
> >
> > I've discovered timeouts while accessing gitmaster.postgresql.org
>
> the root issue should be fixed as of a few minutes ago but it might take
> a bit until everything is synced up again.
>
>
> sorry for the inconvinience :/

No problem.  Thank you!

--
Regards,
Alexander Korotkov

Re: Operator class parameters and sgml docs

On Thu, Jun 18, 2020 at 8:06 PM Alexander Korotkov
 wrote:
> On Wed, Jun 17, 2020 at 2:00 PM Alexander Korotkov
>  wrote:
> > On Wed, Jun 17, 2020 at 2:50 AM Peter Geoghegan  wrote:
> > > On Tue, Jun 16, 2020 at 4:24 AM Alexander Korotkov
> > >  wrote:
> > > > Thank you for patience.  The documentation patch is attached.  I think
> > > > it requires review by native english speaker.
> > >
> > > * "...paramaters that controls" should be "...paramaters that control".
> > >
> > > * "with set of operator class specific option" should be "with a set
> > > of operator class specific options".
> > >
> > > * "The options could be accessible from each support function" should
> > > be "The options can be accessed from other support functions"
> >
> > Fixed, thanks!
> >
> > > It's very hard to write documentation like this, even for native
> > > English speakers. I think that it's important to have something in
> > > place, though. The GiST example helps a lot.
> >
> > I've added a complete example for defining a set of parameters and
> > accessing them from another support function to the GiST
> > documentation.
>
> I'm going to push this patch if there are no objections.  I'm almost
> sure that documentation of opclass options will require further
> adjustments.  However, I think the current patch makes it better, not
> worse.

So, pushed!

--
Regards,
Alexander Korotkov

Re: git.postgresql.org ok?

On Sat, Jun 20, 2020 at 3:32 PM Erik Rijkers  wrote:
> The mails that I get today from  pgsql-committers contain links (as
> usual) to git.postgresql.org
> but these links don't seem to give valid pages: I get what looks like a
> gitweb page but with '404 - Unknown commit object '
>
> example:
> https://git.postgresql.org/pg/commitdiff/15cb2bd27009f73a84a35c2ba60fdd105b4bf263

I've also discovered similar issues.  It seems that new commit appears
at https://git.postgresql.org, but with delay. Commit notifications to
pgsql-committers also seem to work with delays.

> And I can git-pull without error  but nothing more recent than this:

I've discovered timeouts while accessing gitmaster.postgresql.org

--
Regards,
Alexander Korotkov

Re: Failures with wal_consistency_checking and 13~

On Sat, Jun 20, 2020 at 1:16 PM Alexander Korotkov
 wrote:
> On Fri, Jun 19, 2020 at 10:34 PM Alvaro Herrera
>  wrote:
> >
> > On 2020-Jun-15, Michael Paquier wrote:
> >
> > > I have begun my annual study of WAL consistency across replays, and
> > > wal_consistency_checking = 'all' is pointing out at some issues with
> > > at least VACUUM and SPGist:
> > > FATAL:  inconsistent page found, rel 1663/16385/22133, forknum 0,
> > > blkno 15
> > > CONTEXT:  WAL redo at 0/739CEDE8 for SPGist/VACUUM_REDIRECT: newest
> > > XID 4619
> > >
> > > It may be possible that there are other failures, I have just run
> > > installcheck and this is the first failure I saw after replaying all
> > > the generated WAL on a standby.  Please note that I have also checked
> > > 12, and installcheck passes.
> >
> > Umm.  Alexander, do you an idea of what this is about?
>
> I don't have idea yet, but I'll check this out

I have discovered and fixed the issue in a44dd932ff.  spg_mask()
masked unused space only when pagehdr->pd_lower >
SizeOfPageHeaderData.  But during the vacuum regression tests, one
page has been erased completely and pagehdr->pd_lower was set to
SizeOfPageHeaderData.  Actually, 13 didn't introduce any issue, it
just added a test that spotted the issue.  The issue is here since
a507b86900.

--
Regards,
Alexander Korotkov

Re: Failures with wal_consistency_checking and 13~

On Fri, Jun 19, 2020 at 10:34 PM Alvaro Herrera
 wrote:
>
> On 2020-Jun-15, Michael Paquier wrote:
>
> > I have begun my annual study of WAL consistency across replays, and
> > wal_consistency_checking = 'all' is pointing out at some issues with
> > at least VACUUM and SPGist:
> > FATAL:  inconsistent page found, rel 1663/16385/22133, forknum 0,
> > blkno 15
> > CONTEXT:  WAL redo at 0/739CEDE8 for SPGist/VACUUM_REDIRECT: newest
> > XID 4619
> >
> > It may be possible that there are other failures, I have just run
> > installcheck and this is the first failure I saw after replaying all
> > the generated WAL on a standby.  Please note that I have also checked
> > 12, and installcheck passes.
>
> Umm.  Alexander, do you an idea of what this is about?

I don't have idea yet, but I'll check this out

--
Regards,
Alexander Korotkov

Re: Operator class parameters and sgml docs

On Wed, Jun 17, 2020 at 2:00 PM Alexander Korotkov
 wrote:
> On Wed, Jun 17, 2020 at 2:50 AM Peter Geoghegan  wrote:
> > On Tue, Jun 16, 2020 at 4:24 AM Alexander Korotkov
> >  wrote:
> > > Thank you for patience.  The documentation patch is attached.  I think
> > > it requires review by native english speaker.
> >
> > * "...paramaters that controls" should be "...paramaters that control".
> >
> > * "with set of operator class specific option" should be "with a set
> > of operator class specific options".
> >
> > * "The options could be accessible from each support function" should
> > be "The options can be accessed from other support functions"
>
> Fixed, thanks!
>
> > It's very hard to write documentation like this, even for native
> > English speakers. I think that it's important to have something in
> > place, though. The GiST example helps a lot.
>
> I've added a complete example for defining a set of parameters and
> accessing them from another support function to the GiST
> documentation.

I'm going to push this patch if there are no objections.  I'm almost
sure that documentation of opclass options will require further
adjustments.  However, I think the current patch makes it better, not
worse.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath versus NaN

On Thu, Jun 18, 2020 at 7:45 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > Thank you for your answer. I'm trying to understand your point.
> > Standard claims that .double() method should behave the same way as
> > CAST to double.  However, standard references the standard behavior of
> > CAST here, not behavior of your implementation of CAST.  So, if we
> > extend the functionality of standard CAST in our implementation, that
> > doesn't automatically mean we should extend the .double() jsonpath
> > method in the same way.  Is it correct?
>
> Right.  We could, if we chose, extend jsonpath to allow Inf/NaN, but
> I don't believe there's an argument that the spec requires us to.
>
> Also the larger point is that it doesn't make sense to extend jsonpath
> that way when we haven't extended json(b) that way.  This code wart
> wouldn't exist were it not for that inconsistency.  Also, I find it hard
> to see why anyone would have a use for NaN in a jsonpath test when they
> can't write NaN in the input json data, nor have it be correctly reflected
> into output json data either.

Ok, I got the point.  I have nothing against removing support of NaN
in jsonpath as far as it doesn't violates the standard.  I'm going to
write the patch for this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath versus NaN

On Thu, Jun 18, 2020 at 7:34 PM Alexander Korotkov
 wrote:
> Thank you for your answer. I'm trying to understand your point.
> Standard claims that .double() method should behave the same way as
> CAST to double.  However, standard references the standard behavior of
> CAST here, not behavior of your implementation of CAST.

Typo here: please read "our implementation of CAST" here.

> So, if we
> extend the functionality of standard CAST in our implementation, that
> doesn't automatically mean we should extend the .double() jsonpath
> method in the same way.  Is it correct?


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath versus NaN

Tom,

On Thu, Jun 18, 2020 at 7:07 PM Tom Lane  wrote:
> Oleg Bartunov  writes:
> > The problem is that we tried to find a trade-off between standard and
> > postgres implementation, for example, in postgres CAST allows NaN and
> > Inf, and SQL Standard requires .double should works as CAST.
>
> As I said, I think this is a fundamental misreading of the standard.
> The way I read it is that it requires the set of values that are legal
> according to the standard to be processed the same way as CAST would.

Thank you for your answer. I'm trying to understand your point.
Standard claims that .double() method should behave the same way as
CAST to double.  However, standard references the standard behavior of
CAST here, not behavior of your implementation of CAST.  So, if we
extend the functionality of standard CAST in our implementation, that
doesn't automatically mean we should extend the .double() jsonpath
method in the same way.  Is it correct?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Operator class parameters and sgml docs

2020-06-17 Thread Alexander Korotkov

On Wed, Jun 17, 2020 at 2:50 AM Peter Geoghegan  wrote:
> On Tue, Jun 16, 2020 at 4:24 AM Alexander Korotkov
>  wrote:
> > Thank you for patience.  The documentation patch is attached.  I think
> > it requires review by native english speaker.
>
> * "...paramaters that controls" should be "...paramaters that control".
>
> * "with set of operator class specific option" should be "with a set
> of operator class specific options".
>
> * "The options could be accessible from each support function" should
> be "The options can be accessed from other support functions"

Fixed, thanks!

> It's very hard to write documentation like this, even for native
> English speakers. I think that it's important to have something in
> place, though. The GiST example helps a lot.

I've added a complete example for defining a set of parameters and
accessing them from another support function to the GiST
documentation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


opclass_options_doc_2.patch
Description: Binary data

Re: Operator class parameters and sgml docs

2020-06-16 Thread Alexander Korotkov

On Thu, May 28, 2020 at 11:02 PM Alexander Korotkov
 wrote:
>
> On Thu, May 21, 2020 at 3:17 AM Alexander Korotkov
>  wrote:
> >
> > On Thu, May 21, 2020 at 12:37 AM Peter Geoghegan  wrote:
> > > Commit 911e7020770 added a variety of new support routines to index
> > > AMs. For example, it added a support function 5 to btree (see
> > > BTOPTIONS_PROC), but didn't document this alongside the other support
> > > functions in btree.sgml.
> > >
> > > It looks like the new support functions are fundamentally different to
> > > the existing ones in that they exist only as a way of supplying
> > > parameters to other support functions. The idea was to preserve
> > > compatibility with the old support function signatures. Even still, I
> > > think that the new support functions should get some mention alongside
> > > the older support functions.
> > >
> > > I also wonder whether or not xindex.sgml needs to be updated to
> > > account for opclass parameters.
> >
> > Thank you for pointing.  I'm going to take a look on this in next few days.
>
> I'm sorry for the delay.  I was very busy with various stuff.  I'm
> going to post docs patch next week.


Thank you for patience.  The documentation patch is attached.  I think
it requires review by native english speaker.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


opclass_options_doc.patch
Description: Binary data

Re: jsonpath versus NaN

2020-06-15 Thread Alexander Korotkov

On Thu, Jun 11, 2020 at 10:00 PM Tom Lane  wrote:
>
> Alexander Korotkov  writes:
> > On Thu, Jun 11, 2020 at 3:45 PM Tom Lane  wrote:
> >> It is entirely clear from the code, the documentation,
> >> and the relevant RFCs that JSONB does not allow NaNs as numeric
> >> values.
>
> > The JSONB itself doesn't store number NaNs.  It stores the string "NaN".
>
> Yeah, but you have a numeric NaN within the JsonbValue tree between
> executeItemOptUnwrapTarget and convertJsonbScalar.  Who's to say that
> that illegal-per-the-data-type structure won't escape to somewhere else?
> Or perhaps more likely, that we'll need additional warts in other random
> places in the JSON code to keep from spitting up on the transiently
> invalid structure.

I would propose to split two things: user-visible behavior and
internal implementation.  Internal implementation, which allows
numeric NaN within the JsonbValue, isn't perfect and we could improve
it.  But I'd like to determine desired user-visible behavior first,
then we can decide how to fix the implementation.

>
> > I found the relevant part of the standard.  Unfortunately, I can't
> > post the full standard here due to its license, but I think I can cite
> > the relevant part.
>
> I don't think this is very relevant.  The SQL standard has not got the
> concepts of Inf or NaN either (see 4.4.2 Characteristics of numbers),
> therefore their definition is only envisioning that a string representing
> a normal finite number should be castable to DOUBLE PRECISION.  Thus,
> both of the relevant standards think that "numbers" are just finite
> numbers.
>
> So when neither JSON nor SQL consider that "NaN" is an allowed sort
> of number, why are you doing violence to the code to allow it in a
> jsonpath?

Yes, I see.  No standard insists we should support NaN.  However,
standard claims .double() should behave the same as CAST to double.
So, I think if CAST supports NaN, but .double() doesn't, it's still a
violation.

> And if you insist on doing such violence, why didn't you
> do some more and kluge it to the point where "Inf" would work too?

Yep, according to standard .double() should support "Inf" as soon as
CAST to double does.  The reason why it wasn't implemented is that we
use numeric as the internal storage for all the numbers. And numeric
doesn't support Inf yet.

> (It would require slightly less klugery in the wake of the infinities-
> in-numeric patch that I'm going to post soon ... but that doesn't make
> it a good idea.)

If numerics would support infinites, we can follow standard and make
.double() method work the same way as CAST to double does.  Now, I get
that there is no much reason to keep current behaviour, which supports
Nan, but doesn't support Inf.  I think we should either support both
NaN and Inf and don't support any of them.  The latter is a violation
of the standard, but provides us with a simpler and cleaner
implementation.  What do you think?

BTW, we found what the standard says about serialization of SQL/JSON items.

9.37 Serializing an SQL/JSON item (page 695)
ii) Let JV be an implementation-dependent value of type TT and
encoding ENC such that these two conditions hold:
1) JV is a JSON text.
2) When applying the General Rules of Subclause 9.36, “Parsing JSON
text” with JV as JSON TEXT, FO as FORMAT OPTION, and WITHOUT UNIQUE
KEYS as UNIQUENESS CONSTRAINT, the returned STATUS is successful
completion and the returned SQL/JSON ITEM is an SQL/JSON item that is
equivalent to SJI.
If there is no such JV, then let ST be the exception condition: data
exception — invalid JSON text.

Basically it says that the resulting text should result in the same
SQL/JSON item when parsed.  I think this literally means that
serialization of numeric NaN is impossible as soon as it's impossible
to get numeric NaN as the result json parsing.  However, in the same
way this would mean that serialization of datetime is also impossible,
but that seems like nonsense.  So, I think this paragraph of the
standard is ill-conceived.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath versus NaN

2020-06-11 Thread Alexander Korotkov

Hi Tom,

Thank you for raising this issue.

On Thu, Jun 11, 2020 at 3:45 PM Tom Lane  wrote:
> Commit 72b646033 inserted this into convertJsonbScalar:
>
> break;
>
> case jbvNumeric:
> +   /* replace numeric NaN with string "NaN" */
> +   if (numeric_is_nan(scalarVal->val.numeric))
> +   {
> +   appendToBuffer(buffer, "NaN", 3);
> +   *jentry = 3;
> +   break;
> +   }
> +
> numlen = VARSIZE_ANY(scalarVal->val.numeric);
> padlen = padBufferToInt(buffer);
>
> To characterize this as hack, slash, and burn programming would be
> charitable.  It is entirely clear from the code, the documentation,
> and the relevant RFCs that JSONB does not allow NaNs as numeric
> values.

The JSONB itself doesn't store number NaNs.  It stores the string "NaN".

I found the relevant part of the standard.  Unfortunately, I can't
post the full standard here due to its license, but I think I can cite
the relevant part.

1) If JM specifies double, then For all j, 1 (one) ≤ j ≤ n,
Case:
a) If Ij is not a number or character string, then let ST be data
exception — non-numeric SQL/JSON item.
b) Otherwise, let X be an SQL variable whose value is Ij. Let Vj be
the result of
CAST (X AS DOUBLE PRECISION)
If this conversion results in an exception condition, then let ST be
that exception condition.

So, when we apply the .double() method to string, then the result
should be the same as if we cast string to double in SQL.  In SQL we
convert string 'NaN' to numeric NaN.  So, standard requires us to do
the same in SQL/JSON.

I didn't find yet what the standard says about serializing NaNs back
to JSON.  I'll keep you posted.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2020-06-07 Thread Alexander Korotkov

Hi!

On Sat, Jun 6, 2020 at 8:49 PM Peter Eisentraut
 wrote:
>
> On 2020-03-31 08:48, Michael Paquier wrote:
> > On Mon, Mar 30, 2020 at 05:00:01PM +0300, Alexey Kondratov wrote:
> >> What do think about adding following sanity check into xlogarchive.c?
> >>
> >> +#ifdef FRONTEND
> >> +#error "This file is not expected to be compiled for frontend code"
> >> +#endif
> >>
> >> It would prevent someone from adding typedefs and any other common
> >> definitions into xlogarchive.h in the future, leading to the accidental
> >> inclusion of both xlogarchive.h and fe_archive.h in the same time.
> > I don't see much the point as this would fail to compile anyway, and
> > that's not project-style.  Note that we have already a clear
> > separation here between the backend and the frontend code here as
> > xlogarchive.h is backend-only and fe_archive.h is frontend-only.
>
> Why is fe_archive.c in src/common/ and not in src/fe_utils/?  It is not
> "common" to frontend and backend.

Yep, this seems wrong to me.  I can propose following file rename.

src/common/fe_archive.c => src/fe_utils/archive.c
include/common/fe_archive.h => include/fe_utils/archive.h

> It actually defines functions that clash with functions in the backend,
> so this seems doubly wrong.


Let's have frontend version of RestoreArchivedFile() renamed as well.
What about RestoreArchivedFileFrontend()?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Operator class parameters and sgml docs

2020-05-28 Thread Alexander Korotkov

On Thu, May 21, 2020 at 3:17 AM Alexander Korotkov
 wrote:
>
> On Thu, May 21, 2020 at 12:37 AM Peter Geoghegan  wrote:
> > Commit 911e7020770 added a variety of new support routines to index
> > AMs. For example, it added a support function 5 to btree (see
> > BTOPTIONS_PROC), but didn't document this alongside the other support
> > functions in btree.sgml.
> >
> > It looks like the new support functions are fundamentally different to
> > the existing ones in that they exist only as a way of supplying
> > parameters to other support functions. The idea was to preserve
> > compatibility with the old support function signatures. Even still, I
> > think that the new support functions should get some mention alongside
> > the older support functions.
> >
> > I also wonder whether or not xindex.sgml needs to be updated to
> > account for opclass parameters.
>
> Thank you for pointing.  I'm going to take a look on this in next few days.


I'm sorry for the delay.  I was very busy with various stuff.  I'm
going to post docs patch next week.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Operator class parameters and sgml docs

2020-05-20 Thread Alexander Korotkov

Hi, Peter!

On Thu, May 21, 2020 at 12:37 AM Peter Geoghegan  wrote:
> Commit 911e7020770 added a variety of new support routines to index
> AMs. For example, it added a support function 5 to btree (see
> BTOPTIONS_PROC), but didn't document this alongside the other support
> functions in btree.sgml.
>
> It looks like the new support functions are fundamentally different to
> the existing ones in that they exist only as a way of supplying
> parameters to other support functions. The idea was to preserve
> compatibility with the old support function signatures. Even still, I
> think that the new support functions should get some mention alongside
> the older support functions.
>
> I also wonder whether or not xindex.sgml needs to be updated to
> account for opclass parameters.

Thank you for pointing.  I'm going to take a look on this in next few days.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Show opclass and opfamily related information in psql

2020-05-17 Thread Alexander Korotkov

On Thu, May 14, 2020 at 1:34 PM Alexander Korotkov
 wrote:
> On Thu, May 14, 2020 at 1:30 PM Nikita Glukhov  
> wrote:
> > I agree that this patch is an improvement.
>
> OK, I'm going to push this patch if no objections.
> (Sergey doesn't seem to continue involvement in PostgreSQL
> development, so it doesn't look like we should wait for him)

Pushed.  I also applied the same ordering modification to \dAp.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Show opclass and opfamily related information in psql

2020-05-14 Thread Alexander Korotkov

On Thu, May 14, 2020 at 1:30 PM Nikita Glukhov  wrote:
> I agree that this patch is an improvement.

OK, I'm going to push this patch if no objections.
(Sergey doesn't seem to continue involvement in PostgreSQL
development, so it doesn't look like we should wait for him)

--
Alexander Korotkov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Show opclass and opfamily related information in psql

2020-05-14 Thread Alexander Korotkov

Hi!

On Tue, May 12, 2020 at 12:09 AM Alvaro Herrera
 wrote:
> On 2020-Mar-08, Alexander Korotkov wrote:
>
> > Show opclass and opfamily related information in psql
> >
> > This commit provides psql commands for listing operator classes, operator
> > families and its contents in psql.  New commands will be useful for 
> > exploring
> > capabilities of both builtin opclasses/opfamilies as well as
> > opclasses/opfamilies defined in extensions.
>
> I had chance to use these new commands this morning.

Great, thank you!

> Note how operator for strategy 1 are all together, then strategy 2, and
> so on.  But I think we'd prefer the operators to be grouped together for
> the same types (just like \dAp already works); so I would change the clause
> from:
>   ORDER BY 1, 2, o.amopstrategy, 3;
> to:
>   ORDER BY 1, 2, pg_catalog.format_type(o.amoplefttype, NULL), 
> pg_catalog.format_type(o.amoprighttype, NULL), o.amopstrategy;

+1

> Also, while I'm going about this, ISTM it'd make sense to
> list same-class operators first, followed by cross-class operators.
> That requires to add "o.amoplefttype = o.amoprighttype DESC," after
> "ORDER BY 1, 2,".  For brin's integer_minmax_ops, the resulting list
> would have first (bigint,bigint) then (integer,integer) then
> (smallint,smallint), then all the rest:

+1

Nikita, what do you think?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PG 13 release notes, first draft

2020-05-11 Thread Alexander Korotkov

On Mon, May 11, 2020 at 4:20 PM Bruce Momjian  wrote:
> Sorry for the delay in replying.  Yes, I agree we should list all of
> those operator class cases where we now take arguments.  I looked at the
> patch but got confused and missed the doc changes that clearly need to
> be in the release notes.  I see these operator classes now taking
> parameters, as you helpfully listed in your commit message:
>
> tsvector_ops
> gist_ltree_ops
> gist__ltree_ops
> gist_trgm_ops
> gist_hstore_ops
> gist__int_ops
> gist__intbig_ops
>
> I assume the double-underscore is because the first underscore is to
> separate words, and the second one is for the array designation, right?

Yes, this is true.

> So my big question is whether people will understand when they are using
> these operator classes, since many of them are defaults.  Can you use an
> operator class parameter when you are just using the default operator
> class and not specifying its name?

Actually no.  Initial version of patch allowed to explicitly specify
DEFAULT keyword instead of opclass name.  But I didn't like idea to
allow keyword instead of name there.

I've tried to implement syntax allowing specifying parameters without
both new keyword and opclass name, but that causes a lot of grammar
problems.

Finally, I've decided to provide parameters functionality only when
specifying opclass name.  My motivation is that opclass parameters is
functionality for advanced users, who are deeply concerned into what
opclass do.  For this category of users it's natural to know the
opclass name.

> What I thinking  that just saying
> the operator class take arguments might not be helpful.  I think I see
> enough detail in the documentation to write release note items for
> these, but I will have to point out they need to specify the operator
> class, even if it is the default, right?

My point was that we should specify where to look to find new
functionality.  We can don't write opclass names, because those names
might be confusing for users who are not aware of them.  We may
briefly say that new parameters are introduced for GiST for tsvector,
contrib/intarray, contrib/pg_trgm, contrib/ltree, contrib/hstore.
What do you think?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Concurrency bug in amcheck

2020-05-11 Thread Alexander Korotkov

On Tue, Apr 28, 2020 at 4:05 AM Peter Geoghegan  wrote:
> On Mon, Apr 27, 2020 at 4:17 AM Alexander Korotkov
>  wrote:
> > > Assuming it doesn't seem we actually need any items on deleted pages,
> > > I can propose to delete them on primary as well.  That would make
> > > contents of primary and standby more consistent.  What do you think?
> >
> > So, my proposal is following.  Backpatch my fix upthread to 11.  In
> > master additionally make _bt_unlink_halfdead_page() remove page items
> > on primary as well.  Do you have any objections?
>
> What I meant was that we might as well review the behavior of
> _bt_unlink_halfdead_page() here, since we have to change it anyway.
> But I think you're right: No matter what happens or doesn't happen to
> _bt_unlink_halfdead_page(), the fact is that deleted pages may or may
> not have a single remaining item (which was originally the "top
> parent" item from the page at offset number P_HIKEY), now and forever.
> We have to conservatively assume that it could be either state, now
> and forever. That means that we definitely have to give up on the
> check, per your patch. So, +1 for backpatching that back to 11.

Thank you.  I've worked a bit on comments and commit message.  I would
appreciate you review.

> (BTW, I don't think that this is a concurrency issue, except in the
> sense that a test case that recreates the false positive is sensitive
> to concurrency -- I imagine you agree with this.)

Yes, I agree it's related to concurrency, but not exactly concurrency issue.

> I like your idea of making the primary consistent with the REDO
> routine on the master branch only. I wonder if that will make it
> possible to change btree_mask() so that wal_consistency_checking can
> check deleted pages as well. The contents of a deleted page's special
> area do matter, and yet we don't currently verify that it matches (we
> use mask_page_content() within btree_mask() for deleted pages, which
> seems inappropriately broad). In particular, the left and right
> sibling links should be consistent with the primary on a deleted page.

Thank you.  2nd patch is proposed for master and makes btree page
unlink remove all the items from the page being deleted.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Fix-amcheck-for-page-checks-concurrent-to-replay-of-.patch
Description: Binary data


0002-Remove-btree-page-items-after-page-unlink.patch
Description: Binary data

Re: PG 13 release notes, first draft

2020-05-07 Thread Alexander Korotkov

On Thu, May 7, 2020 at 2:46 AM Bruce Momjian  wrote:
> 
> Allow CREATE INDEX to specify the GiST signature length and maximum 
> number of integer ranges (Nikita Glukhov)
> 

Should we specify which particular opclasses are affected?  Or at
least mention it affects core and particular contribs?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: PG 13 release notes, first draft

2020-05-06 Thread Alexander Korotkov

On Tue, May 5, 2020 at 6:16 AM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 13 release notes.  You can
> see them here:
>
> https://momjian.us/pgsql_docs/release-13.html

Great, thank you!

It seems that opclass parameters (911e702077) are not reflected in the
release notes.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Own index methods

2020-05-06 Thread Alexander Korotkov

On Tue, May 5, 2020 at 5:10 PM Tom Lane  wrote:
>
> Benjamin Schaller  writes:
> > Even though it's described as fairly complicated: If I would want to
> > define my own index method, what would be a good approach to do so?
>
> contrib/bloom would make a sensible template, perhaps.

+1

You can also take a look at https://github.com/postgrespro/rum

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Concurrency bug in amcheck

2020-04-27 Thread Alexander Korotkov

On Mon, Apr 27, 2020 at 11:51 AM Alexander Korotkov
 wrote:
> On Wed, Apr 22, 2020 at 7:47 PM Peter Geoghegan  wrote:
> > On Tue, Apr 21, 2020 at 2:54 AM Alexander Korotkov
> >  wrote:
> > > Proposed fix is attached.  Spotted by Konstantin Knizhnik,
> > > reproduction case and fix from me.
> >
> > I wonder if we should fix btree_xlog_unlink_page() instead of amcheck.
> > We already know that its failure to be totally consistent with the
> > primary causes problems for backwards scans -- this problem can be
> > fixed at the same time:
> >
> > https://postgr.es/m/cantu0ohkr-evawbpzju54v8ecotqjjyyp3pq_sgobtrgxwh...@mail.gmail.com
> >
> > We'd probably still use your patch for the backbranches if we went this way.
> >
> > What do you think?
>
> I've skip through the thread.  It seems to be quite independent issue
> from this one.  This issue is related to the fact that we leave some
> items on deleted pages on primary, and on the same time we have no
> items on deleted pages on standby.  This inconsistency cause amcheck
> pass normally on primary, but fail on standby.  BackwardScan on
> standby issue seems to be related solely on locking protocol and
> btpo_prev, btpo_next pointers.  It wasn't mention on that thread that
> we might need hikeys on deleted pages.
>
> Assuming it doesn't seem we actually need any items on deleted pages,
> I can propose to delete them on primary as well.  That would make
> contents of primary and standby more consistent.  What do you think?

So, my proposal is following.  Backpatch my fix upthread to 11.  In
master additionally make _bt_unlink_halfdead_page() remove page items
on primary as well.  Do you have any objections?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Concurrency bug in amcheck

2020-04-27 Thread Alexander Korotkov

On Wed, Apr 22, 2020 at 7:47 PM Peter Geoghegan  wrote:
> On Tue, Apr 21, 2020 at 2:54 AM Alexander Korotkov
>  wrote:
> > Proposed fix is attached.  Spotted by Konstantin Knizhnik,
> > reproduction case and fix from me.
>
> I wonder if we should fix btree_xlog_unlink_page() instead of amcheck.
> We already know that its failure to be totally consistent with the
> primary causes problems for backwards scans -- this problem can be
> fixed at the same time:
>
> https://postgr.es/m/cantu0ohkr-evawbpzju54v8ecotqjjyyp3pq_sgobtrgxwh...@mail.gmail.com
>
> We'd probably still use your patch for the backbranches if we went this way.
>
> What do you think?

I've skip through the thread.  It seems to be quite independent issue
from this one.  This issue is related to the fact that we leave some
items on deleted pages on primary, and on the same time we have no
items on deleted pages on standby.  This inconsistency cause amcheck
pass normally on primary, but fail on standby.  BackwardScan on
standby issue seems to be related solely on locking protocol and
btpo_prev, btpo_next pointers.  It wasn't mention on that thread that
we might need hikeys on deleted pages.

Assuming it doesn't seem we actually need any items on deleted pages,
I can propose to delete them on primary as well.  That would make
contents of primary and standby more consistent.  What do you think?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Fix for pg_statio_all_tables

2020-04-22 Thread Alexander Korotkov

On Tue, Apr 21, 2020 at 4:59 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Tue, Apr 21, 2020 at 7:58 AM Tom Lane  wrote:
> >> Yeah, but that was for a security hole.  I am doubtful that the
> >> severity of this problem is bad enough to justify jumping through
> >> similar hoops.  Even if we fixed it and documented it, how many
> >> users would bother to apply the manual correction?
>
> > Sure, only most conscious users will do the manual correction.  But if
> > there are only two option: backpatch it this way or don't backpatch at
> > all, then I would choose the first one.
>
> Well, if it were something that you could just do and forget, then
> maybe.  But actually, you are proposing to invest a lot of *other*
> people's time --- notably me, as the likely author of the next
> set of release notes --- so it's not entirely up to you.

Sure, this is not entirely up to me.

> Given the lack of field complaints, I'm still of the opinion that
> this isn't really worth back-patching.

So, what exact strategy do you propose?

I don't like idea to postpone decision of what we do with
backbranches.  We may decide not to fix it in previous releases.  But
in order to handle this decision correctly I think we should document
this bug there.  I'm OK with doing this.  And I can put my efforts on
fixing it in the head and backpatching the documentation.  But does
this save significant resources in comparison with fixing bug in
backbranches?
--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Concurrency bug in amcheck

On Tue, Apr 21, 2020 at 12:54 PM Alexander Korotkov
 wrote:
> I found concurrency bug in amcheck running on replica.  When
> btree_xlog_unlink_page() replays changes to replica, deleted page is
> left with no items.  But if amcheck steps on such deleted page
> palloc_btree_page() expects it would have items.

I forgot to mention that I've reproduced it on master.  Quick check
shows bug should exist since 11.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Fix for pg_statio_all_tables

On Tue, Apr 21, 2020 at 7:58 AM Tom Lane  wrote:
> Michael Paquier  writes:
> > On Tue, Apr 21, 2020 at 02:44:45AM +0300, Alexander Korotkov wrote:
> >> As a bugfix, I think this should be backpatched.  But this patch
> >> requires catalog change.  Were  similar cases there before?  If so,
> >> how did we resolve them?
>
> > A backpatch can happen in such cases, see for example b6e39ca9.  In
> > this case, the resolution was done with a backpatch to
> > system_views.sql and the release notes include an additional note
> > saying that the fix applies itself only on already-initialized
> > clusters.  For other clusters, it was necessary to apply a SQL query,
> > given also in the release notes, to fix the issue (just grep for
> > CVE-2017-7547 in release-9.6.sgml on the REL9_6_STABLE branch).
>
> Yeah, but that was for a security hole.  I am doubtful that the
> severity of this problem is bad enough to justify jumping through
> similar hoops.  Even if we fixed it and documented it, how many
> users would bother to apply the manual correction?

Sure, only most conscious users will do the manual correction.  But if
there are only two option: backpatch it this way or don't backpatch at
all, then I would choose the first one.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Fix for pg_statio_all_tables

On Tue, Apr 21, 2020 at 4:38 AM Michael Paquier  wrote:
> On Tue, Apr 21, 2020 at 02:44:45AM +0300, Alexander Korotkov wrote:
> > Among all the joined tables, only "pg_index I" is expected to have
> > multiple rows associated with single relation.  But we do sum() for
> > toast index "pg_index X" as well.  As the result, we multiply
> > statistics for toast index by the number of relation indexes.  This is
> > obviously wrong.
>
> Oops.
>
> > As a bugfix, I think this should be backpatched.  But this patch
> > requires catalog change.  Were  similar cases there before?  If so,
> > how did we resolve them?
>
> A backpatch can happen in such cases, see for example b6e39ca9.  In
> this case, the resolution was done with a backpatch to
> system_views.sql and the release notes include an additional note
> saying that the fix applies itself only on already-initialized
> clusters.  For other clusters, it was necessary to apply a SQL query,
> given also in the release notes, to fix the issue (just grep for
> CVE-2017-7547 in release-9.6.sgml on the REL9_6_STABLE branch).

Thank you for pointing!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Concurrency bug in amcheck

Hi!

I found concurrency bug in amcheck running on replica.  When
btree_xlog_unlink_page() replays changes to replica, deleted page is
left with no items.  But if amcheck steps on such deleted page
palloc_btree_page() expects it would have items.

(lldb_on_primary) b btbulkdelete

primary=# drop table test;
primary=# create table test as (select random() x from
generate_series(1,100) i);
primary=# create index test_x_idx on test(x);
primary=# delete from test;
primary=# vacuum test;

(lldb_on_replica) b bt_check_level_from_leftmost

replica=# select bt_index_check('test_x_idx');

# skip to internal level
(lldb_on_replica) c
(lldb_on_replica) b palloc_btree_page
# skip to non-leftmost page
(lldb_on_replica) c
(lldb_on_replica) c
# concurrently delete btree pages
(lldb_on_primary) c
# continue with pages
(lldb_on_replica) c

Finally replica gets error.
ERROR:  internal block 289 in index "test_x_idx" lacks high key and/or
at least one downlink

Proposed fix is attached.  Spotted by Konstantin Knizhnik,
reproduction case and fix from me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix_amcheck_concurrency.patch
Description: Binary data

Fix for pg_statio_all_tables

2020-04-20 Thread Alexander Korotkov

Hi!

It appears that definition of pg_statio_all_tables has bug.

CREATE VIEW pg_statio_all_tables AS
SELECT
C.oid AS relid,
N.nspname AS schemaname,
C.relname AS relname,
pg_stat_get_blocks_fetched(C.oid) -
pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
sum(pg_stat_get_blocks_fetched(I.indexrelid) -
pg_stat_get_blocks_hit(I.indexrelid))::bigint AS
idx_blks_read,
sum(pg_stat_get_blocks_hit(I.indexrelid))::bigint AS idx_blks_hit,
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
sum(pg_stat_get_blocks_fetched(X.indexrelid) -
pg_stat_get_blocks_hit(X.indexrelid))::bigint AS
tidx_blks_read,
sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;

Among all the joined tables, only "pg_index I" is expected to have
multiple rows associated with single relation.  But we do sum() for
toast index "pg_index X" as well.  As the result, we multiply
statistics for toast index by the number of relation indexes.  This is
obviously wrong.

Attached patch fixes the view definition to count toast index statistics once.

As a bugfix, I think this should be backpatched.  But this patch
requires catalog change.  Were  similar cases there before?  If so,
how did we resolve them?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix_pg_statio_all_tables.patch
Description: Binary data

Re: Improving connection scalability: GetSnapshotData()

2020-04-08 Thread Alexander Korotkov

On Wed, Apr 8, 2020 at 3:43 PM Andres Freund  wrote:
> Realistically it still 2-3 hours of proof-reading.
>
> This makes me sad :(

Can we ask RMT to extend feature freeze for this particular patchset?
I think it's reasonable assuming extreme importance of this patchset.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH] RUM Postgres 13 patch

2020-04-08 Thread Alexander Korotkov

Hi, Pavel!

On Wed, Apr 8, 2020 at 12:14 PM Pavel Borisov  wrote:
> Due to changes in PG13 RUM extension had errors on compiling.
> I propose a short patch to correct this.

RUM is an extension managed by Postgres Pro and not discussed in
pgsql-hackers mailing lists.  Please, make a pull request on github.
https://github.com/postgrespro/rum

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-07 Thread Alexander Korotkov

On Wed, Apr 8, 2020 at 2:14 AM Kartyshov Ivan
 wrote:
> On 2020-04-08 00:27, Tom Lane wrote:
> > Alexander Korotkov  writes:
> »   WAIT FOR LSN lsn [ TIMEOUT timeout ]
> >
> > This seems like a really carelessly chosen syntax —- *three* new
> > keywords, when you probably didn't need any.  Are you not aware that
> > there is distributed overhead in the grammar for every keyword?
> > Plus, each new keyword carries the risk of breaking existing
> > applications, since it no longer works as an alias-not-preceded-by-AS.
> >
>
> To avoid creating new keywords, we could change syntax in the following
> way:
> WAIT FOR => DEPENDS ON

Looks OK for me.

> LSN => EVENT

I think it's too generic.  Not every event is lsn.  TBH, lsn is not
event at all :)

I wonder is we can still use word lsn, but don't use keyword for that.
Can we take arbitrary non-quoted literal there and check it later?

> TIMEOUT => WITH INTERVAL

I'm not yet sure about this.  Probably there are better options.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-07 Thread Alexander Korotkov

On Tue, Apr 7, 2020 at 10:58 PM Anna Akenteva  wrote:
> Thank you for your review!
> Ivan and I have worked on the patch and tried to address your comments:

I've pushed this.  I promise to do careful post-commit review and
resolve any issues arising.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-07 Thread Alexander Korotkov

On Tue, Apr 7, 2020 at 3:07 PM Anna Akenteva  wrote:
> On 2017-10-31 12:42:56, Ants Aasma wrote:
> > For lack of a better proposal I would like something along the lines
> > of:
> > WAIT FOR state_id[, state_id] [ OPTIONS (..)]
>
> As for giving up waiting for multiple events: we can only wait for LSN-s
> at the moment, and there seems to be no point in waiting for multiple
> LSN-s at once, because it's equivalent to waiting for the biggest LSN.
> So we opted for simpler grammar for now, only letting the user specify
> one LSN and one TIMEOUT. If in the future we allow waiting for something
> else, like XID-s, we can expand the grammar as needed.

+1
In the latest version of patch we have very brief and simple syntax
allowing to wait for given LSN with given timeout.  In future we can
expand this syntax in different ways.  I don't see that current syntax
is limiting us from something.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-04-06 Thread Alexander Korotkov

On Tue, Apr 7, 2020 at 12:58 AM Kartyshov Ivan
 wrote:
> On 2020-04-04 03:14, Alexander Korotkov wrote:
> > I think that now we would be fine with single LSN and single TIMEOUT.
> > In future we may add multiple LSNs/TIMEOUTs or/and support for
> > expressions as LSNs/TIMEOUTs if we figure out it's necessary.
> >
> > I also think it's good to couple waiting for lsn with beginning of
> > transaction is good idea.  Separate WAIT FOR LSN statement called in
> > the middle of transaction looks problematic for me. Imagine we have RR
> > isolation and already acquired the snapshot.  Then out snapshot can
> > block applying wal records, which we are waiting for.  That would be
> > implicit deadlock.  It would be nice to evade such deadlocks by
> > design.
> Ok, here is a new version of patch with single LSN and TIMEOUT.

I think this quite small feature, which already received quite amount
of review.  The last version is very pinched.  But I think it would be
good to commit some very basic version, which is at least some
progress in the area and could be extended in future.  I'm going to
pass trough the code tomorrow and commit this unless I found major
issues or somebody objects.

--
Alexander Korotkov

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: WIP: BRIN multi-range indexes

On Sun, Apr 5, 2020 at 8:00 PM Tomas Vondra
 wrote:
> On Sun, Apr 05, 2020 at 07:33:40PM +0300, Alexander Korotkov wrote:
> >On Sun, Apr 5, 2020 at 6:53 PM Tomas Vondra
> > wrote:
> >> On Sun, Apr 05, 2020 at 06:29:15PM +0300, Alexander Korotkov wrote:
> >> >On Thu, Apr 2, 2020 at 5:29 AM Tomas Vondra
> >> > wrote:
> >> >> On Sun, Dec 01, 2019 at 10:55:02AM +0900, Michael Paquier wrote:
> >> >> >On Thu, Sep 26, 2019 at 09:01:48PM +0200, Tomas Vondra wrote:
> >> >> >> Yeah, the opclass params patches got broken by 773df883e adding enum
> >> >> >> reloptions. The breakage is somewhat extensive so I'll leave it up to
> >> >> >> Nikita to fix it in [1]. Until that happens, apply the patches on
> >> >> >> top of caba97a9d9 for review.
> >> >> >
> >> >> >This has been close to two months now, so I have the patch as RwF.
> >> >> >Feel free to update if you think that's incorrect.
> >> >>
> >> >> I see the opclass parameters patch got committed a couple days ago, so
> >> >> I've rebased the patch series on top of it. The pach was marked RwF
> >> >> since 2019-11, so I'll add it to the next CF.
> >> >
> >> >I think this patchset was marked RwF mainly because slow progress on
> >> >opclass parameters.  Now we got opclass parameters committed, and I
> >> >think this patchset is in a pretty good shape.  Moreover, opclass
> >> >parameters patch comes with very small examples.  This patchset would
> >> >be great showcase for opclass parameters.
> >> >
> >> >I'd like to give this patchset a chance for v13.  I'm going to make
> >> >another pass trough this patchset.  If I wouldn't find serious issues,
> >> >I'm going to commit it.  Any objections?
> >> >
> >>
> >> I'm an author of the patchset and I'd love to see it committed, but I
> >> think that might be a bit too rushed and unfair (considering it was not
> >> included in the current CF at all).
> >>
> >> I think the code is correct and I'm not aware of any bugs, but I'm not
> >> sure there was sufficient discussion about things like costing, choosing
> >> parameter values (e.g.  number of values in the multi-minmax or bloom
> >> filter parameters).
> >
> >Ok!
> >
> >> That being said, I think the first couple of patches (that modify how
> >> BRIN deals with multi-key scans and IS NULL clauses) are simple enough
> >> and non-controversial, so maybe we could get 0001-0003 committed, and
> >> leave the bloom/multi-minmax opclasses for v14.
> >
> >Regarding 0001-0003 I've couple of notes:
> >1) They should revise BRIN extensibility documentation section.
> >2) I think 0002 and 0003 should be merged.  NULL ScanKeys should be
> >still passed to consistent function when oi_regular_nulls == false.
> >
> >Assuming we're not going to get 0001-0003 into v13, I'm not so
> >inclined to rush on these three as well.  But you're willing to commit
> >them, you can count round of review on me.
> >
>
> I have no intention to get 0001-0003 committed. I think those changes
> are beneficial on their own, but the primary reason was to support the
> new opclasses (which require those changes). And those parts are not
> going to make it into v13 ...

OK, no problem.
Let's do this for v14.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: SQL/JSON: functions

On Mon, Mar 23, 2020 at 8:28 PM Nikita Glukhov  wrote:
> Attached 47th version of the patches.
>
> On 21.03.2020 22:38, Pavel Stehule wrote:
>
>
> On 21. 3. 2020 v 11:07 Nikita Glukhov  wrote:
>>
>> Attached 46th version of the patches.
>>
>> On 20.03.2020 22:34, Pavel Stehule wrote:
>>
>>
>> On 19.03.2020 23:57 Nikita Glukhov  wrote:
>>>
>>> Attached 45th version of the patches.
>>>
>>> Nodes JsonFormat, JsonReturning, JsonPassing, JsonBehavior were fixed.
>>>
>>> On 17.03.2020 21:35, Pavel Stehule wrote:
>>>>
>>>> User functions json[b]_build_object_ext() and json[b]_build_array_ext() 
>>>> also
>>>> can be easily removed.   But it seems harder to remove new aggregate 
>>>> functions
>>>> json[b]_objectagg() and json[b]_agg_strict(), because they can't be called
>>>> directly from JsonCtorExpr node.
>>>
>>>
>>> I don't see reasons for another reduction now. Can be great if you can 
>>> finalize work what you plan for pg13.
>>>
>> I have removed json[b]_build_object_ext() and json[b]_build_array_ext().
>>
>> But json[b]_objectagg() and json[b]_agg_strict() are still present.
>> It seems that removing them requires majors refactoring of the execution
>> of Aggref and WindowFunc nodes.
>
> I have replaced aggregate function
>
> json[b]_objectagg(key any, val any, absent_on_null boolean, unique_keys 
> boolean)
>
> with three separate functions:
>
> json[b]_object_agg_strict(any, any)
> json[b]_object_agg_unique(any, any)
> json[b]_object_agg_unique_strict(any, any)
>
>
> This should be more correct than single aggregate with additional parameters.

I've following notes about this patchset.

1) Uniqueness checks using JsonbUniqueCheckContext and
JsonUniqueCheckContext have quadratic complexity over number of keys.
That doesn't look good especially for jsonb, which anyway sorts object
keys before object serialization.
2) We have two uniqueness checks for json type, which use
JsonbUniqueCheckContext and JsonUniqueState.  JsonUniqueState uses
stack of hashes, while JsonbUniqueCheckContext have just plain array
of keys.  I think we can make JsonUniqueState use single hash, where
object identifies would be part of hash key.  And we should replace
JsonbUniqueCheckContext with JsonUniqueState.  That would eliminate
extra entities and provide reasonable complexity for cases, which now
use JsonbUniqueCheckContext.
3) New SQL/JSON clauses don't use timezone and considered as immutable
assuming all the children are immutable.  Immutability is good, but
ignoring timezone in all the cases is plain wrong.  The first thing we
can do is to use timezone and make SQL/JSON clauses stable.  But that
limits their usage in functional and partial indexes.  I see couple of
things we can do next (one of them or probably both).
3.1) Provide user a way so specify that we should ignore timezone in
particular case (IGNORE TIMEZONE clause or something like that).  Then
SQL/JSON clause will be considered as immutable.
3.2) Automatically detect whether jsonpath might use timezone.  If
jsonpath doesn't use .datetime() method, it doesn't need timezone for
sure.  Also, from the datetime format specifiers we can get that we
don't compare non-timezoned values to timezoned values.  So, if we
detect this jsonpath never uses timezone, we can consider SQL/JSON
clause as immutable.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: WIP: BRIN multi-range indexes

On Sun, Apr 5, 2020 at 6:53 PM Tomas Vondra
 wrote:
> On Sun, Apr 05, 2020 at 06:29:15PM +0300, Alexander Korotkov wrote:
> >On Thu, Apr 2, 2020 at 5:29 AM Tomas Vondra
> > wrote:
> >> On Sun, Dec 01, 2019 at 10:55:02AM +0900, Michael Paquier wrote:
> >> >On Thu, Sep 26, 2019 at 09:01:48PM +0200, Tomas Vondra wrote:
> >> >> Yeah, the opclass params patches got broken by 773df883e adding enum
> >> >> reloptions. The breakage is somewhat extensive so I'll leave it up to
> >> >> Nikita to fix it in [1]. Until that happens, apply the patches on
> >> >> top of caba97a9d9 for review.
> >> >
> >> >This has been close to two months now, so I have the patch as RwF.
> >> >Feel free to update if you think that's incorrect.
> >>
> >> I see the opclass parameters patch got committed a couple days ago, so
> >> I've rebased the patch series on top of it. The pach was marked RwF
> >> since 2019-11, so I'll add it to the next CF.
> >
> >I think this patchset was marked RwF mainly because slow progress on
> >opclass parameters.  Now we got opclass parameters committed, and I
> >think this patchset is in a pretty good shape.  Moreover, opclass
> >parameters patch comes with very small examples.  This patchset would
> >be great showcase for opclass parameters.
> >
> >I'd like to give this patchset a chance for v13.  I'm going to make
> >another pass trough this patchset.  If I wouldn't find serious issues,
> >I'm going to commit it.  Any objections?
> >
>
> I'm an author of the patchset and I'd love to see it committed, but I
> think that might be a bit too rushed and unfair (considering it was not
> included in the current CF at all).
>
> I think the code is correct and I'm not aware of any bugs, but I'm not
> sure there was sufficient discussion about things like costing, choosing
> parameter values (e.g.  number of values in the multi-minmax or bloom
> filter parameters).

Ok!

> That being said, I think the first couple of patches (that modify how
> BRIN deals with multi-key scans and IS NULL clauses) are simple enough
> and non-controversial, so maybe we could get 0001-0003 committed, and
> leave the bloom/multi-minmax opclasses for v14.

Regarding 0001-0003 I've couple of notes:
1) They should revise BRIN extensibility documentation section.
2) I think 0002 and 0003 should be merged.  NULL ScanKeys should be
still passed to consistent function when oi_regular_nulls == false.

Assuming we're not going to get 0001-0003 into v13, I'm not so
inclined to rush on these three as well.  But you're willing to commit
them, you can count round of review on me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: WIP: BRIN multi-range indexes

On Sun, Apr 5, 2020 at 6:51 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > I'd like to give this patchset a chance for v13.  I'm going to make
> > another pass trough this patchset.  If I wouldn't find serious issues,
> > I'm going to commit it.  Any objections?
>
> I think it is way too late to be reviving major features that nobody
> has been looking at for months, that indeed were never even listed
> in the final CF.  At this point in the cycle I think we should just be
> trying to get small stuff over the line, not shove in major patches
> and figure they can be stabilized later.
>
> In this particular case, the last serious work on the patchset seems
> to have been Tomas' revision of 2019-09-14, and he specifically stated
> then that the APIs still needed work.  That doesn't sound like
> "it's about ready to commit" to me.

OK, got it.  Thank you for the feedback.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: WIP: BRIN multi-range indexes