from:"Stas Kelvich"

Re: [HACKERS] Issues with logical replication

2017-10-09 Thread Stas Kelvich


> On 2 Oct 2017, at 19:59, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> 
>> 
>> Program terminated with signal SIGABRT, Aborted.
>> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> 56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) bt
>> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> #1  0x7f3608f92028 in __GI_abort () at abort.c:89
>> #2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30
>> "!(((xid) != ((TransactionId) 0)))",
>> errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
>> lineNumber=582) at assert.c:54
>> #3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
>> oper=XLTW_None) at lmgr.c:582
>> #4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
>> cutoff=898498) at snapbuild.c:1400
>> #5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
>> lsn=798477760, running=0x2749198) at snapbuild.c:1311
>> #6  0x0081c339 in SnapBuildProcessRunningXacts
>> (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
> 
> 
> Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
> transaction ids but we specifically skip those in
> GetRunningTransactionData(). Can you check pg_waldump output for that
> record (lsn is shown in the backtrace)?

  I investigated this case and it seems that XactLockTableWait() in 
SnapBuildWaitSnapshot()
not always work as expected. XactLockTableWait() waits on LockAcquire() for xid 
to be 
completed and if we finally got this lock but transaction is still in progress 
then such xid
assumed to be a subxid. However LockAcquire/LockRelease cycle can happen after 
transaction
set xid, but before XactLockTableInsert().

Namely following history happened for xid 5225 and lead to crash:

[backend] LOG:  AssignTransactionId: XactTopTransactionId = 5225
   [walsender] LOG:  LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, 
xid[0]=5225
   [walsender] LOG:  XactLockTableWait: LockAcquire 5225
   [walsender] LOG:  XactLockTableWait: LockRelease 5225
[backend] LOG:  AssignTransactionId: LockAcquire ExclusiveLock 5225
   [walsender] LOG:  TransactionIdIsInProgress: SVC->latestCompletedXid=5224 < 
xid=5225 => true
[backend] LOG:  CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
[backend] LOG:  CommitTransaction: ResourceOwnerRelease locks xid=5225

I’ve quickly checked other usages of XactLockTableWait() and it seems that it 
is used mostly with
xids from heap so tx definetly set it lock somewhere in the past.

Not sure what it the best approach to handle that. May be write running xacts 
only if they already
set their lock?

Also attaching pgbench script that can reproduce failure. I run it over usual 
pgbench database
in 10 threads. It takes several minutes to crash.



reload.pgb
Description: Binary data

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-09-27 Thread Stas Kelvich

> On 7 Sep 2017, at 18:58, Nikhil Sontakke <nikh...@2ndquadrant.com> wrote:
> 
> Hi,
> 
> FYI all, wanted to mention that I am working on an updated version of
> the latest patch that I plan to submit to a later CF.
> 

Cool!

So what kind of architecture do you have in mind? Same way as is it was 
implemented before?
As far as I remember there were two main issues:

* Decodong of aborted prepared transaction.

If such transaction modified catalog then we can’t read reliable info with our 
historic snapshot,
since clog already have aborted bit for our tx it will brake visibility logic. 
There are some way to
deal with that — by doing catalog seq scan two times and counting number of 
tuples (details
upthread) or by hijacking clog values in historic visibility function. But ISTM 
it is better not solve this
issue at all =) In most cases intended usage of decoding of 2PC transaction is 
to do some form
of distributed commit, so naturally decoding will happens only with in-progress 
transactions and
we commit/abort will happen only after it is decoded, sent and response is 
received. So we can
just have atomic flag that prevents commit/abort of tx currently being decoded. 
And we can filter
interesting prepared transactions based on GID, to prevent holding this lock 
for ordinary 2pc.

* Possible deadlocks that Andres was talking about.

I spend some time trying to find that, but didn’t find any. If locking pg_class 
in prepared tx is the only
example then (imho) it is better to just forbid to prepare such transactions. 
Otherwise if some realistic
examples that can block decoding are actually exist, then we probably need to 
reconsider the way
tx being decoded. Anyway this part probably need Andres blessing.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-08-01 Thread Stas Kelvich

> On 31 Jul 2017, at 20:03, Robert Haas <robertmh...@gmail.com> wrote:
>
> Regardless of whether we share XIDs or DXIDs, we need a more complex
> concept of transaction state than we have now.

Seems that discussion shifted from 2PC itself to the general issues with
distributed
transactions. So it is probably appropriate to share here resume of things that
we
done in area of distributed visibility. During last two years we tried three
quite different
approaches and finally settled with Clock-SI.

At first, to test different approaches we did small patch that wrap calls to
visibility-related
functions (SetTransactionStatus, GetSnapshot, etc. Described in detail at
wiki[1] ) in order to
allow overload them from extension. Such approach allows to implement almost
anything
related to distributed visibility since you have full control about how local
visibility is done.
That API isn’t hard prerequisite, and if one wants to create some concrete
implementation
it can be done just in place. However, I think it is good to have such API in
some form.

So three approaches that we tried:

1) Postgres-XL-like:

That is most straightforward way. Basically we need separate network service
(GTM/DTM) that is
responsible for xid generation, and managing running-list of transactions. So
acquiring
xid and snapshot is done by network calls. Because of shared xid space it is
possible
to compare them in ordinary way and get right order. Gap between
non-simultaneous
commits by 2pc is covered by the fact that we getting our snapshots from GTM,
and
it will remove xid from running list only when transaction committed on both
nodes.

Such approach is okay for OLAP-style transactions where tps isn’t high. But
OLTP with
high transaction rate GTM will immediately became a bottleneck since even write
transactions
need to get snapshot from GTM. Even if they access only one node.

2) Incremental SI [2]

Approach with central coordinator, that can allow local reads without network
communications by slightly altering visibility rules.

Despite the fact that it is kind of patented, we also failed to achieve proper
visibility
by implementing algorithms from that paper. It always showed some
inconsistencies.
May be because of bugs in our implementation, may be because of some
typos/mistakes in algorithm description itself. Reasoning in paper wasn’t very
clear for us, as well as patent issues, so we just leaved that.

3) Clock-SI [3]

It is MS research paper, that describes algorithm similar to ones used in
Spanner and
CockroachDB, without central GTM and with reads that do not require network
roundtrip.

There are two ideas behind it:

* Assuming snapshot isolation and visibility on node are based on CSN, use
local time as CSN,
then when you are doing 2PC, collect prepare time from all participating nodes
and
commit transaction everywhere with maximum of that times. If node during read
faces tuples
committed by tx with CSN greater then their snapshot CSN (that can happen due to
time desynchronisation on node) then it just waits until that time come. So
time desynchronisation
can affect performance, but can’t affect correctness.

* During distributed commit transaction neither running (if it commits then
tuple
should be already visible) nor committed/aborted (it still can be aborted, so
it is illegal to read).
So here IN-DOUBT transaction state appears, when reader should wait for writers.

We managed to implement that using mentioned XTM api. XID<->CSN mapping is
accounted by extension itself. Speed/scalability are also good.

I want to resubmit implementation of that algorithm for FDW later in August,
along with some
isolation tests based on set of queries in [4].

[1] https://wiki.postgresql.org/wiki/DTM#eXtensible_Transaction_Manager_API
[2] http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf
[3]
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/samehe-clocksi.srds2013.pdf
[4] https://github.com/ept/hermitage

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-07-27 Thread Stas Kelvich

> On 27 Jul 2017, at 04:28, Robert Haas <robertmh...@gmail.com> wrote:
> 
>  However, you would not
> be guaranteed that all of those commits or rollbacks happen at
> anything like the same time.  So, you would have a sort of eventual
> consistency.

As far as I understand any solution that provides proper isolation for 
distributed
transactions in postgres will require distributed 2PC somewhere under the hood.
That is just consequence of parallelism that database allows — transactions can
abort due concurrent operations. So dichotomy is simple: either we need 2PC or
restrict write transactions to be physically serial.

In particular both Postgres-XL/XC and postgrespro multimaster are using 2PC to
commit distributed transaction.

Some years ago we created patches to implement transaction manager API and
that is just a way to inject consistent snapshots on different nodes, but atomic
commit itself is out of scope of TM API (hmm, may be it is better to call it 
snapshot
manager api?). That allows us to use it in quite different environments like 
fdw and
logical replication and both are using 2PC.

I want to submit TM API again during this release cycle along with 
implementation
for fdw. And I planned to base it on top of this patch. So I already rebased 
Masahiko’s
patch to current -master and started writing long list of nitpicks, but not 
finished yet.

Also I see the quite a big value in this patch even without 
tm/snapshots/whatever.
Right now fdw doesn’t guarantee neither isolation nor atomicity. And if one 
isn’t
doing cross-node analytical transactions it will be safe to live without 
isolation.
But living without atomicity means that some parts of data can be lost without 
simple
way to detect and fix that.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)

2017-06-23 Thread Stas Kelvich


> On 23 Jun 2017, at 00:08, Andres Freund <and...@anarazel.de> wrote:
> 
> Hi,
> 
> At pgcon some people were talking about the difficulty of instrumenting
> the time actually spent waiting for lwlocks and related measurements.
> I'd mentioned that linux these days provides infrastructure to measure
> such things in unmodified binaries.
> 
> Attached is a prototype of a script that measures the time spent inside
> PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and
> stacktrace.  That allows one to generate nice flame graphs showing which
> part of the code waits how long for lwlocks.
> 
> The attached script clearly needs improvements, but I thought I'd show
> some of the results it can get.  To run it you need the the python
> library of the 'bcc' project [1], and a sufficiently new kernel.  Some
> distributions, e.g. newer debian versions, package this as python-bpfcc
> and similar.
> 
> The output of the script can be turned into a flamegraph with
> https://github.com/brendangregg/FlameGraph 's flamegraph.pl.
> 
> Here's a few examples of a pgbench run. The number is the number of
> clients, sync/async indicates synchronous_commit on/off.  All the
> numbers here were generated with the libpq & pgbench batch patches
> applied and in use, but that's just because that was the state of my
> working tree.
> 
> http://anarazel.de/t/2017-06-22/pgsemwait_8_sync.svg
> http://anarazel.de/t/2017-06-22/pgsemwait_8_async.svg
> http://anarazel.de/t/2017-06-22/pgsemwait_64_sync.svg
> http://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg
> 
> A bonus, not that representative one is the wait for a pgbench readonly
> run after the above, with autovacuum previously disabled:
> http://anarazel.de/t/2017-06-22/pgsemwait_64_select.svg
> interesting to see how the backends themselves never end up having to
> flush WAL themselves, or at least not in a manner triggering contention.
> 
> I plan to write a few more of these, because they're hugely useful to
> understand actual locking behaviour. Among them:
> - time beteen Acquire/Release of lwlocks, grouped by lwlock
> - time beteen Acquire/Release of lwlocks, grouped by stack
> - callstack of acquirer and waker of lwlocks, grouped by caller stack, waiter 
> stack
> - ...
> 
> I think it might be interesting to collect a few of these somewhere
> centrally once halfway mature.  Maybe in src/tools or such.

Wow, that’s extremely helpful, thanks a lot.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Stas Kelvich


> On 10 May 2017, at 11:43, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> 
> On 09/05/17 22:11, Erik Rijkers wrote:
>> On 2017-05-09 21:00, Petr Jelinek wrote:
>>> On 09/05/17 19:54, Erik Rijkers wrote:
>>>> On 2017-05-09 11:50, Petr Jelinek wrote:
>>>> 
>>> 
>>> Ah okay, so this is same issue that's reported by both Masahiko Sawada
>>> [1] and Jeff Janes [2].
>>> 
>>> [1]
>>> https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com
>>> 
>>> [2]
>>> https://www.postgresql.org/message-id/flat/CAMkU%3D1xUJKs%3D2etq2K7bmbY51Q7g853HLxJ7qEB2Snog9oRvDw%40mail.gmail.com
>>> 
>> 
>> I don't understand why you come to that conclusion: both Masahiko Sawada
>> and Jeff Janes have a DROP SUBSCRIPTION in the mix; my cases haven't. 
>> Isn't that a real difference?
>> 
> 

Just noted another one issue/feature with snapshot builder — when logical 
decoding is in progress
and vacuum full command is being issued quite a big amount of files appears in 
pg_logical/mappings
and reside there till the checkpoint. Also having consequent vacuum full’s 
before checkpoint yields even
more files.

Having two pgbench-filled instances with logical replication between them:

for i in {1..100}; do psql -c 'vacuum full' && ls -la pg_logical/mappings | wc 
-l; done;
VACUUM
  73
VACUUM
 454
VACUUM
1146
VACUUM
2149
VACUUM
3463
VACUUM
5088
<...>
VACUUM
   44708
<…> // checkpoint happens somewhere here
VACUUM
   20921
WARNING:  could not truncate file "base/16384/61773": Too many open files in 
system
ERROR:  could not fsync file 
"pg_logical/mappings/map-4000-4df-0_A4EA29F8-5aa5-5ae6": Too many open files in 
system


I’m not sure whether this is boils down to some of the previous issues 
mentioned here or not, so posting
here as observation.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication ApplyContext bloat

2017-05-09 Thread Stas Kelvich


> On 9 May 2017, at 21:53, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> 
> wrote:
> 
> On 5/8/17 11:26, Peter Eisentraut wrote:
>> I think it would make more sense to reset ApplyMessageContext in
>> apply_dispatch(), so that it is done consistently after every message
>> (as the name implies), not only some messages.
> 
> Committed that.
> 
>> Also, perhaps ApplyMessageContext should be a child of
>> TopTransactionContext.  (You have it as a child of ApplyContext, which
>> is under TopMemoryContext.)
> 
> Left that as is.

Thanks!

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication ApplyContext bloat

2017-05-03 Thread Stas Kelvich


> On 20 Apr 2017, at 17:01, Dilip Kumar <dilipbal...@gmail.com> wrote:
> 
> On Thu, Apr 20, 2017 at 7:04 PM, Stas Kelvich <s.kelv...@postgrespro.ru> 
> wrote:
>> Thanks for noting.
>> 
>> Added short description of ApplyContext and ApplyMessageContext to README.
> 
> Typo
> 
> /analysys/analysis
> 

Fixed, thanks.

Added this to open items list.



apply_contexts_3.patch
Description: Binary data


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-20 Thread Stas Kelvich


> On 19 Apr 2017, at 16:07, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> 
> Stas Kelvich wrote:
> 
>> With patch MemoryContextStats() shows following hierarchy during slot 
>> operations in 
>> apply worker:
>> 
>> TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used
>>  ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used
>>ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 
>> used
>>  ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
>>ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
> 
> Please update src/backend/utils/mmgr/README to list the new contexts.


Thanks for noting.

Added short description of ApplyContext and ApplyMessageContext to README.



apply_contexts_2.patch
Description: Binary data




Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Stas Kelvich


> On 19 Apr 2017, at 14:30, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> 
> On 19/04/17 12:46, Stas Kelvich wrote:
>> 
>> Right now ApplyContext cleaned after each transaction and by this patch I 
>> basically 
>> suggest to clean it after each command counter increment. 
>> 
>> So in your definitions that is ApplyMessageContext, most short-lived one. We 
>> can
>> rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
>> possible name conflict when somebody will decide to keep something during 
>> whole
>> transaction or worker lifespan.
> 
> Yeah we can rename, and then rename the ApplyCacheContext to
> ApplyContext. That would leave the ApplyTransactionContext from Simon's
> proposal which is not really need it anywhere right now and I am not
> sure it will be needed given there is still TopTransactionContext
> present. If we really want ApplyTransactionContext we can probably just
> assign it to TopTransactionContext in ensure_transaction.
> 
>> 
>> P.S. It also seems to me now that resetting context in ensure_transaction() 
>> isn’t that
>> good idea, probably better just explicitly call reset at the end of each 
>> function involved.
>> 
> 
> Well it would definitely improve clarity.
> 

Okay, done.

With patch MemoryContextStats() shows following hierarchy during slot 
operations in 
apply worker:

TopMemoryContext: 83824 total in 5 blocks; 9224 free (8 chunks); 74600 used
  ApplyContext: 8192 total in 1 blocks; 6520 free (4 chunks); 1672 used
ApplyMessageContext: 8192 total in 1 blocks; 6632 free (11 chunks); 1560 
used
  ExecutorState: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used




apply_contexts.patch
Description: Binary data

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Stas Kelvich

> On 19 Apr 2017, at 13:34, Simon Riggs <si...@2ndquadrant.com> wrote:
> 
> On 19 April 2017 at 11:24, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> 
>> I'd still like you to add comment to the ApplyContext saying that it's
>> cleaned after handling each message so nothing that needs to survive for
>> example whole transaction can be allocated in it as that's not very well
>> visible IMHO (and I know I will forget about it when writing next patch
>> in that area).
> 
> It would be better to invent the contexts we want now, so we get the
> architecture right for future work. Otherwise we have problems with
> "can't backpatch this fix because that infrastructure doesn't exist in
> PG10" in a couple of years.
> 
> So I suggest we have
> 
> ApplyMessageContext - cleaned after every message
> ApplyTransactionContext - cleaned at EOXact
> ApplyContext - persistent

Right now ApplyContext cleaned after each transaction and by this patch I 
basically 
suggest to clean it after each command counter increment. 

So in your definitions that is ApplyMessageContext, most short-lived one. We can
rename now ApplyContext -> ApplyMessageContext to make that clear and avoid
possible name conflict when somebody will decide to keep something during whole
transaction or worker lifespan.

P.S. It also seems to me now that resetting context in ensure_transaction() 
isn’t that
good idea, probably better just explicitly call reset at the end of each 
function involved.

> 
> -- 
> Simon Riggs    http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication ApplyContext bloat

2017-04-19 Thread Stas Kelvich


> On 19 Apr 2017, at 12:37, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> 
> On 18/04/17 13:45, Stas Kelvich wrote:
>> Hi,
>> 
>> currently logical replication worker uses ApplyContext to decode received 
>> data
>> and that context is never freed during transaction processing. Hence if 
>> publication
>> side is performing something like 10M row inserts in single transaction, then
>> subscription worker will occupy more than 10G of ram (and can be killed by 
>> OOM).
>> 
>> Attached patch resets ApplyContext after each insert/update/delete/commit.
>> I’ve tried to reset context only on each 100/1000/1 value of 
>> CommandCounter,
>> but didn’t spotted any measurable difference in speed.
>> 
>> Problem spotted by Mikhail Shurutov.
>> 
> 
> Hmm we also do replication protocol handling inside the ApplyContext
> (LogicalRepApplyLoop), are you sure this change is safe in that regard?

Memory is bloated by logicalrep_read_* when palloc happens inside.
I’ve inserted context reset in ensure_transaction() which is only called in 
beginning
of hande_insert/update/delete/commit when data from previous call of that
function isn’t needed. So probably it is safe to clear memory there. At least
i don’t see any state that can be accessed later in this functions. Do you
have any specific concerns?


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication ApplyContext bloat

2017-04-18 Thread Stas Kelvich

Hi,

currently logical replication worker uses ApplyContext to decode received data
and that context is never freed during transaction processing. Hence if 
publication
side is performing something like 10M row inserts in single transaction, then
subscription worker will occupy more than 10G of ram (and can be killed by OOM).

Attached patch resets ApplyContext after each insert/update/delete/commit.
I’ve tried to reset context only on each 100/1000/1 value of CommandCounter,
but didn’t spotted any measurable difference in speed.

Problem spotted by Mikhail Shurutov.



applycontext_bloat.patch
Description: Binary data


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication - TRAP: FailedAssertion in pgstat.c

2017-04-17 Thread Stas Kelvich


> On 17 Apr 2017, at 10:30, Erik Rijkers <e...@xs4all.nl> wrote:
> 
> On 2017-04-16 20:41, Andres Freund wrote:
>> On 2017-04-16 10:46:21 +0200, Erik Rijkers wrote:
>>> On 2017-04-15 04:47, Erik Rijkers wrote:
>>> >
>>> > 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>>> > 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>>> > 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
>>> > 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
>>> > 0005-Skip-unnecessary-snapshot-builds.patch
>>> I am now using these newer patches:
>>> https://www.postgresql.org/message-id/30242bc6-eca4-b7bb-670e-8d0458753a8c%402ndquadrant.com
>>> > It builds fine, but when I run the old pbench-over-logical-replication
>>> > test I get:
>>> >
>>> > TRAP: FailedAssertion("!(entry->trans == ((void *)0))", File:
>>> > "pgstat.c", Line: 828)
>>> To get that error:
>> I presume this is the fault of
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=139eb9673cb84c76f493af7e68301ae204199746
>> if you git revert that individual commit, do things work again?
> 
> Yes, compiled from 67c2def11d4 with the above 4 patches, it runs flawlessly 
> again. (flawlessly= a few hours without any error)
> 

I’ve reproduced failure, this happens under tablesync worker and putting
pgstat_report_stat() under the previous condition block should help.

However for me it took about an hour of running this script to catch original 
assert.

Can you check with that patch applied?




logical_worker.patch
Description: Binary data

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failed recovery with new faster 2PC code

2017-04-17 Thread Stas Kelvich


> On 17 Apr 2017, at 12:14, Simon Riggs <si...@2ndquadrant.com> wrote:
> 
> On 15 April 2017 at 23:37, Jeff Janes <jeff.ja...@gmail.com> wrote:
>> After this commit, I get crash recovery failures when using prepared
>> transactions.
>> 
>> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>> Author: Simon Riggs <si...@2ndquadrant.com>
>> Date:   Tue Apr 4 15:56:56 2017 -0400
>> 
>>Speedup 2PC recovery by skipping two phase state files in normal path
> 
> Thanks Jeff for your tests.
> 
> So that's now two crash bugs in as many days and lack of clarity about
> how to fix it.
> 
> Stas, I thought this patch was very important to you, yet two releases
> in a row we are too-late-and-buggy.

I’m looking at pgstat issue in nearby thread right now and will switch to this
shortly.

If that’s possible, I’m asking to delay revert for several days.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-12 Thread Stas Kelvich

> On 12 Apr 2017, at 20:23, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Wed, Apr 12, 2017 at 1:18 PM, Nicolas Barbier
> <nicolas.barb...@gmail.com> wrote:
>> 2017-04-11 Robert Haas <robertmh...@gmail.com>:
>>> If the data quality is poor (say, 50% of lines have errors) it's
>>> almost impossible to avoid runaway XID consumption.
>> 
>> Yup, that seems difficult to work around with anything similar to the
>> proposed. So the docs might need to suggest not to insert a 300 GB
>> file with 50% erroneous lines :-).
> 
> Yep.  But it does seem reasonably likely that someone might shoot
> themselves in the foot anyway.  Maybe we just live with that.
> 

Moreover if that file consists of one-byte lines (plus one byte of newline char)
then during its import xid wraparound will happens 18 times =)

I think it’s reasonable at least to have something like max_errors parameter
to COPY, that will be set by default to 1000 for example. If user will hit that
limit then it is a good moment to put a warning about possible xid consumption
in case of bigger limit.

However I think it worth of quick research whether it is possible to create 
special
code path for COPY in which errors don’t cancel transaction. At least when
COPY called outside of transaction block.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication worker and statistics

2017-04-10 Thread Stas Kelvich


> On 10 Apr 2017, at 19:50, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> 
> wrote:
> 
> On 4/10/17 05:49, Stas Kelvich wrote:
>> Here is small patch to call statistics in logical worker. Originally i 
>> thought that stat
>> collection during logical replication should manually account amounts of 
>> changed tuples,
>> but seems that it is already smoothly handled on relation level. So call to 
>> pgstat_report_stat() is enough.
> 
> I wonder whether we need a similar call somewhere in tablesync.c.  It
> seems to work without it, though.

I thought it spins up the same worker from worker.c.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication worker and statistics

2017-04-10 Thread Stas Kelvich


> On 10 Apr 2017, at 05:20, Noah Misch <n...@leadboat.com> wrote:
> 
> On Wed, Apr 05, 2017 at 05:02:18PM +0300, Stas Kelvich wrote:
>>> On 27 Mar 2017, at 18:59, Robert Haas <robertmh...@gmail.com> wrote:
>>> On Mon, Mar 27, 2017 at 11:14 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
>>>> Logical replication worker should call pgstat_report_stat()?
>>>> Currently it doesn't seem to do that and no statistics about
>>>> table accesses by logical replication workers are collected.
>>>> For example, this can prevent autovacuum from working on
>>>> those tables properly.
>>> 
>>> Yeah, that doesn't sound good.
>> 
>> Seems that nobody is working on this, so i’m going to create the patch.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Here is small patch to call statistics in logical worker. Originally i thought 
that stat
collection during logical replication should manually account amounts of 
changed tuples,
but seems that it is already smoothly handled on relation level. So call to 
pgstat_report_stat() is enough.

Also i’ve added statistics checks to logrep tap tests, but that is probably 
quite fragile
without something like wait_for_stats() from regression test stats.sql.




call_pgstat_report_stat.diff
Description: Binary data


logical_worker_stats_test.diff
Description: Binary data



Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication worker and statistics

2017-04-05 Thread Stas Kelvich


> On 27 Mar 2017, at 18:59, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Mon, Mar 27, 2017 at 11:14 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
>> Logical replication worker should call pgstat_report_stat()?
>> Currently it doesn't seem to do that and no statistics about
>> table accesses by logical replication workers are collected.
>> For example, this can prevent autovacuum from working on
>> those tables properly.
> 
> Yeah, that doesn't sound good.


Seems that nobody is working on this, so i’m going to create the patch.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-04-04 Thread Stas Kelvich

> On 4 Apr 2017, at 04:23, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> 
> 
> I reviewed this patch but when I tried to build contrib/test_decoding
> I got the following error.
> 

Thanks!

Yes, seems that 18ce3a4a changed ProcessUtility_hook signature.
Updated.

> There are still some unnecessary code in v5 patch.

Actually second diff isn’t intended to be part of the patch, I've just shared
the way I ran regression test suite through the 2pc decoding changing
all commits to prepare/commits where commits happens only after decoding
of prepare is finished (more details in my previous message in this thread).

That is just argument against Andres concern that prepared transaction
is able to deadlock with decoding process — at least no such cases in
regression tests.

And that concern is main thing blocking this patch. Except explicit catalog
locks in prepared tx nobody yet found such cases and it is hard to address
or argue about.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

logical_twophase_v6.diff
Description: Binary data

logical_twophase_regresstest.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-29 Thread Stas Kelvich

> On 28 Mar 2017, at 18:08, Andres Freund <and...@anarazel.de> wrote:
> 
> On 2017-03-28 15:55:15 +0100, Simon Riggs wrote:
>> 
>> 
>> That assertion is obviously false... the plugin can resolve this in
>> various ways, if we allow it.
> 
> Handling it by breaking replication isn't handling it (e.g. timeouts in
> decoding etc).  Handling it by rolling back *prepared* transactions
> (which are supposed to be guaranteed to succeed!), isn't either.
> 
> 
>> You can say that in your opinion you prefer to see this handled in
>> some higher level way, though it would be good to hear why and how.
> 
> It's pretty obvious why: A bit of DDL by the user shouldn't lead to the
> issues mentioned above.
> 
> 
>> Bottom line here is we shouldn't reject this patch on this point,
> 
> I think it definitely has to be rejected because of that.  And I didn't
> bring this up at the last minute, I repeatedly brought it up before.
> Both to Craig and Stas.

Okay. In order to find more realistic cases that blocks replication
i’ve created following setup:

* in backend: tests_decoding plugins hooks on xact events and utility
statement hooks and transform each commit into prepare, then sleeps
on latch. If transaction contains DDL that whole statement pushed in
wal as transactional message. If DDL can not be prepared or disallows
execution in transaction block than it goes as nontransactional logical
message without prepare/decode injection. If transaction didn’t issued any
DDL and didn’t write anything to wal, then it skips 2pc too.

* after prepare is decoded, output plugin in walsender unlocks backend
allowing to proceed with commit prepared. So in case when decoding
tries to access blocked catalog everything should stop.

* small python script that consumes decoded wal from walsender (thanks
Craig and Petr)

After small acrobatics with that hooks I’ve managed to run whole
regression suite in parallel mode through such setup of test_decoding
without any deadlocks. I’ve added two xact_events to postgres and
allowedn prepare of transactions that touched temp tables since
they are heavily used in tests and creates a lot of noise in diffs.

So it boils down to 3 failed regression tests out of 177, namely:

* transactions.sql — here commit of tx stucks with obtaining SafeSnapshot().
I didn’t look what is happening there specifically, but just checked that
walsender isn’t blocked. I’m going to look more closely at this.

* prepared_xacts.sql — here select prepared_xacts() sees our prepared
tx. It is possible to filter them out, but obviously it works as expected.

* guc.sql — here pendingActions arrives on 'DISCARD ALL’ preventing tx
from being prepared. I didn’t found the way to check presence of
pendingActions outside of async.c so decided to leave it as is.

It seems that at least in regression tests nothing can block twophase
logical decoding. Is that strong enough argument to hypothesis that current
approach doesn’t creates deadlock except locks on catalog which should be
disallowed anyway?

Patches attached. logical_twophase_v5 is slightly modified version of previous
patch merged with Craig’s changes. Second file is set of patches over previous
one, that implements logic i’ve just described. There is runtest.sh script that
setups postgres, runs python logical consumer in background and starts
regression test.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

logical_twophase_v5.diff
Description: Binary data

logical_twophase_regresstest.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich

> On 28 Mar 2017, at 00:25, Andres Freund <and...@anarazel.de> wrote:
> 
> Hi,
> 
> On 2017-03-28 00:19:29 +0300, Stas Kelvich wrote:
>> Ok, here it is.
> 
> On a very quick skim, this doesn't seem to solve the issues around
> deadlocks of prepared transactions vs. catalog tables.  What if the
> prepared transaction contains something like LOCK pg_class; (there's a
> lot more realistic examples)? Then decoding won't be able to continue,
> until that transaction is committed / aborted?

But why is that deadlock? Seems as just lock.

In case of prepared lock of pg_class decoding will wait until it committed and
then continue to decode. As well as anything in postgres that accesses pg_class,
including inability to connect to database and bricking database if you 
accidentally
disconnected before committing that tx (as you showed me some while ago :-).

IMO it is issue of being able to prepare such lock, than of decoding.

Is there any other scenarios where catalog readers are blocked except explicit 
lock
on catalog table? Alters on catalogs seems to be prohibited.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich

> On 28 Mar 2017, at 00:19, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
> 
> * It is actually doesn’t pass one of mine regression tests. I’ve added 
> expected output
> as it should be. I’ll try to send follow up message with fix, but right now 
> sending it
> as is, as you asked.
> 
> 

Fixed. I forgot to postpone ReorderBufferTxn cleanup in case of prepare.

So it pass provided regression tests right now.

I’ll give it more testing tomorrow and going to write TAP test to check 
behaviour
when we loose info whether prepare was sent to subscriber or not.

logical_twophase.diff
Description: Binary data

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich

> On 27 Mar 2017, at 16:29, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> On 27 March 2017 at 17:53, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
> 
>> I’m heavily underestimated amount of changes there, but almost finished
>> and will send updated patch in several hours.
> 
> Oh, brilliant! Please post whatever you have before you knock off for
> the day anyway, even if it's just a WIP, so I can pick it up tomorrow
> my time and poke at its tests etc.
> 

Ok, here it is.

Major differences comparing to previous version:

* GID is stored to commit/abort records only when wal_level >= logical.

* More consistency about storing and parsing origin info. Now it
is stored in prepare and abort records when repsession is active.

* Some clenup, function renames to get rid of xact_even/gid fields
in ReorderBuffer which i used only to copy them ReorderBufferTXN.

* Changed output plugin interface to one that was suggested upthread.
Now prepare/CP/AP is separate callback, and if none of them is set
then 2pc tx will be decoded as 1pc to provide back-compatibility.

* New callback filter_prepare() that can be used to switch between
1pc/2pc style of decoding 2pc tx.

* test_decoding uses new API and filters out aborted and running prepared tx.
It is actually easy to move unlock of 2PCState there to prepare callback to 
allow
decode of running tx, but since that extension is example ISTM that is better 
not to
hold that lock there during whole prepare decoding. However I leaved
enough information there about this and about case when that locks are not need 
at all
(when we are coordinating this tx).
  Talking about locking of running prepared tx during decode, I think better 
solution
would be to use own custom lock here and register XACT_EVENT_PRE_ABORT
callback in extension to conflict with this lock. Decode should hold it in 
shared way,
while commit in excluseve. That will allow to lock stuff granularly ang block 
only
tx that is being decoded.
  However we don’t have XACT_EVENT_PRE_ABORT, but it is several LOCs to
add it. Should I?

* It is actually doesn’t pass one of mine regression tests. I’ve added expected 
output
as it should be. I’ll try to send follow up message with fix, but right now 
sending it
as is, as you asked.

logical_twophase.diff
Description: Binary data

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich


> On 27 Mar 2017, at 12:26, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> On 27 March 2017 at 09:31, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
>> We're in the last week of the CF. If you have a patch that's nearly
>> ready or getting there, now would be a good time to post it for help
>> and input from others.
>> 
>> I would really like to get this in, but we're running out of time.
>> 
>> Even if you just post your snapshot management work, with the cosmetic
>> changes discussed above, that would be a valuable start.
> 
> I'm going to pick up the last patch and:

I’m heavily underestimated amount of changes there, but almost finished
and will send updated patch in several hours.

> * Ensure we only add the GID to xact records for 2pc commits and aborts

And only during wal_level >= logical. Done.
Also patch adds origin info to prepares and aborts.

> * Add separate callbacks for prepare, abort prepared, and commit
> prepared (of xacts already processed during prepare), so we aren't
> overloading the "commit" callback and don't have to create fake empty
> transactions to pass to the commit callback;

Done.

> * Add another callback to determine whether an xact should be
> processed at PREPARE TRANSACTION or COMMIT PREPARED time.

Also done.

> * Rename the snapshot builder faux-commit stuff in the current patch
> so it's clearer what's going on.

Hm. Okay, i’ll leave that part to you.

> * Write tests covering DDL, abort-during-decode, etc

I’ve extended test, but it is good to have some more.

> Some special care is needed for the callback that decides whether to
> process a given xact as 2PC or not. It's called before PREPARE
> TRANSACTION to decide whether to decode any given xact at prepare time
> or wait until it commits. It's called again at COMMIT PREPARED time if
> we crashed after we processed PREPARE TRANSACTION and advanced our
> confirmed_flush_lsn such that we won't re-process the PREPARE
> TRANSACTION again. Our restart_lsn might've advanced past it so we
> never even decode it, so we can't rely on seeing it at all. It has
> access to the xid, gid and invalidations, all of which we have at both
> prepare and commit time, to make its decision from. It must have the
> same result at prepare and commit time for any given xact. We can
> probably use a cache in the reorder buffer to avoid the 2nd call on
> commit prepared if we haven't crashed/reconnected between the two.

Good point. Didn’t think about restart_lsn in case when we are skipping this
particular prepare (filter_prepared() -> true, in my terms). I think that should
work properly as it use the same code path as it was before, but I’ll look at 
it.

> This proposal does not provide a way to safely decode a 2pc xact that
> made catalog changes which may be aborted while being decoded. The
> plugin must lock such an xact so that it can't be aborted while being
> processed, or defer decoding until commit prepared. It can use the
> invalidations for the commit to decide.

I had played with that two-pass catalog scan and it seems to be
working but after some time I realised that it is not useful for the main
case when commit/abort is generated after receiver side will answer to
prepares. Also that two-pass scan is a massive change in relcache.c and
genam.c (FWIW there were no problems with cache, but some problems
with index scan and handling one-to-many queries to catalog, e.g. table
with it fields)

Finally i decided to throw it and switched to filter_prepare callback and
passed there txn structure to allow access to has_catalog_changes field.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Stas Kelvich


> On 23 Mar 2017, at 15:53, Craig Ringer  wrote:
> 
> On 23 March 2017 at 19:33, Alexey Kondratov  
> wrote:
> 
>> (1) Add errors handling to COPY as a minimum program
> 
> Huge +1 if you can do it in an efficient way.
> 
> I think the main barrier to doing so is that the naïve approach
> creates a subtransaction for every row, which is pretty dire in
> performance terms and burns transaction IDs very rapidly.
> 
> Most of our datatype I/O functions, etc, have no facility for being
> invoked in a mode where they fail nicely and clean up after
> themselves. We rely on unwinding the subtransaction's memory context
> for error handling, for releasing any LWLocks that were taken, etc.
> There's no try_timestamptz_in function or anything, just
> timestamptz_in, and it ERROR's if it doesn't like its input. You
> cannot safely PG_TRY / PG_CATCH such an exception and continue
> processing to, say, write another row.
> 
> Currently we also don't have a way to differentiate between
> 
> * "this row is structurally invalid" (wrong number of columns, etc)
> * "this row is structually valid but has fields we could not parse
> into their data types"
> * "this row looks structurally valid and has data types we could
> parse, but does not satisfy a constraint on the destination table"
> 
> Nor do we have a way to write to any kind of failure-log table in the
> database, since a simple approach relies on aborting subtransactions
> to clean up failed inserts so it can't write anything for failed rows.
> Not without starting a 2nd subxact to record the failure, anyway.

If we are optimising COPY for case with small amount of bad rows
than 2nd subtransaction seems as not a bad idea. We can try to
apply batch in subtx, if it fails on some row N then insert rows [1, N)
in next subtx, report an error, commit subtx. Row N+1 can be treated
as beginning of next batch.


But if there will be some problems with handling everything with
subtransaction and since parallelism is anyway mentioned, what about
starting bgworker that will perform data insertion and will be controlled
by backend?

For example backend can do following:

* Start bgworker (or even parallel worker) 
* Get chunk of rows out of the file and try to apply them in batch
as subtransaction in bgworker.
* If it fails than we can open transaction in backend itself and
raise notice / move failed rows to special errors table.

> So, having said why it's hard, I don't really have much for you in
> terms of suggestions for ways forward. User-defined data types,
> user-defined constraints and triggers, etc mean anything involving
> significant interface changes will be a hard sell, especially in
> something pretty performance-sensitive like COPY.
> 
> I guess it'd be worth setting out your goals first. Do you want to
> handle all the kinds of problems above? Malformed  rows, rows with
> malformed field values, and rows that fail to satisfy a constraint? or
> just some subset?
> 
> 
> 
> -- 
> Craig Ringer   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-20 Thread Stas Kelvich


> On 20 Mar 2017, at 16:39, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> On 20 March 2017 at 20:57, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>> 
>>> On 20 Mar 2017, at 15:17, Craig Ringer <cr...@2ndquadrant.com> wrote:
>>> 
>>>> I thought about having special field (or reusing one of the existing 
>>>> fields)
>>>> in snapshot struct to force filtering xmax > snap->xmax or xmin = 
>>>> snap->xmin
>>>> as Petr suggested. Then this logic can reside in ReorderBufferCommit().
>>>> However this is not solving problem with catcache, so I'm looking into it 
>>>> right now.
>>> 
>>> OK, so this is only an issue if we have xacts that change the schema
>>> of tables and also insert/update/delete to their heaps. Right?
>>> 
>>> So, given that this is CF3 for Pg10, should we take a step back and
>>> impose the limitation that we can decode 2PC with schema changes or
>>> data row changes, but not both?
>> 
>> Yep, time is tight. I’ll try today/tomorrow to proceed with this two scan 
>> approach.
>> If I’ll fail to do that during this time then I’ll just update this patch to 
>> decode
>> only non-ddl 2pc transactions as you suggested.
> 
> I wasn't suggesting not decoding them, but giving the plugin the
> option of whether to proceed with decoding or not.
> 
> As Simon said, have a pre-decode-prepared callback that lets the
> plugin get a lock on the 2pc xact if it wants, or say it doesn't want
> to decode it until it commits.
> 
> That'd be useful anyway, so we can filter and only do decoding at
> prepare transaction time of xacts the downstream wants to know about
> before they commit.

Ah, got that. Okay.

> -- 
> Craig Ringer   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-20 Thread Stas Kelvich


> On 20 Mar 2017, at 15:17, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
>> I thought about having special field (or reusing one of the existing fields)
>> in snapshot struct to force filtering xmax > snap->xmax or xmin = snap->xmin
>> as Petr suggested. Then this logic can reside in ReorderBufferCommit().
>> However this is not solving problem with catcache, so I'm looking into it 
>> right now.
> 
> OK, so this is only an issue if we have xacts that change the schema
> of tables and also insert/update/delete to their heaps. Right?
> 
> So, given that this is CF3 for Pg10, should we take a step back and
> impose the limitation that we can decode 2PC with schema changes or
> data row changes, but not both?

Yep, time is tight. I’ll try today/tomorrow to proceed with this two scan 
approach.
If I’ll fail to do that during this time then I’ll just update this patch to 
decode
only non-ddl 2pc transactions as you suggested.

>> Just as before I marking this transaction committed in snapbuilder, but after
>> decoding I delete this transaction from xip (which holds committed 
>> transactions
>> in case of historic snapshot).
> 
> That seems kind of hacky TBH. I didn't much like marking it as
> committed then un-committing it.
> 
> I think it's mostly an interface issue though. I'd rather say
> SnapBuildPushPrepareTransaction and SnapBuildPopPreparedTransaction or
> something, to make it clear what we're doing.

Yes, that will be less confusing. However there is no any kind of queue, so
SnapBuildStartPrepare / SnapBuildFinishPrepare should work too.

> -- 
> Craig Ringer   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-20 Thread Stas Kelvich


> On 20 Mar 2017, at 11:32, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> On 19 March 2017 at 21:26, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> 
>> I think only genam would need changes to do two-phase scan for this as
>> the catalog scans should ultimately go there. It's going to slow down
>> things but we could limit the impact by doing the two-phase scan only
>> when historical snapshot is in use and the tx being decoded changed
>> catalogs (we already have global knowledge of the first one, and it
>> would be trivial to add the second one as we have local knowledge of
>> that as well).
> 
> 
> TBH, I have no idea how to approach the genam changes for the proposed
> double-scan method. It sounds like Stas has some idea how to proceed
> though (right?)
> 

I thought about having special field (or reusing one of the existing fields)
in snapshot struct to force filtering xmax > snap->xmax or xmin = snap->xmin
as Petr suggested. Then this logic can reside in ReorderBufferCommit().
However this is not solving problem with catcache, so I'm looking into it right 
now.


> On 17 Mar 2017, at 05:38, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> On 16 March 2017 at 19:52, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
> 
>> 
>> I’m working right now on issue with building snapshots for decoding prepared 
>> tx.
>> I hope I'll send updated patch later today.
> 
> 
> Great.
> 
> What approach are you taking?


Just as before I marking this transaction committed in snapbuilder, but after
decoding I delete this transaction from xip (which holds committed transactions
in case of historic snapshot).


> -- 
> Craig Ringer   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-16 Thread Stas Kelvich

>> On 2 Mar 2017, at 11:00, Craig Ringer <cr...@2ndquadrant.com> wrote:
>> 
>> BTW, I've been reviewing the patch in more detail. Other than a bunch
>> of copy-and-paste that I'm cleaning up, the main issue I've found is
>> that in DecodePrepare, you call:
>> 
>>   SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>>  parsed->nsubxacts, parsed->subxacts);
>> 
>> but I am not convinced it is correct to call it at PREPARE TRANSACTION
>> time, only at COMMIT PREPARED time. We want to see the 2pc prepared
>> xact's state when decoding it, but there might be later commits that
>> cannot yet see that state and shouldn't have it visible in their
>> snapshots. 
> 
> Agree, that is problem. That allows to decode this PREPARE, but after that
> it is better to mark this transaction as running in snapshot or perform 
> prepare
> decoding with some kind of copied-end-edited snapshot. I’ll have a look at 
> this.
> 

While working on this i’ve spotted quite a nasty corner case with aborted 
prepared
transaction. I have some not that great ideas how to fix it, but maybe i 
blurred my
view and missed something. So want to ask here at first.

Suppose we created a table, then in 2pc tx we are altering it and after that 
aborting tx.
So pg_class will have something like this:

xmin | xmax | relname
100  | 200| mytable
200  | 0| mytable

After previous abort, tuple (100,200,mytable) becomes visible and if we will 
alter table
again then xmax of first tuple will be set current xid, resulting in following 
table:

xmin | xmax | relname
100  | 300| mytable
200  | 0| mytable
300  | 0| mytable

In that moment we’ve lost information that first tuple was deleted by our 
prepared tx.
And from POV of historic snapshot that will be constructed to decode prepare 
first
tuple is visible, but actually send tuple should be used. Moreover such 
snapshot could
see both tuples violating oid uniqueness, but heapscan stops after finding 
first one.

I see here two possible workarounds:

* Try at first to scan catalog filtering out tuples with xmax bigger than 
snapshot->xmax
as it was possibly deleted by our tx. Than if nothing found scan in a usual way.

* Do not decode such transaction at all. If by the time of decoding prepare 
record we
already know that it is aborted than such decoding doesn’t have a lot of sense.
IMO intended usage of logical 2pc decoding is to decide about commit/abort based
on answers from logical subscribers/replicas. So there will be barrier between
prepare and commit/abort and such situations shouldn’t happen.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-16 Thread Stas Kelvich


> On 16 Mar 2017, at 14:44, Craig Ringer  wrote:
> 
> I'm going to try to pick this patch up and amend its interface per our
> discussion earlier, see if I can get it committable.

I’m working right now on issue with building snapshots for decoding prepared tx.
I hope I'll send updated patch later today.

> -- 
> Craig Ringer   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-02 Thread Stas Kelvich


> On 2 Mar 2017, at 11:00, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> We already have it, because we just decoded the PREPARE TRANSACTION.
> I'm preparing a patch revision to demonstrate this.

Yes, we already have it, but if server reboots between commit prepared (all
prepared state is gone) and decoding of this commit prepared then we loose
that mapping, isn’t it?

> BTW, I've been reviewing the patch in more detail. Other than a bunch
> of copy-and-paste that I'm cleaning up, the main issue I've found is
> that in DecodePrepare, you call:
> 
>SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>   parsed->nsubxacts, parsed->subxacts);
> 
> but I am not convinced it is correct to call it at PREPARE TRANSACTION
> time, only at COMMIT PREPARED time. We want to see the 2pc prepared
> xact's state when decoding it, but there might be later commits that
> cannot yet see that state and shouldn't have it visible in their
> snapshots. 

Agree, that is problem. That allows to decode this PREPARE, but after that
it is better to mark this transaction as running in snapshot or perform prepare
decoding with some kind of copied-end-edited snapshot. I’ll have a look at this.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-01 Thread Stas Kelvich

> On 2 Mar 2017, at 01:20, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> 
> The info gets removed on COMMIT PREPARED but at that point
> there is no real difference between replicating it as 2pc or 1pc since
> the 2pc behavior is for all intents and purposes lost at that point.
> 

If we are doing 2pc and COMMIT PREPARED happens then we should
replicate that without transaction body to the receiving servers since tx
is already prepared on them with some GID. So we need a way to construct
that GID.

It seems that last ~10 messages I’m failing to explain some points about this
topic. Or, maybe, I’m failing to understand some points. Can we maybe setup
skype call to discuss this and post summary here? Craig? Peter?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Walsender crash

2017-02-13 Thread Stas Kelvich

Hello.

While playing with logical publications/subscriptions I’ve noted that walsender 
crashes
when i’m trying to CREATE SUBSCRIPTION to a current server.

So having one running postgres instance on a standard port:

stas=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=stas host=127.0.0.1' 
PUBLICATION mypub;
^CCancel request sent
ERROR:  canceling statement due to user request
stas=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=stas host=127.0.0.1' 
PUBLICATION mypub;
WARNING:  terminating connection because of crash of another server process

Crash happens due to uninitialised StringBuffer reply_message in walsender.c:

(lldb) bt
* thread #1: tid = 0x13e3f, 0x00010e74ebaf 
postgres`resetStringInfo(str=0x00010ecd12a0) + 15 at stringinfo.c:96, queue 
= 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00010e74ebaf 
postgres`resetStringInfo(str=0x00010ecd12a0) + 15 at stringinfo.c:96
frame #1: 0x00010e8778c9 postgres`ProcessRepliesIfAny + 169 at 
walsender.c:1347
frame #2: 0x00010e8770fd postgres`WalSndWaitForWal(loc=35929928) + 221 
at walsender.c:1142
frame #3: 0x00010e876cf2 
postgres`logical_read_xlog_page(state=0x7fbcac0446f8, 
targetPagePtr=35921920, reqLen=8008, targetRecPtr=35929904, cur_page="\x95?", 
pageTLI=0x7fbcac044fa4) + 50 at walsender.c:717
frame #4: 0x00010e571e5e 
postgres`ReadPageInternal(state=0x7fbcac0446f8, pageptr=35921920, 
reqLen=8008) + 542 at xlogreader.c:556
frame #5: 0x00010e571336 
postgres`XLogReadRecord(state=0x7fbcac0446f8, RecPtr=35929904, 
errormsg=0x7fff5178f1e0) + 326 at xlogreader.c:255
frame #6: 0x00010e85ea9b 
postgres`DecodingContextFindStartpoint(ctx=0x7fbcac044438) + 139 at 
logical.c:432
frame #7: 0x00010e874dfd 
postgres`CreateReplicationSlot(cmd=0x7fbcad8004d0) + 333 at walsender.c:796
frame #8: 0x00010e8747a4 
postgres`exec_replication_command(cmd_string="CREATE_REPLICATION_SLOT \"mysub\" 
LOGICAL pgoutput") + 532 at walsender.c:1272
frame #9: 0x00010e8e6adc postgres`PostgresMain(argc=1, 
argv=0x7fbcad005f50, dbname="stas", username="stas") + 2332 at 
postgres.c:4064
frame #10: 0x00010e839abe postgres`BackendRun(port=0x7fbcabc033a0) 
+ 654 at postmaster.c:4310
frame #11: 0x00010e838eb3 
postgres`BackendStartup(port=0x7fbcabc033a0) + 483 at postmaster.c:3982
frame #12: 0x00010e837ea5 postgres`ServerLoop + 597 at postmaster.c:1722
frame #13: 0x00010e8356b5 postgres`PostmasterMain(argc=3, 
argv=0x7fbcabc02b90) + 5397 at postmaster.c:1330
frame #14: 0x00010e76371f postgres`main(argc=3, 
argv=0x7fbcabc02b90) + 751 at main.c:228
frame #15: 0x7fffa951c255 libdyld.dylib`start + 1
frame #16: 0x7fffa951c255 libdyld.dylib`start + 1

Patch with lacking initStringInfo() attached.



init_reply_message.diff
Description: Binary data

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-02-09 Thread Stas Kelvich

> On 31 Jan 2017, at 12:22, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> Personally I don't think lack of access to the GID justifies blocking 2PC 
> logical decoding. It can be added separately. But it'd be nice to have 
> especially if it's cheap.

Agreed.

> On 2 Feb 2017, at 00:35, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> Stas was concerned about what happens in logical decoding if we crash between 
> PREPSRE TRANSACTION and COMMIT PREPARED. But we'll always go back and decode 
> the whole txn again anyway so it doesn't matter.

Not exactly. It seems that in previous discussions we were not on the same 
page, probably due to unclear arguments by me.

>From my point of view there is no problems (or at least new problems comparing 
>to ordinary 2PC) with preparing transactions on slave servers with something 
>like “#{xid}#{node_id}” instead of GID if issuing node is coordinator of that 
>transaction. In case of failure, restart, crash we have the same options about 
>deciding what to do with uncommitted transactions.

My concern is about the situation with external coordinator. That scenario is 
quite important for users of postgres native 2pc, notably J2EE user.  Suppose 
user (or his framework) issuing “prepare transaction ‘mytxname’;" to servers 
with ordinary synchronous physical replication. If master will crash and 
replica will be promoted than user can reconnect to it and commit/abort that 
transaction using his GID. And it is unclear to me how to achieve same 
behaviour with logical replication of 2pc without GID in commit record. If we 
will prepare with “#{xid}#{node_id}” on acceptor nodes, then if donor node will 
crash we’ll lose mapping between user’s gid and our internal gid; contrary we 
can prepare with user's GID on acceptors, but then we will not know that GID on 
donor during commit decode (by the time decoding happens all memory state 
already gone and we can’t exchange our xid to gid).

I performed some tests to understand real impact on size of WAL. I've compared 
postgres -master with wal_level = logical, after 3M 2PC transactions with 
patched postgres where GID’s are stored inside commit record too. Testing with 
194-bytes and 6-bytes GID’s. (GID max size is 200 bytes)

-master, 6-byte GID after 3M transaction: pg_current_xlog_location = 0/9572CB28
-patched, 6-byte GID after 3M transaction: pg_current_xlog_location = 0/96C442E0

so with 6-byte GID’s difference in WAL size is less than 1%

-master, 194-byte GID after 3M transaction: pg_current_xlog_location = 
0/B7501578
-patched, 194-byte GID after 3M transaction: pg_current_xlog_location = 
0/D8B43E28

and with 194-byte GID’s difference in WAL size is about 18%

So using big GID’s (as J2EE does) can cause notable WAL bloat, while small 
GID’s are almost unnoticeable.

May be we can introduce configuration option track_commit_gid by analogy with 
track_commit_timestamp and make that behaviour optional? Any objections to that?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-26 Thread Stas Kelvich

> On 26 Jan 2017, at 12:51, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> * Tracking xid/gid map in memory also doesn’t help much — if server reboots 
> between prepare
> and commit we’ll lose that mapping.
> 
> Er what? That's why I suggested using the prepared xacts shmem state. It's 
> persistent as you know from your work on prepared transaction files. It has 
> all the required info.

Imagine following scenario:

1. PREPARE happend
2. PREPARE decoded and sent where it should be sent
3. We got all responses from participating nodes and issuing COMMIT/ABORT
4. COMMIT/ABORT decoded and sent

After step 3 there is no more memory state associated with that prepared tx, so 
if will fail
between 3 and 4 then we can’t know GID unless we wrote it commit record (or 
table).

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-26 Thread Stas Kelvich


> On 26 Jan 2017, at 10:34, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Thu, Jan 26, 2017 at 4:09 PM, Nikhil Sontakke
> <nikh...@2ndquadrant.com> wrote:
>>> I look at this patch from you and that's present for me:
>>> https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com
>> 
>>> --- a/src/backend/access/transam/xlog.c
>>> +++ b/src/backend/access/transam/xlog.c
>>> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
>>> (errmsg("unexpected timeline ID %u (should be %u)
>>> in checkpoint record",
>>> checkPoint.ThisTimeLineID, ThisTimeLineID)));
>>> 
>>> +KnownPreparedRecreateFiles(checkPoint.redo);
>>> RecoveryRestartPoint();
>>> }
>> 
>> Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a
>> function by this name. And now I see, the name is CheckPointTwoPhase()
>> :-)
> 
> My mistake then :D
> 
>>> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
>>> files are not flushed to disk with this patch. This is a problem as a
>>> new restart point is created... Having the flush in CheckpointTwoPhase
>>> really makes the most sense.
>> 
>> Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby
>> promote" code path.
> 
> CreateRestartPoint() calls it via CheckPointGuts() while in recovery.
> 

Huh, glad that this tread received a lot of attention.

> On 24 Jan 2017, at 17:26, Nikhil Sontakke <nikh...@2ndquadrant.com> wrote:
> 
> We are talking about the recovery/promote code path. Specifically this
> call to KnownPreparedRecreateFiles() in PrescanPreparedTransactions().
> 
> We write the files to disk and they get immediately read up in the
> following code. We could not write the files to disk and read
> KnownPreparedList in the code path that follows as well as elsewhere.

Thanks Nikhil, now I got that. Since we are talking about promotion we are on 
different timescale and 1-10 second
lag matters a lot.

I think I have in my mind realistic scenario when proposed recovery code path 
will hit the worst case: Google cloud.
They have quite fast storage, but fsync time is really big and can go up to 
10-100ms (i suppose it is network-attacheble).
Having say 300 prepared tx, we can delay promotion up to half minute. 

So i think it worth of examination.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-25 Thread Stas Kelvich

>> 
>> Yes, that’s also possible but seems to be less flexible restricting us to 
>> some
>> specific GID format.
>> 
>> Anyway, I can measure WAL space overhead introduced by the GID’s inside 
>> commit records
>> to know exactly what will be the cost of such approach.
> 
> Stas,
> 
> Have you had a chance to look at this further?

Generally i’m okay with Simon’s approach and will send send updated patch. 
Anyway want to
perform some test to estimate how much disk space is actually wasted by extra 
WAL records.

> I think the approach of storing just the xid and fetching the GID
> during logical decoding of the PREPARE TRANSACTION is probably the
> best way forward, per my prior mail.

I don’t think that’s possible in this way. If we will not put GID in commit 
record, than by the time
when logical decoding will happened transaction will be already 
committed/aborted and there will
be no easy way to get that GID. I thought about several possibilities:

* Tracking xid/gid map in memory also doesn’t help much — if server reboots 
between prepare 
and commit we’ll lose that mapping. 
* We can provide some hooks on prepared tx recovery during startup, but that 
approach also fails
if reboot happened between commit and decoding of that commit.
* Logical messages are WAL-logged, but they don’t have any redo function so 
don’t helps much.

So to support user-accessible 2PC over replication based on 2PC decoding we 
should invent
something more nasty like writing them into a table.

> That should eliminate Simon's
> objection re the cost of tracking GIDs and still let us have access to
> them when we want them, which is the best of both worlds really.

Having 2PC decoding in core is a good thing anyway even without GID tracking =)

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-24 Thread Stas Kelvich

> On 24 Jan 2017, at 09:42, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Mon, Jan 23, 2017 at 9:00 PM, Nikhil Sontakke
> <nikh...@2ndquadrant.com> wrote:
>> Speeding up recovery or failover activity via a faster promote is a
>> desirable thing. So, maybe, we should look at teaching the relevant
>> code about using "KnownPreparedList"? I know that would increase the
>> size of this patch and would mean more testing, but this seems to be
>> last remaining optimization in this code path.
> 
> That's a good idea, worth having in this patch. Actually we may not
> want to call KnownPreparedRecreateFiles() here as promotion is not
> synonym of end-of-recovery checkpoint for a couple of releases now.

Thanks for review, Nikhil and Michael.

I don’t follow here. We are moving data away from WAL to files on checkpoint 
because after checkpoint
there is no guaranty that WAL segment with our prepared tx will be still 
available.

> The difference between those two is likely noise.
> 
> By the way, in those measurements, the OS cache is still filled with
> the past WAL segments, which is a rather best case, no? What happens
> if you do the same kind of tests on a box where memory is busy doing
> something else and replayed WAL segments get evicted from the OS cache
> more aggressively once the startup process switches to a new segment?
> This could be tested for example on a VM with few memory (say 386MB or
> less) so as the startup process needs to access again the past WAL
> segments to recover the 2PC information it needs to get them back
> directly from disk... One trick that you could use here would be to
> tweak the startup process so as it drops the OS cache once a segment
> is finished replaying, and see the effects of an aggressive OS cache
> eviction. This patch is showing really nice improvements with the OS
> cache backing up the data, still it would make sense to test things
> with a worse test case and see if things could be done better. The
> startup process now only reads records sequentially, not randomly
> which is a concept that this patch introduces.
> 
> Anyway, perhaps this does not matter much, the non-recovery code path
> does the same thing as this patch, and the improvement is too much to
> be ignored. So for consistency's sake we could go with the approach
> proposed which has the advantage to not put any restriction on the
> size of the 2PC file contrary to what an implementation saving the
> contents of the 2PC files into memory would need to do.

Maybe i’m missing something, but I don’t see how OS cache can affect something 
here.

Total WAL size was 0x44 * 16 = 1088 MB, recovery time is about 20s. Sequential 
reading 1GB of data
is order of magnitude faster even on the old hdd, not speaking of ssd. Also you 
can take a look on flame graphs
attached to previous message — majority of time during recovery spent in 
pg_qsort while replaying 
PageRepairFragmentation, while whole xact_redo_commit() takes about 1% of time. 
That amount can
grow in case of uncached disk read but taking into account total recovery time 
this should not affect much.

If you are talking about uncached access only during checkpoint than here we 
are restricted with
max_prepared_transaction, so at max we will read about hundred of small files 
(usually fitting into one filesystem page) which will also
be barely noticeable comparing to recovery time between checkpoints. Also wal 
segments cache eviction during
replay doesn’t seems to me as standard scenario.

Anyway i took the machine with hdd to slow down read speed and run tests again. 
During one of the runs i
launched in parallel bash loop that was dropping os cache each second (while 
wal fragment replay takes
 also about one second).

1.5M transactions
 start segment: 0x06
 last segment: 0x47

patched, with constant cache_drop:
  total recovery time: 86s

patched, without constant cache_drop:
   total recovery time: 68s

(while difference is significant, i bet that happens mostly because of database 
file segments should be re-read after cache drop)

master, without constant cache_drop:
   time to recover 35 segments: 2h 25m (after that i tired to wait)
   expected total recovery time: 4.5 hours

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-05 Thread Stas Kelvich

> On 5 Jan 2017, at 13:49, Simon Riggs <si...@2ndquadrant.com> wrote:
> 
> Surely in this case the master server is acting as the Transaction
> Manager, and it knows the mapping, so we are good?
> 
> I guess if you are using >2 nodes then you need to use full 2PC on each node.
> 
> Please explain precisely how you expect to use this, to check that GID
> is required.
> 

For example if we are using logical replication just for failover/HA and 
allowing user
to be transaction manager itself. Then suppose that user prepared tx on server 
A and server A
crashed. After that client may want to reconnect to server B and commit/abort 
that tx.
But user only have GID that was used during prepare.

> But even then, if you adopt the naming convention that all in-progress
> xacts will be called RepOriginId-EPOCH-XID, so they have a fully
> unique GID on all of the child nodes then we don't need to add the
> GID.

Yes, that’s also possible but seems to be less flexible restricting us to some
specific GID format.

Anyway, I can measure WAL space overhead introduced by the GID’s inside commit 
records
to know exactly what will be the cost of such approach.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-05 Thread Stas Kelvich

Thank you for looking into this.

> On 5 Jan 2017, at 09:43, Simon Riggs <si...@2ndquadrant.com> wrote:
>> 
>> GID is now variable sized. You seem to have added this to every
>> commit, not just 2PC
> 

Hm, didn’t realise that, i’ll fix.

> I've just realised that you're adding GID because it allows you to
> uniquely identify the prepared xact. But then the prepared xact will
> also have a regular TransactionId, which is also unique. GID exists
> for users to specify things, but it is not needed internally and we
> don't need to add it here.

I think we anyway can’t avoid pushing down GID to the client side.

If we will push down only local TransactionId to remote server then we will 
lose mapping
of GID to TransactionId, and there will be no way for user to identify his 
transaction on 
second server. Also Open XA and lots of libraries (e.g. J2EE) assumes that 
there is 
the same GID everywhere and it’s the same GID that was issued by the client.

Requirements for two-phase decoding can be different depending on what one want
to build around it and I believe in some situations pushing down xid is enough. 
But IMO
dealing with reconnects, failures and client libraries will force programmer to 
use
the same GID everywhere.

> What we do need is for the commit prepared
> message to remember what the xid of the prepare was and then re-find
> it using the commit WAL record's twophase_xid field. So we don't need
> to add GID to any WAL records, nor to any in-memory structures.

Other part of the story is how to find GID during decoding of commit prepared 
record.
I did that by adding GID field to the commit WAL record, because by the time of 
decoding
all memory structures that were holding xid<->gid correspondence are already 
cleaned up.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical decoding of two-phase transactions

2016-12-31 Thread Stas Kelvich

Here is resubmission of patch to implement logical decoding of two-phase 
transactions (instead of treating them
as usual transaction when commit) [1] I’ve slightly polished things and used 
test_decoding output plugin as client.

General idea quite simple here:

* Write gid along with commit/prepare records in case of 2pc
* Add several routines to decode prepare records in the same way as it already 
happens in logical decoding.

I’ve also added explicit LOCK statement in test_decoding regression suit to 
check that it doesn’t break thing. If
somebody can create scenario that will block decoding because of existing dummy 
backend lock that will be great
help. Right now all my tests passing (including TAP tests to check recovery of 
twophase tx in case of failures from 
adjacent mail thread).

If we will agree about current approach than I’m ready to add this stuff to 
proposed in-core logical replication.

[1] 
https://www.postgresql.org/message-id/EE7452CA-3C39-4A0E-97EC-17A414972884%40postgrespro.ru



logical_twophase.diff
Description: Binary data


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-12-26 Thread Stas Kelvich


> On 22 Dec 2016, at 05:35, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> True. The more spread the checkpoints and 2PC files, the more risk to
> require access to disk. Memory's cheap anyway. What was the system
> memory? How many checkpoints did you trigger for how many 2PC files
> created?

Standard config with increased shared_buffers. I think the most significant
impact on the recovery speed here is on the client side, namely time between
prepare and commit. Right now I’m using pgbench script that issues commit
right after prepare. It’s also possible to put sleep between prepare and commit
and increase number of connections to thousands. That will be probably the
worst case — majority of prepared tx will be moved to files.

> Perhaps it would be a good idea to look for the 2PC files
> from WAL records in a specific order. Did you try to use
> dlist_push_head instead of dlist_push_tail? This may make a difference
> on systems where WAL segments don't fit in system cache as the latest
> files generated would be looked at first for 2PC data.

Ouch! Good catch. I didn’t actually noticed that list populated in opposite 
order
with respect to traversal. I’ll fix that.

> On 27 Dec 2016, at 08:33, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> Stas, have you tested as well tested the impact on recovery time when
> WAL segments are very likely evicted from the OS cache? This could be
> a plausible scenario if a standby instance is heavily used for
> read-only transactions (say pgbench -S), and that the data quantity is
> higher than the amount of RAM available. It would not be complicated
> to test that: just drop_caches before beginning recovery. The maximum
> amount of 2PC transactions that need to have access to the past WAL
> segments is linearly related to the volume of WAL between two
> checkpoints, so max_wal_size does not really matter. What matters is
> the time it takes to recover the same amount of WAL. Increasing
> max_wal_size would give more room to reduce the overall noise between
> two measurements though.

Okay, i’ll perform such testing.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Speedup twophase transactions

2016-12-21 Thread Stas Kelvich

> On 21 Dec 2016, at 19:56, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> That's indeed way simpler than before. Have you as well looked at the
> most simple approach discussed? That would be just roughly replacing
> the pg_fsync() calls currently in RecreateTwoPhaseFile() by a save
> into a list as you are doing, then issue them all checkpoint.Even for
> 2PC files that are created and then removed before the next
> checkpoint, those will likely be in system cache.

  Yes, I tried that as well. But in such approach another bottleneck arises —
new file creation isn’t very cheap operation itself. Dual xeon with 100 backends
quickly hit that, and OS routines about file creation occupies first places in
perf top. Probably that depends on filesystem (I used ext4), but avoiding
file creation when it isn’t necessary seems like cleaner approach.
  On the other hand it is possible to skip file creation by reusing files, for 
example
naming them by dummy PGPROC offset, but that would require some changes
to places that right now looks only at filenames.

> This removes as well
> the need to have XlogReadTwoPhaseData() work in crash recovery, which
> makes me a bit nervous.

Hm, do you have any particular bad scenario for that case in you mind?

> And this saves lookups at the WAL segments
> still present in pg_xlog, making the operation at checkpoint much
> faster with many 2PC files to process.

ISTM your reasoning about filesystem cache applies here as well, but just 
without
spending time on file creation.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Speedup twophase transactions

2016-12-17 Thread Stas Kelvich


> On 18 Dec 2016, at 01:54, Bruce Momjian <br...@momjian.us> wrote:
> 
> On Sun, Dec 18, 2016 at 07:41:50AM +0900, Michael Paquier wrote:
>> 
>> 
>> Strange. I can confirm that I have received the patch as attached, but
>> it is not on the archives.
> 
> It must have been stripped by our email system.  You were a direct CC so
> you received it.
> 

Then, probably, my mail client did something strange. I’ll check.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-12-16 Thread Stas Kelvich

On 27 Sep 2016, at 03:30, Michael Paquier  wrote:OK. I am marking this patch as returned with feedback then. Lookingforward to seeing the next investigations.. At least this review hastaught us one thing or two.So, here is brand new implementation of the same thing.Now instead of creating pgproc entry for prepared transaction during recovery,I just store recptr/xid correspondence in separate 2L-list and deleting entries in thatlist if redo process faced commit/abort. In case of checkpoint or end of recoverytransactions remaining in that list are dumped to files in pg_twophase.Seems that current approach is way more simpler and patch has two times lessLOCs then previous one.-- Stas KelvichPostgres Professional: http://www.postgrespro.comThe Russian Postgres Company

twophase_recovery_list.diff
Description: Binary data

[HACKERS] Stats sender and 2pc minor problem

2016-10-13 Thread Stas Kelvich

Hello.

Statistics sender logic during usual commit and two-phase commit do not 
strictly matches each other and that leads to
delta_live_tuples added to n_live_tup in case of truncate in two phase commit.

That can be see in following example:

CREATE TABLE trunc_stats_test5(id serial);
INSERT INTO trunc_stats_test5 DEFAULT VALUES;
INSERT INTO trunc_stats_test5 DEFAULT VALUES;
INSERT INTO trunc_stats_test5 DEFAULT VALUES;
BEGIN;
TRUNCATE trunc_stats_test5;
PREPARE TRANSACTION 'twophase_stats';
COMMIT PREPARED 'twophase_stats';

After that pg_stat_user_tables will have n_live_tup = 3 instead of 0.

Fix along with test is attached.



2pc-stats.patch
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2016-10-03 Thread Stas Kelvich

> On 17 Sep 2016, at 20:21, Yury Zhuravlev <u.zhurav...@postgrespro.ru> wrote:
> 
> Michael Paquier wrote:
>> On Sat, Sep 17, 2016 at 1:40 AM, Yury Zhuravlev
>> <u.zhurav...@postgrespro.ru> wrote:
>>> Michael Paquier wrote:
>>> I merged master to my branch and I spent time to porting all changes. I hope
>>> send patch in the weekend without terrible flaws.
>> 
>> By the way, I noticed that you did not register this patch in the current 
>> CF..
> 
> Now, I published the first version of the patch. This patch not ready for 
> commit yet and all current task you can read here:
> https://github.com/stalkerg/postgres_cmake/issues
> 
> I hope we realy close to end. 
> In this patch I forbade in-source build for avoid overwriting current 
> Makefiles. We will can remove all Makefiles only after shall see in CMake. 
> You don't need support two system. During commitfests CMake build system will 
> be supported by me.  
> I need help with buildfarm because my knowledge of Perl is very bad (thought 
> about rewrite buildfarm to Python). 
> 
> I hope for your support.


Tried to generate Xcode project out of cmake, build fails on genbki.pl: can't 
locate Catalog.pm (which itself lives in src/backend/catalog/Catalog.pm)

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] assert violation in logical messages serialization

2016-09-27 Thread Stas Kelvich

Hello.

During processing of logical messages in ReorderBufferSerializeChange()
pointer to ondisk structure isn’t updated after possible reorder buffer realloc 
by
ReorderBufferSerializeReserve().

Actual reason spotted by Konstantin Knizhnik.



reorderbuffer.patch
Description: Binary data


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-09-21 Thread Stas Kelvich

> On 21 Sep 2016, at 10:32, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Tue, Sep 20, 2016 at 11:13 PM, Stas Kelvich <s.kelv...@postgrespro.ru> 
> wrote:
>> 
>> Putting that before actual WAL replay is just following historical order of 
>> events.
>> Prepared files are pieces of WAL that happened before checkpoint, so ISTM
>> there is no conceptual problem in restoring their state before replay.
> 
> For wal_level = minimal there is no need to call that to begin with..
> And I'd think that it is better to continue with things the same way.
> 

Hm, why?

>> 
>> With this patch we are reusing one infrastructure for normal work, recovery 
>> and replay.
> 
> The mix between normal work and recovery is the scary part. Normal
> work and recovery are entirely two different things.
> 

Okay, agreed. Commit procedure that checks whether recovery is active or not
is quite hacky solution.

>> So taking into account my comments what do you think? Should we keep current 
>> approach
>> or try to avoid replaying prepare records at all?
> 
> I'd really like to give a try to the idea you just mentioned, aka
> delay the fsync of the 2PC files from RecreateTwoPhaseFile to
> CreateRestartPoint, or get things into memory.. I could write one, or
> both of those patches. I don't mind.

I’m not giving up yet, i’ll write them) I still have in mind several other 
patches to 2pc handling in
postgres during this release cycle — logical decoding and partitioned hash 
instead of 
TwoPhaseState list.

My bet that relative speed of that patches will depend on used filesystem. Like 
it was with the
first patch in that mail thread it is totally possible sometimes to hit 
filesystem limits on file
creation speed. Otherwise both approaches should be more or less equal, i 
suppose.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-09-20 Thread Stas Kelvich

art point in
> RecoveryRestartPoint(). We would need as well to tweak
> PrescanPreparedTransactions accordingly than everything we are trying
> to do here.
> That would be way more simple than what's proposed here, and we'd
> still keep all the performance benefits.

So file arrives to replica through WAL and instead of directly reading it you 
suggest
to copy it do DSM if it is small enough, and to filesystem if not. Probably 
that will
allow us to stay only around reading/writing files, but:

* That will be measurably slower than proposed approach because of unnecessary
copying between WAL and DSM. Not to mention prepare records of arbitrary length.
* Different behaviour on replica and master — less tested code for replica.
* Almost the same behaviour can be achieved by delaying fsync on 2pc files till
checkpoint.

But if reword you comment in a way that it is better to avoid replaying prepare 
record at all,
like it happens now in master, then I see the point. And that is possible even 
without DSM, it possible
to allocate static sized array storing some info about tx, whether it is in the 
WAL or in file, xid, gid.
Some sort of PGXACT doppelganger only for replay purposes instead of using 
normal one.

So taking into account my comments what do you think? Should we keep current 
approach
or try to avoid replaying prepare records at all?

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-09-16 Thread Stas Kelvich

> On 07 Sep 2016, at 11:07, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
> 
>> On 07 Sep 2016, at 03:09, Michael Paquier <michael.paqu...@gmail.com> wrote:
>> 
>>>> On 06 Sep 2016, at 12:03, Michael Paquier <michael.paqu...@gmail.com> 
>>>> wrote:
>>>> 
>>>> On Tue, Sep 6, 2016 at 5:58 PM, Stas Kelvich <s.kelv...@postgrespro.ru> 
>>>> wrote:
>>>>> Oh, I was preparing new version of patch, after fresh look on it. 
>>>>> Probably, I should
>>>>> said that in this topic. I’ve found a bug in sub transaction handling and 
>>>>> now working
>>>>> on fix.
>>>> 
>>>> What's the problem actually?
>>> 
>>> Handling of xids_p array in PrescanPreparedTransactions() is wrong for 
>>> prepared tx's in memory.
>>> Also I want to double-check and add comments to RecoveryInProgress() checks 
>>> in FinishGXact.
>>> 
>>> I’ll post reworked patch in several days.
>> 
>> Could you use as a base the version I just sent yesterday then? I
>> noticed many mistakes in the comments, for example many s/it's/its/
>> and did a couple of adjustments around the code, the goto next_file
>> was particularly ugly. That will be that much work not do to again
>> later.
> 
> Yes. Already merged branch with your version.

Here is updated version of patch.

Looking through old version i’ve noted few things that were bothering me:

* In case of crash replay PREPARE redo accesses SUBTRANS, but StartupSUBTRANS() 
isn’t called yet
in StartupXLOG().
* Several functions in twophase.c have to walk through both PGPROC and 
pg_twophase directory.

Now I slightly changed order of things in StartupXLOG() so twophase state 
loaded from from file and 
StartupSUBTRANS called before actual recovery starts. So in all other functions 
we can be sure that
file were already loaded in memory.

Also since 2pc transactions now are dumped to files only on checkpoint, we can 
get rid of
PrescanPreparedTransactions() — all necessary info can reside in checkpoint 
itself. I’ve changed
behaviour of oldestActiveXid write in checkpoint and that’s probably 
discussable topic, but ISTM
that simplifies a lot recovery logic in both twophase.c and xlog.c. More 
comments on that in patch itself.

twophase_replay.v7.patch
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suggestions for first contribution?

2016-09-07 Thread Stas Kelvich

> On 05 Sep 2016, at 20:25, Christian Convey <christian.con...@gmail.com> wrote:
> 
> Hi guys,
> 
> Can anyone suggest a project for my first PG contribution?
> 
> My first two ideas didn't pan out:  Yury doesn't seem to need help
> with CMake, and the TODO list's "-Wcast-align" project (now deleted)
> appeared impractical.

There is also a list of projects for google summer of code: 
https://wiki.postgresql.org/wiki/GSoC_2016

That topics expected to be doable by a newcomer during several months. It is 
also slightly
outdated, but you always can check current state of things by searching this 
mail list on interesting topic.

Also postgres have several internal API’s like fdw[1] or gist[2] that allows 
you to implements something
useful without digging too much into a postgres internals.

[1] https://www.postgresql.org/docs/current/static/fdwhandler.html
[2] https://www.postgresql.org/docs/current/static/gist-extensibility.html

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug in two-phase transaction recovery

2016-09-07 Thread Stas Kelvich

Hello.

Some time ago two-phase state file format was changed to have variable size GID,
but several places that read that files were not updated to use new offsets. 
Problem
exists in master and 9.6 and can be reproduced on prepared transactions with
savepoints. For example:

create table t(id int);
begin;
insert into t values (42);
savepoint s1;
insert into t values (43);
prepare transaction 'x';
begin;
insert into t values (142);
savepoint s1;
insert into t values (143);
prepare transaction 'y’;

and restart the server. Startup process will hang. Fix attached.

Also while looking at StandbyRecoverPreparedTransactions() i’ve noticed that 
buffer
for 2pc file is allocated in TopMemoryContext but never freed. That probably 
exists
for a long time.



gidlen_fixes.diff
Description: Binary data





standby_recover_pfree.diff
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-09-07 Thread Stas Kelvich

> On 07 Sep 2016, at 03:09, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
>>> On 06 Sep 2016, at 12:03, Michael Paquier <michael.paqu...@gmail.com> wrote:
>>> 
>>> On Tue, Sep 6, 2016 at 5:58 PM, Stas Kelvich <s.kelv...@postgrespro.ru> 
>>> wrote:
>>>> Oh, I was preparing new version of patch, after fresh look on it. 
>>>> Probably, I should
>>>> said that in this topic. I’ve found a bug in sub transaction handling and 
>>>> now working
>>>> on fix.
>>> 
>>> What's the problem actually?
>> 
>> Handling of xids_p array in PrescanPreparedTransactions() is wrong for 
>> prepared tx's in memory.
>> Also I want to double-check and add comments to RecoveryInProgress() checks 
>> in FinishGXact.
>> 
>> I’ll post reworked patch in several days.
> 
> Could you use as a base the version I just sent yesterday then? I
> noticed many mistakes in the comments, for example many s/it's/its/
> and did a couple of adjustments around the code, the goto next_file
> was particularly ugly. That will be that much work not do to again
> later.

Yes. Already merged branch with your version.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-09-06 Thread Stas Kelvich


> On 06 Sep 2016, at 12:09, Simon Riggs <si...@2ndquadrant.com> wrote:
> 
> On 6 September 2016 at 09:58, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>> 
>> I'll check it against my failure scenario with subtransactions and post 
>> results or updated patch here.
> 
> Make sure tests are added for that. It would have been better to say
> you knew there were bugs in it earlier.

I’ve spotted that yesterday during rebase. Sorry. Next time in same situation 
i’ll write right away
to save everyone’s time.

> On 06 Sep 2016, at 12:03, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Tue, Sep 6, 2016 at 5:58 PM, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>> Oh, I was preparing new version of patch, after fresh look on it. Probably, 
>> I should
>> said that in this topic. I’ve found a bug in sub transaction handling and 
>> now working
>> on fix.
> 
> What's the problem actually?

Handling of xids_p array in PrescanPreparedTransactions() is wrong for prepared 
tx's in memory.

Also I want to double-check and add comments to RecoveryInProgress() checks in 
FinishGXact.

I’ll post reworked patch in several days.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-09-06 Thread Stas Kelvich

> On 06 Sep 2016, at 04:41, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Sat, Sep 3, 2016 at 10:26 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> On Fri, Sep 2, 2016 at 5:06 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>>> On 13 April 2016 at 15:31, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>>> 
>>>> Fixed patch attached. There already was infrastructure to skip currently
>>>> held locks during replay of standby_redo() and I’ve extended that with 
>>>> check for
>>>> prepared xids.
>>> 
>>> Please confirm that everything still works on current HEAD for the new
>>> CF, so review can start.
>> 
>> The patch does not apply cleanly. Stas, could you rebase? I am
>> switching the patch to "waiting on author" for now.
> 
> So, I have just done the rebase myself and did an extra round of
> reviews of the patch. Here are couple of comments after this extra
> lookup.
> 
> LockGXactByXid() is aimed to be used only in recovery, so it seems
> adapted to be to add an assertion with RecoveryInProgress(). Using
> this routine in non-recovery code paths is risky because it assumes
> that a PGXACT could be missing, which is fine in recovery as prepared
> transactions could be moved to twophase files because of a checkpoint,
> but not in normal cases. We could also add an assertion of the kind
> gxact->locking_backend == InvalidBackendId before locking the PGXACT
> but as this routine is just normally used by the startup process (in
> non-standalone mode!) that's fine without.
> 
> The handling of invalidation messages and removal of relfilenodes done
> in FinishGXact can be grouped together, checking only once for
> !RecoveryInProgress().
> 
> + *
> + * The same procedure happens during replication and crash recovery.
>  *
> "during WAL replay" is more generic and applies here.
> 
> +
> +next_file:
> +   continue;
> +
> That can be removed and replaced by a "continue;".
> 
> +   /*
> +* At first check prepared tx that wasn't yet moved to disk.
> +*/
> +   LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
> +   for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> +   {
> +   GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
> +   PGXACT *pgxact = >allPgXact[gxact->pgprocno];
> +
> +   if (TransactionIdEquals(pgxact->xid, xid))
> +   {
> +   LWLockRelease(TwoPhaseStateLock);
> +   return true;
> +   }
> +   }
> +   LWLockRelease(TwoPhaseStateLock);
> This overlaps with TwoPhaseGetGXact but I'd rather keep both things
> separated: it does not seem worth complicating the existing interface,
> and putting that in cache during recovery has no meaning.

Oh, I was preparing new version of patch, after fresh look on it. Probably, I 
should
said that in this topic. I’ve found a bug in sub transaction handling and now 
working
on fix.

> 
> I have also reworked the test format, and fixed many typos and grammar
> problems in the patch as well as in the tests.

Thanks!

> 
> After review the result is attached. Perhaps a committer could get a look at 
> it?

I'll check it against my failure scenario with subtransactions and post results 
or updated patch here.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding restart problems

2016-08-31 Thread Stas Kelvich

> On 31 Aug 2016, at 03:28, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> On 25 Aug. 2016 20:03, "Stas Kelvich" <s.kelv...@postgrespro.ru> wrote:
> >
> > Thanks for clarification about how restart_lsn is working.
> >
> > Digging slightly deeper into this topic revealed that problem was in two 
> > phase decoding, not it logical decoding itself.
> 
> Good to know. The behavior didn't make much sense for logical decoding but 
> bugs usually don't.
> 
> Do you plan to submit a patch for logical decoding of prepared transactions 
> to 10.0?
> 

I do, probably on CF2. There is issue with locks that Andres pointed me to; 
also twophase.c was written before logical
decoding happened and it deserves some refactoring to avoid duplicated 
functionality between 2pc decoding and restoring of
old prepared xact — I want to include that in patch too.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding restart problems

2016-08-25 Thread Stas Kelvich

> On 20 Aug 2016, at 15:59, Craig Ringer <cr...@2ndquadrant.com> wrote:
> 
> I'll wait for a test case or some more detail.

Thanks for clarification about how restart_lsn is working.

Digging slightly deeper into this topic revealed that problem was in two phase 
decoding, not it logical decoding itself.
While I was writing DecodePrepare() I've and copied call to 
SnapBuildCommitTxn() function from DecodeCommit()
which was removing current transaction from running list and that’s obviously 
wrong thing to do for a prepared tx.

So I end up with following workaround:

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -950,7 +950,7 @@ SnapBuildAbortTxn(SnapBuild *builder, XLogRecPtr lsn,
  */
 void
 SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
-  int nsubxacts, TransactionId *subxacts)
+  int nsubxacts, TransactionId *subxacts, bool 
isCommit)
 {
int nxact;

@@ -1026,7 +1026,8 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, 
TransactionId xid,
 * Make sure toplevel txn is not tracked in running txn's anymore, 
switch
 * state to consistent if possible.
 */
-   SnapBuildEndTxn(builder, lsn, xid);
+   if (isCommit)
+   SnapBuildEndTxn(builder, lsn, xid);

Calling SnapBuildCommitTxn with isCommit=true from commit and false from 
Prepare. However while I’m not
observing partially decoded transactions anymore, I’m not sure that this is 
right way to build proper snapshot and
something else isn’t broken.

Also while I was playing psycopg2 logical decoding client I do see empty 
transaction containing only
BEGIN/COMMIT (with test_decoding output plugin, and current postgres master).

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-08-15 Thread Stas Kelvich

> On 11 Aug 2016, at 17:43, Petr Jelinek <p...@2ndquadrant.com> wrote:
> 
>> 
>> * Also I wasn’t able actually to run replication itself =) While regression 
>> tests passes, TAP
>> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. 
>> I’ll investigate
>> that more and write again.
> 
> Interesting, please keep me posted. It's possible for tables to stay in 's' 
> state for some time if there is nothing happening on the server, but that 
> should not mean anything is stuck.

Slightly played around, it seems that apply worker waits forever for substate 
change.

(lldb) bt
* thread #1: tid = 0x183e00, 0x7fff88c7f2a2 libsystem_kernel.dylib`poll + 
10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x7fff88c7f2a2 libsystem_kernel.dylib`poll + 10
frame #1: 0x0001017ca8a3 
postgres`WaitEventSetWaitBlock(set=0x7fd2dc816b30, cur_timeout=1, 
occurred_events=0x7fff5e7f67d8, nevents=1) + 51 at latch.c:1108
frame #2: 0x0001017ca438 
postgres`WaitEventSetWait(set=0x7fd2dc816b30, timeout=1, 
occurred_events=0x7fff5e7f67d8, nevents=1) + 248 at latch.c:941
frame #3: 0x0001017c9fde 
postgres`WaitLatchOrSocket(latch=0x00010ab208a4, wakeEvents=25, sock=-1, 
timeout=1) + 254 at latch.c:347
frame #4: 0x0001017c9eda postgres`WaitLatch(latch=0x00010ab208a4, 
wakeEvents=25, timeout=1) + 42 at latch.c:302
  * frame #5: 0x000101793352 
postgres`wait_for_sync_status_change(tstate=0x000101e409b0) + 178 at 
tablesync.c:228
frame #6: 0x000101792bbe 
postgres`process_syncing_tables_apply(slotname="subbi", 
end_lsn=140734778796592) + 430 at tablesync.c:436
frame #7: 0x0001017928c1 
postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 
at tablesync.c:518
frame #8: 0x00010177b620 
postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at 
apply.c:1122
frame #9: 0x00010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at 
apply.c:1353
frame #10: 0x00010174cb5a postgres`StartBackgroundWorker + 826 at 
bgworker.c:729
frame #11: 0x000101762227 
postgres`do_start_bgworker(rw=0x7fd2db70) + 343 at postmaster.c:5553
frame #12: 0x00010175d42b postgres`maybe_start_bgworker + 427 at 
postmaster.c:5761
frame #13: 0x00010175bccf 
postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979
frame #14: 0x7fff9ab2352a libsystem_platform.dylib`_sigtramp + 26
frame #15: 0x7fff88c7e07b libsystem_kernel.dylib`__select + 11
frame #16: 0x00010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665
frame #17: 0x00010175b2e0 postgres`PostmasterMain(argc=3, 
argv=0x7fd2db403840) + 5968 at postmaster.c:1309
frame #18: 0x00010169507f postgres`main(argc=3, 
argv=0x7fd2db403840) + 751 at main.c:228
frame #19: 0x7fff8d45c5ad libdyld.dylib`start + 1
(lldb) p state
(char) $1 = 'c'
(lldb) p tstate->state
(char) $2 = ‘c’

Also I’ve noted that some lsn position looks wrong on publisher:

postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;
 restart_lsn | confirmed_flush_lsn 
-+-
 0/1530EF8   | 7FFF/5E7F6A30
(1 row)

postgres=# select sent_location, write_location, flush_location, 
replay_location from pg_stat_replication;
 sent_location | write_location | flush_location | replay_location 
---+++-----
 0/1530F30 | 7FFF/5E7F6A30  | 7FFF/5E7F6A30  | 7FFF/5E7F6A30
(1 row)



-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-08-11 Thread Stas Kelvich

> On 05 Aug 2016, at 18:00, Petr Jelinek <p...@2ndquadrant.com> wrote:
> 
> Hi,
> 
> as promised here is WIP version of logical replication patch.

Great!

Proposed DDL about publication/subscriptions looks very nice to me.

Some notes and thoughts about patch:

* Clang grumbles at following pieces of code:

apply.c:1316:6: warning: variable 'origin_startpos' is used uninitialized 
whenever 'if' condition is false [-Wsometimes-uninitialized]

tablesync.c:436:45: warning: if statement has empty body [-Wempty-body]
if (wait_for_sync_status_change(tstate));

* max_logical_replication_workers mentioned everywhere in docs, but guc.c 
defines
variable called max_logical_replication_processes for postgresql.conf

* Since pg_subscription already shared across the cluster, it can be also handy 
to
share pg_publications too and allow publication of tables from different 
databases. That
is rare scenarios but quite important for virtual hosting use case — tons of 
small databases
in a single postgres cluster.

* There is no way to see attachecd tables/schemas to publication through \drp

* As far as I understand there is no way to add table/tablespace right in CREATE
PUBLICATION and one need explicitly do ALTER PUBLICATION right after creation.
May be add something like WITH TABLE/TABLESPACE to CREATE?

* So binary protocol goes into core. Is it still possible to use it as decoding 
plugin for
manually created walsender? May be also include json as it was in pglogical? 
While
i’m not arguing that it should be done, i’m interested about your opinion on 
that.

* Also I’ve noted that you got rid of reserved byte (flags) in protocol 
comparing to
pglogical_native. It was very handy to use it for two phase tx decoding (0 — 
usual
commit, 1 — prepare, 2 — commit prepared), because both prepare and commit 
prepared generates commit record in xlog.

> On 05 Aug 2016, at 18:00, Petr Jelinek <p...@2ndquadrant.com> wrote:
> 
> - DDL, I see several approaches we could do here for 10.0. a) don't
>   deal with DDL at all yet, b) provide function which pushes the DDL
>   into replication queue and then executes on downstream (like
>   londiste, slony, pglogical do), c) capture the DDL query as text
>   and allow user defined function to be called with that DDL text on
>   the subscriber

* Since here DDL is mostly ALTER / CREATE / DROP TABLE (or am I wrong?) may be
we can add something like WITH SUBSCRIBERS to statements?

* Talking about exact mechanism of DDL replication I like you variant b), but 
since we
have transactional DDL, we can do two phase commit here. That will require two 
phase
decoding and some logic about catching prepare responses through logical 
messages. If that
approach sounds interesting i can describe proposal in more details and create 
a patch.

* Also I wasn’t able actually to run replication itself =) While regression 
tests passes, TAP
tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. 
I’ll investigate
that more and write again.

* As far as I understand sync starts automatically on enabling publication. May 
be split that
logic into a different command with some options? Like don’t sync at all for 
example.

* When I’m trying to create subscription to non-existent publication, CREATE 
SUBSRITION
creates replication slot and do not destroys it:

# create subscription sub connection 'host=127.0.0.1 dbname=postgres' 
publication mypub;
NOTICE:  created replication slot "sub" on provider
ERROR:  could not receive list of replicated tables from the provider: ERROR:  
cache lookup failed for publication 0
CONTEXT:  slot "sub", output plugin "pgoutput", in the list_tables callback

after that:

postgres=# drop subscription sub;
ERROR:  subscription "sub" does not exist
postgres=# create subscription sub connection 'host=127.0.0.1 dbname=postgres' 
publication pub;
ERROR:  could not crate replication slot "sub": ERROR:  replication slot "sub" 
already exists

* Also can’t drop subscription:

postgres=# \drs
  List of subscriptions
 Name | Database | Enabled | Publication |Conninfo
--+--+-+-+
 sub  | postgres | t   | {mypub} | host=127.0.0.1 dbname=postgres
(1 row)

postgres=# drop subscription sub;
ERROR:  unrecognized object class: 6102

* Several time i’ve run in a situation where provider's postmaster ignores 
Ctrl-C until subscribed
node is switched off.

* Patch with small typos fixed attached.

I’ll do more testing, just want to share what i have so far.

typos.diff
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Naming of new tsvector functions

2016-05-05 Thread Stas Kelvich


> On 06 May 2016, at 00:46, Gavin Flower <gavinflo...@archidevsys.co.nz> wrote:
> 
> On 06/05/16 07:44, Tom Lane wrote:
>> 
>> Yeah, I see we're already a bit inconsistent here.  The problem with using
>> a ts_ prefix, to my mind, is that it offers no option for distinguishing
>> tsvector from tsquery, should you need to do that.  Maybe this isn't a
>> problem for functions that have tsvector as input.
>> 
>>  regards, tom lane
>> 
>> 
> use tsv_ and tsq_?
> 
> 
> Cheers,
> Gavin
> 

That would be a good convention if we were able to easily rename old functions.
But now that will just create another pattern on top of three existing (no 
prefix, ts_*, tsvector_*).

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Naming of new tsvector functions

2016-05-05 Thread Stas Kelvich

> On 04 May 2016, at 20:15, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Stas Kelvich <s.kelv...@postgrespro.ru> writes:
>>> On 04 May 2016, at 16:58, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> The other ones are not so problematic because they do not conflict with
>>> SQL keywords.  It's only delete() and filter() that scare me.
> 
>> Okay, so changed functions to ts_setweight, ts_delete, ts_unnest, ts_filter.
> 
> Somehow, I don't think you read what I wrote.
> 
> Renaming the pre-existing setweight() function to ts_setweight() is
> not going to happen; it's been like that for half a dozen years now.
> It would make no sense to call the new variant ts_setweight() while
> keeping setweight() for the existing function, either.

Oh, I accidentally renamed one of the old functions, my mistake.

> I also don't see that much point in ts_unnest(), since unnest()
> in our implementation is a function not a keyword.  I don't have
> a strong opinion about that one, though.

Just to keep some level of uniformity in function names. But also i’m
not insisting.

> Also, I'd supposed that we'd rename to tsvector_something, since
> the same patch also introduced tsvector_to_array() and
> array_to_tsvector().  What's the motivation for using ts_ as the
> prefix?

There is already several functions named ts_* (ts_rank, ts_headline, 
ts_rewrite) 
and two named starting from tsvector_* (tsvector_update_trigger, 
tsvector_update_trigger_column).

Personally I’d prefer ts_ over tsvector_ since it is shorter, and still keeps 
semantics.

> 
>   regards, tom lane


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Naming of new tsvector functions

2016-05-04 Thread Stas Kelvich

> On 04 May 2016, at 16:58, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Stas Kelvich <s.kelv...@postgrespro.ru> writes:
>>> On 03 May 2016, at 00:59, David Fetter <da...@fetter.org> wrote:
>>> I suspect that steering that ship would be a good idea starting with
>>> deprecation of the old name in 9.6, etc.  hs_filter(), perhaps?
> 
>> In 9.5 there already were tsvector functions length(), numnode(), strip()
> 
>> Recent commit added setweight(), delete(), unnest(), tsvector_to_array(), 
>> array_to_tsvector(), filter().
> 
>> Last bunch can be painlessly renamed, for example to ts_setweight, 
>> ts_delete, ts_unnest, ts_filter.
> 
>> The question is what to do with old ones? Leave them as is? Rename to ts_* 
>> and create aliases with deprecation warning?
> 
> The other ones are not so problematic because they do not conflict with
> SQL keywords.  It's only delete() and filter() that scare me.
> 
>   regards, tom lane

Okay, so changed functions to ts_setweight, ts_delete, ts_unnest, ts_filter.



tsvector_ops_rename.diff
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Naming of new tsvector functions

2016-05-04 Thread Stas Kelvich

> On 03 May 2016, at 00:59, David Fetter <da...@fetter.org> wrote:
> 
> On Mon, May 02, 2016 at 01:58:11PM -0400, Tom Lane wrote:
>> I wrote:
>>> I think we'd be better off to rename these to tsvector_delete()
>>> and tsvector_filter() while we still can.
>> 
>> ... although I now notice that hstore already exposes a function
>> named delete(), so that ship may have sailed already.  But I'm more
>> troubled by filter() anyhow, since that keyword can appear in
>> expressions --- it seems much more likely that that would pose a
>> parsing conflict after future SQL extensions.
> 
> I suspect that steering that ship would be a good idea starting with
> deprecation of the old name in 9.6, etc.  hs_filter(), perhaps?
> 
> Cheers,
> David.


In 9.5 there already were tsvector functions length(), numnode(), strip()

Recent commit added setweight(), delete(), unnest(), tsvector_to_array(), 
array_to_tsvector(), filter().

Last bunch can be painlessly renamed, for example to ts_setweight, ts_delete, 
ts_unnest, ts_filter.

The question is what to do with old ones? Leave them as is? Rename to ts_* and 
create aliases with deprecation warning?

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] tsvector filter problem

2016-05-04 Thread Stas Kelvich

Hi.

As discovered by Oleg Bartunov, current filter() function for tsvector can 
crash backend.
Bug was caused erroneous usage of char type in memmove argument.



tsvector_bugfix_type.diff
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-19 Thread Stas Kelvich

> On 12 Apr 2016, at 07:36, Arcadiy Ivanov <arca...@gmail.com> wrote:
> 
> [ 
>   DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH | MODULO ] ( column_name 
> ) } } |
>   DISTRIBUTED { { BY ( column_name ) } | { RANDOMLY } |
>   DISTSTYLE { EVEN | KEY | ALL } DISTKEY ( column_name )
> ]
> [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]

Less invasive way to achieve same is to use WITH parameter
that already exists in CREATE TABLE, CREATE INDEX, etc.

Like that:

create table foo(id int) with(distributed_by=‘id’, nodes=’node1, node2’);

That’s easier to allow extensions to define custom parameters for WITH, than
to extend parser.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-13 Thread Stas Kelvich

> On 13 Apr 2016, at 01:04, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Wed, Apr 13, 2016 at 1:53 AM, Stas Kelvich <s.kelv...@postgrespro.ru> 
> wrote:
>>> On 12 Apr 2016, at 15:47, Michael Paquier <michael.paqu...@gmail.com> wrote:
>>> 
>>> It looks to be the case... The PREPARE phase replayed after the
>>> standby is restarted in recovery creates a series of exclusive locks
>>> on the table created and those are not taken on HEAD. Once those are
>>> replayed the LOCK_STANDBY record is conflicting with it. In the case
>>> of the failure, the COMMIT PREPARED record cannot be fetched from
>>> master via the WAL stream so the relation never becomes visible.
>> 
>> Yep, it is. It is okay for prepared xact hold a locks for created/changed 
>> tables,
>> but code in standby_redo() was written in assumption that there are no 
>> prepared
>> xacts at the time of recovery. I’ll look closer at checkpointer code and 
>> will send
>> updated patch.
>> 
>> And thanks again.
> 
> That's too late for 9.6 unfortunately, don't forget to add that in the next 
> CF!

Fixed patch attached. There already was infrastructure to skip currently
held locks during replay of standby_redo() and I’ve extended that with check for
prepared xids.

The reason why I’m still active on this thread is because I see real problems
in deploying 9.6 in current state. Let me stress my concern: current state of 
things
_WILL_BREAK_ async replication in case of substantial load of two phase
transactions on master. And a lot of J2EE apps falls into that category, as they
wrap most of their transactions in prepare/commit. Slave server just will always
increase it lag and will never catch up. It is possible to deal with that by 
switching
to synchronous replication or inserting triggers with pg_sleep on master, but it
doesn’t looks like normal behaviour of system.

twophase_replay.v5.patch
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-12 Thread Stas Kelvich

> On 12 Apr 2016, at 15:47, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Mon, Apr 11, 2016 at 7:16 PM, Stas Kelvich wrote:
>> Michael, it looks like that you are the only one person who can reproduce 
>> that bug. I’ve tried on bunch of OS’s and didn’t observe that behaviour, 
>> also looking at your backtraces I can’t get who is holding this lock (and 
>> all of that happens before first prepare record is replayed).
> 
> Where did you try it. FWIW, I can reproduce that on Linux and OSX, and
> only manually though:

Thanks a lot, Michael! Now I was able to reproduce that. Seems that when
you was performing manual setup, master instance issued checkpoint, but in
my script that didn’t happened because of shorter timing. There are tests
with checkpoint between prepare/commit in proposed test suite, but none of
them was issuing ddl.

> It looks to be the case... The PREPARE phase replayed after the
> standby is restarted in recovery creates a series of exclusive locks
> on the table created and those are not taken on HEAD. Once those are
> replayed the LOCK_STANDBY record is conflicting with it. In the case
> of the failure, the COMMIT PREPARED record cannot be fetched from
> master via the WAL stream so the relation never becomes visible.

Yep, it is. It is okay for prepared xact hold a locks for created/changed 
tables,
but code in standby_redo() was written in assumption that there are no prepared
xacts at the time of recovery. I’ll look closer at checkpointer code and will 
send
updated patch.

And thanks again.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 'Create table if not exists as' breaks SPI_execute

2016-04-11 Thread Stas Kelvich

> On 11 Apr 2016, at 18:41, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
> 
> Hi.
> 
> SPI_execute assumes that CreateTableAsStmt always have completionTag == 
> “completionTag”.
> But it isn’t true in case of ‘IF NOT EXISTS’ present.
> 
> 
> 

Sorry, I meant completionTag == “SELECT”.

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 'Create table if not exists as' breaks SPI_execute

2016-04-11 Thread Stas Kelvich

Hi.

SPI_execute assumes that CreateTableAsStmt always have completionTag == 
“completionTag”.
But it isn’t true in case of ‘IF NOT EXISTS’ present.




spi-cta.patch
Description: Binary data


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-11 Thread Stas Kelvich

> On 08 Apr 2016, at 16:09, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
> 
> Tried on linux and os x 10.11 and 10.4.
> 
> Still can’t reproduce, but have played around with your backtrace.
> 
> I see in xlodump on slave following sequence of records:
> 
> rmgr: Storage len (rec/tot): 16/42, tx:  0, lsn: 
> 0/03015AF0, prev 0/03015958, desc: CREATE base/12669/16387
> rmgr: Heaplen (rec/tot):  3/  1518, tx:867, lsn: 
> 0/03015B20, prev 0/03015AF0, desc: INSERT off 8, blkref #0: rel 
> 1663/12669/1247 blk 8 FPW
> <...>
> rmgr: Btree   len (rec/tot):  2/72, tx:867, lsn: 
> 0/03019CD0, prev 0/03019C88, desc: INSERT_LEAF off 114, blkref #0: rel 
> 1663/12669/2674 blk 22
> rmgr: Standby len (rec/tot): 16/42, tx:867, lsn: 
> 0/03019D18, prev 0/03019CD0, desc: LOCK xid 867 db 12669 rel 16387 
> rmgr: Transaction len (rec/tot):784/   813, tx:867, lsn: 
> 0/03019D48, prev 0/03019D18, desc: PREPARE 
> rmgr: Transaction len (rec/tot):380/   409, tx:  0, lsn: 
> 0/0301A090, prev 0/03019D48, desc: COMMIT_PREPARED 867: 2016-04-08 
> 14:38:33.347851 MSK;
> 
> It looks like that you had stuck in LOCK xid 867 even before replaying 
> PREPARE record, so I have not that much ideas on why that can be caused by 
> changing procedures of PREPARE replay.
> 
> Just to keep things sane, here is my current diff:
> 
> 

Michael, it looks like that you are the only one person who can reproduce that 
bug. I’ve tried on bunch of OS’s and didn’t observe that behaviour, also 
looking at your backtraces I can’t get who is holding this lock (and all of 
that happens before first prepare record is replayed).

Can you investigate it more? Particularly find out who holds the lock?

There is last version of the patch:



twophase_replay.v4.patch
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] closing CommitFest 2016-03, feature freeze now in effect

2016-04-09 Thread Stas Kelvich


> On 09 Apr 2016, at 03:05, Robert Haas <robertmh...@gmail.com> wrote:
> 
> CommitFest 2016-03 is now closed.  I have moved "Twophase transactions
> on slave", "Partial sort", and "amcheck (B-Tree integrity checking
> tool)" to the next CommitFest in accordance with the policy previous
> set by the release management team.   I have left "Replace buffer
> manager spinlock with atomic operations" active in the current
> CommitFest because it was granted an extension.  The RMT has received
> Tom's request for an extension on the "Unique Joins" patch but has not
> yet reached a decision.
> 

Aren’t  "Twophase transactions on slave” falling into category of patches that 
fixes
previously introduces behaviour? |'m not trying to argue with RMT decision, but 
just
want to ensure that it was thoughtful decision, taking into account that 
absence of that
patch in release can cause problems with replication in some cases as it was 
warned
by Jesper[1] and Andres[2].

[1] http://www.postgresql.org/message-id/5707a8cc.6080...@redhat.com

[2] 
http://www.postgresql.org/message-id/80856693-5065-4392-8606-cf572a2ff...@anarazel.de

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fwd: [HACKERS] Speedup twophase transactions

2016-04-08 Thread Stas Kelvich


> On 08 Apr 2016, at 21:55, Jesper Pedersen <jesper.peder...@redhat.com> wrote:
> 
> On 04/08/2016 02:42 PM, Robert Haas wrote:
>> On Tue, Jan 26, 2016 at 7:43 AM, Stas Kelvich <s.kelv...@postgrespro.ru> 
>> wrote:
>>> Hi,
>>> 
>>> Thanks for reviews and commit!
>> 
>> I apologize for being clueless here, but was this patch committed?
>> It's still marked as "Needs Review" in the CommitFest application.
>> 
> 
> There are 2 parts to this - both in the same email thread.
> 
> Part 1 [0] dealt with 2-phase commits on the master node. Part 2 [1] deals 
> with replaying on slaves, which currently shows lag.
> 
> There is still an open item found by Michael, so part 2 isn't ready to be 
> moved to "Ready for Committer" yet.
> 
> [0] 
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=978b2f65aa1262eb4ecbf8b3785cb1b9cf4db78e
> [1] https://commitfest.postgresql.org/9/523/
> 
> Best regards,
> Jesper
> 

By the way, Jesper, can you, please, try to run tests in diff, that i’ve sent 
today? It includes test scenario that was
causing problems for Michael, but works fine on all systems that I have access 
to.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-08 Thread Stas Kelvich

> On 08 Apr 2016, at 21:42, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Tue, Jan 26, 2016 at 7:43 AM, Stas Kelvich <s.kelv...@postgrespro.ru> 
> wrote:
>> Hi,
>> 
>> Thanks for reviews and commit!
> 
> I apologize for being clueless here, but was this patch committed?
> It's still marked as "Needs Review" in the CommitFest application.

There was a patch to skip two phase file creation when there were no checkpoint
between PREPARE and COMMIT, and that patch was commited.
But that patch didn’t touch anything in replay, so replay speed of 2pc is 
significantly slower
than 2pc in normal mode. And that can cause constantly increasing replication 
lag for async
replication.
After that i’ve wrote new patch introducing same behaviour in replay and used 
the same
mail thread. Now Michael found a (heisen)bug in second patch, that i can’t 
reproduce.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-08 Thread Stas Kelvich

> On 08 Apr 2016, at 16:36, Simon Riggs <si...@2ndquadrant.com> wrote:
> 
> On 7 April 2016 at 07:29, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> With the test case attached in my case the COMMIT PREPARED record does
> not even get replayed.
> 
> I was surprised to see this in the test...
> 
>sleep 2; # wait for changes to arrive on slave 
> 
> I think the test framework needs a WaitForLSN function to allow us to know 
> for certain that something has been delivered.

Yes, test framework already has that function. That was just quick script to 
reproduce bug, that Michael faced.
If there will be deterministic way to reproduce that bug, i'll rework it and 
move to 00X_twophase.pl

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-08 Thread Stas Kelvich

> On 07 Apr 2016, at 09:29, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> relOid=16385) + 358 at standby.c:627
>frame #11: 0x0001023dac6b
> postgres`standby_redo(record=0x7fde50841e38) + 251 at
> standby.c:809
> 
> (LOCALLOCK) $1 = {
>  tag = {
>lock = {
>  locktag_field1 = 16384
>  locktag_field2 = 16385
>  locktag_field3 = 0
>  locktag_field4 = 0
>  locktag_type = '\0'
>  locktag_lockmethodid = '\x01'
>}
>mode = 8
>  }
> 
> =# select relname, oid from pg_class where oid > 16000;
> relname |  oid

Tried on linux and os x 10.11 and 10.4.

Still can’t reproduce, but have played around with your backtrace.

I see in xlodump on slave following sequence of records:

rmgr: Storage len (rec/tot): 16/42, tx:  0, lsn: 
0/03015AF0, prev 0/03015958, desc: CREATE base/12669/16387
rmgr: Heaplen (rec/tot):  3/  1518, tx:867, lsn: 
0/03015B20, prev 0/03015AF0, desc: INSERT off 8, blkref #0: rel 1663/12669/1247 
blk 8 FPW
<...>
rmgr: Btree   len (rec/tot):  2/72, tx:867, lsn: 
0/03019CD0, prev 0/03019C88, desc: INSERT_LEAF off 114, blkref #0: rel 
1663/12669/2674 blk 22
rmgr: Standby len (rec/tot): 16/42, tx:867, lsn: 
0/03019D18, prev 0/03019CD0, desc: LOCK xid 867 db 12669 rel 16387 
rmgr: Transaction len (rec/tot):784/   813, tx:867, lsn: 
0/03019D48, prev 0/03019D18, desc: PREPARE 
rmgr: Transaction len (rec/tot):380/   409, tx:  0, lsn: 
0/0301A090, prev 0/03019D48, desc: COMMIT_PREPARED 867: 2016-04-08 
14:38:33.347851 MSK;

It looks like that you had stuck in LOCK xid 867 even before replaying PREPARE 
record, so I have not that much ideas on why that can be caused by changing 
procedures of PREPARE replay.

Just to keep things sane, here is my current diff:



twophase_replay.v4.patch
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-06 Thread Stas Kelvich

> On Apr 2, 2016, at 3:14 AM, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Fri, Apr 1, 2016 at 10:53 PM, Stas Kelvich <s.kelv...@postgrespro.ru> 
> wrote:
>> I wrote:
>>> While testing the patch, I found a bug in the recovery conflict code
>>> path. You can do the following to reproduce it:
>>> 1) Start a master with a standby
>>> 2) prepare a transaction on master
>>> 3) Stop immediate on standby to force replay
>>> 4) commit prepare transaction on master
>>> 5) When starting the standby, it remains stuck here:
>> 
>> Hm, I wasn’t able to reproduce that. Do you mean following scenario or am I 
>> missing something?
>> 
>> (async replication)
>> 
>> $node_master->psql('postgres', "
>>begin;
>>insert into t values (1);
>>prepare transaction 'x';
>> ");
>> $node_slave->teardown_node;
>> $node_master->psql('postgres',"commit prepared 'x'");
>> $node_slave->start;
>> $node_slave->psql('postgres',"select count(*) from pg_prepared_xacts", 
>> stdout => \$psql_out);
>> is($psql_out, '0', "Commit prepared on master while slave is down.");
> 
> Actually, not exactly, the transaction prepared on master created a
> table. Sorry for the lack of precisions in my review.

Sorry for delay.

Actually I can’t reproduce that again, tried with following tx:

begin;
insert into t values(0);
create table t1(id int);
insert into t1 values(1);
create table t2(id int);
insert into t2 values(2);
savepoint s1;
drop table t1;
select * from t for update;
select * from t2 for share;
prepare transaction 'x’;

use strict;
use warnings;
use PostgresNode;
use TestLib;
use Test::More tests => 2;

# Setup master node
my $node_master = get_new_node("master");
$node_master->init(allows_streaming => 1);
$node_master->append_conf('postgresql.conf', qq(
max_prepared_transactions = 10
));
$node_master->start;
$node_master->backup('master_backup');
$node_master->psql('postgres', "create table t(id int)");

# Setup slave node
my $node_slave = get_new_node('slave');
$node_slave->init_from_backup($node_master, 'master_backup', has_streaming => 1);
$node_slave->start;

my $psql_out = '';
my $psql_rc = '';

$node_master->psql('postgres', "
	begin;
	insert into t values(0);
	create table t1(id int);
	insert into t1 values(1);
	create table t2(id int);
	insert into t2 values(2);
	savepoint s1;
	drop table t1;
	select * from t for update;
	select * from t2 for share;
	prepare transaction 'x';
");
sleep 2; # wait for changes to arrive on slave
$node_slave->teardown_node;
$node_master->psql('postgres',"commit prepared 'x'");
$node_slave->start;
$node_slave->psql('postgres',"select count(*) from pg_prepared_xacts", stdout => \$psql_out);
is($psql_out, '0', "Commit prepared on master while slave is down.");
$node_slave->psql('postgres',"select sum(id) from t2", stdout => \$psql_out);
is($psql_out, '2', "Check that tx changes are visible.");


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-04-01 Thread Stas Kelvich

veRecoveryConflictWithLock(locktag=LOCKTAG at
> 0x7fff581efde8) + 187 at standby.c:391
>frame #7: 0x000107e7a6a8
> postgres`ProcSleep(locallock=0x7f90c203dac8,
> lockMethodTable=0x0001082f6218) + 1128 at proc.c:1215
>frame #8: 0x000107e72886
> postgres`WaitOnLock(locallock=0x7f90c203dac8,
> owner=0x) + 358 at lock.c:1703
>frame #9: 0x000107e70f93
> postgres`LockAcquireExtended(locktag=0x7fff581f0238, lockmode=8,
> sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at
> lock.c:998
>frame #10: 0x000107e6a9a6
> postgres`StandbyAcquireAccessExclusiveLock(xid=863, dbOid=16384,
> relOid=16385) + 358 at standby.c:627
>frame #11: 0x000107e6af0b
> postgres`standby_redo(record=0x7f90c2041e38) + 251 at
> standby.c:809
>frame #12: 0x000107b0e227 postgres`StartupXLOG + 9351 at xlog.c:6871
> It seems that the replay on on-memory state of the PREPARE transaction
> is conflicting directly with replay.


-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-03-30 Thread Stas Kelvich

On Mar 29, 2016, at 6:04 PM, David Steele  wrote:It looks like you should post a new patch or respond to Michael's comments.  Marked as "waiting on author".Yep, here it is.On Mar 22, 2016, at 4:20 PM, Michael Paquier  wrote:Looking at this patch….Thanks.+++ b/src/test/recovery/t/006_twophase.pl@@ -0,0 +1,226 @@+# Checks for recovery_min_apply_delay+use strict;This description is wrong, this file has been copied from 005.Yep, done.+my $node_master = get_new_node("Candie");+my $node_slave = get_new_node('Django');Please let's use a node names that are more descriptive.Hm, it’s hard to create descriptive names because test changes master/slave roles for that nodes several times during test. It’s possible to call them “node1” and “node2” but I’m not sure that improves something. But anyway I’m not insisting on that particular names and will agree with any reasonable suggestion.+# Switch to synchronous replication+$node_master->append_conf('postgresql.conf', qq(+synchronous_standby_names = '*'+));+$node_master->restart;Reloading would be fine.Good catch, done.+   /* During replay that lock isn't really necessary, but let's takeit anyway */+   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);+   for (i = 0; i < TwoPhaseState->numPrepXacts; i++)+   {+   gxact = TwoPhaseState->prepXacts[i];+   proc = >allProcs[gxact->pgprocno];+   pgxact = >allPgXact[gxact->pgprocno];++   if (TransactionIdEquals(xid, pgxact->xid))+   {+   gxact->locking_backend = MyBackendId;+   MyLockedGxact = gxact;+   break;+   }+   }+   LWLockRelease(TwoPhaseStateLock);Not taking ProcArrayLock here?All accesses to 2pc dummies in ProcArray are covered with TwoPhaseStateLock, so I thick that’s safe. Also I’ve deleted comment above that block, probably it’s more confusing than descriptive.The comment at the top of XlogReadTwoPhaseData is incorrect.Yep, fixed.RecoverPreparedFromXLOG and RecoverPreparedFromFiles have a lot ofcode in common, having this duplication is not good, and you couldsimplify your patch.I reworked patch to avoid duplicated code between RecoverPreparedFromXLOG/RecoverPreparedFromFiles and also between FinishPreparedTransaction/XlogRedoFinishPrepared.

twophase_replay.v3.diff
Description: Binary data

-- Stas KelvichPostgres Professional: http://www.postgrespro.comRussian Postgres Company

Re: [HACKERS] 2PC support for pglogical

2016-03-24 Thread Stas Kelvich


> On 24 Mar 2016, at 17:03, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Wed, Mar 23, 2016 at 1:44 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:
>> On 10 March 2016 at 22:50, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>>> Hi.
>>> 
>>> Here is proof-of-concept version of two phase commit support for logical
>>> replication.
>> 
>> I missed this when you posted it, so sorry for the late response.
>> 
>> I've read through this but not tested it yet. I really appreciate you doing
>> it, it's been on my wishlist/backburner for ages.
>> 
>> On reading through the patch I noticed that there doesn't seem to be any
>> consideration of locking. The prepared xact can still hold strong locks on
>> catalogs. How's that handled? I think Robert's group locking stuff is what
>> we'll want here - for a prepared xact we can join the prepared xact as a
>> group lock member so we inherit its locks. Right now if you try DDL in a
>> prepared xact I suspect it'll break.
> 
> I have grave doubts about that approach.  It's not impossible it could
> be right, but I wouldn't bet the farm on it.
> 

I think all the locking already handled properly by creating dummy backend in 
PGPROC, as it done in usual postgres 2pc implementation.

Just checked DDL with following scenario:

* prepare tx that inserts a row in table on master
* execute DROP TABLE on pglogical slave
* commit prepared on master

Now it behaves as expected — slave blocks DROP TABLE until commit prepared on 
master.

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Stas Kelvich

> On 22 Mar 2016, at 01:47, Jim Nasby <jim.na...@bluetreble.com> wrote:
> 
> On 3/21/16 11:57 AM, Teodor Sigaev wrote:
>> A and B are points of intersection of lines. So, box PBCAis a bounding
>> box for points contained in 3-rd (see labeling above). For example X
>> labeled point is not a descendace of child node with centroid  C because
>> it must be in branch of 1-st quad of parent node. So, each node (except
>> root) will have a limitation in its quadrant. To transfer that
>> limitation the traversalValue is used.
> 
> Isn't this basically the same thing that the cube contrib module does? (Which 
> has the added benefit of kNN-capable operators).

Cube and postgres own geometric types are indexed by building R-tree. While 
this is one of the most popular solutions it
degrades almost to linear search when bounding boxes for each index node 
overlaps a lot. This problem will arise when one will
try to index streets in the city with grid system — a lot of street's bounding 
boxes will just coincide with bounding box for whole city.

But that patch use totally different strategy for building index and do not 
suffer from above problem.

> 
> If that's true then ISTM it'd be better to work on pulling cube's features 
> into box?
> 
> If it's not true, I'm still wondering if there's enough commonality here that 
> we should pull cube into core…

There is also intarray module with very common functionality, but it also 
addressing different data pattern.

Optimal indexing and kNN strategy are very sensible to the data itself and if 
we want some general multidimensional search routines,
then I think none of the mentioned extensions is general enough. Cube barely 
applicable when dimensions number is higher than 10-20,
intarray performs bad on data with big sets of possible coordinates, this patch 
is also intended to help with specific, niche problem.

While people tends to use machine learning and regressions models more and more 
it is interesting to have some general n-dim indexing with kNN,
but I think it is different problem and should be solved in a different way.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-21 Thread Stas Kelvich


> On 21 Mar 2016, at 22:38, Kevin Grittner <kgri...@gmail.com> wrote:
> 
> On Mon, Mar 21, 2016 at 11:57 AM, Teodor Sigaev <teo...@sigaev.ru> wrote:
> 
>>> I couldn't get the term 4D point.  Maybe it means that we are
>>> using box datatype as the prefix, but we are not treating them
>>> as boxes.
>> 
>> exactly, we treat boxes as 4-dimentional points.
> 
> I'm not entirely sure I understand the terminology either, since I
> tend to think of a *point* as having *zero* dimensions.  Would it
> perhaps be more accurate to say we are treating a 2-dimensional box
> as a point in 4-dimensional space?

Or just say 4-d vector instead of 4-d point. Look like it will be enough 
rigorous.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fd.c: flush data problems on osx

2016-03-21 Thread Stas Kelvich

On 21 Mar 2016, at 14:53, Andres Freund <and...@anarazel.de> wrote:
> Hm. I think we should rather just skip calling pg_flush_data in the
> directory case, that very likely isn't beneficial on any OS.

Seems reasonable, changed.

> I think we still need to fix the mmap() implementation to support the
> offset = 0, nbytes = 0 case (via fseek(SEEK_END).

It is already in this diff. I’ve added this few messages ago.



flushdata.v4.patch
Description: Binary data

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fd.c: flush data problems on osx

2016-03-21 Thread Stas Kelvich


> On 18 Mar 2016, at 14:45, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>> 
>>> One possible solution for that is just fallback to pg_fdatasync in case 
>>> when offset = nbytes = 0.
>> 
>> Hm, that's a bit heavyweight. I'd rather do an lseek(SEEK_END) to get
>> the file size. Could you test that?
>> 
> 
> It looks like OSX mmap raises EINVAL when length isn’t aligned to pagesize 
> while manual says it can be of arbitrary length, so i aligned it.
> Also there were call to mmap with PROT_READ | PROT_WRITE, but when called 
> from pre_sync_fname file descriptor is just O_RDONLY, so i changed mmap mode 
> to PROT_READ — seems that PROT_WRITE wasn’t needed anyway.
> 
> And all of that reduces number of warnings in order of magnitude but there 
> are still some and I don’t yet understand why are they happening.

I’ve spend some more time on this issue and found that remaining warnings were 
caused by mmap-ing directories — that raises EINVAL in OSX (probably not only 
OSX, but I didn’t tried).
So i’ve skipped mmap for dirs and now restore happens without warnings. Also 
I’ve fixed wrong error check that was in previous version of patch.




flushdata.v3.patch
Description: Binary data


---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] fd.c: flush data problems on osx

2016-03-20 Thread Stas Kelvich

Hi

Current implementation of pg_flush_data when called with zero offset and zero 
nbytes is assumed to flush all file. In osx it uses mmap with these arguments, 
but according to man: 

"[EINVAL]   The len argument was negative or zero. Historically, the 
system call would not return an
error if the argument was zero.  See other potential 
additional restrictions in the COMPAT-
IBILITY section below."

so it is generate a lot of warnings:

"WARNING:  could not mmap while flushing dirty data: Invalid argument"

Call to pg_flush_data with zero offset and nbytes happens when replica starts 
from base backup and fsync=on. TAP test to reproduce is attached.

One possible solution for that is just fallback to pg_fdatasync in case when 
offset = nbytes = 0.

Also there are no default ifdef inside this function, is there any check that 
will guarantee that pg_flush_data will not end up with empty body on some 
platform?

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
# Minimal test testing streaming replication
use strict;
use warnings;
use PostgresNode;
use TestLib;
use Test::More tests => 1;

# Initialize master node
my $node_master = get_new_node('master');
$node_master->init(allows_streaming => 1);
$node_master->append_conf(
	'postgresql.conf', qq(
fsync = 'on'
));
$node_master->start;
my $backup_name = 'my_backup';

# Take backup
$node_master->backup($backup_name);

# Create streaming standby linking to master
my $node_standby = get_new_node('standby');
$node_standby->init_from_backup($node_master, $backup_name);
$node_standby->start;


flushdata.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fd.c: flush data problems on osx

2016-03-19 Thread Stas Kelvich

> On 17 Mar 2016, at 20:23, Andres Freund <and...@anarazel.de> wrote:
> 
>> 
>> Also there are no default ifdef inside this function, is there any
>> check that will guarantee that pg_flush_data will not end up with
>> empty body on some platform?
> 
> There doesn't need to be - it's purely "advisory", i.e. just an
> optimization.

Ah, okay, then I misinterpret purpose of that function and it shouldn’t be 
forced to sync.

> 
>> One possible solution for that is just fallback to pg_fdatasync in case when 
>> offset = nbytes = 0.
> 
> Hm, that's a bit heavyweight. I'd rather do an lseek(SEEK_END) to get
> the file size. Could you test that?
> 

It looks like OSX mmap raises EINVAL when length isn’t aligned to pagesize 
while manual says it can be of arbitrary length, so i aligned it.
Also there were call to mmap with PROT_READ | PROT_WRITE, but when called from 
pre_sync_fname file descriptor is just O_RDONLY, so i changed mmap mode to 
PROT_READ — seems that PROT_WRITE wasn’t needed anyway.

And all of that reduces number of warnings in order of magnitude but there are 
still some and I don’t yet understand why are they happening.

> 
> Greetings,
> 
> Andres Freund
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

flushdata.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-19 Thread Stas Kelvich

On 12 Mar 2016, at 13:19, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Fri, Mar 11, 2016 at 9:35 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> IMO this is not committable as-is, and I don't think that it's something
>> that will become committable during this 'fest.  I think we'd be well
>> advised to boot it to the 2016-09 CF and focus our efforts on other stuff
>> that has a better chance of getting finished this month.
> 
> Yeah, I would believe that a good first step would be to discuss
> deeply about that directly at PGCon for folks that will be there and
> interested in the subject. It seems like a good timing to brainstorm
> things F2F at the developer unconference for example, a couple of
> months before the 1st CF of 9.7. We may perhaps (or not) get to
> cleaner picture of what kind of things are wanted in this area.

To give overview of xtm coupled with postgres_fdw from users perspective i’ve 
packed patched postgres with docker
and provided test case when it is easy to spot violation of READ COMMITTED 
isolation level without XTM.

This test fills database with users across two shards connected by postgres_fdw 
and inherits the same table. Then 
starts to concurrently transfers money between users in different shards:

begin;
update t set v = v - 1 where u=%d; -- this is user from t_fdw1, first shard
update t set v = v + 1 where u=%d; -- this is user from t_fdw2, second shard
commit;

Also test simultaneously runs reader thread that counts all money in system:

select sum(v) from t;

So in transactional system we expect that sum should be always constant (zero 
in our case, as we initialize user with zero balance).
But we can see that without tsdtm total amount of money fluctuates around zero.

https://github.com/kelvich/postgres_xtm_docker

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] async replication code

2016-03-19 Thread Stas Kelvich


> On 16 Mar 2016, at 15:52, otheus uibk <otheus.u...@gmail.com> wrote:
> 
> Greetings,

Hi

> 
> I am new here.  Apologetic question: how can i search the archives of this 
> mailing list? Is there some set of magic words to put in Google? Must I wade 
> through 20 pages of hits? Should I download all the archives and grep? :) 
> 

http://www.postgresql.org/list/

https://wiki.postgresql.org/wiki/Mailing_Lists

> Main question:  I want to understand very precisely the exact algirithm used 
> in PG to do asynchronous streaming replication. Eg, I want to know how the 
> record is written and flushed to the socket and how that happens in time 
> w.r.t WAL-to-disk and response to client. Will someone please give me a head 
> start as to where to begin my code-spelunking? 
> 

WAL synced to disc, then everything else is happening.

http://www.postgresql.org/docs/9.5/static/warm-standby.html#STREAMING-REPLICATION

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2016-03-18 Thread Stas Kelvich


> On 11 Mar 2016, at 19:41, Jesper Pedersen <jesper.peder...@redhat.com> wrote:
> 

Thanks for review, Jesper.

> Some comments:
> 
> * The patch needs a rebase against the latest TwoPhaseFileHeader change

Done.

> * Rework the check.sh script into a TAP test case (src/test/recovery), as 
> suggested by Alvaro and Michael down thread

Done. Originally I thought about reducing number of tests (11 right now), but 
now, after some debugging, I’m more convinced that it is better to include them 
all, as they are really testing different code paths.

> * Add documentation for RecoverPreparedFromXLOG

Done.

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



twophase_replay.v2.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-03-11 Thread Stas Kelvich

> 
> On 11 Mar 2016, at 16:13, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
> 
> 
>> On 10 Mar 2016, at 20:29, Teodor Sigaev <teo...@sigaev.ru> wrote:
>> 
>> I would like to suggest rename both functions to array_to_tsvector and 
>> tsvector_to_array to have consistent name. Later we could add 
>> to_tsvector([regconfig, ], text[]) with morphological processing.
>> 
>> Thoughts?
>> 

Hi, thanks for commit.

I saw errors on windows, here is the fix:



tsvector.fix.patch
Description: Binary data

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-03-11 Thread Stas Kelvich


> On 10 Mar 2016, at 20:29, Teodor Sigaev <teo...@sigaev.ru> wrote:
> 
> I would like to suggest rename both functions to array_to_tsvector and 
> tsvector_to_array to have consistent name. Later we could add 
> to_tsvector([regconfig, ], text[]) with morphological processing.
> 
> Thoughts?
> 

Seems reasonable, done.



tsvector_ops-v6.diff
Description: Binary data


---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 2PC support for pglogical

2016-03-10 Thread Stas Kelvich

Hi.

Here is proof-of-concept version of two phase commit support for logical 
replication. There are some changes in core postgres, pglogical_output and 
pglogical extensions. I’ve used version from 2nd Quadrant repo, pglogical 
branch and rebased it to current head. Some notes about this patch:

* LWLockAssign() was deleted, so I changed that to new locks tranche api.

* Seems that only reliable way to get GID during replay of commit/rollback 
prepared is to force postgres to write GID in corresponding records, otherwise 
we can lose correspondence between xid and gid  if we are replaying data from 
wal sender while transaction was commited some time ago. So i’ve changed 
postgres to write gid’s not only on prepare, but also on commit/rollback 
prepared. That should be done only in logical level, but now I just want to 
here some other opinions on that.

* Abort prepared xlog record also lack database information. Normally logical 
decoding just cleans reorder buffer when facing abort, but in case of 2PC we 
should send it to callbacks anyway. So I’ve added that info to abort records.

* Prepare emits xlog record with TwoPhaseFileHeader in it and that structure is 
the same as xl_xact_parsed_commit, but with some fields renamed. Probably that 
is just due to historical reasons. It is possible to change PREPARE to write 
ordinary commit records with some flag and then use the same infrastructure to 
parse it. So DecodePrepare/DecodeCommit can be done with the same function. 





pglogical_twophase.diff
Description: Binary data


---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transam README small fix

2016-03-04 Thread Stas Kelvich

Thanks.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


> On 04 Mar 2016, at 22:14, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Tue, Mar 1, 2016 at 4:31 AM, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>> Transaction function call sequence description in transam/README is slightly 
>> outdated. Select now handled by PortalRunSelect instead of ProcessQuery. It 
>> is also hard to follow what tabulation there means — sometimes that means 
>> “function called by function”, sometimes it isn't. So I’ve also changed it 
>> to actual call nesting.
> 
> After some study, this looks good to me, so committed.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] transam README small fix

2016-03-01 Thread Stas Kelvich

Hi,

Transaction function call sequence description in transam/README is slightly 
outdated. Select now handled by PortalRunSelect instead of ProcessQuery. It is 
also hard to follow what tabulation there means — sometimes that means 
“function called by function”, sometimes it isn't. So I’ve also changed it to 
actual call nesting.




transam.readme.patch
Description: Binary data

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-02-17 Thread Stas Kelvich

> 
> On 02 Feb 2016, at 20:10, Teodor Sigaev <teo...@sigaev.ru> wrote:
> 
> Some notices:
> 
> 1 tsin in documentation doesn't look like a good name. Changed to vector 
> similar to other places.
> 
> 2 I did some editorization about freeing memory/forgotten names etc

Thanks.

> 
> 3 It seems to me that tsvector_unnest() could be seriously optimized for
> large tsvectors: with current coding it detoasts/decompresses tsvector value 
> on each call. Much better to do it once in
> multi_call_memory_ctx context at first call init

Done, moved detoasting to first SRF call.

> 4 It seems debatable returning empty array for position/weight if they are 
> absent:
> =# select * from unnest('a:1 b'::tsvector);
> lexeme | positions | weights
> +---+-
> a  | {1}   | {D}
> b  | {}| {}
> I think, it's better to return NULL in this case
> 

Okay, done.

> 5 
> array_to_tsvector/tsvector_setweight_by_filter/tsvector_delete_arr/tsvector_filter
>  doesn't check or pay attention to NULL elements in input arrays
> 

Thanks! Fixed and added tests.

> -- 
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>   WWW: http://www.sigaev.ru/
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



tsvector_ops-v4.diff
Description: Binary data


---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsearch_extras extension

2016-02-17 Thread Stas Kelvich


> On 17 Feb 2016, at 08:14, Oleg Bartunov <obartu...@gmail.com> wrote:
> 
> 
> 
> On Wed, Feb 17, 2016 at 6:57 AM, Tim Abbott <tabb...@mit.edu> wrote:
> Just following up here since I haven't gotten a reply -- I'd love to work 
> with someone from the Postgres community on a plan to make the tsearch_extras 
> functionality available as part of mainline postgres.
> 
> 
> -Tim Abbott
> 
> On Wed, Feb 3, 2016 at 9:41 PM, Tim Abbott <tabb...@mit.edu> wrote:
> Hello,
> 
> I'm a maintainer of the Zulip open source group chat application.  Zulip 
> depends on a small (~200 lines) postgres extension called tsearch_extras 
> (https://github.com/zbenjamin/tsearch_extras) that returns the actual 
> (offset, length) pairs of all the matches for a postgres full text search 
> query.  This extension allows Zulip to do its own highlighting of the full 
> text search matches, using a more complicated method than what Postgres 
> supports natively.  
> 
> I think tsearch_extras is probably of value to others using postgres 
> full-text search (and I'd certainly love to get out of the business of 
> maintaining an out-of-tree postgres extension), so I'm wondering if this 
> feature (or a variant of it) would be of interest to upstream?  
> 
> Thanks!
> 
> -Tim Abbott
> 
> (See 
> http://www.postgresql.org/message-id/flat/52c7186d.8010...@strangersgate.com#52c7186d.8010...@strangersgate.com
>  for the discussion on postgres mailing lists that caused us to write this 
> module in the first place.)
> 
> Tim,
> 
> take a look on this patch (https://commitfest.postgresql.org/7/385/) and 
> contact author.  It contains everything you need to your purposes.
> 
> btw, Stas, check on status "Returned with feedback" !
> 
> 
> Regards,
> Oleg 
> 

Hi,

Yes, this extension have some common functionality with patch. Particularly 
yours tsvector_lexemes and mine to_array do the same thing. I wasn’t aware of 
you patch when started to work on that, so I think we can ask commiter to 
mention you in commit message (if it that will be commited).

And how do you use ts_match_locs_array / ts_match_locs_array? To highlight 
search results? There is function called ts_headline that can mark matches with 
custom start/stop strings.

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-04 Thread Stas Kelvich

Hi.

I’ve looked over proposed patch and migrated my shell tests scripts that i’ve 
used for testing twophase commits on master/slave to this test framework. 
Everything looks mature, and I didn’t encountered any problems with writing new 
tests using this infrastructure.

>From my point of view I don’t see any problems to commit this patches in their 
>current state.

Also some things that came into mind about test suite:

0) There are several routines that does actual checking, like 
is/command_ok/command_fails. I think it will be very handy to have wrappers 
psql_ok/psql_fails that calls psql through the command_ok/fails.

1) Better to raise more meaningful error when IPC::Run is absend.

2) --enable-tap-tests deserves mention in test/recovery/README and more obvious 
error message when one trying to run make check in test/recovery without 
--enable-tap-tests.

3) Is it hard to give ability to run TAP tests in extensions?

4) It will be handy if make check will write path to log files in case of 
failed test.

5) psql() accepts database name as a first argument, but everywhere in tests it 
is ‘postgres’. Isn’t it simpler to store dbname in connstr, and have separate 
function to change database?

6) Clean logs on prove restart? Clean up tmp installations?

7) Make check sets PGPORT PG_REGRESS for prove. Is it necessary?

> On 22 Jan 2016, at 09:17, Michael Paquier <michael.paqu...@gmail.com> wrote:
> 
> On Mon, Dec 21, 2015 at 4:45 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> As this thread is stalling a bit, please find attached a series of
>> patch gathering all the pending issues for this thread:
>> - 0001, fix config_default.pl for MSVC builds to take into account TAP tests
>> - 0002, append a node name in get_new_node (per Noah's request)
>> - 0003, the actual recovery test suite
>> Hopefully this facilitates future reviews.
> 
> Patch 2 has been pushed as c8642d9 (thanks Alvaro). The remaining two
> patches still apply and pass cleanly.
> -- 
> Michael
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tsvector editing functions

2016-01-27 Thread Stas Kelvich

Hi

> On 22 Jan 2016, at 19:03, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:
> OK, although I do recommend using more sensible variable names, i.e. why how 
> to use 'lexemes' instead of 'lexarr' for example? Similarly for the other 
> functions.


Changed. With old names I tried to follow conventions in surrounding code, but 
probably that is a good idea to switch to more meaningful names in new code.

>> 
>> 
>> delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions 
>> of tsv_filter from tsin. When lexeme in tsv_filter has no positions function 
>> will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme 
>> have positions function will delete them from positions of matching lexeme 
>> in tsin. If after such removal resulting positions set is empty then 
>> function will delete that lexeme from resulting tsvector.
>> 
> 
> I can't really imagine situation in which I'd need this, but if you do have a 
> use case for it ... although in the initial paragraph you say "... but if 
> somebody wants to delete for example ..." which suggests you may not have 
> such use case.
> 
> Based on bad experience with extending API based on vague ideas, I recommend 
> only really adding functions with existing need. It's easy to add a function 
> later, much more difficult to remove it or change the signature.

I tried to create more or less self-contained api, e.g. have ability to negate 
effect of concatenation. But i’ve also asked people around what they think 
about extending API and everybody convinced that it is better to stick to 
smaller API. So let’s drop it. At least that functions exists in mail list in 
case if somebody will google for such kind of behaviour.

>> 
>> Also if we want some level of completeness of API and taking into account 
>> that concat() function shift positions on second argument I thought that it 
>> can be useful to also add function that can shift all positions of specific 
>> value. This helps to undo concatenation: delete one of concatenating 
>> tsvectors and then shift positions in resulting tsvector. So I also wrote 
>> one another small function:
>> 
>> shift(tsin tsvector,offset int16) — Shift all positions in tsin by given 
>> offset
> 
> That seems rather too low-level. Shouldn't it be really built into delete() 
> directly somehow?


I think it is ambiguous task on delete. But if we are dropping support of 
delete(tsvector, tsvector) I don’t see points in keeping that functions.

>>> 
>>> 7) Some of the functions use intexterm that does not match the function
>>>   name. I see two such cases - to_tsvector and setweight. Is there a
>>>   reason for that?
>>> 
>> 
>> Because sgml compiler wants unique indexterm. Both functions that
>> youmentioned use overloading of arguments and have non-unique name.
> 
> As Michael pointed out, that should probably be handled by using  
> and  tags.


Done.


> On 19 Jan 2016, at 00:21, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> 
> 
> It's a bit funny that you reintroduce the "unrecognized weight: %d"
> (instead of %c) in tsvector_setweight_by_filter.
> 


Ah, I was thinking about moving it to separate diff and messed. Fixed and 
attaching diff with same fix for old tsvector_setweight.




tsvector_ops-v2.1.diff
Description: Binary data


tsvector_ops-v2.2.diff
Description: Binary data



---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-01-27 Thread Stas Kelvich

Hi.

I tried that and confirm strange behaviour. It seems that problem with small 
cyrillic letter ‘х’. (simplest obscene language filter? =)

That can be reproduced with simpler test

Stas



test.c
Description: Binary data

 
> On 27 Jan 2016, at 13:59, Artur Zakirov  wrote:
> 
> On 27.01.2016 13:46, Shulgin, Oleksandr wrote:
>> 
>> Not sure why the file uses "SET KOI8-R" directive then?
>> 
> 
> This directive is used only by Hunspell program. PostgreSQL ignores this 
> directive and assumes that input affix and dictionary files in the UTF-8 
> encoding.
> 
>> 
>> 
>> What error message do you get with this test program?  (I don't get any,
>> but I'm not on Mac OS.)
>> --
>> Alex
>> 
>> 
> 
> With this program you will get wrong output. A error message is not called. 
> You can execute the following commands:
> 
> > cc test.c -o test
> > ./test
> 
> You will get the output:
> 
> SFX/Y/?/аться/шутся
> 
> Although the output should be:
> 
> SFX/Y/хаться/шутся/хаться
> 
> -- 
> Artur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 128 matches

Mail list logo