Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2024-04-30 Thread Maksim Milyutin


On 29.04.2024 23:59, Thomas Munro wrote:

On Tue, Apr 30, 2024 at 7:17 AM Thomas Munro  wrote:

On Tue, Apr 30, 2024 at 6:47 AM Maksim Milyutin  wrote:

Should not we call at the end the StrategyFreeBuffer() function to add target 
buffer to freelist and not miss it after invalidation?

Please take a look at this issue, current implementation of 
EvictUnpinnedBuffer() IMO is erroneous - evicted buffers are lost permanently 
and will not be reused again

I don't think that's true: it is not lost permanently, it'll be found
by the regular clock hand.  Perhaps it should be put on the freelist
so it can be found again quickly, but I'm not sure that's a bug, is
it?



Yeah, you are right. Thanks for clarification.

CLOCK algorithm will reuse it eventually but being of evicted cleared 
buffer in freelist might greatly restrict the time of buffer allocation 
when all others buffers were in use.


--
Best regards,
Maksim Milyutin


Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2024-04-29 Thread Maksim Milyutin

On 14.04.2024 21:16, Maksim Milyutin wrote:


On 07.04.2024 02:07, Thomas Munro wrote:


So this is the version I plan to commit.

+bool
+EvictUnpinnedBuffer(Buffer buf)
+{
...
+/* This will return false if it becomes dirty or someone else pins it. */
+result = InvalidateVictimBuffer(desc);
+
+UnpinBuffer(desc);
+
+return result;
+}



Hi, Thomas!

Should not we call at the end the StrategyFreeBuffer() function to add 
target buffer to freelist and not miss it after invalidation?




Hello everyone!

Please take a look at this issue, current implementation of 
EvictUnpinnedBuffer() IMO is erroneous - evicted buffers are lost 
permanently and will not be reused again


--
Best regards,
Maksim Milyutin


Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2024-04-14 Thread Maksim Milyutin

On 07.04.2024 02:07, Thomas Munro wrote:


So this is the version I plan to commit.

+bool
+EvictUnpinnedBuffer(Buffer buf)
+{
...
+/* This will return false if it becomes dirty or someone else pins it. */
+result = InvalidateVictimBuffer(desc);
+
+UnpinBuffer(desc);
+
+return result;
+}



Hi, Thomas!

Should not we call at the end the StrategyFreeBuffer() function to add 
target buffer to freelist and not miss it after invalidation?


--
Best regards,
Maksim Milyutin


Re: Add client connection check during the execution of the query

2021-10-11 Thread Maksim Milyutin

On 12.06.2021 07:24, Thomas Munro wrote:


On Fri, Apr 30, 2021 at 2:23 PM Thomas Munro  wrote:

Here's something I wanted to park here to look into for the next
cycle:  it turns out that kqueue's EV_EOF flag also has the right
semantics for this.  That leads to the idea of exposing the event via
the WaitEventSet API, and would the bring
client_connection_check_interval feature to 6/10 of our OSes, up from
2/10.  Maybe Windows' FD_CLOSE event could get us up to 7/10, not
sure.

Rebased.  Added documentation tweak and a check to reject the GUC on
unsupported OSes.



Good work. I have tested your patch on Linux and FreeBSD on three basic 
cases: client killing, cable breakdown (via manipulations with firewall) 
and silent closing client connection before completion of previously 
started query in asynchronous manner. And all works fine.


Some comments from me:

 bool
 pq_check_connection(void)
 {
-#if defined(POLLRDHUP)
+    WaitEvent events[3];


3 is looks like as magic constant. We might to specify a constant for 
all event groups in FeBeWaitSet.



+    ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, NULL);
+    ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, 
WL_SOCKET_CLOSED, NULL);

+    rc = WaitEventSetWait(FeBeWaitSet, 0, events, lengthof(events), 0);
+    ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET, 
MyLatch);


AFAICS, side effect to 
(FeBeWaitSet->events[FeBeWaitSetSocketPos]).events by setting 
WL_SOCKET_CLOSED value under calling of pq_check_connection() function 
doesn't have negative impact later, does it? That is, all 
WaitEventSetWait() calls have to setup socket events on its own from 
scratch.



--
Regards,
Maksim Milyutin





Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin



On 20.04.2021 19:38, Tomas Vondra wrote:


On 4/20/21 6:23 PM, Aleksander Alekseev wrote:

Hi Ondřej,

Thanks for the report. It seems to be a clear violation of what is
promised in the docs. Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql. So this might be not such a critical issue after
all. BTW what version of PostgreSQL are you using?


Which part of the docs does this contradict?



I think, Aleksandr refers to the following phrase in docs:

"The guarantee we offer is that the application will not receive 
explicit acknowledgment of the successful commit of a transaction until 
the WAL data is known to be safely received by all the synchronous 
standbys." [1]


And IMO confusing here regards to the notion of `successful commit`. 
Does warning attached to received commit message make it not 
*successful*? I think we have to explicitly mention cases about 
cancellation and termination session in docs to avoid ambiguity in 
understanding of phrase above.



1. 
https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA


--
Regards,
Maksim Milyutin





Re: Synchronous commit behavior during network outage

2021-04-20 Thread Maksim Milyutin

Hi!


This is a known issue with synchronous replication [1]. You might inject 
into unmodified operation some dummy modification to overcome the 
negative sides of such partially committing without source code patching.



On 20.04.2021 19:23, Aleksander Alekseev wrote:

Although it's unlikely that someone implemented
an application which deals with important data and "pressed Ctr+C" as
it's done in psql.



Some client libraries have feature to cancel session that has similar 
effect to "Ctrl+C" from psql after specified by client deadline 
expiration [2]. Hence, this case might be quite often when application 
interacts with database.




On Mon, Apr 19, 2021 at 10:13 PM Ondřej Žižka  wrote:

 From the synchronous_commit=remote_write level and "higher", I would expect, 
that when the remote application (doesn't matter if flush, write or apply) would not be 
applied I would not receive a confirmation about the commit (even with a warning). 
Something like, if there is no commit from sync replica, there is no commit on primary 
and if someone performs the steps above, the whole transaction will not send a 
confirmation.



The warning have to be accounted here and performed commit have not to 
be treated as *successful*.



1. 
https://www.postgresql.org/message-id/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru


2. 
https://www.postgresql.org/message-id/CANtu0ogbu%2By6Py963p-zKJ535b8zm5AOq7zkX7wW-tryPYi1DA%40mail.gmail.com



--
Regards,
Maksim Milyutin





Re: Add client connection check during the execution of the query

2021-03-29 Thread Maksim Milyutin

Hi Thomas! Thanks for working on this patch.

I have attached a new version with some typo corrections of doc entry, 
removing of redundant `include` entries and trailing whitespaces. Also I 
added in doc the case when single query transaction with disconnected 
client might be eventually commited upon completion in autocommit mode 
if it doesn't return any rows (doesn't communicate with user) before 
sending final commit message. This behavior might be unexpected for 
clients and hence IMO it's worth noticing.



> diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
> index 4c7b1e7bfd..8cf95d09a4 100644
> --- a/src/backend/libpq/pqcomm.c
> +++ b/src/backend/libpq/pqcomm.c
>
> @@ -919,13 +923,13 @@ socket_set_nonblocking(bool nonblocking)
>  }
>
>  /* 
> - *    pq_recvbuf - load some bytes into the input buffer
> + *    pq_recvbuf_ext - load some bytes into the input buffer
>   *
>   *    returns 0 if OK, EOF if trouble
>   * 
>   */
>  static int
> -pq_recvbuf(void)
> +pq_recvbuf_ext(bool nonblocking)
>  {
>  if (PqRecvPointer > 0)
>  {
> @@ -941,8 +945,7 @@ pq_recvbuf(void)
>  PqRecvLength = PqRecvPointer = 0;
>  }
>
> -    /* Ensure that we're in blocking mode */
> -    socket_set_nonblocking(false);
> +    socket_set_nonblocking(nonblocking);
>
>  /* Can fill buffer from PqRecvLength and upwards */
>  for (;;)
> @@ -954,6 +957,9 @@ pq_recvbuf(void)
>
>  if (r < 0)
>  {
> +    if (nonblocking && (errno == EAGAIN || errno == 
EWOULDBLOCK))

> +    return 0;
> +
>  if (errno == EINTR)
>  continue;    /* Ok if interrupted */
>
> @@ -981,6 +987,13 @@ pq_recvbuf(void)
>  }
>  }
>
> +static int
> +pq_recvbuf(void)
> +{
> +    return pq_recvbuf_ext(false);
> +}
> +
> +

AFAICS, the above fragment is not related with primary fix directly.


AFAICS, there are the following open items that don't allow to treat the 
current patch completed:


* Absence of tap tests emulating some scenarios of client disconnection. 
IIRC, you wanted to rewrite the test case posted by Sergey.


* Concerns about portability of `pq_check_connection()`A implementation. 
BTW, on windows postgres with this patch have not been built [1].


* Absence of benchmark results to show lack of noticeable performance 
regression after applying non-zero timeout for checking client liveness.



1. 
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131820



--
Regards,
Maksim Milyutin

>From 3ec788ac5e7c47fe135a3618849db179942f4b27 Mon Sep 17 00:00:00 2001
From: Maksim Milyutin 
Date: Fri, 26 Mar 2021 10:18:30 +0300
Subject: [PATCH v9] Detect POLLHUP while running queries

Provide a new optional GUC that can be used to check whether the client
connection has gone away periodically while running very long queries.

Author: Sergey Cherkashin 
Author: Thomas Munro 
Reviewed-by: Thomas Munro 
Reviewed-by: Tatsuo Ishii 
Reviewed-by: Konstantin Knizhnik 
Reviewed-by: Zhihong Yu 
Reviewed-by: Andres Freund 
Reviewed-by: Maksim Milyutin 
Reviewed-by: Tom Lane  (much earlier version)
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
---
 doc/src/sgml/config.sgml  | 36 +++
 src/backend/libpq/pqcomm.c| 63 +--
 src/backend/tcop/postgres.c   | 27 
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 10 +++
 src/backend/utils/misc/guc.c  | 10 +++
 src/backend/utils/misc/postgresql.conf.sample |  3 +
 src/include/libpq/libpq.h |  2 +
 src/include/miscadmin.h   |  1 +
 src/include/utils/timeout.h   |  1 +
 10 files changed, 150 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ddc6d789d8..abec47ada7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -998,6 +998,42 @@ include_dir 'conf.d'
   
  
 
+ 
+  client_connection_check_interval (integer)
+  
+   client_connection_check_interval configuration parameter
+  
+  
+  
+   
+Sets the time interval between checks that the client is still
+connected, while running queries. The check is performed by polling the
+socket to allows long running queries to be aborted immediately.
+   
+   
+If this value is specified without units, it is taken as milliseconds.
+The default value is 0, which disables connection
+checks. Without connection checks, the server will detect the loss of
+the connection only after next interaction w

Re: Disallow cancellation of waiting for synchronous replication

2020-01-15 Thread Maksim Milyutin

On 15.01.2020 01:53, Andres Freund wrote:


On 2020-01-12 16:18:38 +0500, Andrey Borodin wrote:

11 янв. 2020 г., в 7:34, Bruce Momjian  написал(а):

Actually, it might be worse than that.  In my reading of
RecordTransactionCommit(), we do this:

write to WAL
flush WAL (durable)
make visible to other backends
replicate
communicate to the client

I think this means we make the transaction commit visible to all
backends _before_ we replicate it, and potentially wait until we get a
replication reply to return SUCCESS to the client.

No. Data is not visible to other backend when we await sync rep.

Yea, as the relevant comment in RecordTransactionCommit() says;

 * Note that at this stage we have marked clog, but still show as 
running
 * in the procarray and continue to hold locks.
 */
if (wrote_xlog && markXidCommitted)
SyncRepWaitForLSN(XactLastRecEnd, true);


But it's worthwhile to emphasize that data at that stage actually *can*
be visible on standbys. The fact that the transaction still shows as
running via procarray, on the primary, does not influence visibility
determinations on the standby.



In common case, consistent reading in cluster (even if remote_apply is 
on) is available (and have to be) only on master node. For example, if 
random load balancer on read-only queries is established above master 
and sync replicas (while meeting remote_apply is on) it's possible to 
catch the case when preceding reads would return data that will be 
absent on subsequent ones.
Moreover, such visible commits on sync standby are not durable from the 
point of cluster view. For example, if we have two sync standbys then 
under failover we can switch master to sync standby on which waiting 
commit was not replicated but it was applied (and visible) on other 
standby. This switching is completely safe because client haven't 
received acknowledge on commit request and that transaction is in 
indeterminate state, it can be as committed so aborted depending on 
which standby will be promoted.



--
Best regards,
Maksim Milyutin





Re: Disallow cancellation of waiting for synchronous replication

2019-12-28 Thread Maksim Milyutin

On 29.12.2019 00:55, Robert Haas wrote:


On Fri, Dec 20, 2019 at 12:04 AM Andrey Borodin  wrote:

Currently, we can have split brain with the combination of following steps:
0. Setup cluster with synchronous replication. Isolate primary from standbys.
1. Issue upsert query INSERT .. ON CONFLICT DO NOTHING
2. CANCEL 1 during wait for synchronous replication
3. Retry 1. Idempotent query will succeed and client have confirmation of 
written data, while it is not present on any standby.

All that being said, like Tom and Michael, I don't think teaching the
backend to ignore cancels is the right approach. We have had
innumerable problems over the years that were caused by the backend
failing to respond to cancels when we would really have liked it to do
so, and users REALLY hate it when you tell them that they have to shut
down and restart (with crash recovery) the entire database because of
a single stuck backend.



The stuckness of backend is not deadlock here. To cancel waiting of 
backend fluently, client is enough to turn off synchronous replication 
(change synchronous_standby_names through server reload) or change 
synchronous replica to another livable one (again through changing of 
synchronous_standby_names). In first case he explicitly agrees with 
existence of local (not replicated) commits in master.



--
Best regards,
Maksim Milyutin





Re: Disallow cancellation of waiting for synchronous replication

2019-12-26 Thread Maksim Milyutin


On 25.12.2019 13:45, Andrey Borodin wrote:

25 дек. 2019 г., в 15:28, Maksim Milyutin  написал(а):


Synchronous replication
does not guarantee that a committed write is actually on any replica,
but it does in general guarantee that a commit has been replicated
before sending a response to the client. That's arguably more
important because the rest of what the application might depend on the
transaction completing and replicating successfully. I don't know of
cases other than cancellation in which a response is sent to the
client without replication when synchronous replication is enabled.


Yes, at query canceling (e.g. by timeout from client driver) client receives 
response about completed transaction (though with warning which not all client 
drivers can handle properly) and the guarantee about successfully replicated 
transaction *violates*.

We obviously need a design discussion here to address the issue. But the 
immediate question is should we add this topic to January CF items?



+1 on posting this topic to January CF.

Andrey, some fixes from me:

1) pulled out the cancelling of QueryCancelPending from internal branch 
where synchronous_commit_cancelation is set so that to avoid dummy 
iterations with printing message "canceling the wait for ..."


2) rewrote errdetail message under cancelling query: I hold in this case 
we cannot assert that transaction committed locally because its changes 
are not visible as yet so I propose to write about locally flushed 
commit wal record.


Updated patch is attached.

--
Best regards,
Maksim Milyutin

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 5353b6ab0b..844ce36d79 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -80,6 +80,7 @@ bool		DefaultXactDeferrable = false;
 bool		XactDeferrable;
 
 int			synchronous_commit = SYNCHRONOUS_COMMIT_ON;
+bool		synchronous_commit_cancelation = true;
 
 /*
  * When running as a parallel worker, we place only a single
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 7bf7a1b7f8..0a533d2d79 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -250,13 +250,21 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		 */
 		if (ProcDiePending)
 		{
-			ereport(WARNING,
+			if (synchronous_commit_cancelation)
+			{
+ereport(WARNING,
+		(errcode(ERRCODE_ADMIN_SHUTDOWN),
+		errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
+		errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+whereToSendOutput = DestNone;
+SyncRepCancelWait();
+break;
+			}
+
+			ereport(PANIC,
 	(errcode(ERRCODE_ADMIN_SHUTDOWN),
-	 errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
-	 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
-			whereToSendOutput = DestNone;
-			SyncRepCancelWait();
-			break;
+	errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
+	errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
 		}
 
 		/*
@@ -268,11 +276,18 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		if (QueryCancelPending)
 		{
 			QueryCancelPending = false;
+			if (synchronous_commit_cancelation)
+			{
+ereport(WARNING,
+		(errmsg("canceling wait for synchronous replication due to user request"),
+		 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+SyncRepCancelWait();
+break;
+			}
+
 			ereport(WARNING,
-	(errmsg("canceling wait for synchronous replication due to user request"),
-	 errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
-			SyncRepCancelWait();
-			break;
+	(errmsg("canceling wait for synchronous replication due requested, but cancelation is not allowed"),
+	 errdetail("The COMMIT record has already flushed to WAL locally and might not have been replicated to the standby. We must wait here.")));
 		}
 
 		/*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4b4911d5ec..a279faaeb5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1222,6 +1222,15 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"synchronous_commit_cancelation", PGC_USERSET, WAL_SETTINGS,
+			gettext_noop("Allow to cancel waiting for replication of transaction commited localy."),
+			NULL
+		},
+		_commit_ca

Re: Disallow cancellation of waiting for synchronous replication

2019-12-25 Thread Maksim Milyutin

On 25.12.2019 14:27, Marco Slot wrote:




On Wed, Dec 25, 2019, 11:28 Maksim Milyutin <mailto:milyuti...@gmail.com>> wrote:


But in this case locally committed data becomes visible to new
incoming
transactions that is bad side-effect of this issue.


Your application should be prepared for that in any case.

At the point where synchronous replication waits, the commit has 
already been written to disk on the primary. If postgres 
restarts while waiting for replication then the write becomes 
immediately visible regardless of whether it was replicated.



Yes, this write is recovered after starting of instance. At first, this 
case I want to delegate to external HA tool around PostgreSQL. It can 
handle instance stopping and take switchover to sync replica or start 
current instance with closed connections from external users until all 
writes replicate to sync replicas. Later, arguably closing connection 
after recovery process could be implemented inside the kernel.



I don't think throwing a PANIC actually prevents that and if it does 
it's coincidental.



PANIC lets down instance and doesn't provide clients to read locally 
committed data. HA tool takes further steps to close access to these 
data as described above.



That's far from ideal, but fixing it requires a much bigger change to 
streaming replication. The write should be replicated prior to commit 
on the primary, but applied after in a way where unapplied writes on 
the secondary can be overwritten/discarded if it turns out they did 
not commit on the primary.



Thanks for sharing your opinion about enhancement of synchronous commit 
protocol. Here [1] my position is listed. It would like to see positions 
of other members of community.



1. 
https://www.postgresql.org/message-id/f3ffc220-e601-cc43-3784-f9bba66dc382%40gmail.com


--
Best regards,
Maksim Milyutin



Re: Disallow cancellation of waiting for synchronous replication

2019-12-25 Thread Maksim Milyutin

On 21.12.2019 13:34, Marco Slot wrote:


I do agree with the general sentiment that terminating the connection
is preferable over sending a response to the client (except when
synchronous replication was already disabled).



But in this case locally committed data becomes visible to new incoming 
transactions that is bad side-effect of this issue. Under failover those 
changes potentially undo.




Synchronous replication
does not guarantee that a committed write is actually on any replica,
but it does in general guarantee that a commit has been replicated
before sending a response to the client. That's arguably more
important because the rest of what the application might depend on the
transaction completing and replicating successfully. I don't know of
cases other than cancellation in which a response is sent to the
client without replication when synchronous replication is enabled.



Yes, at query canceling (e.g. by timeout from client driver) client 
receives response about completed transaction (though with warning which 
not all client drivers can handle properly) and the guarantee about 
successfully replicated transaction *violates*.



--
Best regards,
Maksim Milyutin





Re: Disallow cancellation of waiting for synchronous replication

2019-12-25 Thread Maksim Milyutin

On 21.12.2019 00:19, Tom Lane wrote:


Three is still a problem when backend is not canceled, but terminated [2].

Exactly.  If you don't have a fix that handles that case, you don't have
anything.  In fact, you've arguably made things worse, by increasing the
temptation to terminate or "kill -9" the nonresponsive session.



I assume that the termination of backend that causes termination of 
PostgreSQL instance in Andrey's patch proposal have to be resolved by 
external HA agents that could interrupt such terminations as parent 
process of postmaster and make appropriate decisions e.g., restart 
PostgreSQL node in closed from external users state (via pg_hba.conf 
manipulation) until all sync replicas synchronize changes from master. 
Stolon HA tool implements this strategy  [1]. This logic (waiting for 
all replicas declared in synchronous_standby_names replicate all WAL 
from master) could be implemented inside PostgreSQL kernel after start 
recovery process before database is opened to users and this can be done 
separately later.


Another approach is to implement two-phase commit over master and sync 
replicas (as it did Oracle in old versions [2]) where the risk to get 
local committed data under instance restarting and query canceling is 
minimal (after starting of final commitment phase). But this approach 
has latency penalty and complexity to resolve partial (prepared but not 
committed) transactions under coordinator (in this case master node) 
failure in automatic mode. Nicely if this approach will be implemented 
later as option of synchronous commit.



1. 
https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md#handling-postgresql-sync-repl-limits-under-such-circumstances


2. 
https://docs.oracle.com/cd/B28359_01/server.111/b28326/repmaster.htm#i33607


--
Best regards,
Maksim Milyutin





Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-03-19 Thread Maksim Milyutin

On 3/16/19 5:32 PM, Robert Haas wrote:


There's only one query ID field available, and
you can't use two extensions that care about query ID unless they
compute it the same way, and replicating all the code that computes
the query ID into each new extension that wants one sucks.  I think we
should actually bite the bullet and move all of that code into core,
and then just let extensions say whether they care about it getting
set.



+1.

But I think that enough to integrate into core the query normalization 
routine and store generalized query strings (from which the queryId is 
produced) in shared memory (for example, hashtable that maps queryId to 
the text representation of generalized query). And activate 
normalization routine and filling the table of generalized queries by 
specified GUC.


This allows to unbind extensions that require queryId from using 
pg_stat_statements and consider such computing of queryId as canonical.



--
Regards,
Maksim Milyutin




Re: O_DIRECT for relations and SLRUs (Prototype)

2019-01-15 Thread Maksim Milyutin

On 1/15/19 11:28 AM, Michael Paquier wrote:


On Tue, Jan 15, 2019 at 11:19:48AM +0500, Andrey Borodin wrote:

Is it possible to avoid those memcpy's by aligning available buffers
instead?  I couldn't understand this from the patch and this thread.

Sure, it had better do that.  That's just a lazy implementation.



Hi!

Could you specify all cases when buffers will not be aligned with BLCKSZ?

AFAIC shared and temp buffers are aligned. And what ones are not?


--
Regards, Maksim Milyutin




Re: Hint to set owner for tablespace directory

2018-10-01 Thread Maksim Milyutin

01.10.2018 15:15, Peter Eisentraut wrote:


On 24/09/2018 14:50, Peter Eisentraut wrote:

On 11/09/2018 17:10, Peter Eisentraut wrote:

On 07/09/2018 17:59, Maksim Milyutin wrote:

those directories was that user). The error message "could not set
permissions on directory ..." disoriented that user. The need to change
the owner of those directories came after careful reading of
documentation. I think it would be helpful to show the proposed hint to
more operatively resolve the problem.

I think it might be worth clarifying the documentation instead.  I'm
looking at the CREATE TABLESPACE reference page and it's not super clear
on first reading.

How about the attached patch?

I have committed this patch and will close this commitfest item.  I
don't think the idea with the hints in the code was going anywhere.



Yes, thanks. Sorry for so delayed answer, I was busy. You have clearly 
described the workflow to create tablespace (including *chown* command) 
and in general it is enough to resolve the issue with  wrong owner of 
tablespace directory.


--
Regards,
Maksim Milyutin




Re: Hint to set owner for tablespace directory

2018-09-07 Thread Maksim Milyutin

On 09/07/2018 06:05 PM, Peter Eisentraut wrote:


On 07/09/2018 11:59, Maksim Milyutin wrote:

OK. However I think it would be helpful to leave the mention about
setting necessary owner for tablespace directory. My final version of
hint is "You might need to fix permissions on this directory or its
parents or install the PostgreSQL system user as the owner of this
directory." And updated version of patch is attached.

You are still proposing to hint that the permissions on the tablespace
directory might be correct but that the main database instance is
running under the wrong user.


Not exactly this way. The case that motivated me to make this patch was 
the novice user after installing postgres from package (with setting up 
postgres user and initializing PGDATA with right permissions) tried to 
create the necessary tablespace directories from himself (the owner of 
those directories was that user). The error message "could not set 
permissions on directory ..." disoriented that user. The need to change 
the owner of those directories came after careful reading of 
documentation. I think it would be helpful to show the proposed hint to 
more operatively resolve the problem.


--
Regards, Maksim Milyutin




Re: Hint to set owner for tablespace directory

2018-09-07 Thread Maksim Milyutin

On 08/31/2018 04:59 PM, Tom Lane wrote:


Maksim Milyutin  writes:

30.08.2018 19:52, Peter Eisentraut wrote:

I think the hint is backwards.  When you don't have permission to chmod
the tablespace directory, you probably want to fix the permissions on
the tablespace directory or its parent.

In this case I propose to:
- replace my initial hint message to the guess what to do if errno ==
EPERM, smth like "You might need to install the PostgreSQL system user
as the owner of this directory";
- add another hint if errno == EACCES: "Fix permissions on the parent
directories".

I agree with what I think Peter is saying: the hint should just recommend
fixing permissions on the directory, for either errno code.  The more
detail you try to give, the more likely the hint will be wrong
... especially when you're dealing with errno codes that aren't all that
well standardized, as these aren't.


OK. However I think it would be helpful to leave the mention about 
setting necessary owner for tablespace directory. My final version of 
hint is "You might need to fix permissions on this directory or its 
parents or install the PostgreSQL system user as the owner of this 
directory." And updated version of patch is attached.


--
Regards, Maksim Milyutin

diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index f7e9160..1478462 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -585,10 +585,15 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid)
 	 InRecovery ? errhint("Create this directory for the tablespace before "
 		  "restarting the server.") : 0));
 		else
+		{
 			ereport(ERROR,
 	(errcode_for_file_access(),
 	 errmsg("could not set permissions on directory \"%s\": %m",
-			location)));
+			location),
+	 errhint("You might need to fix permissions on this "
+			 "directory or its parents or install the PostgreSQL "
+			 "system user as the owner of this directory.")));
+		}
 	}
 
 	if (InRecovery)


Re: Hint to set owner for tablespace directory

2018-08-31 Thread Maksim Milyutin

30.08.2018 19:52, Peter Eisentraut wrote:


On 23/08/2018 13:24, Maksim Milyutin wrote:

I have noticed the novice users are stuck trying to create tablespace
over a directory whose owner is not the system postgres user. They
observed the message "could not set permissions on directory ...:
permission denied".

I want to add patch that prints hint to set required owner for the
tablespace directory if this is the cause of the problem (*errno ==
EPERM* after calling *chmod*).

I think the hint is backwards.  When you don't have permission to chmod
the tablespace directory, you probably want to fix the permissions on
the tablespace directory or its parent.


According with man page of chmod(2) when permissions on parent 
directories are insufficient the errno takes the EACCES value. However, 
in Linux man page of chmod(2) there been mentioned that the process 
could be not privileged (does not have the CAP_FOWNER capability) when 
errno is EPERM.



But the hint says to run the database server as a different user.  That's a bit 
unusual.


In this case I propose to:
- replace my initial hint message to the guess what to do if errno == 
EPERM, smth like "You might need to install the PostgreSQL system user 
as the owner of this directory";
- add another hint if errno == EACCES: "Fix permissions on the parent 
directories".


Any thoughts?

--
Regards,
Maksim Milyutin




Re: Hint to set owner for tablespace directory

2018-08-31 Thread Maksim Milyutin

On 08/31/2018 01:12 PM, Rafia Sabih wrote:




On Fri, Aug 31, 2018 at 3:30 PM, Maksim Milyutin <mailto:milyuti...@gmail.com>> wrote:


On 08/31/2018 11:55 AM, Rafia Sabih wrote:



Adding to that this patch needs a rebase. And, please don't
forget to run 'git diff --check', as it has some white-space issues.


 git apply /home/edb/Desktop/patches/others/owner_tbsp_hint.patch
 /home/edb/Desktop/patches/others/owner_tbsp_hint.patch:9: trailing 
whitespace.

        {
/home/edb/Desktop/patches/others/owner_tbsp_hint.patch:10: trailing 
whitespace.

            bool wrong_owner = (errno == EPERM);


This is hex+ASCII display (from *hexdump -C owner_tbsp_hint.patch* 
output) of code fragment above:


01a0  0a 20 09 09 65 6c 73 65  0a 2b 09 09 7b 0a 2b 09  |. 
..else.+..{.+.|
01b0  09 09 62 6f 6f 6c 20 77  72 6f 6e 67 5f 6f 77 6e |..bool 
wrong_own|
01c0  65 72 20 3d 20 28 65 72  72 6e 6f 20 3d 3d 20 45 |er = (errno 
== E|
01d0  50 45 52 4d 29 3b 0a 2b  0a 20 09 09 09 65 72 65 |PERM);.+. 
...ere|


Problem lines don't have trailing whitespaces, only line feed (0x0a) symbols

This is what I got. Please correct me if I am missng anything. I am 
using centos 7,  just an FYI.


My git version is 2.7.4.

--
Regards, Maksim Milyutin



Re: Hint to set owner for tablespace directory

2018-08-31 Thread Maksim Milyutin

On 08/31/2018 11:55 AM, Rafia Sabih wrote:



Adding to that this patch needs a rebase. And, please don't forget to 
run 'git diff --check', as it has some white-space issues.


Hmm, it's odd. Provided patch is fluently applied on the current master 
HEAD (2e39f69) and *git diff --check* doesn't print any warnings.


--
Regards, Maksim Milyutin



Re: Hint to set owner for tablespace directory

2018-08-24 Thread Maksim Milyutin

On 08/24/2018 05:18 AM, Michael Paquier wrote:


On Thu, Aug 23, 2018 at 02:24:25PM +0300, Maksim Milyutin wrote:

I want to add patch that prints hint to set required owner for the
tablespace directory if this is the cause of the problem (*errno == EPERM*
after calling *chmod*).

Please do not forget to add this patch to the next commit fest.


Thanks, done.

--
Regards,
Maksim Milyutin




Hint to set owner for tablespace directory

2018-08-23 Thread Maksim Milyutin

Hi!


I have noticed the novice users are stuck trying to create tablespace 
over a directory whose owner is not the system postgres user. They 
observed the message "could not set permissions on directory ...: 
permission denied".


I want to add patch that prints hint to set required owner for the 
tablespace directory if this is the cause of the problem (*errno == 
EPERM* after calling *chmod*).



--
Regards,
Maksim Milyutin

diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index f7e9160..a8733d4 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -585,10 +585,16 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid)
 	 InRecovery ? errhint("Create this directory for the tablespace before "
 		  "restarting the server.") : 0));
 		else
+		{
+			bool wrong_owner = (errno == EPERM);
+
 			ereport(ERROR,
 	(errcode_for_file_access(),
 	 errmsg("could not set permissions on directory \"%s\": %m",
-			location)));
+			location),
+	 wrong_owner ? errhint("Install the PostgreSQL system user as "
+		   "the owner of this directory.") : 0));
+		}
 	}
 
 	if (InRecovery)


Re: Slow planning time for simple query

2018-06-13 Thread Maksim Milyutin

13.06.2018 12:40, Maksim Milyutin wrote:


On 09.06.2018 22:49, Tom Lane wrote:


Maksim Milyutin  writes:

On hot standby I faced with the similar problem.
...
is planned 4.940 ms on master and *254.741* ms on standby.

(I wonder though why, if you executed the same query on the master,
its setting of the index-entry-is-dead bits didn't propagate to the
standby.)


I have verified the number dead item pointers (through pageinspect 
extension) in the first leaf page of index participating in query 
('main.message_instance_pkey') on master and slave nodes and have 
noticed a big difference.


SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 
3705);


On master:

 blkno | type | live_items | dead_items | avg_item_size | page_size | 
free_size | btpo_prev | btpo_next | btpo | btpo_flags

---+--+++---+---+---+---+---+--+
  3705 | l    |  1 | 58 |    24 |  8192 
|  6496 | 0 |  3719 |    0 | 65


On standby:

 blkno | type | live_items | dead_items | avg_item_size | page_size | 
free_size | btpo_prev | btpo_next | btpo | btpo_flags

---+--+++---+---+---+---+---+--+
  3705 | l    | 59 |  0 |    24 |  8192 
|  6496 | 0 |  3719 |    0 |  1





In this point I want to highlight the issue that the changes in 
*lp_flags* bits (namely, set items as dead) for index item pointers 
doesn't propagate from master to replica in my case. As a consequence, 
on standby I have live index items most of which on master are marked as 
dead. And my queries on planning stage are forced to descent to heap 
pages under *get_actual_variable_range* execution that considerately 
slows down planning.


Is it bug or restriction of implementation or misconfiguration of 
WAL/replication?


--
Regards,
Maksim Milyutin



Re: Slow planning time for simple query

2018-06-13 Thread Maksim Milyutin

On 09.06.2018 22:49, Tom Lane wrote:


Maksim Milyutin  writes:

On hot standby I faced with the similar problem.
...
is planned 4.940 ms on master and *254.741* ms on standby.


(I wonder though why, if you executed the same query on the master,
its setting of the index-entry-is-dead bits didn't propagate to the
standby.)


I have verified the number dead item pointers (through pageinspect 
extension) in the first leaf page of index participating in query 
('main.message_instance_pkey') on master and slave nodes and have 
noticed a big difference.


SELECT * FROM monitoring.bt_page_stats('main.message_instance_pkey', 3705);

On master:

 blkno | type | live_items | dead_items | avg_item_size | page_size | 
free_size | btpo_prev | btpo_next | btpo | btpo_flags

---+--+++---+---+---+---+---+--+
  3705 | l    |  1 | 58 |    24 |  8192 
|  6496 | 0 |  3719 |    0 | 65


On standby:

 blkno | type | live_items | dead_items | avg_item_size | page_size | 
free_size | btpo_prev | btpo_next | btpo | btpo_flags

---+--+++---+---+---+---+---+--+
  3705 | l    | 59 |  0 |    24 |  8192 
|  6496 | 0 |  3719 |    0 | 1



The vacuum routine improves the situation.
Сan there be something that I have incorrectly configured WAL logging or 
replication?



I wonder if we should extend the "SnapshotNonVacuumable" logic introduced
in commit 3ca930fc3 so that in hot standby, *all* index entries are deemed
non vacuumable.  This would essentially get rid of long standby planning
times in this sort of scenario by instead accepting worse (possibly much
worse) planner range estimates.  I'm unsure if that's a good tradeoff or
not.


I applied the patch introduced in this commit to test standby (not 
master; I don't know if this is correct) and haven't noticed any 
differences.


--
Regards,
Maksim Milyutin



Re: [HACKERS] PoC: custom signal handler for extensions

2018-03-05 Thread Maksim Milyutin

Hello David,


On 05.03.2018 18:50, David Steele wrote:

Hello Maksim,

On 1/27/18 2:19 PM, Arthur Zakirov wrote:


Is there actual need in UnregisterCustomProcSignal() within _PG_init()?
An extension registers a handler and then unregister it doing
nothing. It seems useless.

Also process_shared_preload_libraries_in_progress within _PG_fini() will
be false I think. _PG_fini() won't be called though, because
implementation of internal_unload_library() is disabled.

Actually, is there need in UnregisterCustomProcSignal() at all? It
unregisters a handler only in current backend, for actual unergistering
we need to call it everywhere, if I'm not mistaken.

This patch has been in Waiting on Author state for almost three weeks.
Have you had a chance to consider Arthur's suggestions?


Yes, I want to rework my patch to enable setup of custom signals on 
working backend without preload initialization.



Do you know when you'll have an updated patch available?


I want to actuate the work on this patch for the next 12 release. Sorry, 
for now I can not keep up with the current release.


--
Regards,
Maksim Milyutin




Re: [HACKERS] PoC: custom signal handler for extensions

2018-01-22 Thread Maksim Milyutin

Hello!


On 11.01.2018 18:53, Arthur Zakirov wrote:

The relationship between custom signals and
assigned handlers (function addresses) is replicated from postmaster to
child processes including working backends.

I think this happens only if a custom signal registered during initializing 
shared_preload_libraries.
But from RegisterCustomProcSignalHandler()'s implementation I see that you can 
register the signal anytime. Am I wrong?

If so then custom signal handlers may not work as expected.


Yeah, thanks. Added checks on 
process_shared_preload_libraries_in_progress flag.



What is purpose of AssignCustomProcSignalHandler() function? This function can 
erase a handler set by another extension.
For example, extension 1 set handler passing reason PROCSIG_CUSTOM_1, and 
extension 2 set another handler passing reason PROCSIG_CUSTOM_1 too. Here the 
handler of the extension 2 will be purged.


+
+   Assert(reason >= PROCSIG_CUSTOM_1 && reason <= PROCSIG_CUSTOM_N);
+

I think it is not good to use asserts within AssignCustomProcSignalHandler() 
and GetCustomProcSignalHandler(). Because this functions can be executed by an 
external extension, and it may pass a reason outside this boundary. It will be 
better if the reason will be checked in runtime.


I was guided by the consideration that assertions define preconditions 
for input parameter (in our case, procsignal) of functions. Meeting 
these preconditions have to be provided by function callers. Since these 
checks is not expensive it will be good to replace their to ereport calls.


Updated patch is attached in [1].

1. 
https://www.postgresql.org/message-id/37a48ac6-aa14-4e36-5f08-cf8581fe1382%40gmail.com


--
Regards,
Maksim Milyutin



Re: [HACKERS] PoC: custom signal handler for extensions

2018-01-22 Thread Maksim Milyutin

On 12.01.2018 18:51, Teodor Sigaev wrote:

In perspective, this mechanism provides the low-level instrument to 
define remote procedure call on extension side. The simple RPC that 
defines effective userid on remote backend (remote_effective_user 
function) is attached for example.


Suppose, it's useful patch. Some notes:

1) AssignCustomProcSignalHandler is unneeded API function, easy 
replaced by 
GetCustomProcSignalHandler/ReleaseCustomProcSignal/RegisterCustomProcSignalHandler 
functions, but in other hand, it isn't looked as widely used feature 
to reassign custom signal handler.


Agreed. AssignCustomProcSignalHandler is unnecessary. Also I have 
removed GetCustomProcSignalHandler as not see any application of this 
function.


2) Naming RegisterCustomProcSignalHandler and ReleaseCustomProcSignal  
is not self-consistent. Other parts suggest pair Register/Unregister 
or Aquire/Release. Seems, Register/Unregister pair is preferred here.


Thanks for notice. Fixed.

3) Agree that Assert(reason >= PROCSIG_CUSTOM_1 && reason <= 
PROCSIG_CUSTOM_N) should be replaced with ereport. By the way, 
ReleaseCustomProcSignal() is a single place where there isn't a range 
check of reason arg


Oops, I missed this assert check.
I considered assert check in user available routines accepting 
procsignal as a precondition for routine's clients, i.e. violation of 
this check have to involve uncertain behavior. This checks is not 
expensive itself therefore it makes sense to replace their to runtime 
checks via ereport calls.


4) CustomSignalInterrupt() - play with errno and SetLatch() seems 
unneeded here - there isn't call of handler of custom signal, SetLatch 
will be called several lines below


I have noticed that even simple HandleNotifyInterrupt() and 
HandleParallelMessageInterrupt() routines add at least 
SetLatch(MyLatch). I think it makes sense to leave this call in our 
case. Perhaps, I'm wrong. I don't understand entirely the intention of 
SetLatch() in those cases.


5) Using a special memory context for handler call looks questionably, 
I think that complicated handlers could manage memory itself (and will 
do, if they need to store something between calls). So, I prefer to 
remove context.


Fixed. But in this case we have to explicitly reflect in documentation 
the ambiguity of running memory context under signal handler and the 
intention to leave memory management on extension's developer.


6) As I can see, all possible (not only custom) signal handlers could 
perfectly use this API, or, at least, be store in CustomHandlers 
array, which could be renamed to InterruptHandlers, for example. Next, 
custom handler type is possible to make closier to built-in handlers, 
let its signature extends to
void (*ProcSignalHandler_type) (ProcSignalReason reason) as others. It 
will allow to use single handler to support several reasons.


OK, fixed.

7) Suppose, API allows to use different handlers in different 
processes for the same reason, it's could be reason of confusion. I 
suggest to restrict Register/Unregister call only for 
shared_preload_library, ie only during startup.


Yeah, added checks on process_shared_preload_libraries_in_progress flag. 
Thanks for notice.


8) I'd like to see an example of usage this API somewhere in contrib 
in exsting modules. Ideas?


Besides of usage in the extension pg_query_state [1] that provides the 
way to extract query state from running backend, I see the possibility 
with this patch to implement statistical sampling-based profiler for 
plpgsql functions on extension side. If we could get a call stack of 
plpgsql functions on interruptible backend then there are no obstacles 
to organize capturing of stack frames at some intervals over fixed 
period of time defined by arguments in extension's function.



I have attached a new version of patch and updated version of 
remote_effective_user function implementation that demonstrates the 
usage of custom signals API.



1. https://github.com/postgrespro/pg_query_state

--
Regards,
Maksim Milyutin

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b0dd7d1..ae7d007 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -60,12 +60,20 @@ typedef struct
  */
 #define NumProcSignalSlots	(MaxBackends + NUM_AUXPROCTYPES)
 
+#define IsCustomProcSignalReason(reason) \
+	((reason) >= PROCSIG_CUSTOM_1 && (reason) <= PROCSIG_CUSTOM_N)
+
+static bool CustomSignalPendings[NUM_CUSTOM_PROCSIGNALS];
+static ProcSignalHandler_type CustomInterruptHandlers[NUM_CUSTOM_PROCSIGNALS];
+
 static ProcSignalSlot *ProcSignalSlots = NULL;
 static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 
+static void CheckAndSetCustomSignalInterrupts(void);
+
 /*
  * ProcSignalShmemSize
  *		Compute space needed for procsign

Re: [HACKERS] wrong t_bits alignment in pageinspect

2018-01-04 Thread Maksim Milyutin

Hi!


02.01.2018 12:33, Andrey Borodin wrote:

15 дек. 2017 г., в 18:53, Maksim Milyutin <milyuti...@gmail.com> написал(а):

I found out the problem in exposing values of t_bits field from heap_page_items 
function.

Probably, this [0] place contains similar bug too?

[0] 
https://github.com/postgres/postgres/blob/master/contrib/pageinspect/heapfuncs.c#L439


Yes, it's so. Thanks a lot for notice.

Attached a new version of patch with regression test.

--
Regards,
Maksim Milyutin

diff --git a/contrib/pageinspect/expected/page.out 
b/contrib/pageinspect/expected/page.out
index 8e15947a81..97af6f5bd2 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -92,3 +92,20 @@ create table test_part1 partition of test_partitioned for 
values from ( 1 ) to (
 select get_raw_page('test_part1', 0); -- get farther and error about empty 
table
 ERROR:  block number 0 is out of range for relation "test_part1"
 drop table test_partitioned;
+-- check null bitmap alignment for table whose the number of attributes is 
multiple of 8
+create table test8 (f1 int, f2 int, f3 int, f4 int, f5 int, f6 int, f7 int, f8 
int);
+insert into test8(f1, f8) values (x'f1'::int, 0);
+select t_bits, t_data from heap_page_items(get_raw_page('test8', 0));
+  t_bits  |   t_data   
+--+
+ 1001 | \xf100
+(1 row)
+
+select tuple_data_split('test8'::regclass, t_data, t_infomask, t_infomask2, 
t_bits)
+from heap_page_items(get_raw_page('test8', 0));
+  tuple_data_split   
+-
+ {"\\xf100",NULL,NULL,NULL,NULL,NULL,NULL,"\\x"}
+(1 row)
+
+drop table test8;
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 088254453e..99f409846c 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -234,7 +234,7 @@ heap_page_items(PG_FUNCTION_ARGS)
int bits_len;
 
bits_len =
-   ((tuphdr->t_infomask2 & 
HEAP_NATTS_MASK) / 8 + 1) * 8;
+   
BITMAPLEN(HeapTupleHeaderGetNatts(tuphdr)) * BITS_PER_BYTE;
values[11] = CStringGetTextDatum(

 bits_to_text(tuphdr->t_bits, bits_len));
}
@@ -436,12 +436,12 @@ tuple_data_split(PG_FUNCTION_ARGS)
int bits_str_len;
int bits_len;
 
-   bits_len = (t_infomask2 & HEAP_NATTS_MASK) / 8 + 1;
+   bits_len = BITMAPLEN(t_infomask2 & HEAP_NATTS_MASK) * 
BITS_PER_BYTE;
if (!t_bits_str)
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("argument of t_bits is null, 
but it is expected to be null and %d character long",
-   bits_len * 8)));
+   bits_len)));
 
bits_str_len = strlen(t_bits_str);
if ((bits_str_len % 8) != 0)
@@ -449,11 +449,11 @@ tuple_data_split(PG_FUNCTION_ARGS)
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("length of t_bits is not a 
multiple of eight")));
 
-   if (bits_len * 8 != bits_str_len)
+   if (bits_len != bits_str_len)
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("unexpected length of t_bits 
%u, expected %d",
-   bits_str_len, bits_len 
* 8)));
+   bits_str_len, 
bits_len)));
 
/* do the conversion */
t_bits = text_to_bits(t_bits_str, bits_str_len);
diff --git a/contrib/pageinspect/sql/page.sql b/contrib/pageinspect/sql/page.sql
index 493ca9b211..16b4571fbf 100644
--- a/contrib/pageinspect/sql/page.sql
+++ b/contrib/pageinspect/sql/page.sql
@@ -41,3 +41,11 @@ select get_raw_page('test_partitioned', 0); -- error about 
partitioned table
 create table test_part1 partition of test_partitioned for values from ( 1 ) to 
(100);
 select get_raw_page('test_part1', 0); -- get farther and error about empty 
table
 drop table test_partitioned;
+
+-- check null bitmap alignment for table whose the number of attributes is 
multiple of 8
+create table test8 (f1 int, f2 int, f3 int, f4 int, f5 int, f6 int, f7 int, f8 

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-27 Thread Maksim Milyutin

On 27.12.2017 10:44, Craig Ringer wrote:

On 22 December 2017 at 23:19, Maksim Milyutin <milyuti...@gmail.com 
<mailto:milyuti...@gmail.com>> wrote:


Noticing the interest in the calling some routines on the remote
backend through signals, in parallel thread[1] I have proposed the
possibility to define user defined signal handlers in extensions.
There is a patch taken from pg_query_state module.

1.

https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0-64a1-7fd9b8351592%40gmail.com

<https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0-64a1-7fd9b8351592%40gmail.com>


Haven't read the patch, but the idea seems useful if a bit hairy in 
practice. It'd be done on a "use at your own risk, and if it breaks 
you get to keep the pieces" basis, where no guarantees are made by Pg 
about how and when the function is called so it must be utterly 
defensive. The challenge there being that you can't always learn 
enough about the state of the system without doing things that might 
break it, so lots of things you could do would be fragile.


Yes, I agree with your thesis that the state of the system have to be 
checked/rechecked before starting its manipulation in signal handler. 
And the responsibility is down to developer of extension to provide the 
necessary level of defensive. Perhaps, it makes sense to add try-catch 
block around signal handler to interrupt potential errors therefrom.


--
Regards,
Maksim Milyutindefe



Re: PoC: custom signal handler for extensions

2017-12-23 Thread Maksim Milyutin

23.12.17 12:58, legrand legrand wrote:


+1
if this permits to use extension  pg_query_state
<https://github.com/postgrespro/pg_query_state>  , that would be great !



Yes, attached patch is the single critical point. It allows to loose 
pg_query_state from the need to patch postgres core.


--
Regards,
Maksim Milyutin




Re: Using ProcSignal to get memory context stats from a running backend

2017-12-22 Thread Maksim Milyutin

On 22.12.2017 16:56, Craig Ringer wrote:

On 22 December 2017 at 20:50, Maksim Milyutin <milyuti...@gmail.com 
<mailto:milyuti...@gmail.com>> wrote:


On 19.12.2017 16:54, Pavel Stehule wrote:


sorry for small offtopic. Can be used this mechanism for log of
executed plan or full query?



That's a really good idea. I'd love to be able to pg_explain_backend(...)

I left the mechanism as a generic diagnostic signal exactly so that we 
could add other things we wanted to be able to ask backends. I think a 
follow-on patch that adds support for dumping explain-format plans 
would be great, if it's practical to do that while a query's already 
running.


Noticing the interest in the calling some routines on the remote backend 
through signals, in parallel thread[1] I have proposed the possibility 
to define user defined signal handlers in extensions. There is a patch 
taken from pg_query_state module.


1. 
https://www.postgresql.org/message-id/3f905f10-cf7a-d4e0-64a1-7fd9b8351592%40gmail.com


--
Regards,
Maksim Milyutin



Re: Using ProcSignal to get memory context stats from a running backend

2017-12-22 Thread Maksim Milyutin

On 19.12.2017 16:54, Pavel Stehule wrote:


Hi

2017-12-19 14:44 GMT+01:00 Craig Ringer <cr...@2ndquadrant.com 
<mailto:cr...@2ndquadrant.com>>:


On 18 December 2017 at 10:05, Robert Haas <robertmh...@gmail.com
<mailto:robertmh...@gmail.com>> wrote:

On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer
<cr...@2ndquadrant.com <mailto:cr...@2ndquadrant.com>> wrote:
> On 15 December 2017 at 09:24, Greg Stark <st...@mit.edu
<mailto:st...@mit.edu>> wrote:
>> Another simpler option would be to open up a new file in
the log
>> directory
>
> ... if we have one.
>
> We might be logging to syslog, or whatever else.
>
> I'd rather keep it simple(ish).

+1.  I still think just printing it to the log is fine.


Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new
ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by
CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to
elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook
global. It's a diagnostic routine so I don't think that's going to
be a great bother. By default it's set to write_stderr. That just
writes to vfprintf on unix so the outcome's the same as our direct
use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to
write memory context dumps to the eventlog with ERROR level if
running as a service with no console. That means we vsnprintf the
string into a static buffer first. I'm not 100% convinced of the
wisdom of that because it could flood the event log, which is
usually size limited by # of events and recycled. It'd be better
to write one event than write one per memory context line, but
it's not safe to do that when OOM. I lean toward this being a win:
at least Windows users should have some hope of finding out why Pg
OOM'd, which currently they don't when it runs as a service. If we
don't, we should look at using SetStdHandle to write stderr to a
secondary logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve
exactly the same behaviour instead, but I figured I'd start with
reusing write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf,
because on some systems simply finding where stderr is written can
be a challenge, if it's not going to /dev/null via some detached
session. Is it in journald? In some separate log? Being captured
by the logging collector (and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info
to associate the dump with the process of interest and what it's
doing at the time.

Also, an astonishing number of deployed systems I've worked with
actually don't put the pid or anything useful in log_line_prefix
to make grouping up multi-line output practical. Which is insane.
But it's only PGC_SIGHUP so fixing it is easy enough.


sorry for small offtopic. Can be used this mechanism for log of 
executed plan or full query?


This idea (but without logging) is implemented in the work on 
pg_progress command proposed by Remi Colinet[1] and in extension 
pg_query_state[2].


1. 
https://www.postgresql.org/message-id/CADdR5nxQUSh5kCm9MKmNga8%2Bc1JLxLHDzLhAyXpfo9-Wmc6s5g%40mail.gmail.com

2. https://github.com/postgrespro/pg_query_state

--
Regards,
Maksim Milyutin



[HACKERS] PoC: custom signal handler for extensions

2017-12-22 Thread Maksim Milyutin

Hi, hackers!


I want to propose the patch that allows to define custom signals and 
their handlers on extension side. It is based on ProcSignal module, 
namely it defines the fixed set (number is specified by constant) of 
custom signals that could be reserved on postgres initialization stage 
(in _PG_init function of shared preloaded module) through specific 
interface functions. Functions that are custom signal handlers are 
defined in extension. The relationship between custom signals and 
assigned handlers (function addresses) is replicated from postmaster to 
child processes including working backends. Using this signals we are 
able to call specific handler (via SendProcSignal function) on remote 
backend that is actually come into action in CHECK_FOR_INTERRUPTS point.


In perspective, this mechanism provides the low-level instrument to 
define remote procedure call on extension side. The simple RPC that 
defines effective userid on remote backend (remote_effective_user 
function) is attached for example.


C welcome!


--
Regards,
Maksim Milyutin

diff --git a/src/backend/storage/ipc/procsignal.c 
b/src/backend/storage/ipc/procsignal.c
index b9302ac630..75d4ea72b7 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -27,6 +27,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/memutils.h"
 
 
 /*
@@ -60,12 +61,17 @@ typedef struct
  */
 #define NumProcSignalSlots (MaxBackends + NUM_AUXPROCTYPES)
 
+static bool CustomSignalPendings[NUM_CUSTOM_PROCSIGNALS];
+static ProcSignalHandler_type CustomHandlers[NUM_CUSTOM_PROCSIGNALS];
+
 static ProcSignalSlot *ProcSignalSlots = NULL;
 static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 
+static void CustomSignalInterrupt(ProcSignalReason reason);
+
 /*
  * ProcSignalShmemSize
  * Compute space needed for procsignal's shared memory
@@ -166,6 +172,70 @@ CleanupProcSignalState(int status, Datum arg)
 }
 
 /*
+ * RegisterCustomProcSignalHandler
+ * Assign specific handler of custom process signal with new 
ProcSignalReason key
+ *
+ * Return INVALID_PROCSIGNAL if all custom signals have been assigned.
+ */
+ProcSignalReason
+RegisterCustomProcSignalHandler(ProcSignalHandler_type handler)
+{
+   ProcSignalReason reason;
+
+   /* iterate through custom signal keys to find free spot */
+   for (reason = PROCSIG_CUSTOM_1; reason <= PROCSIG_CUSTOM_N; reason++)
+   if (!CustomHandlers[reason - PROCSIG_CUSTOM_1])
+   {
+   CustomHandlers[reason - PROCSIG_CUSTOM_1] = handler;
+   return reason;
+   }
+   return INVALID_PROCSIGNAL;
+}
+
+/*
+ * ReleaseCustomProcSignal
+ *  Release slot of specific custom signal
+ */
+void
+ReleaseCustomProcSignal(ProcSignalReason reason)
+{
+   CustomHandlers[reason - PROCSIG_CUSTOM_1] = NULL;
+}
+
+/*
+ * AssignCustomProcSignalHandler
+ * Assign handler of custom process signal with specific 
ProcSignalReason key
+ *
+ * Return old ProcSignal handler.
+ * Assume incoming reason is one of custom ProcSignals.
+ */
+ProcSignalHandler_type
+AssignCustomProcSignalHandler(ProcSignalReason reason, ProcSignalHandler_type 
handler)
+{
+   ProcSignalHandler_type old;
+
+   Assert(reason >= PROCSIG_CUSTOM_1 && reason <= PROCSIG_CUSTOM_N);
+
+   old = CustomHandlers[reason - PROCSIG_CUSTOM_1];
+   CustomHandlers[reason - PROCSIG_CUSTOM_1] = handler;
+   return old;
+}
+
+/*
+ * GetCustomProcSignalHandler
+ * Get handler of custom process signal
+ *
+ * Assume incoming reason is one of custom ProcSignals.
+ */
+ProcSignalHandler_type
+GetCustomProcSignalHandler(ProcSignalReason reason)
+{
+   Assert(reason >= PROCSIG_CUSTOM_1 && reason <= PROCSIG_CUSTOM_N);
+
+   return CustomHandlers[reason - PROCSIG_CUSTOM_1];
+}
+
+/*
  * SendProcSignal
  * Send a signal to a Postgres process
  *
@@ -260,7 +330,8 @@ CheckProcSignal(ProcSignalReason reason)
 void
 procsignal_sigusr1_handler(SIGNAL_ARGS)
 {
-   int save_errno = errno;
+   int save_errno = errno;
+   ProcSignalReasonreason;
 
if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
HandleCatchupInterrupt();
@@ -292,9 +363,84 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+   for (reason = PROCSIG_CUSTOM_1; reason <= PROCSIG_CUSTOM_N; reason++)
+   if (CheckProcSignal(reason))
+   CustomSignalInterrupt(reason);
+
SetLatch(MyLatch);
 

[HACKERS] wrong t_bits alignment in pageinspect

2017-12-15 Thread Maksim Milyutin

Hi!


I found out the problem in exposing values of t_bits field from 
heap_page_items function.


When the number of attributes in table is multiple of eight, t_bits 
column shows double number of bits in which data fields are included.


For example:

# create table tbl(f1 int, f2 int, f3 int, f4 int, f5 int, f6 int, f7 
int, f8 int);

# insert into tbl(f1, f8) values (x'f1'::int, 0);

# select t_bits, t_data from heap_page_items(get_raw_page('tbl', 0));

   t_bits  |   t_data
   --+
 10011000 | \xf100

I suppose the prefix 1001 corresponds to real value of t_bits, the 
rest part 1000 - to the lower byte of f1 field of tbl.



Attached patch fixes this issue.


--
Regards,
Maksim Milyutin

diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index ca4d3f5..fe53a1a 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -234,7 +234,7 @@ heap_page_items(PG_FUNCTION_ARGS)
 	int			bits_len;
 
 	bits_len =
-		((tuphdr->t_infomask2 & HEAP_NATTS_MASK) / 8 + 1) * 8;
+		TYPEALIGN(8, (tuphdr->t_infomask2 & HEAP_NATTS_MASK));
 	values[11] = CStringGetTextDatum(
 	 bits_to_text(tuphdr->t_bits, bits_len));
 }


Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-12 Thread Maksim Milyutin

Hi, Fujita-san!


On 24.11.2017 16:03, Etsuro Fujita wrote:

(2017/10/27 20:00), Etsuro Fujita wrote:

Please find attached an updated version of the patch.


Amit rebased this patch and sent me the rebased version off list. 
Thanks for that, Amit!


One thing I noticed I overlooked is about this change I added to 
make_modifytable to make a valid-looking plan node to pass to 
PlanForeignModify to plan remote insert to each foreign partition:


+   /*
+    * The column list of the child might have a different 
column
+    * order and/or a different set of dropped columns 
than that

+    * of its parent, so adjust the subplan's tlist as well.
+    */
+   tlist = preprocess_targetlist(root,
+ child_parse->targetList);

This would be needed because the FDW might reference the tlist. Since 
preprocess_targetlist references root->parse, it's needed to replace 
that with the child query before calling that function, but I forgot 
to do that.  So I fixed that.  Attached is an updated version of the 
patch.


Your patch already is not applied on master. Please rebase it.

--
Regards,
Maksim Milyutin