Re: wrong comment in libpq.h

2024-05-10 Thread David Zhang



It looks like this wording "prototypes for functions in" is used many 
times in src/include/, in many cases equally inaccurately, so I would 
suggest creating a more comprehensive patch for this.


I noticed this "prototypes for functions in" in many header files and 
briefly checked them. It kind of all make sense except the bufmgr.h has 
something like below.


/* in buf_init.c */
extern void InitBufferPool(void);
extern Size BufferShmemSize(void);

/* in localbuf.c */
extern void AtProcExit_LocalBuffers(void);

/* in freelist.c */

which doesn't say "prototypes for functions xxx", but it still make 
sense for me.



The main confusion part is in libpq.h.

/*
 * prototypes for functions in be-secure.c
 */
extern PGDLLIMPORT char *ssl_library;
extern PGDLLIMPORT char *ssl_cert_file;
extern PGDLLIMPORT char *ssl_key_file;
extern PGDLLIMPORT char *ssl_ca_file;
extern PGDLLIMPORT char *ssl_crl_file;
extern PGDLLIMPORT char *ssl_crl_dir;
extern PGDLLIMPORT char *ssl_dh_params_file;
extern PGDLLIMPORT char *ssl_passphrase_command;
extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload;
#ifdef USE_SSL
extern PGDLLIMPORT bool ssl_loaded_verify_locations;
#endif

If we can delete the comment and move the variables declarations to /* 
GUCs */ section, then it should be more consistent.


/* GUCs */
extern PGDLLIMPORT char *SSLCipherSuites;
extern PGDLLIMPORT char *SSLECDHCurve;
extern PGDLLIMPORT bool SSLPreferServerCiphers;
extern PGDLLIMPORT int ssl_min_protocol_version;
extern PGDLLIMPORT int ssl_max_protocol_version;

One more argument for my previous patch is that with this minor change 
it can better align with the parameters in postgresql.conf.


# - SSL -

#ssl = off
#ssl_ca_file = ''
#ssl_cert_file = 'server.crt'
#ssl_crl_file = ''
#ssl_crl_dir = ''
#ssl_key_file = 'server.key'
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'    # allowed SSL ciphers
#ssl_prefer_server_ciphers = on
#ssl_ecdh_curve = 'prime256v1'
#ssl_min_protocol_version = 'TLSv1.2'
#ssl_max_protocol_version = ''
#ssl_dh_params_file = ''
#ssl_passphrase_command = ''
#ssl_passphrase_command_supports_reload = off


best regards,

David






Re: wrong comment in libpq.h

2024-05-03 Thread David Zhang

On 2024-05-03 4:53 a.m., Daniel Gustafsson wrote:

On 3 May 2024, at 13:48, Peter Eisentraut  wrote:
Maybe it's easier if we just replaced

prototypes for functions in xxx.c

with

declarations for xxx.c

throughout src/include/libpq/libpq.h.

+1

+1


--
Daniel Gustafsson


David




wrong comment in libpq.h

2024-05-02 Thread David Zhang

Hi Hackers,

There is a comment like below in src/include/libpq/libpq.h,

 /*
  * prototypes for functions in be-secure.c
  */
extern PGDLLIMPORT char *ssl_library;
extern PGDLLIMPORT char *ssl_cert_file;

...

However, 'ssl_library', 'ssl_cert_file' and the rest are global 
parameter settings, not functions. To address this confusion, it would 
be better to move all global configuration settings to the proper 
section, such as /* GUCs */, to maintain consistency.


I have attached an attempt to help address this issue.


Thank you,

David
From 9154102c34ec0c4c956d25942b8ea600dd740a07 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Thu, 2 May 2024 15:15:13 -0700
Subject: [PATCH] fix the wrong comment to keep the consistency

---
 src/include/libpq/libpq.h | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 83e338f604..bece50b69c 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -86,19 +86,6 @@ extern bool pq_check_connection(void);
 /*
  * prototypes for functions in be-secure.c
  */
-extern PGDLLIMPORT char *ssl_library;
-extern PGDLLIMPORT char *ssl_cert_file;
-extern PGDLLIMPORT char *ssl_key_file;
-extern PGDLLIMPORT char *ssl_ca_file;
-extern PGDLLIMPORT char *ssl_crl_file;
-extern PGDLLIMPORT char *ssl_crl_dir;
-extern PGDLLIMPORT char *ssl_dh_params_file;
-extern PGDLLIMPORT char *ssl_passphrase_command;
-extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload;
-#ifdef USE_SSL
-extern PGDLLIMPORT bool ssl_loaded_verify_locations;
-#endif
-
 extern int secure_initialize(bool isServerStart);
 extern bool secure_loaded_verify_locations(void);
 extern void secure_destroy(void);
@@ -117,6 +104,19 @@ extern ssize_t secure_open_gssapi(Port *port);
 #endif
 
 /* GUCs */
+extern PGDLLIMPORT char *ssl_library;
+extern PGDLLIMPORT char *ssl_cert_file;
+extern PGDLLIMPORT char *ssl_key_file;
+extern PGDLLIMPORT char *ssl_ca_file;
+extern PGDLLIMPORT char *ssl_crl_file;
+extern PGDLLIMPORT char *ssl_crl_dir;
+extern PGDLLIMPORT char *ssl_dh_params_file;
+extern PGDLLIMPORT char *ssl_passphrase_command;
+extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload;
+#ifdef USE_SSL
+extern PGDLLIMPORT bool ssl_loaded_verify_locations;
+#endif
+
 extern PGDLLIMPORT char *SSLCipherSuites;
 extern PGDLLIMPORT char *SSLECDHCurve;
 extern PGDLLIMPORT bool SSLPreferServerCiphers;
-- 
2.34.1



Re: enhance the efficiency of migrating particularly large tables

2024-05-02 Thread David Zhang

Thanks a lot David Rowley for your suggestion in details.

On 2024-04-08 3:23 p.m., David Rowley wrote:

On Tue, 9 Apr 2024 at 09:52, David Zhang  wrote:
Finding the exact ctid seems overkill for what you need.  Why you
could just find the maximum block with:

N = pg_relation_size('name_of_your_table'::regclass) /
current_Setting('block_size')::int;

and do WHERE ctid < '(N,1)';
We experienced this approach using pg_relation_size and tried to compare 
the performance. Below are some simple timing results for 100 million 
records in a table:


Using system function max():
SELECT max(ctid) from t;
Time: 2126.680 ms (00:02.127)

Using pg_relation_size and where condition:
SELECT pg_relation_size('t'::regclass) / current_setting('block_size')::int;
Time: 0.561 ms

Using the experimental function introduced in previous patch:
SELECT ctid from get_ctid('t', 1);
Time: 0.452 ms


Delete about 1/3 records from the end of the table:
SELECT max(ctid) from t;
Time: 1552.975 ms (00:01.553)

SELECT pg_relation_size('t'::regclass) / current_setting('block_size')::int;
Time: 0.533 m
But before vacuum, pg_relation_size always return the same value as 
before and this relation_size may not be so accurate.


SELECT ctid from get_ctid('t', 1);
Time: 251.105 m

After vacuum:
SELECT ctid from get_ctid('t', 1);
Time: 0.478 ms


Below are the comparison between system function min() and the 
experimental function:


SELECT min(ctid) from t;
Time: 1932.554 ms (00:01.933)

SELECT ctid from get_ctid('t', 0);
Time: 0.478 ms

After deleted about 1/3 records from the beginning of the table:
SELECT min(ctid) from t;
Time: 1305.799 ms (00:01.306)

SELECT ctid from get_ctid('t', 0);
Time: 244.336 ms

After vacuum:
SELECT ctid from get_ctid('t', 0);
Time: 0.468 ms


If we wanted to optimise this in PostgreSQL, the way to do it would
be, around set_plain_rel_pathlist(), check if the relation's ctid is a
required PathKey by the same means as create_index_paths() does, then
if found, create another seqscan path without synchronize_seqscans *
and tag that with the ctid PathKey sending the scan direction
according to the PathKey direction. nulls_first does not matter since
ctid cannot be NULL.

Min(ctid) query should be able to make use of this as the planner
should rewrite those to subqueries with a ORDER BY ctid LIMIT 1.


Is there a simple way to get the min of ctid faster than using min(), 
but similar to get the max of ctid using pg_relation_size?



Thank you,

David Zhang


enhance the efficiency of migrating particularly large tables

2024-04-08 Thread David Zhang

Hi Postgres hackers,

I'm reaching out to gather some comments on enhancing the efficiency of 
migrating particularly large tables with significant data volumes in 
PostgreSQL.


When migrating a particularly large table with a significant amount of 
data, users sometimes tend to split the table into multiple segments and 
utilize multiple sessions to process data from different segments in 
parallel, aiming to enhance efficiency. When segmenting a large table, 
it's challenging if the table lacks fields suitable for segmentation or 
if the data distribution is uneven. I believe that the data volume in 
each block should be relatively balanced when vacuum is enabled. 
Therefore, the ctid can be used to segment a large table, and I am 
thinking the entire process can be outlined as follows:

1) determine the minimum and maximum ctid.
2) calculate the number of data blocks based on the maximum and minimum 
ctid.
3) generate multiple SQL queries, such as SELECT * FROM tbl WHERE ctid 
>= '(xx,1)' AND ctid < '(xxx,1)'.


However, when executing SELECT min(ctid) and max(ctid), it performs a 
Seq Scan, which can be slow for a large table. Is there a way to 
retrieve the minimum and maximum ctid other than using the system 
functions min() and max()?


Since the minimum and maximum ctid are in order, theoretically, it 
should start searching from the first block and can stop as soon as it 
finds the first available one when retrieving the minimum ctid. 
Similarly, it should start searching in reverse order from the last 
block and stop upon finding the first occurrence when retrieving the 
maximum ctid. Here's a piece of code snippet:


    /* scan the relation for minimum or maximum ctid */
    if (find_max_ctid)
        dir = BackwardScanDirection;
    else
        dir = ForwardScanDirection;

    while ((tuple = heap_getnext(scan, dir)) != NULL)
    ...

The attached is a simple POC by referring to the extension pgstattuple. 
Any feedback, suggestions, or alternative solutions from the community 
would be greatly appreciated.


Thank you,

David

#include "postgres.h"
#include "fmgr.h"

#include "funcapi.h"
#include "miscadmin.h"
#include "nodes/primnodes.h"
#include "utils/relcache.h"
#include "catalog/namespace.h"
#include "catalog/pg_am_d.h"
#include "utils/varlena.h"
#include "storage/bufmgr.h"
#include "utils/snapmgr.h"
#include "access/heapam.h"
#include "access/relscan.h"
#include "access/tableam.h"


PG_MODULE_MAGIC;
PG_FUNCTION_INFO_V1(get_ctid);


Datum
get_ctid(PG_FUNCTION_ARGS)
{
boolfind_max_ctid = 0;
text*relname;
RangeVar*relrv;
Relationrel;
charctid[32] = {0};

if (PG_ARGISNULL(0) || PG_ARGISNULL(1))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("two parameters are required")));

relname = PG_GETARG_TEXT_PP(0);
find_max_ctid = PG_GETARG_UINT32(1);

/* open relation */
relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
rel = relation_openrv(relrv, AccessShareLock);

if (RELATION_IS_OTHER_TEMP(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot access temporary tables of 
other sessions")));

if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind) ||
rel->rd_rel->relkind == RELKIND_SEQUENCE)
{
TableScanDesc scan;
HeapScanDesc hscan;
HeapTuple   tuple;
SnapshotData SnapshotDirty;
BlockNumber blockNumber;
OffsetNumber offsetNumber;
ScanDirection dir;

if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("only heap AM is supported")));

/* Disable syncscan because we assume we scan from block zero 
upwards */
scan = table_beginscan_strat(rel, SnapshotAny, 0, NULL, true, 
false);
hscan = (HeapScanDesc) scan;

InitDirtySnapshot(SnapshotDirty);

/* scan the relation for minimum or maximum ctid */
if (find_max_ctid)
dir = BackwardScanDirection;
else
dir = ForwardScanDirection;

while ((tuple = heap_getnext(scan, dir)) != NULL)
{
CHECK_FOR_INTERRUPTS();

/* must hold a buffer lock to call 
HeapTupleSatisfiesVisibility */
LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE);

if (HeapTupleSatisfiesVisibility(tuple, , 
hscan->rs_cbuf))

improve ssl error code, 2147483650

2024-03-06 Thread David Zhang

Hi Hackers,

When configuring SSL on the Postgres server side with the following 
information:


ssl = on
ssl_ca_file = 'root_ca.crt'
ssl_cert_file = 'server-cn-only.crt'
ssl_key_file = 'server-cn-only.key'

If a user makes a mistake, for example, accidentally using 'root_ca.crl' 
instead of 'root_ca.crt', Postgres will report an error like the one below:
FATAL:  could not load root certificate file "root_ca.crl": SSL error 
code 2147483650


Here, the error code 2147483650 is not very helpful for the user. The 
reason is that Postgres is missing the initial SSL file check before 
passing it to the OpenSSL API. For example:


    if (ssl_ca_file[0])
    {
    STACK_OF(X509_NAME) * root_cert_list;

    if (SSL_CTX_load_verify_locations(context, ssl_ca_file, NULL) 
!= 1 ||
        (root_cert_list = SSL_load_client_CA_file(ssl_ca_file)) == 
NULL)

    {
        ereport(isServerStart ? FATAL : LOG,
                (errcode(ERRCODE_CONFIG_FILE_ERROR),
                 errmsg("could not load root certificate file 
\"%s\": %s",

                        ssl_ca_file, SSLerrmessage(ERR_get_error();
        goto error;
    }

The SSL_CTX_load_verify_locations function in OpenSSL will return NULL 
if there is a system error, such as "No such file or directory" in this 
case:


const char *ERR_reason_error_string(unsigned long e)
{
    ERR_STRING_DATA d, *p = NULL;
    unsigned long l, r;

    if (!RUN_ONCE(_string_init, do_err_strings_init)) {
    return NULL;
    }

    /*
 * ERR_reason_error_string() can't safely return system error strings,
 * since openssl_strerror_r() needs a buffer for thread safety, and we
 * haven't got one that would serve any sensible purpose.
 */
    if (ERR_SYSTEM_ERROR(e))
    return NULL;

It would be better to perform a simple SSL file check before passing the 
SSL file to OpenSSL APIs so that the system error can be captured and a 
meaningful message provided to the end user.


Attached is a simple patch to help address this issue for ssl_ca_file, 
ssl_cert_file, and ssl_crl_file. With this patch, a similar test will 
return something like the message below:
FATAL:  could not access certificate file "root_ca.crl": No such file or 
directory


I believe this can help end users quickly realize the mistake.


Thank you,
David
From bb6264791aab6a3e8150704fc8f1c8774c27018d Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Wed, 6 Mar 2024 15:51:11 -0800
Subject: [PATCH] improve ssl files error code

---
 src/backend/libpq/be-secure-common.c  | 19 +++
 src/backend/libpq/be-secure-openssl.c | 10 ++
 src/include/libpq/libpq.h |  1 +
 3 files changed, 30 insertions(+)

diff --git a/src/backend/libpq/be-secure-common.c 
b/src/backend/libpq/be-secure-common.c
index 0582606192..01d567cbfc 100644
--- a/src/backend/libpq/be-secure-common.c
+++ b/src/backend/libpq/be-secure-common.c
@@ -102,7 +102,26 @@ error:
return len;
 }
 
+/*
+ * Check SSL certificate files.
+ */
+bool
+check_ssl_file(const char *ssl_file, bool isServerStart)
+{
+   int loglevel = isServerStart ? FATAL : LOG;
+   struct stat buf;
+
+   if (stat(ssl_file, ) != 0)
+   {
+   ereport(loglevel,
+   (errcode_for_file_access(),
+errmsg("could not access certificate file 
\"%s\": %m",
+   ssl_file)));
+   return false;
+   }
 
+   return true;
+}
 /*
  * Check permissions for SSL key files.
  */
diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index e12b1cc9e3..c5d58545d9 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -144,6 +144,10 @@ be_tls_init(bool isServerStart)
/*
 * Load and verify server's certificate and private key
 */
+
+   if (!check_ssl_file(ssl_cert_file, isServerStart))
+   goto error;
+
if (SSL_CTX_use_certificate_chain_file(context, ssl_cert_file) != 1)
{
ereport(isServerStart ? FATAL : LOG,
@@ -297,6 +301,9 @@ be_tls_init(bool isServerStart)
{
STACK_OF(X509_NAME) * root_cert_list;
 
+   if (!check_ssl_file(ssl_ca_file, isServerStart))
+   goto error;
+
if (SSL_CTX_load_verify_locations(context, ssl_ca_file, NULL) 
!= 1 ||
(root_cert_list = SSL_load_client_CA_file(ssl_ca_file)) 
== NULL)
{
@@ -336,6 +343,9 @@ be_tls_init(bool isServerStart)
{
X509_STORE *cvstore = SSL_CTX_get_cert_store(context);
 
+   if (ssl_crl_file[0] && !check_ssl_file(ssl_crl_file, 
isServerStart))
+   goto error;
+
if (cvstore)
  

Re: Proposal for implementing OCSP Stapling in PostgreSQL

2024-03-05 Thread David Zhang

Hi Hackers,

This is the third version patch for "Certificate status check using OCSP 
Stapling" with ssl regression test cases added.


Here is how I run the ssl regression test:
    ./configure --enable-tap-tests --with-openssl
    make -j
    cd src/test/ssl
    make sslfiles
    make check PG_TEST_EXTRA=ssl

expected results:
    # +++ tap check in src/test/ssl +++
    t/001_ssltests.pl .. ok
    t/002_scram.pl . ok
    t/003_sslinfo.pl ... ok
    All tests successful.
    Files=3, Tests=279, 17 wallclock secs ( 0.05 usr  0.01 sys + 2.32 
cusr  2.16 csys =  4.54 CPU)


    Result: PASS

Notes, before executing the SSL regression tests with the command `make 
check PG_TEST_EXTRA=ssl`, it is necessary to wait for 1 minute after 
running `make sslfiles`. This delay is required because the newly 
generated OCSP responses for the 'expired' test cases need 1 minute to 
pass the nextUpdate period. Once the stapled OCSP response files for the 
tests are committed as test input, there is no need to wait, similar to 
certificate files.


Any comments or feedback would be greatly appreciated!

Thank you,

DavidFrom 99fc46ed0bf05eedbe7539890d946db472617150 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Tue, 5 Mar 2024 15:31:22 -0800
Subject: [PATCH 1/3] support certificate status check using OCSP stapling

---
 src/backend/libpq/be-secure-openssl.c |  87 
 src/backend/libpq/be-secure.c |   1 +
 src/backend/utils/misc/guc_tables.c   |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/libpq/libpq.h |   1 +
 src/interfaces/libpq/fe-connect.c |  37 
 src/interfaces/libpq/fe-secure-openssl.c  | 198 ++
 src/interfaces/libpq/libpq-int.h  |   1 +
 8 files changed, 336 insertions(+)

diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index e12b1cc9e3..c727634dfa 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -50,6 +50,7 @@
 #include 
 #endif
 #include 
+#include 
 
 
 /* default init hook can be overridden by a shared library */
@@ -81,6 +82,8 @@ static bool ssl_is_server_start;
 static int ssl_protocol_version_to_openssl(int v);
 static const char *ssl_protocol_version_to_string(int v);
 
+static int ocsp_stapling_cb(SSL *ssl);
+
 /* for passing data back from verify_cb() */
 static const char *cert_errdetail;
 
@@ -429,6 +432,9 @@ be_tls_open_server(Port *port)
return -1;
}
 
+   /* set up OCSP stapling callback */
+   SSL_CTX_set_tlsext_status_cb(SSL_context, ocsp_stapling_cb);
+
/* set up debugging/info callback */
SSL_CTX_set_info_callback(SSL_context, info_cb);
 
@@ -1653,3 +1659,84 @@ default_openssl_tls_init(SSL_CTX *context, bool 
isServerStart)
SSL_CTX_set_default_passwd_cb(context, 
dummy_ssl_passwd_cb);
}
 }
+
+/*
+ * OCSP stapling callback function for the server side.
+ *
+ * This function is responsible for providing the OCSP stapling response to
+ * the client during the SSL/TLS handshake, based on the client's request.
+ *
+ * Parameters:
+ *   - ssl: SSL/TLS connection object.
+ *
+ * Returns:
+ *   - SSL_TLSEXT_ERR_OK: OCSP stapling response successfully provided.
+ *   - SSL_TLSEXT_ERR_NOACK: OCSP stapling response not provided due to errors.
+ *
+ * Steps:
+ *   1. Check if the server-side OCSP stapling feature is enabled.
+ *   2. Read OCSP response from file if client requested OCSP stapling.
+ *   3. Set the OCSP stapling response in the SSL/TLS connection.
+ */
+static int ocsp_stapling_cb(SSL *ssl)
+{
+   int resp_len = -1;
+   BIO *bio = NULL;
+   OCSP_RESPONSE   *resp = NULL;
+   unsigned char   *rspder = NULL;
+
+   /* return, if ssl_ocsp_file not enabled on server */
+   if (ssl_ocsp_file == NULL)
+   {
+   ereport(WARNING,
+   (errcode(ERRCODE_CONFIG_FILE_ERROR),
+errmsg("could not find ssl_ocsp_file")));
+   return SSL_TLSEXT_ERR_NOACK;
+   }
+
+   /* whether the client requested OCSP stapling */
+   if (SSL_get_tlsext_status_type(ssl) == TLSEXT_STATUSTYPE_ocsp)
+   {
+   bio = BIO_new_file(ssl_ocsp_file, "r");
+   if (bio == NULL)
+   {
+   ereport(WARNING,
+   (errcode(ERRCODE_CONFIG_FILE_ERROR),
+errmsg("could not read 
ssl_ocsp_file")));
+   return SSL_TLSEXT_ERR_NOACK;
+   }
+
+   resp = d2i_OCSP_RESPONSE_bio(bio, NULL);
+   BIO_free(bio);
+   if (resp == NULL)
+   {
+   ereport(WARNING,
+  

Wrong description in server_ca.config and client_ca.config

2024-02-27 Thread David Zhang

Hi Hackers,

The current descriptions for server_ca.config and client_ca.config are 
not so accurate. For example, one of the descriptions in 
server_ca.config states, "This certificate is used to sign server 
certificates. It is self-signed." However, the server_ca.crt and 
client_ca.crt are actually signed by the root_ca.crt, which is the only 
self-signed certificate. Therefore, it would be more accurate to change 
it to "This certificate is used to sign server certificates. It is an 
Intermediate CA."


Attached is a patch attempting to fix the description issue.

Best regards,

David
From ddc07447152331c09daecf0202178cfe77a817a9 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Tue, 27 Feb 2024 10:06:18 -0800
Subject: [PATCH] correct description for server_ca and client_ca

---
 src/test/ssl/conf/client_ca.config | 8 +---
 src/test/ssl/conf/server_ca.config | 8 +---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/test/ssl/conf/client_ca.config 
b/src/test/ssl/conf/client_ca.config
index 5990f06000..08365aac95 100644
--- a/src/test/ssl/conf/client_ca.config
+++ b/src/test/ssl/conf/client_ca.config
@@ -1,7 +1,9 @@
-# An OpenSSL format CSR config file for creating the client root certificate.
-# This configuration file is also used when operating the CA.
+# An OpenSSL format CSR config file for creating the client Intermediate
+# Certificate Authority. This configuration file is also used when operating
+# the CA.
 #
-# This certificate is used to sign client certificates. It is self-signed.
+# This certificate is used to sign client certificates. It is an Intermediate
+# CA.
 
 [ req ]
 distinguished_name = req_distinguished_name
diff --git a/src/test/ssl/conf/server_ca.config 
b/src/test/ssl/conf/server_ca.config
index 496aaba29f..15f8d1590f 100644
--- a/src/test/ssl/conf/server_ca.config
+++ b/src/test/ssl/conf/server_ca.config
@@ -1,7 +1,9 @@
-# An OpenSSL format CSR config file for creating the server root certificate.
-# This configuration file is also used when operating the CA.
+# An OpenSSL format CSR config file for creating the server Intermediate
+# Certificate Authority. This configuration file is also used when operating
+# the CA.
 #
-# This certificate is used to sign server certificates. It is self-signed.
+# This certificate is used to sign server certificates. It is an Intermediate
+# CA.
 
 [ req ]
 distinguished_name = req_distinguished_name
-- 
2.34.1



Re: Proposal for implementing OCSP Stapling in PostgreSQL

2024-02-23 Thread David Zhang

Hi Hackers,

This is the 2nd version patch with following updates:

1) Changed the frontend SSL parameter from `ssl_ocsp_stapling` to 
`sslocspstapling` to align with other SSL parameters.


2) Documented both the backend parameter `ssl_ocsp_file` and the 
frontend parameter `sslocspstapling`.


3) Implemented a check for the `nextUpdate` field in the OCSP response. 
If it is present but expired, the TLS connection will fail."


To add the test cases for OCSP Stapling, I think I should 1) add one 
section to conf/cas.config to generate `ocsp_ca.crt`; 2) use this 
`ocsp_ca` to sign some OCSP responses for server side certificates with 
`good`, `revoked` and `unknown`, and then 3) add some test cases to 
t/001_ssltests.pl.


Any comments or feedback would be greatly appreciated!

Thank you,

David


On 2024-02-05 3:51 p.m., David Zhang wrote:

Hello PostgreSQL Hackers,

This proposal suggests implementing OCSP Stapling in PostgreSQL as an 
alternative and more efficient method for checking certificate 
revocation, aligning with the trend shift from Certificate Revocation 
Lists (CRL).



1. benefits
OCSP Stapling offers several advantages over traditional CRL checks, 
including:


*) enhances user trust and real-time certificate verification without 
relying on potentially outdated CRLs.
*) helps address privacy concerns associated with traditional OCSP 
checks, where the client contacts the OCSP responder directly.
*) reduces latency by eliminating the need for the client to perform 
an additional round-trip to the OCSP responder.
*) efficient resource utilization by allowing the server to cache and 
reuse OCSP responses.




2. a POC patch with below changes:
*) a new configuration option 'ssl_ocsp_file' to enable/disable OCSP 
Stapling and specify OCSP responses for PostgreSQL servers. For 
instance, ssl_ocsp_file = '_server.resp'


*) a server-side callback function responsible for generating OCSP 
stapling responses. This function comes into play only when a client 
requests the server's certificate status during the SSL/TLS handshake.


*) a new connection parameter 'ssl_ocsp_stapling' on the client side. 
For example, when 'ssl_ocsp_stapling=1', the psql client will send a 
certificate status request to the PostgreSQL server.


*) a client-side callback function within the libpq interface to 
validate and check the stapled OCSP response received from the server. 
If the server's certificate status is valid, the TLS handshake 
continues; otherwise, the connection is rejected.




3.  test cases for 'make check' are not yet ready as they could be 
complicated, but basic tests can be performed as demonstrated below:
To run the tests, OpenSSL tools are required to simulate the OCSP 
responder for generating OCSP responses. Additionally, initial 
certificate generation, including a self-signed root CA, OCSP response 
signing certificate, and PostgreSQL server certificate, is needed.


*) add ocsp atrributes to openssl.cnf
$ openssl version
OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

$ diff openssl-ocsp.cnf /etc/ssl/openssl.cnf
204d203
< authorityInfoAccess = OCSP;URI:http://127.0.0.1:6655
232,235d230
< [ v3_ocsp ]
< basicConstraints = CA:FALSE
< keyUsage = nonRepudiation, digitalSignature, keyEncipherment
< extendedKeyUsage = OCSPSigning
255c250
< keyUsage = critical, cRLSign, digitalSignature, keyCertSign
---
>


*) prepare OCSP responder for generating OCSP response
$ mkdir -p demoCA/newcerts
$ touch demoCA/index.txt
$ echo '01' > demoCA/serial

# create a self-signed root CA
$ openssl req -new -nodes -out rootCA.csr -keyout rootCA.key -subj 
"/C=CA/ST=BC/L=VAN/O=IDO/OU=DEV/CN=rootCA"
$ openssl x509 -req -in rootCA.csr -days 3650 -extfile 
openssl-ocsp.cnf -extensions v3_ca -signkey rootCA.key -out rootCA.crt


# create a certificate for OCSP responder
$ openssl req -new -nodes -out ocspSigning.csr -keyout ocspSigning.key 
-subj "/C=CA/ST=BC/L=VAN/O=IDO/OU=DEV/CN=ocspSigner"
$ openssl ca -keyfile rootCA.key -cert rootCA.crt -in ocspSigning.csr 
-out ocspSigning.crt -config openssl-ocsp.cnf -extensions v3_ocsp

    Sign the certificate? [y/n]:y
    1 out of 1 certificate requests certified, commit? [y/n]y

# create a certificate for PostgreSQL server
$ openssl req -new -nodes -out server.csr -keyout server.key -subj 
"/C=CA/ST=BC/L=VAN/O=IDO/OU=DEV/CN=server"
$ openssl ca -batch -days 365 -keyfile rootCA.key -cert rootCA.crt 
-config openssl-ocsp.cnf -out server.crt -infiles server.csr



# start OCSP responder
$ openssl ocsp -index demoCA/index.txt -port 6655 -rsigner 
ocspSigning.crt -rkey ocspSigning.key -CA rootCA.crt -text


# make sure PostgreSQL server's certificate is 'good'
$ openssl ocsp -issuer rootCA.crt -url http://127.0.0.1:6655 
-resp_text -noverify -cert server.crt


# generate OCSP response when certificate status is 'good' and save as 
_server.resp-good:
$ openssl ocsp -issuer rootCA.crt -cert server

Proposal for implementing OCSP Stapling in PostgreSQL

2024-02-05 Thread David Zhang
.resp-revoked' depending on the test case, for example:


listen_addresses = '*'
ssl = on
ssl_ca_file = 'rootCA.crt'
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'
ssl_ocsp_file = '_server.resp'



*) test with psql client
3.1) PostgreSQL server's certificate status is 'good'
$ cp -pr _server.resp-good _server.resp
$ psql -d "sslmode=verify-ca sslrootcert=rootCA.crt user=david 
dbname=postgres ssl_ocsp_stapling=1" -h 127.0.0.1 -p 5432

psql (17devel)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, 
compression: off)

Type "help" for help.

postgres=#


3.2) PostgreSQL server's certificate status is 'revoked'
$ cp -pr _server.resp-revoked _server.resp
$ psql -d "sslmode=verify-ca sslrootcert=rootCA.crt user=david 
dbname=postgres ssl_ocsp_stapling=1" -h 127.0.0.1 -p 5432
psql: error: connection to server at "127.0.0.1", port 5432 failed: SSL 
error: ocsp callback failure



3.3) PostgreSQL server's certificate status is 'revoked' but OCSP 
stapling is not required by psql client:
$ psql -d "sslmode=verify-ca sslrootcert=rootCA.crt user=david 
dbname=postgres ssl_ocsp_stapling=0" -h 127.0.0.1 -p 5432

psql (17devel)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, 
compression: off)

Type "help" for help.

postgres=#


This is a highly experimental proof of concept, and any comments or 
feedback would be greatly appreciated!



Best regards,
David Zhang

===
Highgo Software Canada
www.highgo.ca

#
# OpenSSL example configuration file.
# See doc/man5/config.pod for more info.
#
# This is mostly being used for generation of certificate requests,
# but may be used for auto loading of providers

# Note that you can include other files from the main configuration
# file using the .include directive.
#.include filename

# This definition stops the following lines choking if HOME isn't
# defined.
HOME= .

 # Use this in order to automatically load providers.
openssl_conf = openssl_init

# Comment out the next line to ignore configuration errors
config_diagnostics = 1

# Extra OBJECT IDENTIFIER info:
# oid_file   = $ENV::HOME/.oid
oid_section = new_oids

# To use this configuration file with the "-extfile" option of the
# "openssl x509" utility, name here the section containing the
# X.509v3 extensions to use:
# extensions=
# (Alternatively, use a configuration file that has only
# X.509v3 extensions in its main [= default] section.)

[ new_oids ]
# We can add new OIDs in here for use by 'ca', 'req' and 'ts'.
# Add a simple OID like this:
# testoid1=1.2.3.4
# Or use config file substitution like this:
# testoid2=${testoid1}.5.6

# Policies used by the TSA examples.
tsa_policy1 = 1.2.3.4.1
tsa_policy2 = 1.2.3.4.5.6
tsa_policy3 = 1.2.3.4.5.7

# For FIPS
# Optionally include a file that is generated by the OpenSSL fipsinstall
# application. This file contains configuration data required by the OpenSSL
# fips provider. It contains a named section e.g. [fips_sect] which is
# referenced from the [provider_sect] below.
# Refer to the OpenSSL security policy for more information.
# .include fipsmodule.cnf

[openssl_init]
providers = provider_sect
ssl_conf = ssl_sect

# List of providers to load
[provider_sect]
default = default_sect
# The fips section name should match the section name inside the
# included fipsmodule.cnf.
# fips = fips_sect

# If no providers are activated explicitly, the default one is activated 
implicitly.
# See man 7 OSSL_PROVIDER-default for more details.
#
# If you add a section explicitly activating any other provider(s), you most
# probably need to explicitly activate the default provider, otherwise it
# becomes unavailable in openssl.  As a consequence applications depending on
# OpenSSL may not work correctly which could lead to significant system
# problems including inability to remotely access the system.
[default_sect]
# activate = 1



[ ca ]
default_ca  = CA_default# The default ca section


[ CA_default ]

dir = ./demoCA  # Where everything is kept
certs   = $dir/certs# Where the issued certs are kept
crl_dir = $dir/crl  # Where the issued crl are kept
database= $dir/index.txt# database index file.
#unique_subject = no# Set to 'no' to allow creation of
# several certs with same subject.
new_certs_dir   = $dir/newcerts # default place for new certs.

certificate = $dir/cacert.pem   # The CA certificate
serial  = $dir/serial   # The current serial number
crlnumber   = $dir/crlnumber# the current crl number
# must be commented out to leave a V1 
CR

Re: Functions to return random numbers in a given range

2024-01-26 Thread David Zhang

Thank you for the patch.

I applied this patch manually to the master branch, resolving a conflict 
in `numeric.h`. It successfully passed both `make check` and `make 
check-world`.



Best regards,

David





Re: Tab completion for ATTACH PARTITION

2023-10-10 Thread David Zhang

On 2023-09-13 12:19 a.m., Alvaro Herrera wrote:

On 2023-Sep-13, bt23nguyent wrote:


Hi,

Currently, the psql's tab completion feature does not support properly for
ATTACH PARTITION. When  key is typed after "ALTER TABLE 
ATTACH PARTITION ", all possible table names should be displayed, however,
foreign table names are not displayed. So I created a patch that addresses
this issue by ensuring that psql displays not only normal table names but
also foreign table names in this case.

Sounds reasonable, but I think if we're going to have a specific query
for this case, we should make it a lot more precise.  For example, any
relation that's already a partition cannot be attached; as can't any
relation that is involved in legacy inheritance as either parent or
child.


I applied the patch and performed below tests. I think it would be 
better if "attach partition" can filter out those partitions which has 
already been attached, just like "detach partition" is capable to filter 
out the partitions which has already been detached.


Here are my test steps and results:


### create a main PG cluster on port 5432 and run below commands:

CREATE EXTENSION postgres_fdw;
CREATE SERVER s1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname  
'postgres', host '127.0.0.1', port '5433');

CREATE USER MAPPING for david SERVER s1 OPTIONS(user 'david');
CREATE TABLE t (a INT, b TEXT) PARTITION BY RANGE (a);
CREATE TABLE t_local PARTITION OF t FOR VALUES FROM (1) TO (10);
CREATE FOREIGN TABLE t_s1 PARTITION OF t FOR VALUES FROM (11) TO (20) 
SERVER s1 OPTIONS(schema_name 'public', table_name 't');
CREATE FOREIGN TABLE t_s1 SERVER s1 OPTIONS(schema_name 'public', 
table_name 't');



### create a foreign PG cluster on port 5433 and run below command:
CREATE TABLE t (a INT, b TEXT);


### "detach partition" can filter out already detached partition, in 
this case, "t_local".


postgres=# alter table t detach partition
information_schema.  public.  t_local t_s1
postgres=# alter table t detach partition t_s1 ;
ALTER TABLE
postgres=# alter table t detach partition
information_schema.  public. t_local


## before patch, "attach partition" can't display foreign table;
postgres=# alter table t attach partition
information_schema.  public.  t t_local


### after patch, "attach partition" dose display the foreign table 
(patch works).

postgres=# alter table t attach partition
information_schema.  public.  t t_local  t_s1

In both cases, the already attached partition "t_local" shows up. If it 
can be filtered out then I believe better user experience.



Best regards,

David









Re: [PATCH] Add inline comments to the pg_hba_file_rules view

2023-09-08 Thread David Zhang
This is a very useful feature. I applied the patch to the master branch, 
and both make check and make check-world passed without any issues.


Just one comment here, based on the example below,


host db jim 127.0.0.1/32 md5 # #foo#

... it returns the following pg_hba_file_rules records:

postgres=#  SELECT type, database, user_name, address, comment
 FROM pg_hba_file_rules
 WHERE user_name[1]='jim';

 type | database | user_name |  address  | comment
--+--+---+---+-
 host | {db} | {jim} | 127.0.0.1 | #foo#


Since "only the first #" and "any leading spaces" are removed, IMO, it 
can be more accurate to say,


Text after the first # comment character in the end 
of a valid pg_hba.conf entry, if any



Best regards,

David






Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-08-14 Thread David Zhang




[..]

For below changes,

  else if (TailMatches("CREATE", "VIEW", MatchAny, "AS") ||
-             TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny,
"AS"))
+             TailMatches("CREATE", "VIEW", MatchAny, "WITH", "(*)", "AS")
||
+             TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny, "AS")
||
+             TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny,
"WITH", "(*)", "AS"))

it would be great to switch the order of the 3rd and the 4th line to make a
better match for "CREATE" and "CREATE OR REPLACE" .

I don't see how it would effect matching in any way - or am I
overlooking something here?


It won't affect the SQL matching. What I was trying to say is that using 
'CREATE OR REPLACE ...' after 'CREATE ...' can enhance code structure, 
making it more readable. For instance,


/* Complete CREATE [ OR REPLACE ] VIEW  WITH ( ... ) with "AS" */
else if (TailMatches("CREATE", "VIEW", MatchAny, "WITH", "(*)") ||
 TailMatches("CREATE", "OR", "REPLACE", "VIEW", 
MatchAny, "WITH", "(*)"))

    COMPLETE_WITH("AS");

"CREATE", "OR", "REPLACE", "VIEW", MatchAny, "WITH", "(*)" follows 
"CREATE", "VIEW", MatchAny, "WITH", "(*)") immediately.


best regards,

David





Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-08-11 Thread David Zhang

Applied v3 patch to master and verified it with below commands,

#Alter view

postgres=# alter view v 
ALTER COLUMN  OWNER TO  RENAME    RESET (   SET

postgres=# alter view v set 
(   SCHEMA

postgres=# alter view v set ( 
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# alter view v reset ( 
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# alter view v set ( check_option = 
CASCADED  LOCAL

postgres=# alter view v set ( security_barrier = 
FALSE  TRUE

postgres=# alter view v set ( security_invoker = 
FALSE  TRUE


#Create view

postgres=# create view v
AS  WITH (
postgres=# create or replace view v
AS  WITH (

postgres=# create view v with (
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# create or replace view v with (
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

postgres=# create view v with (*)AS
postgres=# create or replace view v with (*)AS

postgres=# create view v as SELECT
postgres=# create or replace view v as SELECT


For below changes,

 else if (TailMatches("CREATE", "VIEW", MatchAny, "AS") ||
-             TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny, 
"AS"))
+             TailMatches("CREATE", "VIEW", MatchAny, "WITH", "(*)", 
"AS") ||
+             TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny, 
"AS") ||
+             TailMatches("CREATE", "OR", "REPLACE", "VIEW", MatchAny, 
"WITH", "(*)", "AS"))


it would be great to switch the order of the 3rd and the 4th line to 
make a better match for "CREATE" and "CREATE OR REPLACE" .



Since it supports  in the middle for below case,
postgres=# alter view v set ( security_
security_barrier  security_invoker

and during view reset it can also provide all the options list,
postgres=# alter view v reset (
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

but not sure if it is a good idea or possible to autocomplete the reset 
options after seeing one of the options showing up with "," for example,

postgres=# alter view v reset ( CHECK_OPTION, 
SECURITY_BARRIER  SECURITY_INVOKER


Thank you,

David





Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

2023-07-19 Thread David Zhang

On 2023-07-16 6:27 p.m., Michael Paquier wrote:


Delete a backup_label from a fresh base backup can easily lead to data
corruption, as the startup process would pick up as LSN to start
recovery from the control file rather than the backup_label file.
This would happen if a checkpoint updates the redo LSN in the control
file while a backup happens and the control file is copied after the
checkpoint, for instance.  If one wishes to deploy a new primary from
a base backup, recovery.signal is the way to go, making sure that the
new primary is bumped into a new timeline once recovery finishes, on
top of making sure that the startup process starts recovery from a
position where the cluster would be able to achieve a consistent
state.

Thanks a lot for sharing this information.


How would you rewrite that?  I am not sure how many details we want to
put here in terms of differences between recovery.signal and
standby.signal, still we surely should mention these are the two
possible choices.


Honestly, I can't convince myself to mention the backup_label here too. 
But, I can share some information regarding my testing of the patch and 
the corresponding results.


To assess the impact of the patch, I executed the following commands for 
before and after,


pg_basebackup -h localhost -p 5432 -U david -D pg_backup1

pg_ctl -D pg_backup1 -l /tmp/logfile start

Before the patch, there were no issues encountered when starting an 
independent Primary server.



However, after applying the patch, I observed the following behavior 
when starting from the base backup:


1) simply start server from a base backup

FATAL:  could not find recovery.signal or standby.signal when recovering 
with backup_label


HINT:  If you are restoring from a backup, touch 
"/media/david/disk1/pg_backup1/recovery.signal" or 
"/media/david/disk1/pg_backup1/standby.signal" and add required recovery 
options.


2) touch a recovery.signal file and then try to start the server, the 
following error was encountered:


FATAL:  must specify restore_command when standby mode is not enabled

3) touch a standby.signal file, then the server successfully started, 
however, it operates in standby mode, whereas the intended behavior was 
for it to function as a primary server.



Best regards,

David








Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

2023-07-14 Thread David Zhang
I believe before users can make a backup using pg_basebackup and then 
start the backup as an independent Primary server for whatever reasons. 
Now, if this is still allowed, then users need to be aware that the 
backup_label must be manually deleted, otherwise, the backup won't be 
able to start as a Primary.


The current message below doesn't provide such a hint.

+   if (!ArchiveRecoveryRequested)
+   ereport(FATAL,
+   (errmsg("could not find recovery.signal or 
standby.signal when recovering with backup_label"),
+errhint("If you are restoring from a backup, touch 
\"%s/recovery.signal\" or \"%s/standby.signal\" and add required recovery options.",
+DataDir, DataDir)));

On 2023-03-12 6:06 p.m., Michael Paquier wrote:

On Fri, Mar 10, 2023 at 03:59:04PM +0900, Michael Paquier wrote:

My apologies for the long message, but this deserves some attention,
IMHO.

Note: A CF entry has been added as of [1], and I have added an item in
the list of live issues on the open item page for 16.

[1]:https://commitfest.postgresql.org/43/4244/
[2]:https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Live_issues
--
Michael


Best regards,

David


Re: [PATCH] pg_regress.c: Fix "make check" on Mac OS X: Pass DYLD_LIBRARY_PATH

2023-06-22 Thread David Zhang
After conducting a further investigation into this issue, I have made 
some discoveries. The previous patch successfully resolves the problem 
when running the commands `./configure && make && make check` (without 
any previous sudo make install or make install). However, it stops at 
the 'isolation check' when using the commands `./configure 
--enable-tap-tests && make && make check-world`.


To address this, I attempted to apply a similar approach as the previous 
patch, resulting in an experimental patch (attached). This new patch 
helps progress the 'make-world' process and passes the 'isolation 
check', but there are still several remaining issues that need to be 
addressed.



Currently, there is a description suggesting a workaround by running a 
'make install' command first, but I find it to be somewhat inaccurate. 
It would be better to update the existing description to provide more 
precise instructions on how to overcome this issue. Here are the changes 
I would suggest.


from:
"You can work around that by doing make install before make check. Most 
PostgreSQL developers just turn off SIP, though."


to:
"You can execute sudo make install if you do not specify a prefix during 
the configure step, or make install without sudo if you do specify a 
prefix (assuming proper permissions) before make check. Most PostgreSQL 
developers just turn off SIP, though."


Otherwise, following the current description, if you run `./configure  
&& make install` you will get error: "mkdir: /usr/local/pgsql: 
Permission denied"



Below are the steps I took that led to the discovery of additional issues.

git apply  pg_regress_mac_os_x_dyld.patch
./configure
make
make check

... ...
# All 215 tests passed.


./configure --enable-tap-tests
make
make check-world

... ...
echo "# +++ isolation check in src/test/isolation +++"
... ...
dyld[32335]: Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
  Referenced from:  
/Users/david/hg/sandbox/postgres/src/test/isolation/isolationtester
  Reason: tried: '/usr/local/pgsql/lib/libpq.5.dylib' (no such file), 
'/System/Volumes/Preboot/Cryptexes/OS/usr/local/pgsql/lib/libpq.5.dylib' 
(no such file), '/usr/local/pgsql/lib/libpq.5.dylib' (no such file), 
'/usr/local/lib/libpq.5.dylib' (no such file), '/usr/lib/libpq.5.dylib' 
(no such file, not in dyld cache)
no data was returned by command 
""/Users/david/hg/sandbox/postgres/src/test/isolation/isolationtester" -V"




git apply pg_regress_mac_os_x_dyld_isolation_check_only.patch
./configure --enable-tap-tests
make
make check-world

... ...
# All 215 tests passed.
... ...
# +++ isolation check in src/test/isolation +++
... ...
# All 112 tests passed.

echo "# +++ tap check in src/test/modules/brin +++"
... ...
# +++ tap check in src/test/modules/brin +++
t/01_workitems.pl  Bailout called.  Further testing stopped:  
command "initdb -D 
/Users/david/hg/sandbox/postgres/src/test/modules/brin/tmp_check/t_01_workitems_tango_data/pgdata 
-A trust -N" died with signal 6

t/01_workitems.pl  Dubious, test returned 255 (wstat 65280, 0xff00)
No subtests run


Any thoughts ?

Thank you

David

On 2023-06-16 2:25 p.m., David Zhang wrote:

I have applied the patch to the latest master branch and successfully executed './configure && 
make && make check' on macOS Ventura. However, during the process, a warning was encountered: 
"mixing declarations and code is incompatible with standards before C99 
[-Wdeclaration-after-statement]". Moving the declaration of 'result' to the beginning like below can 
resolve the warning, and it would be better to use a unique variable instead of 'result'.

#ifdef __darwin__
static char extra_envvars[4096];
+int result = -1;
... ...
-int result = snprintf(extra_envvars, sizeof(extra_envvars), 
"DYLD_LIBRARY_PATH=%s",
+result = snprintf(extra_envvars, sizeof(extra_envvars), "DYLD_LIBRARY_PATH=%s",diff --git a/src/common/exec.c b/src/common/exec.c
index f209b934df..8cf2c21a66 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -74,6 +74,12 @@ static char *pg_realpath(const char *fname);
 static BOOL GetTokenUser(HANDLE hToken, PTOKEN_USER *ppTokenUser);
 #endif
 
+#ifdef __darwin__
+static char extra_envvars[4096];
+#else
+static const char extra_envvars[] = "";
+#endif 
+
 /*
  * validate_exec -- validate "path" as an executable file
  *
@@ -330,6 +336,15 @@ find_other_exec(const char *argv0, const char *target,
charcmd[MAXPGPATH];
charline[MAXPGPATH];
 
+#ifdef __darwin__
+int result = 1;
+result = snprintf(extra_envvars, sizeof(extra_envvars), 
"DYLD_LIBRARY_PATH=%s",
+  getenv("DYLD_LIBRARY_PATH"));
+if (result < 0 || result >= sizeof(extra_envvars))
+   

Re: [PATCH] pg_regress.c: Fix "make check" on Mac OS X: Pass DYLD_LIBRARY_PATH

2023-06-16 Thread David Zhang
I have applied the patch to the latest master branch and successfully executed 
'./configure && make && make check' on macOS Ventura. However, during the 
process, a warning was encountered: "mixing declarations and code is 
incompatible with standards before C99 [-Wdeclaration-after-statement]". Moving 
the declaration of 'result' to the beginning like below can resolve the 
warning, and it would be better to use a unique variable instead of 'result'. 

#ifdef __darwin__
static char extra_envvars[4096];
+int result = -1;
... ...
-int result = snprintf(extra_envvars, sizeof(extra_envvars), 
"DYLD_LIBRARY_PATH=%s",
+result = snprintf(extra_envvars, sizeof(extra_envvars), "DYLD_LIBRARY_PATH=%s",

Re: PGDOCS - Replica Identity quotes

2023-05-05 Thread David Zhang

On 2023-03-16 4:46 p.m., Peter Smith wrote:

A rebase was needed due to the recent REPLICA IDENTITY push [1].

PSA v2.


-   A published table must have a replica identity configured in
+   A published table must have a replica identity 
configured in

+1

 order to be able to replicate UPDATE
 and DELETE operations, so that appropriate rows to
 update or delete can be identified on the subscriber side.  By default,
 this is the primary key, if there is one.  Another unique index (with
 certain additional requirements) can also be set to be the replica
 identity.  If the table does not have any suitable key, then it can be set
-   to replica identity full, which means the entire row becomes
-   the key.  When replica identity full is specified,
+   to REPLICA IDENTITY FULL, which means the entire row 
becomes
+   the key.  When REPLICA IDENTITY FULL is specified,
 indexes can be used on the subscriber side for searching the rows.  
Candidate
 indexes must be btree, non-partial, and have at least one column reference
 (i.e. cannot consist of only expressions).  These restrictions
 on the non-unique index properties adhere to some of the restrictions that
 are enforced for primary keys.  If there are no such suitable indexes,
 the search on the subscriber side can be very inefficient, therefore
-   replica identity full should only be used as a
+   REPLICA IDENTITY FULL should only be used as a
 fallback if no other solution is possible.  If a replica identity other
IMO, it would be better just change "full" to "FULL". On one side, it 
can emphasize that "FULL" is one of the specific values (DEFAULT | USING 
INDEX index_name | FULL | NOTHING); on the other side, it leaves 
"replica identity" in lowercase to be more consistent with the 
terminology used in this entire paragraph.

-   than full is set on the publisher side, a replica identity
+   than FULL is set on the publisher side, a replica 
identity

+1

 comprising the same or fewer columns must also be set on the subscriber
 side.  See  for details on
 how to set the replica identity.  If a table without a replica identity is


David





Re: Add missing copyright for pg_upgrade/t/* files

2023-04-21 Thread David Zhang




I checked the commit log.
About 001_basic.pl, it had been added at 2017 once but been reverted soon 
[1][2].
322bec added the file again at 2022[3], so I chose 2022.

About 002_pg_upgrade.pl, it has been added at the same time[3].
Definitively it should be 2022.


It is great to make sure each file has the Copyright and I see this 
patch has already been committed.


Just curious, is there a rule to add Copyright to Postgres? For example, 
if I run a command `grep -rn Copyright --include="*.pl" | awk -F ':' 
{'print $2, $1'} | sort -nr` inside postgres/src/bin, It seems most 
Copyright were added to the second line, but these two were added to the 
very beginning (of course, there are three other files following this 
pattern as well).


...

2 pg_archivecleanup/t/010_pg_archivecleanup.pl
2 pg_amcheck/t/005_opclass_damage.pl
2 pg_amcheck/t/004_verify_heapam.pl
2 pg_amcheck/t/003_check.pl
2 pg_amcheck/t/002_nonesuch.pl
2 pg_amcheck/t/001_basic.pl
2 initdb/t/001_initdb.pl
1 pg_verifybackup/t/010_client_untar.pl
1 pg_verifybackup/t/008_untar.pl
1 pg_upgrade/t/002_pg_upgrade.pl
1 pg_upgrade/t/001_basic.pl
1 pg_basebackup/t/011_in_place_tablespace.pl


David





Re: [BUG] pg_stat_statements and extended query protocol

2023-03-13 Thread David Zhang

It appears you must "make clean; make install" to correctly compile after
applying the patch.

In a git repository, I've learnt to rely on this simple formula, even
if it means extra cycles when running ./configure:
git clean -d -x -f

Thank you all for pointing out that it needs make clean first. After 
make clean followed by recompile with the patch then both make check 
from regression test and pg_stat_statements extension report all test 
passed. So the current existing test cases can't really detect any 
change from this patch, then it would be better to add some test cases 
to cover this.


Best regards,

David






Re: [BUG] pg_stat_statements and extended query protocol

2023-03-10 Thread David Zhang




Yes, I agree that proper test coverage is needed here. Will think
about how to accomplish this.


Tried to apply this patch to current master branch and the build was ok, 
however it crashed during initdb with a message like below.


"performing post-bootstrap initialization ... Segmentation fault (core 
dumped)"


If I remove this patch and recompile again, then "initdb -D $PGDATA" works.

Thanks,

David





Re: psql: Add role's membership options to the \du+ command

2023-02-15 Thread David Zhang

On 2023-02-15 1:37 p.m., David G. Johnston wrote:


On Wed, Feb 15, 2023 at 2:31 PM David Zhang  wrote:

There is a default built-in role `pg_monitor` and the behavior
changed after the patch. If `\dg+` and `\du+` is treated as the
same, and `make check` all pass, then I assume there is no test
case to verify the output of `duS+`. My point is should we
consider add a test case?

I mean, either you accept the change in how this meta-command presents 
its information or you don't.  I don't see how a test case is 
particularly beneficial.  Or, at least the pg_monitor role is not 
special in this regard.  Alice changed too and you don't seem to be 
including it in your complaint.

Good improvement, +1.

Re: psql: Add role's membership options to the \du+ command

2023-02-15 Thread David Zhang

On 2023-02-10 2:27 p.m., David G. Johnston wrote:

On Fri, Feb 10, 2023 at 2:08 PM David Zhang  wrote:


I noticed the document psql-ref.sgml has been updated for both
`du+` and
`dg+`, but only `du` and `\du+` are covered in regression test. Is
that
because `dg+` is treated exactly the same as `du+` from testing
point of
view?


Yes.


The reason I am asking this question is that I notice that
`pg_monitor`
also has the detailed information, so not sure if more test cases
required.


Of course it does.  Why does that bother you?  And what does that have 
to do with the previous paragraph?


There is a default built-in role `pg_monitor` and the behavior changed 
after the patch. If `\dg+` and `\du+` is treated as the same, and `make 
check` all pass, then I assume there is no test case to verify the 
output of `duS+`. My point is should we consider add a test case?


Before patch the output for `pg_monitor`,

postgres=# \duS+
List of roles
  Role name  | Attributes | 
Member of   | Description

-++--+-
 alice |    | 
{pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables} |
 pg_checkpoint   | Cannot 
login   | 
{}   |
 pg_database_owner   | Cannot 
login   | 
{}   |
 pg_execute_server_program   | Cannot 
login   | 
{}   |
 pg_maintain | Cannot 
login   | 
{}   |
 pg_monitor  | Cannot 
login   | 
{pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables} |
 pg_read_all_data    | Cannot 
login   | 
{}   |
 pg_read_all_settings    | Cannot 
login   | 
{}   |
 pg_read_all_stats   | Cannot 
login   | 
{}   |
 pg_read_server_files    | Cannot 
login   | 
{}   |
 pg_signal_backend   | Cannot 
login   | 
{}   |
 pg_stat_scan_tables | Cannot 
login   | 
{}   |
 pg_use_reserved_connections | Cannot 
login   | 
{}   |
 pg_write_all_data   | Cannot 
login   | 
{}   |
 pg_write_server_files   | Cannot 
login   | 
{}   |
 ubuntu  | Superuser, Create role, Create DB, 
Replication, Bypass RLS | 
{}   |


After patch the output for `pg_monitor`,

postgres=# \duS+
List of roles
  Role name  | Attributes 
|   Member of   | Description

-++---+-
 alice |    | 
pg_read_all_settings WITH ADMIN, INHERIT, SET+|
|    | 
pg_read_all_stats WITH INHERIT   +|
|    | 
pg_stat_scan_tables   |
 pg_checkpoint   | Cannot login 
|   |
 pg_database_owner   | Cannot login 
|   |
 pg_execute_server_program   | Cannot login 
|   |
 pg_maintain | Cannot login 
|   |
 pg_monitor  | Cannot 
login   | 
pg_read_all_settings WITH INHERIT, SET

Re: psql: Add role's membership options to the \du+ command

2023-02-10 Thread David Zhang
Thanks a lot for the improvement, and it will definitely help provide 
more very useful information.


I noticed the document psql-ref.sgml has been updated for both `du+` and 
`dg+`, but only `du` and `\du+` are covered in regression test. Is that 
because `dg+` is treated exactly the same as `du+` from testing point of 
view?


The reason I am asking this question is that I notice that `pg_monitor` 
also has the detailed information, so not sure if more test cases required.


postgres=# \duS+
List of roles
  Role name  | Attributes 
|   Member of   | Description

-++---+-
 alice |    | 
pg_read_all_settings WITH ADMIN, INHERIT, SET |
 pg_checkpoint   | Cannot login 
|   |
 pg_database_owner   | Cannot login 
|   |
 pg_execute_server_program   | Cannot login 
|   |
 pg_maintain | Cannot login 
|   |
 pg_monitor  | Cannot 
login   | 
pg_read_all_settings WITH INHERIT, SET   +|
|    | 
pg_read_all_stats WITH INHERIT, SET  +|
|    | 
pg_stat_scan_tables WITH INHERIT, SET |


Best regards,

David

On 2023-01-09 8:09 a.m., Pavel Luzanov wrote:

When you include one role in another, you can specify three options:
ADMIN, INHERIT (added in e3ce2de0) and SET (3d14e171).

For example.

CREATE ROLE alice LOGIN;

GRANT pg_read_all_settings TO alice WITH ADMIN TRUE, INHERIT TRUE, SET 
TRUE;
GRANT pg_stat_scan_tables TO alice WITH ADMIN FALSE, INHERIT FALSE, 
SET FALSE;
GRANT pg_read_all_stats TO alice WITH ADMIN FALSE, INHERIT TRUE, SET 
FALSE;


For information about the options, you need to look in the 
pg_auth_members:


SELECT roleid::regrole, admin_option, inherit_option, set_option
FROM pg_auth_members
WHERE member = 'alice'::regrole;
    roleid    | admin_option | inherit_option | set_option
--+--++
 pg_read_all_settings | t    | t  | t
 pg_stat_scan_tables  | f    | f  | f
 pg_read_all_stats    | f    | t  | f
(3 rows)

I think it would be useful to be able to get this information with a 
psql command

like \du (and \dg). With proposed patch the \du command still only lists
the roles of which alice is a member:

\du alice
 List of roles
 Role name | Attributes |  Member of
---++-- 

 alice |    | 
{pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables}


But the \du+ command adds information about the selected ADMIN, INHERIT
and SET options:

\du+ alice
    List of roles
 Role name | Attributes |   Member 
of   | Description
---++---+- 


 alice |    | pg_read_all_settings WITH ADMIN, INHERIT, SET+|
   |    | pg_read_all_stats WITH INHERIT   +|
   |    | pg_stat_scan_tables   |

One more change. The roles in the "Member of" column are sorted for both
\du+ and \du for consistent output.

Any comments are welcome.






Re: Patch: Global Unique Index

2022-12-27 Thread David Zhang

On 2022-12-19 7:51 a.m., Nikita Malakhov wrote:

Sorry to bother - but is this patch used in IvorySQL?
Here:
https://www.ivorysql.org/docs/Global%20Unique%20Index/create_global_unique_index
According to syntax it definitely looks like this patch.


The global unique index is one of the features required in IvorySQL 
development. We want to share it to the communities to get more 
feedback, and then hopefully we could better contribute it back to 
PostgreSQL.


Best regards,

David





Re: Make ON_ERROR_STOP stop on shell script failure

2022-12-13 Thread David Zhang

On 2022-10-12 2:13 a.m., bt22nakamorit wrote:

There was a mistake in the error message for \! so I updated the patch.

Tried to apply this patch to the master branch, but got the error like 
below.

$ git apply --check patch-view.diff
error: patch failed: src/bin/psql/command.c:2693
error: src/bin/psql/command.c: patch does not apply

I think there are some tests related with "ON_ERROR_STOP" in 
src/bin/psql/t/001_basic.pl, and we should consider to add corresponding 
test cases for "on/off/shell/all" to this patch.



Best regards,

David






Re: Patch: Global Unique Index

2022-12-02 Thread David Zhang

On 2022-11-29 6:16 p.m., Tom Lane wrote:

Assuming that you are inserting into index X, and you've checked
index Y to find that it has no conflicts, what prevents another
backend from inserting a conflict into index Y just after you look?
AIUI the idea is to prevent that by continuing to hold an exclusive
lock on the whole index Y until you've completed the insertion.
Perhaps there's a better way to do that, but it's not what was
described.
Another main change in patch 
`0004-support-global-unique-index-insert-and-update.patch`,

+                search_global:
+                        stack = _bt_search(iRel, insertstate.itup_key,
+                                           , BT_READ, 
NULL);

+                        xwait = _bt_check_unique_gi(iRel, ,
+                                                    hRel, checkUnique, 
_unique,

+ , heapRel);
+                        if (unlikely(TransactionIdIsValid(xwait)))
+                        {
... ...
+                            goto search_global;
+                        }

Here, I am trying to use `BT_READ` to require a LW_SHARED lock on the 
buffer block if a match found using `itup_key` search key. The 
cross-partition uniqueness checking will wait if the index tuple 
insertion on this buffer block has not done yet, otherwise runs the 
uniqueness check to see if there is an ongoing transaction which may 
insert a conflict value. Once the ongoing insertion is done, it will go 
back and check again (I think it can also handle the case that a 
potential conflict index tuple was later marked as dead in the same 
transaction). Based on this change, my test results are:


1) a select-only query will not be blocked by the ongoing insertion on 
index X


2) insertion happening on index Y may wait for the buffer block lock 
when inserting a different value but it does not wait for the 
transaction lock held by insertion on index X.


3) when an insertion inserting a conflict value on index Y,
    3.1) it waits for buffer block lock if the lock has been held by 
the insertion on index X.
    3.2) then, it waits for transaction lock until the insertion on 
index X is done.






Re: Patch: Global Unique Index

2022-12-02 Thread David Zhang

Thanks a lot for all the comments.

On 2022-11-29 3:13 p.m., Tom Lane wrote:

... not to mention creating a high probability of deadlocks between
concurrent insertions to different partitions.  If they each
ex-lock their own partition's index before starting to look into
other partitions' indexes, it seems like a certainty that such
cases would fail.  The rule of thumb about locking multiple objects
is that all comers had better do it in the same order, and this
isn't doing that.
In the current POC patch, the deadlock is happening when backend-1 
inserts a value to index X(partition-1), and backend-2 try to insert a 
conflict value right after backend-1 released the buffer block lock but 
before start to check unique on index Y(partition-2). In this case, 
backend-1 holds ExclusiveLock on transaction-1 and waits for ShareLock 
on transaction-2 , while backend-2 holds ExclusiveLock on transaction-2 
and waits for ShareLock on transaction-1. Based on my debugging tests, 
this only happens when backend-1 and backend-2 want to insert a conflict 
value. If this is true, then is it ok to either `deadlock` error out or 
`duplicated value` error out since this is a conflict value? (hopefully 
end users can handle it in a similar way). I think the probability of 
such deadlock has two conditions: 1) users insert a conflict value and 
plus 2) the uniqueness checking happens in the right moment (see above).

That specific issue could perhaps be fixed by having everybody
examine all the indexes in the same order, inserting when you
come to your own partition's index and otherwise just checking
for conflicts.  But that still means serializing insertions
across all the partitions.  And the fact that you need to lock
all the partitions, or even just know what they all are,
Here is the main change for insertion cross-partition uniqueness check 
in `0004-support-global-unique-index-insert-and-update.patch`,
 result = _bt_doinsert(rel, itup, checkUnique, indexUnchanged, 
heapRel);


+    if (checkUnique != UNIQUE_CHECK_NO)
+        btinsert_check_unique_gi(itup, rel, heapRel, checkUnique);
+
 pfree(itup);

where, a cross-partition uniqueness check is added after the index tuple 
btree insertion on current partition. The idea is to make sure other 
backends can find out the ongoing index tuple just inserted (but before 
marked as visible yet), and the current partition uniqueness check can 
be skipped as it has already been checked. Based on this change, I think 
the insertion serialization can happen in two cases: 1) two insertions 
happen on the same buffer block (buffer lock waiting); 2) two ongoing 
insertions with duplicated values (transaction id waiting);







Re: Patch: Global Unique Index

2022-11-25 Thread David Zhang

Hi Bruce,

Thank you for helping review the patches in such detail.

On 2022-11-25 9:48 a.m., Bruce Momjian wrote:

Looking at the patch, I am unclear how the the patch prevents concurrent
duplicate value insertion during the partitioned index checking.  I am
actually not sure how that can be done without locking all indexes or
inserting placeholder entries in all indexes.  (Yeah, that sounds bad,
unless I am missing something.)


For the uniqueness check cross all partitions, we tried to follow the 
implementation of uniqueness check on a single partition, and added a 
loop to check uniqueness on other partitions after the index tuple has 
been inserted to current index partition but before this index tuple has 
been made visible. The uniqueness check will wait `XactLockTableWait` if 
there is a valid transaction in process, and performs the uniqueness 
check again after the in-process transaction finished.


We tried to simulate this duplicate value case in blow steps:

1) prepare the partitioned table,
CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a);
CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10);
CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20);

2) having two psql consoles hooked up with gdbs and set break points 
after _bt_doinsert


result = _bt_doinsert(rel, itup, checkUnique, indexUnchanged, heapRel);

inside btinsert function in nbtree.c file.

3) first, execute `INSERT INTO gidx_part values(1, 1, 'test');` on 
console-1, and then execute `INSERT INTO gidx_part values(11, 1, 
'test');` on console-2 (expect duplicated value '1' in the 2nd column to 
be detected),


The test results is that: console-2 query will have to wait until either 
console-1 committed or aborted. If console-1 committed, then console-2 
reports duplicated value already exists; if console-1 aborted, then 
console-2 will report insert successfully. If there is a deadlock, then 
the one detected this deadlock will error out to allow the other one 
continue.


I am not quite sure if this is a proper way to deal with a deadlock in 
this case. It would be so grateful if someone could help provide some 
cases/methods to verify this cross all partitions uniqueness.


Best regards,

David


HighGo Software Canada
www.highgo.ca 





Re: Error for WITH options on partitioned tables

2022-10-28 Thread David Zhang

Hi Karina,

I am not very clear about why `build_reloptions` is removed in patch 
`v2-0002-better-error-message-for-setting-parameters-for-p.patch`, if 
you can help explain would be great.


From my observation, it seems the WITH option has different behavior 
when creating partitioned table and index. For example,


pgbench -i --partitions=2 --partition-method=range -d postgres

postgres=# create index idx_bid on pgbench_accounts using btree(bid) 
with (fillfactor = 90);

CREATE INDEX

postgres=# select relname, relkind, reloptions from pg_class where 
relnamespace=2200 order by oid;

  relname   | relkind |    reloptions
+-+--
 idx_bid    | I   | {fillfactor=90}
 pgbench_accounts_1_bid_idx | i   | {fillfactor=90}
 pgbench_accounts_2_bid_idx | i   | {fillfactor=90}

I can see the `fillfactor` parameter has been added to the indexes, 
however, if I try to alter `fillfactor`, I got an error like below.

postgres=# alter index idx_bid set (fillfactor=40);
ERROR:  ALTER action SET cannot be performed on relation "idx_bid"
DETAIL:  This operation is not supported for partitioned indexes.

This generic error message is reported by 
`errdetail_relkind_not_supported`, and there is a similar error message 
for partitioned tables. Anyone knows if this can be an option for 
reporting this `fillfactor` parameter not supported error.



Best regards,

David

On 2022-10-14 8:16 a.m., Karina Litskevich wrote:

Hi, Simon!

The new error message looks better. But I believe it is better to use
"parameters" instead of "options" as it is called "storage parameters"
in the documentation. I also believe it is better to report error in
partitioned_table_reloptions() (thanks to Japin Li for mentioning this
function) as it also fixes the error message in the following situation:

test=# CREATE TABLE parted_col_comment (a int, b text) PARTITION BY 
LIST (a);

CREATE TABLE
test=# ALTER TABLE parted_col_comment SET (fillfactor=100);
ERROR:  unrecognized parameter "fillfactor"

I attached the new versions of patches.

I'm not sure about the errcode. May be it is better to report error with
ERRCODE_INVALID_OBJECT_DEFINITION for CREATE TABLE and with
ERRCODE_WRONG_OBJECT_TYPE for ALTER TABLE (as when you try "ALTER TABLE
partitioned INHERIT nonpartitioned;" an ERROR with 
ERRCODE_WRONG_OBJECT_TYPE

is reported). Then either the desired code should be passed to
partitioned_table_reloptions() or similar checks and ereport calls 
should be

placed in two different places. I'm not sure whether it is a good idea to
change the code in one of these ways just to change the error code.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/






Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-09-30 Thread David Zhang

Hi Etsuro,


The patch needs rebase due to commits 4036bcbbb, 8c8d307f8 and
82699edbf, so I updated the patch.  Attached is a new version, in
which I also tweaked comments a little bit.


After rebase the file `postgres_fdw.out` and applied to master branch, 
make and make check are all ok for postgres_fdw.



--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

2022-08-19 Thread David Zhang

Hi Ranier,

Following the comment in commit 9fd45870c1436b477264c0c82eb195df52bc0919,

    (The same could be done with appropriate memset() calls, but this
    patch is part of an effort to phase out MemSet(), so it doesn't touch
    memset() calls.)

Should these obviously possible replacement of the standard library 
function "memset" be considered as well? For example, something like the 
attached one which is focusing on the pageinspect extension only.



Best regards,

David

On 2022-08-01 10:08 a.m., Ranier Vilela wrote:
Em sáb., 16 de jul. de 2022 às 16:54, Ranier Vilela 
 escreveu:




Em sáb, 16 de jul de 2022 2:58 AM, Peter Eisentraut
 escreveu:

On 11.07.22 21:06, Ranier Vilela wrote:
> Em qui., 7 de jul. de 2022 às 14:01, Ranier Vilela
 <mailto:ranier...@gmail.com>> escreveu:
>
>     Attached the v1 of your patch.
>     I think that all is safe to switch MemSet by {0}.
>
> Here the rebased patch v2, against latest head.

I have committed my patch with Álvaro's comments addressed

I see.
It's annoing that old compiler (gcc 4.7.2) don't handle this style.


Your patch appears to add in changes that are either arguably
out of
scope or would need further review (e.g., changing memset()
calls,
changing the scope of some variables, changing places that
need to worry
about padding bits).  Please submit separate patches for
those, and we
can continue the analysis.

Sure.

Hi, sorry for the delay.
Like how 
https://github.com/postgres/postgres/commit/9fd45870c1436b477264c0c82eb195df52bc0919

New attempt to remove more MemSet calls, that are safe.

Attached v3 patch.

regards,
Ranier Vilela
From 25bd8af7acfd6bf2e23cdfac93828d810cfbb5b4 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Fri, 19 Aug 2022 14:43:01 -0700
Subject: [PATCH] Replace many memset calls with struct initialization

This replaces all memset() calls with struct initialization where that
is easily and obviously possible.
---
 contrib/pageinspect/btreefuncs.c |  3 +--
 contrib/pageinspect/ginfuncs.c   | 12 +++-
 contrib/pageinspect/gistfuncs.c  | 12 +++-
 contrib/pageinspect/heapfuncs.c  |  4 +---
 contrib/pageinspect/rawpage.c|  4 +---
 5 files changed, 9 insertions(+), 26 deletions(-)

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 9375d55e14..a14d83f8cb 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -311,7 +311,7 @@ bt_page_print_tuples(struct user_args *uargs)
boolrightmost = uargs->rightmost;
boolispivottuple;
Datum   values[9];
-   boolnulls[9];
+   boolnulls[9] = {0};
HeapTuple   tuple;
ItemId  id;
IndexTuple  itup;
@@ -331,7 +331,6 @@ bt_page_print_tuples(struct user_args *uargs)
itup = (IndexTuple) PageGetItem(page, id);
 
j = 0;
-   memset(nulls, 0, sizeof(nulls));
values[j++] = DatumGetInt16(offset);
values[j++] = ItemPointerGetDatum(>t_tid);
values[j++] = Int32GetDatum((int) IndexTupleSize(itup));
diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c
index 952e9d51a8..6c0183cc63 100644
--- a/contrib/pageinspect/ginfuncs.c
+++ b/contrib/pageinspect/ginfuncs.c
@@ -40,7 +40,7 @@ gin_metapage_info(PG_FUNCTION_ARGS)
GinMetaPageData *metadata;
HeapTuple   resultTuple;
Datum   values[10];
-   boolnulls[10];
+   boolnulls[10] = {0};
 
if (!superuser())
ereport(ERROR,
@@ -75,8 +75,6 @@ gin_metapage_info(PG_FUNCTION_ARGS)
 
metadata = GinPageGetMeta(page);
 
-   memset(nulls, 0, sizeof(nulls));
-
values[0] = Int64GetDatum(metadata->head);
values[1] = Int64GetDatum(metadata->tail);
values[2] = Int32GetDatum(metadata->tailFreeSize);
@@ -107,7 +105,7 @@ gin_page_opaque_info(PG_FUNCTION_ARGS)
GinPageOpaque opaq;
HeapTuple   resultTuple;
Datum   values[3];
-   boolnulls[3];
+   boolnulls[3] = {0};
Datum   flags[16];
int nflags = 0;
uint16  flagbits;
@@ -162,8 +160,6 @@ gin_page_opaque_info(PG_FUNCTION_ARGS)
flags[nflags++] = DirectFunctionCall1(to_hex32, 
Int32GetDatum(flagbits));
}
 
-   memset(nulls, 0, sizeof(nulls));
-
values[0] = Int64GetDatum(opaq->rightlink);
values[1] = Int32GetDatum(opaq->maxoff);
values[2] = PointerGetDatum(construct_array_builtin(flags, nflags, 
TEXTOID));
@@ -255,14 +251,12 @@ gin_leafpage_items(PG_FUNCTION_ARGS)
HeapTuple   resultTuple;
Datum 

Re: Hash index build performance tweak from sorting

2022-08-05 Thread David Zhang

On 2022-08-01 8:37 a.m., Simon Riggs wrote:

Using the above test case, I'm getting a further 4-7% improvement on
already committed code with the attached patch, which follows your
proposal.


I ran two test cases: for committed patch `hash_sort_by_hash.v3.patch`, I can 
see about 6 ~ 7% improvement; and after applied patch 
`hash_inserted_sorted.v2.patch`, I see about ~3% improvement. All the test 
results are based on 10 times average on two different machines.

Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-07-08 Thread David Zhang

Hi,

I tried to apply this patch v5 to current master branch but it complains,
"git apply --check 
v5-0001-Add-protections-in-xlog-record-APIs-against-large.patch

error: patch failed: src/include/access/xloginsert.h:43
error: src/include/access/xloginsert.h: patch does not apply"

then I checked it out before the commit 
`b0a55e43299c4ea2a9a8c757f9c26352407d0ccc` and applied this v5 patch.


1) both make check and make installcheck passed.

2) and I can also see this patch v5 prevents the error happens previously,

"postgres=# SELECT pg_logical_emit_message(false, long, long) FROM 
repeat(repeat(' ', 1024), 1024*1023) as l(long);

ERROR:  too much WAL data"

3) without this v5 patch, the same test will cause the standby crash 
like below, and the standby not be able to boot up after this crash.


"2022-07-08 12:28:16.425 PDT [2363] FATAL:  invalid memory alloc request 
size 2145388995
2022-07-08 12:28:16.426 PDT [2360] LOG:  startup process (PID 2363) 
exited with exit code 1
2022-07-08 12:28:16.426 PDT [2360] LOG:  terminating any other active 
server processes
2022-07-08 12:28:16.427 PDT [2360] LOG:  shutting down due to startup 
process failure

2022-07-08 12:28:16.428 PDT [2360] LOG:  database system is shut down"


Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-06-13 Thread David Zhang




A little confused here, does this patch V3 intend to solve this problem "record 
length 2145386550 at 0/360 too long"?

No, not once the record exists. But it does remove Postgres' ability
to create such records, thereby solving the problem for all systems
that generate WAL through Postgres' WAL writing APIs.


I set up a simple Primary and Standby stream replication environment, and use 
the above query to run the test for before and after patch v3. The error 
message still exist, but with different message.

Before patch v3, the error is showing below,

2022-06-10 15:32:25.307 PDT [4253] LOG:  record length 2145386550 at 0/360 
too long
2022-06-10 15:32:47.763 PDT [4257] FATAL:  terminating walreceiver process due 
to administrator command
2022-06-10 15:32:47.763 PDT [4253] LOG:  record length 2145386550 at 0/360 
too long

After patch v3, the error displays differently

2022-06-10 15:53:53.397 PDT [12848] LOG:  record length 2145386550 at 0/360 
too long
2022-06-10 15:54:07.249 PDT [12852] FATAL:  could not receive data from WAL 
stream: ERROR:  requested WAL segment 00010045 has already been 
removed
2022-06-10 15:54:07.275 PDT [12848] LOG:  record length 2145386550 at 0/360 
too long

And once the error happens, then the Standby can't continue the replication.

Did you initiate a new cluster or otherwise skip the invalid record
you generated when running the instance based on master? It seems to
me you're trying to replay the invalid record (len > MaxAllocSize),
and this patch does not try to fix that issue. This patch just tries
to forbid emitting records larger than MaxAllocSize, as per the check
in XLogRecordAssemble, so that we wont emit unreadable records into
the WAL anymore.

Reading unreadable records still won't be possible, but that's also
not something I'm trying to fix.


Thanks a lot for the clarification. My testing environment is pretty 
simple, initdb for Primary, run basebackup and set the connection string 
for Standby, then run the "pg_logical_emit_message" query and tail the 
log on standby side.


Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2022-06-10 Thread David Zhang

Hi,

> > MaxAllocSize is pretty easy:
> > SELECT pg_logical_emit_message(false, long, long) FROM 
repeat(repeat(' ', 1024), 1024*1023) as l(long);

> >
> > on a standby:
> >
> > 2022-03-11 16:41:59.336 PST [3639744][startup][1/0:0] LOG:  record 
length 2145386550 at 0/360 too long

>
> Thanks for the reference. I was already playing around with 2PC log
> records (which can theoretically contain >4GB of data); but your
> example is much easier and takes significantly less time.

A little confused here, does this patch V3 intend to solve this problem 
"record length 2145386550 at 0/360 too long"?


I set up a simple Primary and Standby stream replication environment, 
and use the above query to run the test for before and after patch v3. 
The error message still exist, but with different message.


Before patch v3, the error is showing below,

2022-06-10 15:32:25.307 PDT [4253] LOG: record length 2145386550 at 
0/360 too long
2022-06-10 15:32:47.763 PDT [4257] FATAL:  terminating walreceiver 
process due to administrator command
2022-06-10 15:32:47.763 PDT [4253] LOG:  record length 2145386550 at 
0/360 too long


After patch v3, the error displays differently

2022-06-10 15:53:53.397 PDT [12848] LOG: record length 2145386550 at 
0/360 too long
2022-06-10 15:54:07.249 PDT [12852] FATAL:  could not receive data from 
WAL stream: ERROR:  requested WAL segment 00010045 has 
already been removed
2022-06-10 15:54:07.275 PDT [12848] LOG:  record length 2145386550 at 
0/360 too long


And once the error happens, then the Standby can't continue the replication.


Is a particular reason to say "more datas" at line 52 in patch v3?

+ * more datas than are being accounted for by the XLog infrastructure.


On 2022-04-18 10:19 p.m., Michael Paquier wrote:

On Mon, Apr 18, 2022 at 05:48:50PM +0200, Matthias van de Meent wrote:

Seeing that the busiest time for PG15 - the last commitfest before the
feature freeze - has passed, could someone take another look at this?

The next minor release is three weeks away, so now would be a good
time to get that addressed.  Heikki, Andres, are you planning to look
more at what has been proposed here?
--
Michael


Thank you,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-05-04 Thread David Zhang

Thanks a lot for the patch update.

On 2022-05-02 1:25 a.m., Etsuro Fujita wrote:

Hi,

On Wed, Apr 20, 2022 at 4:55 AM David Zhang  wrote:

I tried to apply the patch to master and plan to run some tests, but got
below errors due to other commits.

I rebased the patch against HEAD.  Attached is an updated patch.


Applied the patch v8 to master branch today, and the `make check` is OK. 
I also repeated previous performance tests on three virtual Ubuntu 
18.04, and the performance improvement of parallel abort in 10 times 
average is more consistent.


before:

    abort.1 = 2.6344 ms
    abort.2 = 4.2799 ms

after:

    abort.1 = 1.4105 ms
    abort.2 = 2.2075 ms


+ * remote server in parallel at (sub)transaction end.

Here, I think the comment above could potentially apply to multiple
remote server(s).

I agree on that point, but I think it's correct to say "the remote
server" here, because we determine this for the given remote server.
Maybe I'm missing something, so could you elaborate on it?
Your understanding is correct. I was thinking `remote server(s)` would 
be easy for end user to understand but this is a comment in source code, 
so either way is fine for me.

Not sure if there is a way to avoid repeated comments? For example, the
same comment below appears in two places (line 231 and line 296).

+/*
+ * If requested, consume whatever data is available from the socket.
+ * (Note that if all data is available, this allows
+ * pgfdw_get_cleanup_result to call PQgetResult without forcing the
+ * overhead of WaitLatchOrSocket, which would be large compared to the
+ * overhead of PQconsumeInput.)
+ */

IMO I think it's OK to have this in multiple places, because 1) IMO it
wouldn't be that long, and 2) we already duplicated comments like this
in the same file in v14 and earlier.  Here is such an example in
pgfdw_xact_callback() and pgfdw_subxact_callback() in that file in
those versions:

 /*
  * If a command has been submitted to the remote server by
  * using an asynchronous execution function, the command
  * might not have yet completed.  Check to see if a
  * command is still being processed by the remote server,
  * and if so, request cancellation of the command.
  */

Thanks for reviewing!  Sorry for the delay.

Best regards,
Etsuro Fujita

--
Best regards,
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-04-19 Thread David Zhang
I tried to apply the patch to master and plan to run some tests, but got 
below errors due to other commits.


$ git apply --check 
v7-0003-postgres-fdw-Add-support-for-parallel-abort.patch

error: patch failed: doc/src/sgml/postgres-fdw.sgml:479
error: doc/src/sgml/postgres-fdw.sgml: patch does not apply


+     * remote server in parallel at (sub)transaction end.

Here, I think the comment above could potentially apply to multiple 
remote server(s).



Not sure if there is a way to avoid repeated comments? For example, the 
same comment below appears in two places (line 231 and line 296).


+    /*
+     * If requested, consume whatever data is available from the socket.
+     * (Note that if all data is available, this allows
+     * pgfdw_get_cleanup_result to call PQgetResult without forcing the
+     * overhead of WaitLatchOrSocket, which would be large compared to the
+     * overhead of PQconsumeInput.)
+     */

On 2022-03-24 11:46 p.m., Etsuro Fujita wrote:

On Thu, Mar 24, 2022 at 1:34 PM Etsuro Fujita  wrote:

Attached is a new patch set.  The new version of 0002 is just a
cleanup patch (see the commit message in 0002), and I think it's
committable, so I'm planning to commit it, if no objections.

Done.

Attached is the 0003 patch, which is the same as the one I sent yesterday.

Best regards,
Etsuro Fujita

--
Best regards,
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-03-11 Thread David Zhang
Applied patches v6-0002 and v6-0003 to master branch, and the `make 
check` test is ok.


Here is my test result in 10 times average on 3 virtual machines:
before the patches:

   abort.1 = 2.5473 ms

   abort.2 = 4.1572 ms

after the patches with OPTIONS (ADD parallel_abort 'true'):

   abort.1 = 1.7136 ms

   abort.2 = 2.5833 ms

Overall, it has about 32 ~ 37 % improvement in my testing environment.

On 2022-03-05 2:32 a.m., Etsuro Fujita wrote:

On Mon, Feb 28, 2022 at 6:53 PM Etsuro Fujita  wrote:

Here is an updated version.  I added to the 0003 patch a macro for
defining the milliseconds to wait, as proposed by David upthread.

I modified the 0003 patch further: 1) I added to
pgfdw_cancel_query_end/pgfdw_exec_cleanup_query_end the PQconsumeInput
optimization that we have in do_sql_command_end, and 2) I
added/tweaked comments a bit further.  Attached is an updated version.

Like [1], I ran a simple performance test using the following transaction:

BEGIN;
SAVEPOINT s;
INSERT INTO ft1 VALUES (10, 10);
INSERT INTO ft2 VALUES (20, 20);
ROLLBACK TO SAVEPOINT s;
RELEASE SAVEPOINT s;
INSERT INTO ft1 VALUES (10, 10);
INSERT INTO ft2 VALUES (20, 20);
ABORT;

where ft1 is a foreign table created on a foreign server hosted on the
same machine as the local server, and ft2 is a foreign table created
on a foreign server hosted on a different machine.  (In this test I
used two machines, while in [1] I used three machines: one for the
local server and the others for ft1 and ft2.)  The average latencies
for the ROLLBACK TO SAVEPOINT and ABORT commands over ten runs of the
above transaction with the parallel_abort option disabled/enabled are:

* ROLLBACK TO SAVEPOINT
   parallel_abort=0: 0.3217 ms
   parallel_abort=1: 0.2396 ms

* ABORT
   parallel_abort=0: 0.4749 ms
   parallel_abort=1: 0.3733 ms

This option reduces the latency for ROLLBACK TO SAVEPOINT by 25.5
percent, and the latency for ABORT by 21.4 percent.  From the results,
I think the patch is useful.

Best regards,
Etsuro Fujita

[1]https://www.postgresql.org/message-id/CAPmGK17dAZCXvwnfpr1eTfknTGdt%3DhYTV9405Gt5SqPOX8K84w%40mail.gmail.com

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-02-18 Thread David Zhang

Thanks a lot for updating the patch.

Tried to apply the patches to master branch, no warning found and 
regression test passed.


Now, we have many places (5) calling the same function with a constant 
number 3. Is this a good time to consider redefine this number a 
macro somewhere?


Thank you,

On 2022-02-17 8:46 a.m., Fujii Masao wrote:



On 2022/02/11 21:59, Etsuro Fujita wrote:

I tweaked comments/docs a little bit as well.


Thanks for updating the patches!

I reviewed 0001 patch. It looks good to me except the following minor 
things. If these are addressed, I think that the 001 patch can be 
marked as ready for committer.


+ * Also determine to commit (sub)transactions opened on the 
remote server

+ * in parallel at (sub)transaction end.

Like the comment "Determine whether to keep the connection ...", 
"determine to commit" should be "determine whether to commit"?


"remote server" should be "remote servers"?


+    curlevel = GetCurrentTransactionNestLevel();
+    snprintf(sql, sizeof(sql), "RELEASE SAVEPOINT s%d", curlevel);

Why does pgfdw_finish_pre_subcommit_cleanup() need to call 
GetCurrentTransactionNestLevel() and construct the "RELEASE SAVEPOINT" 
query string again? pgfdw_subxact_callback() already does them and 
probably we can make it pass either of them to 
pgfdw_finish_pre_subcommit_cleanup() as its argument.



+   This option controls whether postgres_fdw 
commits

+   remote (sub)transactions opened on a foreign server in a local
+   (sub)transaction in parallel when the local (sub)transaction 
commits.


"a foreign server" should be "foreign servers"?

"in a local (sub)transaction" part seems not to be necessary.

Regards,


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Question about 001_stream_rep.pl recovery test

2021-12-14 Thread David Zhang

Thanks a lot Michael for the explanation.

On 2021-12-13 4:05 a.m., Michael Paquier wrote:

On Fri, Dec 10, 2021 at 01:44:40PM -0800, David Zhang wrote:

Inside the test script `src/test/recovery/t/001_stream_rep.pl`, a comment at
line 30 says `my_backup` is "not mandatory",

  30 # Take backup of standby 1 (not mandatory, but useful to check if
  31 # pg_basebackup works on a standby).
  32 $node_standby_1->backup($backup_name);

however if remove the backup folder "my_backup" after line 32, something
like below,

Well, the comment is not completely incorrect IMO.  In this context, I
read taht that we don't need a backup from a standby and we could just
take a backup from the primary instead.  If I were to fix something, I
would suggest to reword the comment as follows:
# Take backup of standby 1 (could be taken from the primary, but this
# is useful to check if pg_basebackup works on a standby).


And then the test script takes another backup `my_backup_2`, but it seems
this backup is not used anywhere.

This had better stay around, per commit f267c1c.  And as far as I can
see, we don't have an equivalent test in a different place, where
pg_basebackup runs on a standby while its *primary* is stopped.
--
Michael

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Question about 001_stream_rep.pl recovery test

2021-12-10 Thread David Zhang

Hi Hackers,

Inside the test script `src/test/recovery/t/001_stream_rep.pl`, a 
comment at line 30 says `my_backup` is "not mandatory",


 30 # Take backup of standby 1 (not mandatory, but useful to check if
 31 # pg_basebackup works on a standby).
 32 $node_standby_1->backup($backup_name);

however if remove the backup folder "my_backup" after line 32, something 
like below,


 33 system_or_bail('rm', '-rf', 
'/home/user/postgres/src/test/recovery/tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup');


then the test failed with message like,

recovery$ make check PROVE_TESTS='t/001_stream_rep.pl'
t/001_stream_rep.pl .. # Looks like your test exited with 2 before it 
could output anything.

t/001_stream_rep.pl .. Dubious, test returned 2 (wstat 512, 0x200)
Failed 53/53 subtests
... ...
Result: FAIL
Makefile:23: recipe for target 'check' failed
make: *** [check] Error 1

And then the test script takes another backup `my_backup_2`, but it 
seems this backup is not used anywhere.


 35 # Take a second backup of the standby while the primary is offline.
 36 $node_primary->stop;
 37 $node_standby_1->backup('my_backup_2');
 38 $node_primary->start;

because, if I deleted the backup "my_backup_2" right after line 38 with 
something like below,


39 system_or_bail('rm', '-rf', 
'/home/user/postgres/src/test/recovery/tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup_2');

there is no impact on the test result, and it says all test cases passed.

recovery$ make check PROVE_TESTS='t/001_stream_rep.pl'
...
t/001_stream_rep.pl .. ok
All tests successful.
Files=1, Tests=53,  6 wallclock secs ( 0.01 usr  0.01 sys + 1.00 cusr  
0.82 csys =  1.84 CPU)

Result: PASS

Did I misunderstand something here?

Thank you,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2021-11-01 Thread David Zhang

I evaluated the effectiveness of the patch using a simple
multi-statement transaction:

BEGIN;
SAVEPOINT s;
INSERT INTO ft1 VALUES (10, 10);
INSERT INTO ft2 VALUES (20, 20);
RELEASE SAVEPOINT s;
COMMIT;

where ft1 and ft2 are foreign tables created on different foreign
servers hosted on different machines.  I ran the transaction five
times using the patch with the option enabled/disabled, and measured
the latencies for the RELEASE and COMMIT commands in each run.  The
average latencies for these commands over the five runs are:

* RELEASE
   parallel_commit=0: 0.385 ms
   parallel_commit=1: 0.221 ms

* COMMIT
   parallel_commit=0: 1.660 ms
   parallel_commit=1: 0.861 ms

With the option enabled, the average latencies for both commands are
reduced significantly!
Followed your instructions, I performed some basic tests to compare the 
performance between before and after. In my testing environment (two 
foreign servers on the same local machine), the performance varies, 
sometimes the time spent on RELEASE and COMMIT without patch are close 
to after patched, but seems it always perform better after patched. Then 
I ran a 1-millions tuples insert, 5 times average is something like below,


Before
    RELEASE 0.171 ms, COMMIT 1.861 ms

After
    RELEASE 0.147 ms, COMMIT 1.305 ms

Best regards,
--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Fix uninitialized variable access (src/backend/utils/mmgr/freepage.c)

2021-10-01 Thread David Zhang

On 2021-08-18 1:29 a.m., Kyotaro Horiguchi wrote:

At Tue, 17 Aug 2021 17:04:44 +0900, Michael Paquier  wrote 
in

On Fri, Jul 02, 2021 at 06:22:56PM -0300, Ranier Vilela wrote:

Em qui., 1 de jul. de 2021 às 17:20, Mahendra Singh Thalor <
mahi6...@gmail.com> escreveu:

Please can we try to hit this rare condition by any test case. If you have
any test cases, please share.

Yeah, this needs to be proved.  Are you sure that this change is
actually right?  The bottom of FreePageManagerPutInternal() has
assumptions that a page may not be found during a btree search, with
an index value used.

By a quick look, FreePageBtreeSearch is called only from
FreePageManagerPutInternal at three points. The first one assumes that
result.found == true, at the rest points are passed only when
fpm->btree_depth > 0, i.e, fpm->btree_root is non-NULL.

In short FreePageBtreeSeach is never called when fpm->btree_root is
NULL.  I don't think we need to fill-in other members since the
contract of the function looks fine.

It might be simpler to turn 'if (btp == NULL)' to an assertion.
After added the initialization of split_pages in patch 
fix_unitialized_var_index_freepage-v1.patch,


+        result->split_pages = 0;

it actually changed the assertion condition after the second time 
function call of FreePageBtreeSearch.

            FreePageBtreeSearch(fpm, first_page, );

            /*
             * The act of allocating pages for use in constructing our 
btree

             * should never cause any page to become more full, so the new
             * split depth should be no greater than the old one, and 
perhaps
             * less if we fortuitously allocated a chunk that freed up 
a slot

             * on the page we need to update.
             */
            Assert(result.split_pages <= fpm->btree_recycle_count);

Should we consider adding some test cases to make sure this assertion 
will still function properly?




regards.


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: ORDER BY pushdowns seem broken in postgres_fdw

2021-09-10 Thread David Zhang

On 2021-09-06 1:16 a.m., Ronan Dunklau wrote:

Le vendredi 3 septembre 2021, 22:54:25 CEST David Zhang a écrit :

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Applied the v6 patch to master branch and ran regression test for contrib,
the result was "All tests successful."

What kind of error did you get running make installcheck-world ? If it passed
the make check for contrib, I can't see why it would fail running make
installcheck-world.
Just to clarify, the error I encountered was not related with patch v6, 
it was related with other extensions.

In any case, I just checked and running make installcheck-world doesn't
produce any error.

Since HEAD had moved a bit since the last version, I rebased the patch,
resulting in the attached v7.

Best regards,

--
Ronan Dunklau

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: ORDER BY pushdowns seem broken in postgres_fdw

2021-09-03 Thread David Zhang
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Applied the v6 patch to master branch and ran regression test for contrib, the 
result was "All tests successful."

Re: [UNVERIFIED SENDER] Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

2021-08-13 Thread David Zhang

Hi Drouvot,

I don't see extra data in your output and it looks like your 
copy/paste is missing some content, no?


On my side, that looks good and here is what i get with the patch applied:

I ran the test again, now I got the same output as yours, and it looks 
good for me. (The issue I mentioned in previous email was caused by my 
console output.)


Thank you,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

2021-08-06 Thread David Zhang
Hi Drouvot,
I can reproduce the issue you mentioned on REL_12_STABLE as well as Master 
branch, but the patch doesn't apply to REL_12_STABLE. After applied it to 
Master branch, it returns some wired result when run the query in the first 
time. 
As you can see in the log below, after the first time execute the query `select 
* from pg_logical_slot_get_changes('bdt_slot', null, null);` it returns some 
extra data.
david:postgres$ psql -d postgres
psql (15devel)
Type "help" for help.

postgres=# \q
david:postgres$ psql -d postgres
psql (15devel)
Type "help" for help.

postgres=# select 
pg_create_logical_replication_slot('bdt_slot','test_decoding');
 pg_create_logical_replication_slot 

 (bdt_slot,0/1484FA8)
(1 row)

postgres=# CREATE TABLE tbl1 (a INT, b TEXT);
CREATE TABLE
postgres=# CREATE TABLE tbl2 (a INT);
CREATE TABLE
postgres=# ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
ALTER TABLE
postgres=# 
postgres=# BEGIN;
BEGIN
postgres=*# INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
INSERT 0 1
postgres=*# ALTER TABLE tbl1 ADD COLUMN id serial primary key;
ALTER TABLE
postgres=*# INSERT INTO tbl2 VALUES(1);
INSERT 0 1
postgres=*# commit;
COMMIT
postgres=# 
postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
lsn| xid |  
  

  

  

  

  

  

  

  

  

  

  

  

  data

  

  

  

  

  

  

  

  

  

  

Re: Query about time zone patterns in to_char

2021-07-09 Thread David Zhang
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Applied the patch `v2_support_of_tzh_tzm_patterns.patch` to `REL_14_STABLE` 
branch, both `make check` and `make check-world` are all passed.

Re: Support tab completion for upper character inputs in psql

2021-03-30 Thread David Zhang

Hi Tang,

Thanks a lot for the patch.

I did a quick test based on the latest patch V3 on latest master branch 
"commit 4753ef37e0eda4ba0af614022d18fcbc5a946cc9".


Case 1: before patch

  1 postgres=# set a
  2 all  allow_system_table_mods 
application_name array_nulls

  3 postgres=# set A
  4
  5 postgres=# create TABLE tbl (data text);
  6 CREATE TABLE
  7 postgres=# update tbl SET DATA =
  8
  9 postgres=# update T
 10
 11 postgres=#

Case 2: after patched

  1 postgres=# set a
  2 all  allow_system_table_mods 
application_name array_nulls

  3 postgres=# set A
  4 ALL  ALLOW_SYSTEM_TABLE_MODS 
APPLICATION_NAME ARRAY_NULLS

  5 postgres=# create TABLE tbl (data text);
  6 CREATE TABLE
  7
  8 postgres=# update tbl SET DATA =
  9
 10 postgres=# update TBL SET
 11
 12 postgres=#

So, as you can see the difference is between line 8 and 10 in case 2. It 
looks like the lowercase can auto complete more than the uppercase; 
secondly, if you can add some test cases, it would be great.


Best regards,
David

On 2021-03-22 5:41 a.m., tanghy.f...@fujitsu.com wrote:

On Tuesday, March 16, 2021 5:20 AM, Peter Eisentraut 
 wrote:


The cases that complete with a query
result are not case insensitive right now.  This affects things like

UPDATE T

as well.  I think your first patch was basically right.  But we need to
understand that this affects all completions with query results, not
just the one you wanted to fix.  So you should analyze all the callers
and explain why the proposed change is appropriate.

Thanks for your review and suggestion. Please find attached patch V3 which was 
based on the first patch[1].
Difference from the first patch is:

Add tab completion support for all query results in psql.
complete_from_query
+complete_from_versioned_query
+complete_from_schema_query
+complete_from_versioned_schema_query

[1] 
https://www.postgresql.org/message-id/a63cbd45e3884cf9b3961c2a6a95dcb7%40G08CNEXMBPEKD05.g08.fujitsu.local

The modification to support case insensitive matching in " _complete_from_query" is based on 
"complete_from_const and "complete_from_list" .
Please let me know if you find anything insufficient.

Regards,
Tang



--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay

2021-02-12 Thread David Zhang

Thanks for the patch, Mead.

For 'MAXVACUUMCOSTLIMIT", it would be nice to follow the current GUC 
pattern to do define a constant.


For example, the constant "MAX_KILOBYTES" is defined in guc.h, with a 
pattern like, "MAX_" to make it easy to read.


Best regards,

David

On 2021-02-08 6:48 a.m., Mead, Scott wrote:

Hello,
   I recently looked at what it would take to make a running 
autovacuum pick-up a change to either cost_delay or cost_limit.  Users 
frequently will have a conservative value set, and then wish to change 
it when autovacuum initiates a freeze on a relation.  Most users end 
up finding out they are in ‘to prevent wraparound’ after it has 
happened, this means that if they want the vacuum to take advantage of 
more I/O, they need to stop and then restart the currently running 
vacuum (after reloading the GUCs).
  Initially, my goal was to determine feasibility for making this 
dynamic.  I added debug code to vacuum.c:vacuum_delay_point(void) and 
found that changes to cost_delay and cost_limit are already processed 
by a running vacuum.  There was a bug preventing the cost_delay or 
cost_limit from being configured to allow higher throughput however.
I believe this is a bug because currently, autovacuum will dynamically 
detect and /increase/ the cost_limit or cost_delay, but it can never 
decrease those values beyond their setting when the vacuum began.  The 
current behavior is for vacuum to limit the maximum throughput of 
currently running vacuum processes to the cost_limit that was set when 
the vacuum process began.
I changed this (see attached) to allow the cost_limit to be 
re-calculated up to the maximum allowable (currently 10,000).  This 
has the effect of allowing users to reload a configuration change and 
an in-progress vacuum can be ‘sped-up’ by setting either the 
cost_limit or cost_delay.

The problematic piece is:
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c

index c6ec657a93..d3c6b0d805 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1834,7 +1834,7 @@ autovac_balance_cost(void)
* cost_limit to more than the base value.
*/
worker->wi_cost_limit = *Max(Min(limit,*
*- worker->wi_cost_limit_base*),
+ MAXVACUUMCOSTLIMIT),
1);
}
We limit the worker to the max cost_limit that was set at the 
beginning of the vacuum.  I introduced the MAXVACUUMCOSTLIMIT constant 
(currently defined to 1, which is the currently max limit already 
defined) in miscadmin.h so that vacuum will now be able to adjust the 
cost_limit up to 1 as the upper limit in a currently running vacuum.


The tests that I’ve run show that the performance of an existing 
vacuum can be increased commensurate with the parameter change. 
 Interestingly, /autovac_balance_cost(void) /is only updating the 
cost_limit, even if the cost_delay is modified.  This is done 
correctly, it was just a surprise to see the behavior.



2021-02-01 13:36:52.346 EST [37891] DEBUG:  VACUUM Sleep: Delay: 
20.00, CostBalance: 207, CostLimit: *200*, msec: 20.70
2021-02-01 13:36:52.346 EST [37891] CONTEXT:  while scanning block 
1824 of relation "public.blah"
2021-02-01 13:36:52.362 EST [36460] LOG:  received SIGHUP, reloading 
configuration files

*
*
*2021-02-01 13:36:52.364 EST [36460] LOG:  parameter 
"autovacuum_vacuum_cost_delay" changed to "2"*

\
2021-02-01 13:36:52.365 EST [36463] DEBUG:  checkpointer updated 
shared memory configuration values
2021-02-01 13:36:52.366 EST [36466] DEBUG: 
 autovac_balance_cost(pid=37891 db=13207, rel=16384, dobalance=yes 
cost_limit=2000, cost_limit_base=200, cost_delay=20)


2021-02-01 13:36:52.366 EST [36467] DEBUG:  received inquiry for 
database 0
2021-02-01 13:36:52.366 EST [36467] DEBUG:  writing stats file 
"pg_stat_tmp/global.stat"
2021-02-01 13:36:52.366 EST [36467] DEBUG:  writing stats file 
"pg_stat_tmp/db_0.stat"
2021-02-01 13:36:52.388 EST [37891] DEBUG:  VACUUM Sleep: Delay: 
20.00, CostBalance: 2001, CostLimit: 2000, msec: 20.01



--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca


Re: Add table access method as an option to pgbench

2021-01-19 Thread David Zhang

On 2021-01-15 1:22 p.m., Andres Freund wrote:


Hi,

On 2020-11-25 12:41:25 +0900, Michael Paquier wrote:

On Tue, Nov 24, 2020 at 03:32:38PM -0800, David Zhang wrote:

But, providing another option for the end user may not be a bad idea, and it
might make the tests easier at some points.

My first thought is that we have no need to complicate pgbench with
this option because there is a GUC able to do that, but we do that for
tablespaces, so...  No objections from here.

I think that objection is right. All that's needed to change this from
the client side is to do something like
PGOPTIONS='-c default_table_access_method=foo' pgbench ...
Yeah, this is a better solution for me too. Thanks a lot for all the 
feedback.

I don't think adding pgbench options for individual GUCs really is a
useful exercise?

Greetings,

Andres Freund

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Add table access method as an option to pgbench

2021-01-15 Thread David Zhang

Hi,

`v6-0001-add-table-access-method-as-an-option-to-pgbench` fixed the 
wording problems for pgbench document and help as they were pointed out 
by Justin and Youichi.


`0001-update-tablespace-to-keep-document-consistency` is trying to make 
the *tablespace* to be more consistent in pgbench document.


Thank you,

On 2021-01-14 3:51 p.m., David Zhang wrote:

On 2021-01-09 5:44 a.m., youichi aramaki wrote:

+   " create tables with using specified table access 
method,\n"
In my opinion,  this sentence should be "create tables with specified 
table access method" or "create tables using specified table access 
method".
"create tables with specified table access method" may be more 
consistent with other options.


Thank you Youichi. I will change it to "create tables with specified 
table access method" in next patch.



--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
From ff94675fda38a0847b893a29dce8c3068a74e118 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Fri, 15 Jan 2021 11:40:30 -0800
Subject: [PATCH] update tablespace to keep document consistency

---
 doc/src/sgml/ref/pgbench.sgml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index faa7c26b0a..080c25731d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -325,10 +325,10 @@ pgbench  options 
 d
  
 
  
-  
--index-tablespace=index_tablespace
+  
--index-tablespace=TABLESPACE
   

-Create indexes in the specified tablespace, rather than the default
+Create indexes in the specified TABLESPACE, 
rather than the default
 tablespace.

   
@@ -360,10 +360,10 @@ pgbench  options 
 d
  
 
  
-  
--tablespace=tablespace
+  
--tablespace=TABLESPACE
   

-Create tables in the specified tablespace, rather than the default
+Create tables in the specified TABLESPACE, 
rather than the default
 tablespace.

   
-- 
2.17.1

From 517fcb61c7065b7bccf6105a7bb89007fe2fcb08 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Fri, 15 Jan 2021 10:32:53 -0800
Subject: [PATCH] add table access method as an option to pgbench

---
 doc/src/sgml/ref/pgbench.sgml|  9 +++
 src/bin/pgbench/pgbench.c| 25 +++-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 22 +
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index faa7c26b0a..d8018388e6 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -359,6 +359,15 @@ pgbench  options 
 d
   
  
 
+ 
+  
--table-access-method=TABLEAM
+  
+   
+Create tables with specified table access method, rather than the 
default.
+   
+  
+ 
+ 
  
   
--tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f7da3e1f62..3f8a119811 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -184,10 +184,11 @@ doublethrottle_delay = 0;
 int64  latency_limit = 0;
 
 /*
- * tablespace selection
+ * tablespace and table access method selection
  */
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
+char  *table_access_method = NULL;
 
 /*
  * Number of "pgbench_accounts" partitions.  0 is the default and means no
@@ -641,6 +642,9 @@ usage(void)
   "  --partition-method=(range|hash)\n"
   "   partition pgbench_accounts with 
this method (default: range)\n"
   "  --partitions=NUM partition pgbench_accounts into 
NUM parts (default: 0)\n"
+  "  --table-access-method=TABLEAM\n"
+  "   create tables with specified 
table access method,\n"
+  "   rather than the default.\n"
   "  --tablespace=TABLESPACE  create tables in the specified 
tablespace\n"
   "  --unlogged-tablescreate tables as unlogged 
tables\n"
   "\nOptions to select what to run:\n"
@@ -3761,6 +3765,20 @@ initCreateTables(PGconn *con)
 
fprintf(stderr, "creating tables...\n");
 
+   if (table_access_method != NULL)
+   {
+   char   *escaped_table_access_method;
+
+   initPQExpBuffer();
+   escaped_table_access_method = PQescapeIdentifier(con,
+   table_access_method, 
strlen(table_access_method));
+   appendPQExpBuffer(, "set default_table_access_method to 
%s;\n",
+ 

Re: Add table access method as an option to pgbench

2021-01-14 Thread David Zhang

On 2021-01-09 5:44 a.m., youichi aramaki wrote:


+  "   create tables with using specified 
table access method,\n"

In my opinion,  this sentence should be "create tables with specified table access 
method" or "create tables using specified table access method".
"create tables with specified table access method" may be more consistent with 
other options.


Thank you Youichi. I will change it to "create tables with specified 
table access method" in next patch.


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Add table access method as an option to pgbench

2021-01-08 Thread David Zhang

tablespace is an extraneous word ?


Thanks a lot for pointing this out. I will fix it in next patch once get all 
issues clarified.


On Sun, Dec 27, 2020 at 09:14:53AM -0400, Fabien COELHO wrote:

src/test/regress/sql/create_am.sql:CREATE ACCESS METHOD heap2 TYPE TABLE 
HANDLER heap_tableam_handler;
...
src/test/regress/sql/create_am.sql:DROP ACCESS METHOD heap2;


Do you suggest to add a new positive test case by using table access method 
*heap2* created using the existing heap_tableam_handler?


If you found pre-existing inconsistencies/errors/deficiencies, I'd suggest to
fix them in a separate patch.  Typically those are small, and you can make them
0001 patch.  Errors might be worth backpatching, so in addition to making the
"feature" patches small, it's good if they're separate.



Like writing NAME vs TABLESPACE.  Or escape vs escapes.

I will provide a separate small patch when fixing the *tablespace* typo mention 
above. Can you help to clarity which releases or branches need to be back 
patched?


Or maybe using SET default_tablespace instead of modifying the CREATE sql.
That'd need to be done separately for indexes, and RESET after creating the
tables, to avoid accidentally affecting indexes, too.

Why should it not affect indexes?

I rarely use pgbench, and probably never looked at its source before, but I saw
that it has a separate --tablespace and --index-tablespace, and that
--tablespace is *not* the default for indexes.

So if we changed it to use SET default_tablespace instead of amending the DDL
sql, we'd need to make sure the SET applied only to the intended CREATE
command, and not all following create commands.  In the case that
--index-tablespace is not specified, it would be buggy to do this:

SET default_tablespace=foo;
CREATE TABLE ...
CREATE INDEX ...
CREATE TABLE ...
CREATE INDEX ...
...


If this `tablespace` issue also applies to `table-access-method`, yes, I agree 
it could be buggy too. But, the purpose of this patch is to provide an option 
for end user to test a newly created table access method. If the the *new* 
table access method hasn't fully implemented all the interfaces yet, then no 
matter the end user use the option provided by this patch or the GUC parameter, 
the results will be the same.

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca


Re: Add table access method as an option to pgbench

2020-12-07 Thread David Zhang

Again, thanks a lot for the feedback.

On 2020-12-02 12:03 p.m., Fabien COELHO wrote:


Hello David,

Some feedback about v4.

It looks that the option is *silently* ignored when creating a 
partitionned table, which currently results in an error, and not 
passed to partitions, which would accept them. This is pretty weird.
The input check is added with an error message when both partitions 
and table access method are used.


Hmmm. If you take the resetting the default, I do not think that you 
should have to test anything? AFAICT the access method is valid on 
partitions, although not on the partitioned table declaration. So I'd 
say that you could remove the check.
Yes, it makes sense to remove the *blocking* check, and actually the 
table access method interface does work with partitioned table. I tested 
this/v5 by duplicating the heap access method with a different name. For 
this reason, I removed the condition check as well when applying the 
table access method.


They should also trigger failures, eg 
"--table-access-method=no-such-table-am", to be added to the @errors 
list.
No sure how to address this properly, since the table access method 
potentially can be *any* name.


I think it is pretty unlikely that someone would chose the name 
"no-such-table-am" when developing a new storage engine for Postgres 
inside Postgres, so that it would make this internal test fail.


There are numerous such names used in tests, eg "no-such-database", 
"no-such-command".


So I'd suggest to add such a test to check for the expected failure.
The test case "no-such-table-am" has been added to the errors list, and 
the regression test is ok.



I do not understand why you duplicated all possible option entry.

Test with just table access method looks redundant if the feature is 
make to work orthonogonally to partitions, which should be the case.
Only one positive test case added using *heap* as the table access 
method to verify the initialization.


Yep, only "heap" if currently available for tables.


Thanks,
--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
From 1d6f434d56c36f95da82d1bae4f99bb917351c08 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Mon, 7 Dec 2020 13:42:00 -0800
Subject: [PATCH] add table access method as an option to pgbench

---
 doc/src/sgml/ref/pgbench.sgml| 10 
 src/bin/pgbench/pgbench.c| 25 +++-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 22 +
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 7180fedd65..992fbbdb4b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -359,6 +359,16 @@ pgbench  options 
 d
   
  
 
+ 
+  
--table-access-method=TABLEAM
+  
+   
+Create tables using the specified table access method, rather than the 
default.
+tablespace.
+   
+  
+ 
+ 
  
   
--tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3057665bbe..eb91df6d6b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -184,10 +184,11 @@ doublethrottle_delay = 0;
 int64  latency_limit = 0;
 
 /*
- * tablespace selection
+ * tablespace and table access method selection
  */
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
+char  *table_access_method = NULL;
 
 /*
  * Number of "pgbench_accounts" partitions.  0 is the default and means no
@@ -641,6 +642,9 @@ usage(void)
   "  --partition-method=(range|hash)\n"
   "   partition pgbench_accounts with 
this method (default: range)\n"
   "  --partitions=NUM partition pgbench_accounts into 
NUM parts (default: 0)\n"
+  "  --table-access-method=TABLEAM\n"
+  "   create tables with using 
specified table access method,\n"
+  "   rather than the default.\n"
   "  --tablespace=TABLESPACE  create tables in the specified 
tablespace\n"
   "  --unlogged-tablescreate tables as unlogged 
tables\n"
   "\nOptions to select what to run:\n"
@@ -3747,6 +3751,20 @@ initCreateTables(PGconn *con)
 
fprintf(stderr, "creating tables...\n");
 
+   if (table_access_method != NULL)
+   {
+   char   *escaped_table_access_method;
+
+   initPQExpBuffer();
+   escaped_table_access_method = PQescapeIdentifier(con,
+   table_access_method, 
strlen(table_access_method));
+   appendPQEx

Re: Add table access method as an option to pgbench

2020-12-01 Thread David Zhang

Thanks a lot for the comments, Fabien.

Some feedback about v3:

In the doc I find TABLEACCESSMETHOD quite hard to read. Use TABLEAM 
instead? 

Yes, in this case, *TABLEAM* is easy to read, updated.

Also, the next entry uses lowercase (tablespace), why change the style?
The original style is not so consistent, for example, in doc, 
--partition-method using *NAME*, but --table-space using *tablespace*; 
in help, --partition-method using *(rang|hash)*, but --tablespace using 
*TABLESPACE*. To keep it more consistent, I would rather use *TABLEAM* 
in both doc and help.
Remove space after comma in help string. I'd use the more readable 
TABLEAM in the help string rather than the mouthful.

Yes the *space* is removed after comma.


It looks that the option is *silently* ignored when creating a 
partitionned table, which currently results in an error, and not 
passed to partitions, which would accept them. This is pretty weird.
The input check is added with an error message when both partitions and 
table access method are used.


I'd suggest that the table am option should rather by changing the 
default instead, so that it would apply to everything relevant 
implicitely when appropriate.
I think this is a better idea, so in v4 patch I change it to something 
like "set default_table_access_method to *TABLEAM*" instead of using the 
*using* clause.


About tests :

They should also trigger failures, eg 
"--table-access-method=no-such-table-am", to be added to the @errors 
list.
No sure how to address this properly, since the table access method 
potentially can be *any* name.


I do not understand why you duplicated all possible option entry.

Test with just table access method looks redundant if the feature is 
make to work orthonogonally to partitions, which should be the case.
Only one positive test case added using *heap* as the table access 
method to verify the initialization.


By the way, I saw the same style for other variables, such as 
escape_tablespace, should this be fixed as well?


Nope, let it be.


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
From c25b55580b1b3f3faac63b4d72291c80fb2c9c1f Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Tue, 1 Dec 2020 20:47:01 -0800
Subject: [PATCH] add table access method as an option to pgbench

---
 doc/src/sgml/ref/pgbench.sgml| 10 +++
 src/bin/pgbench/pgbench.c| 31 +++-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 12 
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 7180fedd65..992fbbdb4b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -359,6 +359,16 @@ pgbench  options 
 d
   
  
 
+ 
+  
--table-access-method=TABLEAM
+  
+   
+Create tables using the specified table access method, rather than the 
default.
+tablespace.
+   
+  
+ 
+ 
  
   
--tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3057665bbe..961841b255 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -184,10 +184,11 @@ doublethrottle_delay = 0;
 int64  latency_limit = 0;
 
 /*
- * tablespace selection
+ * tablespace and table access method selection
  */
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
+char  *table_access_method = NULL;
 
 /*
  * Number of "pgbench_accounts" partitions.  0 is the default and means no
@@ -641,6 +642,9 @@ usage(void)
   "  --partition-method=(range|hash)\n"
   "   partition pgbench_accounts with 
this method (default: range)\n"
   "  --partitions=NUM partition pgbench_accounts into 
NUM parts (default: 0)\n"
+  "  --table-access-method=TABLEAM\n"
+  "   create tables with using 
specified table access method,\n"
+  "   rather than the default.\n"
   "  --tablespace=TABLESPACE  create tables in the specified 
tablespace\n"
   "  --unlogged-tablescreate tables as unlogged 
tables\n"
   "\nOptions to select what to run:\n"
@@ -3747,6 +3751,20 @@ initCreateTables(PGconn *con)
 
fprintf(stderr, "creating tables...\n");
 
+   if (table_access_method != NULL && partition_method == PART_NONE && 
partitions == 0)
+   {
+   char   *escaped_table_access_method;
+
+   initPQExpBuffer();
+   escaped_table_access_method = PQescapeIdentifier(con,
+   table_access_method, 
strlen(table_access_method));
+   append

Re: Add table access method as an option to pgbench

2020-11-26 Thread David Zhang

Thanks Fabien for the comments.

On 2020-11-25 11:29 p.m., Fabien COELHO wrote:


Hello David,

The previous patch was based on branch "REL_13_STABLE". Now, the 
attached new patch v2 is based on master branch. I followed the new 
code structure using appendPQExpBuffer to append the new clause 
"using TABLEAM" in a proper position and tested. In the meantime, I 
also updated the pgbench.sqml file to reflect the changes.


My 0.02€: I'm fine with the added feature.

The patch lacks minimal coverage test. Consider adding something to 
pgbench tap tests, including failures (ie non existing method).
I added 3 test cases to the tap tests, but not sure if they are 
following the rules. One question about the failures test, my thoughts 
is that it should be in the no_server test cases, but it is hard to 
verify the input as the table access method can be any name, such as 
zheap, zedstore or zheap2 etc. Any suggestion will be great.


The option in the help string is not at the right ab place.

Fixed in v3 patch.


I would merge the tableam declaration to the previous one with a 
extended comments, eg "tablespace and access method selection".

Updated in v3 patch.


escape_tableam -> escaped_tableam ?


Updated in v3 patch.

By the way, I saw the same style for other variables, such as 
escape_tablespace, should this be fixed as well?


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: Add table access method as an option to pgbench

2020-11-26 Thread David Zhang

Thanks a lot again for the review and comments.

On 2020-11-25 9:36 p.m., Michael Paquier wrote:

On Wed, Nov 25, 2020 at 12:13:55PM -0800, David Zhang wrote:

The previous patch was based on branch "REL_13_STABLE". Now, the attached
new patch v2 is based on master branch. I followed the new code structure
using appendPQExpBuffer to append the new clause "using TABLEAM" in a proper
position and tested. In the meantime, I also updated the pgbench.sqml file
to reflect the changes.

+ 
+  
--table-am=TABLEAM
+  
+   
+Create all tables with specified table access method
+TABLEAM.
+If unspecified, default is heap.
+   
+  
+ 
This needs to be in alphabetical order.

The order issue is fixed in v3 patch.

And what you say here is
wrong.  The default would not be heap if default_table_access_method
is set to something else.
Right, if user change the default settings in GUC, then the default is 
not `heap` any more.

I would suggest to use table_access_method
instead of TABLEAM,
All other options if values are required, the words are all capitalized, 
such as TABLESPACE. Therefore, I used TABLEACCESSMETHOD in stead of 
table_access_method in v3 patch.

and keep the paragraph minimal, say:
"Create tables using the specified table access method, rather than
the default."

Updated in v3 patch.


--table-am is really the best option name?  --table-access-method
sounds a bit more consistent to me.

Updated in v3 patch.


+   if (tableam != NULL)
+   {
+   char   *escape_tableam;
+
+   escape_tableam = PQescapeIdentifier(con, tableam, strlen(tableam));
+   appendPQExpBuffer(, " using %s", escape_tableam);
+   PQfreemem(escape_tableam);
+   }

Thanks for pointing this out. The order I believe is fixed in v3 patch.

The order is wrong here, generating an unsupported grammar, see by
yourself with this command:
pgbench --partition-method=hash --table-am=heap -i  --partitions 2


Tested in v3 patch, with a command like,

pgbench --partition-method=hash --table-access-method=heap -i --partitions 2



This makes the patch trickier than it looks as the USING clause is
located between PARTITION BY and WITH.  Also, partitioned tables
cannot use the USING clause so you need to be careful that
createPartitions() also uses the correct table AM.

This stuff needs tests.
3 test cases has been added the tap test, but honestly I am not sure if 
I am following the rules. Any comments will be great.

--
Michael

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
>From c64173842a163abbdd6210af5f9643b3301d8de0 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Thu, 26 Nov 2020 18:08:49 -0800
Subject: [PATCH] add table access method as an option to pgbench

---
 doc/src/sgml/ref/pgbench.sgml|  9 ++
 src/bin/pgbench/pgbench.c| 23 ++-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 30 +++-
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 7180fedd65..3af5230f85 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -359,6 +359,15 @@ pgbench  options 
 d
   
  
 
+ 
+  
--table-access-method=TABLEACCESSMETHOD
+  
+   
+Create tables using the specified table access method, rather than the 
default. 
+   
+  
+ 
+ 
  
   
--tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3057665bbe..c3939b6bb4 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -184,10 +184,11 @@ doublethrottle_delay = 0;
 int64  latency_limit = 0;
 
 /*
- * tablespace selection
+ * tablespace and table access method selection
  */
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
+char  *table_access_method = NULL;
 
 /*
  * Number of "pgbench_accounts" partitions.  0 is the default and means no
@@ -641,6 +642,9 @@ usage(void)
   "  --partition-method=(range|hash)\n"
   "   partition pgbench_accounts with 
this method (default: range)\n"
   "  --partitions=NUM partition pgbench_accounts into 
NUM parts (default: 0)\n"
+  "  --table-access-method=TABLEACCESSMETHOD\n"
+  "   create tables with using 
specified table access method, \n"
+  "   rather than the default.\n"
   "  --tablespace=TABLESPACE  create tables in the specified 
tablespace\n"
   "  --unlogged-tablescreate tables as unlogged 
tables\n"
   "\nOptions t

Re: Add table access method as an option to pgbench

2020-11-25 Thread David Zhang

Thank a lot for your comments, Michael.

On 2020-11-24 7:41 p.m., Michael Paquier wrote:

On Tue, Nov 24, 2020 at 03:32:38PM -0800, David Zhang wrote:

But, providing another option for the end user may not be a bad idea, and it
might make the tests easier at some points.

My first thought is that we have no need to complicate pgbench with
this option because there is a GUC able to do that, but we do that for
tablespaces, so...  No objections from here.


The attached file is quick patch for this.

Thoughts?

This patch does not apply on HEAD, where you can just use
appendPQExpBuffer() to append the new clause to the CREATE TABLE
query.  This needs proper documentation.
The previous patch was based on branch "REL_13_STABLE". Now, the 
attached new patch v2 is based on master branch. I followed the new code 
structure using appendPQExpBuffer to append the new clause "using 
TABLEAM" in a proper position and tested. In the meantime, I also 
updated the pgbench.sqml file to reflect the changes.

--
Michael

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
>From 1e908ecdd6add81c96dca489f45f5ae9d0c2a5f1 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Wed, 25 Nov 2020 10:41:20 -0800
Subject: [PATCH] add table access method option to pgbench

---
 doc/src/sgml/ref/pgbench.sgml | 11 +++
 src/bin/pgbench/pgbench.c | 20 
 2 files changed, 31 insertions(+)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 7180fedd65..a8f8d24fe1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -378,6 +378,17 @@ pgbench  options 
 d
   
  
 
+ 
+  
--table-am=TABLEAM
+  
+   
+Create all tables with specified table access method 
+TABLEAM.
+If unspecified, default is heap.
+   
+  
+ 
+
 

 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 3057665bbe..62b1dd3e23 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -189,6 +189,11 @@ int64  latency_limit = 0;
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
 
+/*
+ * table access method selection
+ */
+char  *tableam = NULL;
+
 /*
  * Number of "pgbench_accounts" partitions.  0 is the default and means no
  * partitioning.
@@ -643,6 +648,7 @@ usage(void)
   "  --partitions=NUM partition pgbench_accounts into 
NUM parts (default: 0)\n"
   "  --tablespace=TABLESPACE  create tables in the specified 
tablespace\n"
   "  --unlogged-tablescreate tables as unlogged 
tables\n"
+  "  --table-am=TABLEAM   create tables with specified 
table access method (default: heap)\n"
   "\nOptions to select what to run:\n"
   "  -b, --builtin=NAME[@W]   add builtin script NAME weighted 
at W (default: 1)\n"
   "   (use \"-b list\" to list 
available scripts)\n"
@@ -3759,6 +3765,15 @@ initCreateTables(PGconn *con)
  ddl->table,
  (scale >= 
SCALE_32BIT_THRESHOLD) ? ddl->bigcols : ddl->smcols);
 
+   if (tableam != NULL)
+   {
+   char   *escape_tableam;
+
+   escape_tableam = PQescapeIdentifier(con, tableam, 
strlen(tableam));
+   appendPQExpBuffer(, " using %s", escape_tableam);
+   PQfreemem(escape_tableam);
+   }
+
/* Partition pgbench_accounts table */
if (partition_method != PART_NONE && strcmp(ddl->table, 
"pgbench_accounts") == 0)
appendPQExpBuffer(,
@@ -5419,6 +5434,7 @@ main(int argc, char **argv)
{"show-script", required_argument, NULL, 10},
{"partitions", required_argument, NULL, 11},
{"partition-method", required_argument, NULL, 12},
+   {"table-am", required_argument, NULL, 13},
{NULL, 0, NULL, 0}
};
 
@@ -5792,6 +5808,10 @@ main(int argc, char **argv)
exit(1);
}
break;
+   case 13:/* table access method*/
+   initialization_option_set = true;
+   tableam = pg_strdup(optarg);
+   break;
default:
fprintf(stderr, _("Try \"%s --help\" for more 
information.\n"), progname);
exit(1);
-- 
2.17.1



Add table access method as an option to pgbench

2020-11-24 Thread David Zhang

Hi Hackers,

I noticed that there is a table access method API added starting from 
PG12. In other words, Postgresql open the door for developers to add 
their own access methods, for example, zheap, zedstore etc. However, the 
current pgbench doesn't have an option to allow user to specify which 
table access method to use during the initialization phase. If we can 
have an option to help user address this issue, it would be great. For 
example, if user want to do a performance benchmark to compare "zheap" 
with "heap", then the commands can be something like below:


pgbench -d -i postgres --table-am=heap

pgbench -d -i postgres --table-am=zheap

I know there is a parameter like below available in postgresql.conf:

#default_table_access_method = 'heap'

But, providing another option for the end user may not be a bad idea, 
and it might make the tests easier at some points.


The attached file is quick patch for this.

Thoughts?


Thank you,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
From 798f970e22034d33146265ed98922c605d0dc237 Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Tue, 24 Nov 2020 15:14:42 -0800
Subject: [PATCH] add table access method option to pgbench

---
 src/bin/pgbench/pgbench.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 08a5947a9e..24bc6bdbe3 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -188,6 +188,11 @@ int64  latency_limit = 0;
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
 
+/*
+ * table access method selection
+ */
+char  *tableam = NULL;
+
 /*
  * Number of "pgbench_accounts" partitions.  0 is the default and means no
  * partitioning.
@@ -643,6 +648,7 @@ usage(void)
   "  --partitions=NUM partition pgbench_accounts into 
NUM parts (default: 0)\n"
   "  --tablespace=TABLESPACE  create tables in the specified 
tablespace\n"
   "  --unlogged-tablescreate tables as unlogged 
tables\n"
+  "  --table-am=TABLEAM   create tables using specified 
access method\n"
   "\nOptions to select what to run:\n"
   "  -b, --builtin=NAME[@W]   add builtin script NAME weighted 
at W (default: 1)\n"
   "   (use \"-b list\" to list 
available scripts)\n"
@@ -3750,12 +3756,14 @@ initCreateTables(PGconn *con)
for (i = 0; i < lengthof(DDLs); i++)
{
charopts[256];
+   charopam[256];
charbuffer[256];
const struct ddlinfo *ddl = [i];
const char *cols;
 
/* Construct new create table statement. */
opts[0] = '\0';
+   opam[0] = '\0';
 
/* Partition pgbench_accounts table */
if (partition_method != PART_NONE && strcmp(ddl->table, 
"pgbench_accounts") == 0)
@@ -3776,11 +3784,22 @@ initCreateTables(PGconn *con)
PQfreemem(escape_tablespace);
}
 
+   if (tableam != NULL)
+   {
+   char   *escape_tableam;
+
+   escape_tableam = PQescapeIdentifier(con, tableam,
+   
   strlen(tableam));
+   snprintf(opam + strlen(opam), sizeof(opam) - 
strlen(opam),
+" using %s", escape_tableam);
+   PQfreemem(escape_tableam);
+   }
+
cols = (scale >= SCALE_32BIT_THRESHOLD) ? ddl->bigcols : 
ddl->smcols;
 
-   snprintf(buffer, sizeof(buffer), "create%s table %s(%s)%s",
+   snprintf(buffer, sizeof(buffer), "create%s table %s(%s)%s%s",
 unlogged_tables ? " unlogged" : "",
-ddl->table, cols, opts);
+ddl->table, cols, opam, opts);
 
executeStatement(con, buffer);
}
@@ -5422,6 +5441,7 @@ main(int argc, char **argv)
{"show-script", required_argument, NULL, 10},
{"partitions", required_argument, NULL, 11},
{"partition-method", required_argument, NULL, 12},
+   {"table-am", required_argument, NULL, 13},
{NULL, 0, NULL, 0}
};
 
@@ -5795,6 +5815,10 @@ main(int argc, char **argv)
exit(1);
}
bre

Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted

2020-11-24 Thread David Zhang
I verified the patch 
"v2-0001-Free-disk-space-for-dropped-relations-on-commit.patch" on master 
branch "0cc9932740f2bf572303b68438e4caf62de9". It works for me. Below is my 
test procedure and results.

=== Before the patch ===
#1 from psql console 1, create table and index then insert enough data
postgres=# CREATE TABLE test_tbl ( a int, b text);
postgres=# CREATE INDEX idx_test_tbl on test_tbl (a);
postgres=# INSERT INTO test_tbl SELECT generate_series(1,8000),'Hello 
world!';
postgres=# INSERT INTO test_tbl SELECT generate_series(1,8000),'Hello 
world!';

#2 check files size 
david:12867$ du -h
12G .

#3 from psql console 2, drop the index
postgres=# drop index idx_test_tbl;

#4 check files size in different ways,
david:12867$ du -h
7.8G.
david:12867$ ls -l
...
-rw--- 1 david david  0 Nov 23 20:07 16402
...

$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  25736  david   45u  REG  259,2
  0   12592758 /home/david/sandbox/postgres/pgdata/base/12867/16402 (deleted)
postgres  25736  david   49u  REG  259,2 
1073741824   12592798 /home/david/sandbox/postgres/pgdata/base/12867/16402.1 
(deleted)
postgres  25736  david   53u  REG  259,2 
1073741824   12592739 /home/david/sandbox/postgres/pgdata/base/12867/16402.2 
(deleted)
postgres  25736  david   59u  REG  259,2  
372604928   12592800 /home/david/sandbox/postgres/pgdata/base/12867/16402.3 
(deleted)
...

The index relnode id "16402" displays size "0" from postgres database folder, 
but when using lsof to check, all 16402.x are still in used by a psql 
connection except 16402 is set to 0. Check it again after an hour, lsof shows 
the same results.

=== After the patch ===
Repeat step 1 ~ 4, lsof shows all the index relnode files (in this case, the 
index relnode id 16389) are removed within about 1 minute.
$ lsof -nP | grep '(deleted)' |grep pgdata
...
postgres  32707  david   66u  REG  259,2
  0   12592763 /home/david/sandbox/postgres/pgdata/base/12867/16389.1 (deleted)
postgres  32707  david   70u  REG  259,2
  0   12592823 /home/david/sandbox/postgres/pgdata/base/12867/16389.2 (deleted)
postgres  32707  david   74u  REG  259,2
  0   12592805 /home/david/sandbox/postgres/pgdata/base/12867/16389.3 (deleted)
...

One thing interesting for me is that, if the index is created after data 
records has been inserted, then lsof doesn't show this issue.

a potential size overflow issue

2020-09-25 Thread David Zhang

Hi hackers,

"InitBufTable" is the function used to initialize the buffer lookup 
table for buffer manager. With the memory size increasing nowadays, 
there is a potential overflow issue for the parameter "int size" used by 
"InitBufTable". This function is invoked in freelist.c as below:

    InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS);

The number of buffer block “NBuffers” is also defined as "int", and 
"NUM_BUFFER_PARTITIONS" has a default value 128. In theory, it may get 
the chance to overflow the "size" parameter in "InitBufTable". The 
"size" parameter is later used by "ShmemInitHash" as "init_size" and 
"max_size", which are all defined as "long".


    SharedBufHash = ShmemInitHash("Shared Buffer Lookup Table",
                                  size, size,
                                  ,
                                  HASH_ELEM | HASH_BLOBS | HASH_PARTITION);

Therefore, it would be better to change "InitBufTable(int size)" to 
"InitBufTable(long size)".


A simple patch is attached and it passed the “make installcheck-world” test.

--

David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
From 7df445121d3cf2aa7c0c22c831a8a920bc818d6e Mon Sep 17 00:00:00 2001
From: David Zhang 
Date: Fri, 25 Sep 2020 15:39:24 -0700
Subject: [PATCH] fix a potential overflow issue for InitBufTable

---
 src/backend/storage/buffer/buf_table.c | 2 +-
 src/include/storage/buf_internals.h| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/buffer/buf_table.c 
b/src/backend/storage/buffer/buf_table.c
index 4953ae9f82..d4f58c1ce0 100644
--- a/src/backend/storage/buffer/buf_table.c
+++ b/src/backend/storage/buffer/buf_table.c
@@ -49,7 +49,7 @@ BufTableShmemSize(int size)
  * size is the desired hash table size (possibly more than 
NBuffers)
  */
 void
-InitBufTable(int size)
+InitBufTable(long size)
 {
HASHCTL info;
 
diff --git a/src/include/storage/buf_internals.h 
b/src/include/storage/buf_internals.h
index 3377fa5676..25e1db6854 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -320,7 +320,7 @@ extern bool have_free_buffer(void);
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);
-extern void InitBufTable(int size);
+extern void InitBufTable(long size);
 extern uint32 BufTableHashCode(BufferTag *tagPtr);
 extern int BufTableLookup(BufferTag *tagPtr, uint32 hashcode);
 extern int BufTableInsert(BufferTag *tagPtr, uint32 hashcode, int buf_id);
-- 
2.21.1 (Apple Git-122.3)



Re: history file on replica and double switchover

2020-09-25 Thread David Zhang
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

"make installcheck-world" test was performed on branch "REL_13_STABLE" with tag 
"REL_13_0". The regression failed was not caused by the "history file" patch, 
since it has the same error on my environment even without any patch. So the 
failure is not related, in other words, the patch "history_replica_v4.patch" is 
good for me. 

Below is the failed message when tested with and without 
"history_replica_v4.patch".
t/004_timeline_switch.pl . ok   
t/005_replay_delay.pl  ok   
t/006_logical_decoding.pl  # Looks like your test exited with 29 
before it could output anything.
t/006_logical_decoding.pl  Dubious, test returned 29 (wstat 7424, 
0x1d00)
Failed 14/14 subtests 
t/007_sync_rep.pl  ok 
t/008_fsm_truncation.pl .. ok   
t/009_twophase.pl  ok 
t/010_logical_decoding_timelines.pl .. # Looks like your test exited with 29 
before it could output anything.
t/010_logical_decoding_timelines.pl .. Dubious, test returned 29 (wstat 7424, 
0x1d00)
Failed 13/13 subtests 
t/011_crash_recovery.pl .. ok   
t/012_subtransactions.pl . ok 
t/013_crash_restart.pl ... ok 
t/014_unlogged_reinit.pl . ok 
t/015_promotion_pages.pl . ok   
t/016_min_consistency.pl . ok   
t/017_shm.pl . ok   
t/018_wal_optimize.pl  ok 
t/019_replslot_limit.pl .. ok 
t/020_archive_status.pl .. ok 

Test Summary Report
---
t/006_logical_decoding.pl  (Wstat: 7424 Tests: 0 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 14 tests but ran 0.
t/010_logical_decoding_timelines.pl (Wstat: 7424 Tests: 0 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 13 tests but ran 0.
Files=20, Tests=202, 103 wallclock secs ( 0.18 usr  0.04 sys + 21.20 cusr 23.52 
csys = 44.94 CPU)
Result: FAIL
make[2]: *** [installcheck] Error 1
make[2]: Leaving directory `/home/david/git/postgres/src/test/recovery'
make[1]: *** [installcheck-recovery-recurse] Error 2
make[1]: Leaving directory `/home/david/git/postgres/src/test'
make: *** [installcheck-world-src/test-recurse] Error 2

Re: history file on replica and double switchover

2020-09-24 Thread David Zhang

On 2020-09-24 5:00 p.m., Fujii Masao wrote:



On 2020/09/25 8:15, David Zhang wrote:

Hi,

My understanding is that the "archiver" won't even start if 
"archive_mode = on" has been set on a "replica". Therefore, either 
(XLogArchiveMode == ARCHIVE_MODE_ALWAYS) or (XLogArchiveMode != 
ARCHIVE_MODE_OFF) will produce the same result.


Yes, the archiver isn't invoked in the standby when archive_mode=on.
But, with the original patch, walreceiver creates .ready archive 
status file

even when archive_mode=on and no archiver is running. This causes
the history file to be archived after the standby is promoted and
the archiver is invoked.

With my patch, walreceiver creates .ready archive status for the 
history file

only when archive_mode=always, like it does for the streamed WAL files.
This is the difference between those two patches. To prevent the streamed
timeline history file from being archived, IMO we should use the 
condition

archive_mode==always in the walreceiver.

+1





Please see how the "archiver" is started in 
src/backend/postmaster/postmaster.c


5227 /*
5228  * Start the archiver if we're responsible for 
(re-)archiving received

5229  * files.
5230  */
5231 Assert(PgArchPID == 0);
5232 if (XLogArchivingAlways())
5233 PgArchPID = pgarch_start();

I did run the nice script "double_switchover.sh" using either 
"always" or "on" on patch v1 and v2. They all generate the same 
results below. In other words, whether history file 
(0003.history) is archived or not depends on "archive_mode" 
settings.


echo "archive_mode = always" >> ${PGDATA_NODE2}/postgresql.auto.conf

$ ls -l archive
-rw--- 1 david david 16777216 Sep 24 14:40 00010002
... ...
-rw--- 1 david david 16777216 Sep 24 14:40 0002000A
-rw--- 1 david david   41 Sep 24 14:40 0002.history
-rw--- 1 david david   83 Sep 24 14:40 0003.history

echo "archive_mode = on" >> ${PGDATA_NODE2}/postgresql.auto.conf

$ ls -l archive
-rw--- 1 david david 16777216 Sep 24 14:47 00010002
... ...
-rw--- 1 david david 16777216 Sep 24 14:47 0002000A
-rw--- 1 david david   41 Sep 24 14:47 0002.history


Personally, I prefer patch v2 since it allies to the statement here, 
https://www.postgresql.org/docs/13/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY


"If archive_mode is set to on, the archiver is not enabled during 
recovery or standby mode. If the standby server is promoted, it will 
start archiving after the promotion, but will not archive any WAL it 
did not generate itself."


By the way, I think the last part of the sentence should be changed 
to something like below:


"but will not archive any WAL, which was not generated by itself."


I'm not sure if this change is an improvement or not. But if we apply
the patch I proposed, maybe we should mention also history file here.
For example, "but will not archive any WAL or timeline history files
that it did not generate itself".

make sense for me


Regards,


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: history file on replica and double switchover

2020-09-24 Thread David Zhang

Hi,

My understanding is that the "archiver" won't even start if 
"archive_mode = on" has been set on a "replica". Therefore, either 
(XLogArchiveMode == ARCHIVE_MODE_ALWAYS) or (XLogArchiveMode != 
ARCHIVE_MODE_OFF) will produce the same result.


Please see how the "archiver" is started in 
src/backend/postmaster/postmaster.c


5227 /*
5228  * Start the archiver if we're responsible for 
(re-)archiving received

5229  * files.
5230  */
5231 Assert(PgArchPID == 0);
5232 if (XLogArchivingAlways())
5233 PgArchPID = pgarch_start();

I did run the nice script "double_switchover.sh" using either "always" 
or "on" on patch v1 and v2. They all generate the same results below. In 
other words, whether history file (0003.history) is archived or not 
depends on "archive_mode" settings.


echo "archive_mode = always" >> ${PGDATA_NODE2}/postgresql.auto.conf

$ ls -l archive
-rw--- 1 david david 16777216 Sep 24 14:40 00010002
... ...
-rw--- 1 david david 16777216 Sep 24 14:40 0002000A
-rw--- 1 david david   41 Sep 24 14:40 0002.history
-rw--- 1 david david   83 Sep 24 14:40 0003.history

echo "archive_mode = on" >> ${PGDATA_NODE2}/postgresql.auto.conf

$ ls -l archive
-rw--- 1 david david 16777216 Sep 24 14:47 00010002
... ...
-rw--- 1 david david 16777216 Sep 24 14:47 0002000A
-rw--- 1 david david   41 Sep 24 14:47 0002.history


Personally, I prefer patch v2 since it allies to the statement here, 
https://www.postgresql.org/docs/13/warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY


"If archive_mode is set to on, the archiver is not enabled during 
recovery or standby mode. If the standby server is promoted, it will 
start archiving after the promotion, but will not archive any WAL it did 
not generate itself."


By the way, I think the last part of the sentence should be changed to 
something like below:


"but will not archive any WAL, which was not generated by itself."


Best regards,

David

On 2020-09-17 10:18 a.m., Fujii Masao wrote:



On 2020/09/04 13:53, Fujii Masao wrote:



On 2020/09/04 8:29, Anastasia Lubennikova wrote:

On 27.08.2020 16:02, Grigory Smolkin wrote:

Hello!

I`ve noticed, that when running switchover replica to master and 
back to replica, new history file is streamed to replica, but not 
archived,
which is not great, because it breaks PITR if archiving is running 
on replica. The fix looks trivial.

Bash script to reproduce the problem and patch are attached.


Thanks for the report. I agree that it looks like a bug.


+1

+    /* Mark history file as ready for archiving */
+    if (XLogArchiveMode != ARCHIVE_MODE_OFF)
+    XLogArchiveNotify(fname);

I agree that history file should be archived in the standby when
archive_mode=always. But why do we need to do when archive_mode=on?
I'm just concerned about the case where the primary and standby
have the shared archive area, and archive_mode is on.


So I updated the patch so that walreceiver marks the streamed history 
file

as ready for archiving only when archive_mode=always. Patch attached.
Thought?

Regards,


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: PATCH: Attempt to make dbsize a bit more consistent

2020-09-09 Thread David Zhang



On 2020-09-09 12:41 a.m., gkokola...@pm.me wrote:

‐‐‐ Original Message ‐‐‐
On Tuesday, 8 September 2020 22:26, David Zhang  wrote:



I found the function "table_relation_size" is only used by buffer
manager for "RELKIND_RELATION", "RELKIND_TOASTVALUE" and
"RELKIND_MATVIEW", i.e.

         case RELKIND_RELATION:
         case RELKIND_TOASTVALUE:
         case RELKIND_MATVIEW:
             {
                 /*
                  * Not every table AM uses BLCKSZ wide fixed size blocks.
                  * Therefore tableam returns the size in bytes - but
for the
                  * purpose of this routine, we want the number of blocks.
                  * Therefore divide, rounding up.
                  */
                 uint64        szbytes;

                 szbytes = table_relation_size(relation, forkNum);

                 return (szbytes + (BLCKSZ - 1)) / BLCKSZ;
             }

So using "calculate_relation_size" and "calculate_toast_table_size" in
"calculate_table_size" is easy to understand and the original logic is
simple.


You are correct. This is the logic that is attempted to be applied
in dbsize.c in this patch.

So what do you think of the patch?


I would suggest to keep the original logic unless there is a bug.

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: PATCH: Attempt to make dbsize a bit more consistent

2020-09-08 Thread David Zhang



I found the function "table_relation_size" is only used by buffer 
manager for "RELKIND_RELATION", "RELKIND_TOASTVALUE" and 
"RELKIND_MATVIEW", i.e.


        case RELKIND_RELATION:
        case RELKIND_TOASTVALUE:
        case RELKIND_MATVIEW:
            {
                /*
                 * Not every table AM uses BLCKSZ wide fixed size blocks.
                 * Therefore tableam returns the size in bytes - but 
for the

                 * purpose of this routine, we want the number of blocks.
                 * Therefore divide, rounding up.
                 */
                uint64        szbytes;

                szbytes = table_relation_size(relation, forkNum);

                return (szbytes + (BLCKSZ - 1)) / BLCKSZ;
            }

So using "calculate_relation_size" and "calculate_toast_table_size" in 
"calculate_table_size" is easy to understand and the original logic is 
simple.



On 2020-08-27 6:38 a.m., gkokola...@pm.me wrote:

Hi all,

this minor patch is attempting to force the use of the tableam api in dbsize 
where ever it is required.

Apparently something similar was introduced for toast relations only. 
Intuitively it seems that the distinction between a table and a toast table is 
not needed. This patch treats tables, toast tables and materialized views 
equally.

Regards,
//Georgios

Best regards,
--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: Add LWLock blocker(s) information

2020-08-07 Thread David Zhang
Hi, 
This is a very interesting topic. I did apply the 2nd patch to master branch 
and performed a quick test. I can observe below information,
postgres=# select * from pg_lwlock_blocking_pid(26925);
 requested_mode | last_holder_pid | last_holder_mode | nb_holders 
+-+--+
 LW_EXCLUSIVE   |   26844 | LW_EXCLUSIVE |  1
(1 row)

postgres=# select 
query,pid,state,wait_event,wait_event_type,pg_lwlock_blocking_pid(pid),pg_blocking_pids(pid)
 from pg_stat_activity where state='active' and pid != pg_backend_pid();
query |  pid  | state  
| wait_event | wait_event_type |   pg_lwlock_blocking_pid| 
pg_blocking_pids 
--+---+++-+-+--
 INSERT INTO orders SELECT FROM generate_series(1, 1000); | 26925 | active 
| WALWrite   | LWLock  | (LW_EXCLUSIVE,26844,LW_EXCLUSIVE,1) | {}
(1 row)

At some points, I have to keep repeating the query in order to capture the 
"lock info". I think this is probably part of the design, but I was wondering,
if a query is in deadlock expecting a developer to take a look using the 
methods above, will the process be killed before a developer get the chance to 
execute the one of the query?
if some statistics information can be added, it may help the developers to get 
an overall idea about the lock status, and if the developers can specify some 
filters, such as, the number of times a query entered into a deadlock, the 
queries hold the lock more than number of ms, etc, it might help to 
troubleshooting the "lock" issue even better. And moreover, if this feature can 
be an independent extension, similar to "pg_buffercache" it will be great.
Best regards,

David

Add an option to allow run regression test for individual module on Windows build

2020-06-16 Thread David Zhang

Hi Hackers,

When I was working on an extension on Windows platform, I used the 
command 'vcregress contribcheck' to run the regression test for my 
module. However, this command will run the regression test for all the 
modules, I couldn't find a way to regress test my module only. I think 
it would be better to have such an option to make the development be 
more efficient (Maybe there is a solution already, but I can't find it).


The attached patch allow user to run the regression test by using either 
'vcregress contribcheck' or 'vcregress contribcheck postgres_fdw' if you 
want to test your extension directly, for example, 'postgres_fdw'.


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index dac60f6264..9db740a847 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -448,23 +448,40 @@ sub subdircheck
return;
 }
 
+sub contribmodulecheck
+{
+   my $module = shift;
+   my $mstat = shift;
+
+   # these configuration-based exclusions must match Install.pm
+   next if ($module eq "uuid-ossp"  && !defined($config->{uuid}));
+   next if ($module eq "sslinfo"&& !defined($config->{openssl}));
+   next if ($module eq "xml2"   && !defined($config->{xml}));
+   next if ($module =~ /_plperl$/   && !defined($config->{perl}));
+   next if ($module =~ /_plpython$/ && !defined($config->{python}));
+   next if ($module eq "sepgsql");
+
+   subdircheck($module);
+   my $status = $? >> 8;
+   $$mstat ||= $status;
+}
+
 sub contribcheck
 {
chdir "../../../contrib";
my $mstat = 0;
-   foreach my $module (glob("*"))
-   {
-   # these configuration-based exclusions must match Install.pm
-   next if ($module eq "uuid-ossp"  && !defined($config->{uuid}));
-   next if ($module eq "sslinfo"&& 
!defined($config->{openssl}));
-   next if ($module eq "xml2"   && !defined($config->{xml}));
-   next if ($module =~ /_plperl$/   && !defined($config->{perl}));
-   next if ($module =~ /_plpython$/ && 
!defined($config->{python}));
-   next if ($module eq "sepgsql");
 
-   subdircheck($module);
-   my $status = $? >> 8;
-   $mstat ||= $status;
+   my $module = shift;
+   if ($module ne "" )
+   {
+   contribmodulecheck($module, \$mstat);
+   }
+   else
+   {
+   foreach my $module (glob("*"))
+   {
+   contribmodulecheck($module, \$mstat);
+   }
}
exit $mstat if $mstat;
return;
@@ -760,6 +777,8 @@ sub usage
  "  serial serial mode\n",
  "  parallel   parallel mode\n",
  "\nOption for : for taptest\n",
- "  TEST_DIR   (required) directory where tests reside\n";
+ "  TEST_DIR   (required) directory where tests reside\n",
+ "\nOption for : for contribcheck\n",
+ "  module   (optional) module name to be tested only\n";
exit(1);
 }


Re: Postgres Windows build system doesn't work with python installed in Program Files

2020-06-01 Thread David Zhang

Hi Michael,

I performed a quick test for the path "msvc-build-init-v2.patch" using 
below cases:


1. perl build.pl
2. perl build.pl debug psql
3. perl build.pl RELEASE psql
4. perl build.pl deBUG psql
5. perl build.pl psql
The above cases (case-insensitive) are all working great without any 
warning for latest master branch.


When build with more than 3 parameters, yes, I got below expected 
message as well.


c:\Users\david\Downloads\postgres\src\tools\msvc>perl build.pl DEbug 
psql pg_baseback

Usage: build.pl [ [  ]  ]
Options are case-insensitive.
configuration: Release | Debug. This sets the configuration
to build. Default is Release.
component: name of component to build. An empty value means
to build all components.


However, if I ask for help in a typical Windows' way, i.e, "perl 
build.pl -help" or "perl build.pl /?", I got some messages like below,


c:\Users\david\Downloads\postgres\src\tools\msvc>perl build.pl -help
Detected hardware platform: Win32
Files src/bin/pgbench/exprscan.l
Files src/bin/pgbench/exprparse.y
Files src/bin/psql/psqlscanslash.l
Files contrib/cube/cubescan.l
Files contrib/cube/cubeparse.y
Files contrib/seg/segscan.l
Files contrib/seg/segparse.y
Generating configuration headers...
Microsoft (R) Build Engine version 16.5.1+4616136f8 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

MSBUILD : error MSB1001: Unknown switch.
Switch: -help.vcxproj

For switch syntax, type "MSBuild -help"

c:\Users\david\Downloads\postgres\src\tools\msvc>perl build.pl /?
Detected hardware platform: Win32
Files src/bin/pgbench/exprscan.l
Files src/bin/pgbench/exprparse.y
Files src/bin/psql/psqlscanslash.l
Files contrib/cube/cubescan.l
Files contrib/cube/cubeparse.y
Files contrib/seg/segscan.l
Files contrib/seg/segparse.y
Generating configuration headers...
Microsoft (R) Build Engine version 16.5.1+4616136f8 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

MSBUILD : error MSB1001: Unknown switch.
Switch: /?.vcxproj

For switch syntax, type "MSBuild -help"

It would be a bonus if the build.pl can support the "help" in Windows' way.


Thanks,

David

On 2020-05-06 10:45 p.m., Michael Paquier wrote:

On Wed, May 06, 2020 at 12:17:03AM +0200, Juan José Santamaría Flecha wrote:

Please forgive me if I am being too nitpicky, but I find the comments a
little too verbose, a usage format might be more visual and easier to
explain:

Usage: build [[CONFIGURATION] COMPONENT]

The options are  case-insensitive.
CONFIGURATION sets the configuration to build, "debug" or "release" (by
default).
COMPONENT defines a component to build. An empty option means all
components.

Your comment makes sense to me.  What about the attached then?  On top
of documenting the script usage in the code, let's trigger it if it
gets called with more than 3 arguments.  What do you think?

FWIW, I forgot to mention that I don't think those warnings are worth
a backpatch.  No objections with improving things on HEAD of course.
--
Michael

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Can the OUT parameter be enabled in stored procedure?

2020-04-29 Thread David Zhang

Hi hackers,

I found two email threads below,

https://www.postgresql.org/message-id/b0d099ca-f9c3-00ed-0c95-4d7a9f7c97fc%402ndquadrant.com

https://www.postgresql.org/message-id/CA%2B4BxBwBHmDkSpgvnfG_Ps1SEeYhDRuLpr1AvdbUwFh-otTg8A%40mail.gmail.com

and I understood "OUT parameters in procedures are not implemented yet, 
but would like to have
in the future" at that moment. However, I ran a quick test by simply 
commented out a few lines below in src/backend/commands/functioncmds.c


+// if (objtype == OBJECT_PROCEDURE)
+// {
+// if (fp->mode == FUNC_PARAM_OUT)
+// ereport(ERROR,
+// (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+// errmsg("procedures cannot have OUT arguments"),
+//  errhint("INOUT 
arguments are permitted.")));

+// }

then I can use OUT as the parameter to create a PROCEDURE, and I can do 
something like,


postgres=# create procedure p2(IN a int, OUT b int) as 'select 9*$1' 
language sql;

CREATE PROCEDURE
postgres=# CALL p2(1);
 b
---
 9
(1 row)

By enabling the OUT parameter, I can see some difference, for example,
1. user doesn't have to provide one (or many) dummy "INOUT" parameter in 
order to get the output
2. it has similar behavior compare with FUNCTION when using IN, INOUT, 
and OUT parameters


So, the questions are,

1. what are the limitation or concern that block the support of the OUT 
parameter at this moment?


2. if the OUT parameter is enabled like above, what will be the impact?

3. Is there any other components that should be considered following the 
above change?



Thanks,
--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: WIP/PoC for parallel backup

2020-04-27 Thread David Zhang

Hi,

Here is the parallel backup performance test results with and without 
the patch "parallel_backup_v15" on AWS cloud environment. Two 
"t2.xlarge" machines were used: one for Postgres server and the other 
one for pg_basebackup with the same machine configuration showing below.


Machine configuration:
    Instance Type    :t2.xlarge
    Volume type  :io1
    Memory (MiB) :16GB
    vCPU #   :4
    Architecture :x86_64
    IOP  :6000
    Database Size (GB)   :108

Performance test results:
without patch:
    real 18m49.346s
    user 1m24.178s
    sys 7m2.966s

1 worker with patch:
    real 18m43.201s
    user 1m55.787s
    sys 7m24.724s

2 worker with patch:
    real 18m47.373s
    user 2m22.970s
    sys 11m23.891s

4 worker with patch:
    real 18m46.878s
    user 2m26.791s
    sys 13m14.716s

As required, I didn't have the pgbench running in parallel like we did 
in the previous benchmark.


The perf report files for both Postgres server and pg_basebackup sides 
are attached.


The files are listed like below. i.e. without patch 1 worker, and with 
patch 1, 2, 4 workers.


perf report on Postgres server side:
    perf.data-postgres-without-parallel_backup_v15.txt
    perf.data-postgres-with-parallel_backup_v15-j1.txt
    perf.data-postgres-with-parallel_backup_v15-j2.txt
    perf.data-postgres-with-parallel_backup_v15-j4.txt

perf report on pg_basebackup side:
    perf.data-pg_basebackup-without-parallel_backup_v15.txt
    perf.data-pg_basebackup-with-parallel_backup_v15-j1.txt
    perf.data-pg_basebackup-with-parallel_backup_v15-j2.txt
    perf.data-pg_basebackup-with-parallel_backup_v15-j4.txt


If any more information required please let me know.


On 2020-04-21 7:12 a.m., Amit Kapila wrote:

On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi  wrote:

On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila  wrote:

On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila  wrote:

On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman  wrote:

I did some tests a while back, and here are the results. The tests were done to 
simulate
a live database environment using pgbench.

machine configuration used for this test:
Instance Type:t2.xlarge
Volume Type  :io1
Memory (MiB) :16384
vCPU #   :4
Architecture:X86_64
IOP :16000
Database Size (GB) :102

The setup consist of 3 machines.
- one for database instances
- one for pg_basebackup client and
- one for pgbench with some parallel workers, simulating SELECT loads.

basebackup | 4 workers | 8 Workers  | 16 
workers
Backup Duration(Min):   69.25|  20.44  | 19.86  | 20.15
(pgbench running with 50 parallel client simulating SELECT load)

Backup Duration(Min):   154.75   |  49.28 | 45.27 | 20.35
(pgbench running with 100 parallel client simulating SELECT load)


Thanks for sharing the results, these show nice speedup!  However, I
think we should try to find what exactly causes this speed up.  If you
see the recent discussion on another thread related to this topic,
Andres, pointed out that he doesn't think that we can gain much by
having multiple connections[1].  It might be due to some internal
limitations (like small buffers) [2] due to which we are seeing these
speedups.  It might help if you can share the perf reports of the
server-side and pg_basebackup side.


Just to be clear, we need perf reports both with and without patch-set.


These tests were done a while back, I think it would be good to run the 
benchmark again with the latest patches of parallel backup and share the 
results and perf reports.


Sounds good. I think we should also try to run the test with 1 worker
as well.  The reason it will be good to see the results with 1 worker
is that we can know if the technique to send file by file as is done
in this patch is better or worse than the current HEAD code.  So, it
will be good to see the results of an unpatched code, 1 worker, 2
workers, 4 workers, etc.


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
<>


Re: ERROR: invalid input syntax for type circle

2020-04-06 Thread David Zhang

Hi Tom,

Thanks for the review.

Generated a new patch v2 (attached) following your suggestion and 
performed the same test again. The test results looks good including the 
"make check".


On 2020-04-06 3:16 p.m., Tom Lane wrote:

David Zhang  writes:

I got an error when I was trying to insert a circle using the syntax
(the 3rd one) specified in the latest document.

Hm.  Presumably, that has never worked, and we've had no complaints
to date.  I'm halfway inclined to treat it as a documentation bug
and remove the claim that it works.


The patch based on tag "REL_12_2" is attached.

This patch looks extremely dangerous to me, because it'll allow "s"
to get incremented past the ending nul character ... and then the
code will proceed to keep scanning, which at best is useless and
at worst will end in a core dump.

What actually looks wrong to me in this code is the initial bit

 if ((*s == LDELIM_C) || (*s == LDELIM))
 {
 depth++;
 cp = (s + 1);
 while (isspace((unsigned char) *cp))
 cp++;
 if (*cp == LDELIM)
 s = cp;
 }

If the first test triggers but it doesn't then find a following
paren, then it's incremented depth without moving s, which seems
certain to end badly.  Perhaps the correct fix is like

 if (*s == LDELIM_C)
 depth++, s++;
 else if (*s == LDELIM)
 {
 /* If there are two left parens, consume the first one */
 cp = (s + 1);
 while (isspace((unsigned char) *cp))
 cp++;
 if (*cp == LDELIM)
 depth++, s = cp;
 }

regards, tom lane

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 373784f..c69844e 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -4503,14 +4503,16 @@ circle_in(PG_FUNCTION_ARGS)
s = str;
while (isspace((unsigned char) *s))
s++;
-   if ((*s == LDELIM_C) || (*s == LDELIM))
+   if (*s == LDELIM_C)
+   depth++, s++;
+   else if (*s == LDELIM)
{
-   depth++;
+   /* If there are two left parens, consume the first one */
cp = (s + 1);
while (isspace((unsigned char) *cp))
cp++;
if (*cp == LDELIM)
-   s = cp;
+   depth++, s = cp;
}
 
pair_decode(s, >center.x, >center.y, , "circle", str);


ERROR: invalid input syntax for type circle

2020-04-06 Thread David Zhang

Hi,

I got an error when I was trying to insert a circle using the syntax 
(the 3rd one) specified in the latest document.


https://www.postgresql.org/docs/current/datatype-geometric.html#DATATYPE-CIRCLE
< ( x , y ) , r >
( ( x , y ) , r )
  ( x , y ) , r
    x , y   , r

Here is how to reproduce it.

CREATE TABLE tbl_circle(id serial PRIMARY KEY, a circle);
INSERT INTO tbl_circle(a) VALUES('( 1 , 1 ) , 5'::circle );

ERROR:  invalid input syntax for type circle: "( 1 , 1 ) , 5"
LINE 1: INSERT INTO tbl_circle(a) VALUES('( 1 , 1 ) , 5'::circle );

I made a little change in the "circle_in" function, and then I can enter 
a circle using the 3rd way.


INSERT INTO tbl_circle(a) VALUES('( 1 , 1 ) , 5'::circle );
INSERT 0 1

The fix does generate the same output as the other three ways.

INSERT INTO tbl_circle(a) VALUES( '< ( 1 , 1 ) , 5 >'::circle );
INSERT INTO tbl_circle(a) VALUES( '( ( 1 , 1 ) , 5 )'::circle );
INSERT INTO tbl_circle(a) VALUES( '1 , 1 , 5'::circle );

select * from tbl_circle;
 id | a
+---
  1 | <(1,1),5>
  2 | <(1,1),5>
  3 | <(1,1),5>
  4 | <(1,1),5>
(4 rows)

Sor far, no error found during the "make check".

The patch based on tag "REL_12_2" is attached.

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index 373784fcc1..6efd5bc1f2 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -4528,7 +4528,7 @@ circle_in(PG_FUNCTION_ARGS)
 
while (depth > 0)
{
-   if ((*s == RDELIM) || ((*s == RDELIM_C) && (depth == 1)))
+   if ((*s == RDELIM) || (((*s == RDELIM_C) || (*s == '\0')) && 
(depth == 1)))
{
depth--;
s++;


Re: Nicer error when connecting to standby with hot_standby=off

2020-04-02 Thread David Zhang
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

I applied the patch to the latest master branch and run a test below. The error 
messages have been separated. Below is the test steps.

### setup primary server
initdb -D /tmp/primary/data
mkdir /tmp/archive_dir
echo "archive_mode='on'" >> /tmp/primary/data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'" >> 
/tmp/primary/data/postgresql.conf
pg_ctl -D /tmp/primary/data -l /tmp/primary-logs start

### setup host standby server
pg_basebackup -p 5432 -w -R -D /tmp/hotstandby
echo "primary_conninfo='host=127.0.0.1 port=5432 user=pgdev'" >> 
/tmp/hotstandby/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'" >> 
/tmp/hotstandby/postgresql.conf
echo "hot_standby = off" >> /tmp/hotstandby/postgresql.conf
pg_ctl -D /tmp/hotstandby -l /tmp/hotstandby-logs -o "-p 5433" start

### keep trying to connect to hot standby server in order to get the error 
messages in different stages.
while true; do echo "`date`"; psql postgres -p 5433 -c "SELECT 
txid_current_snapshot();" sleep 0.2; done

### before the patch
psql: error: could not connect to server: FATAL:  the database system is 
starting up
...

### after the patch, got different messages, one message indicates hot_standby 
is off
psql: error: could not connect to server: FATAL:  the database system is 
starting up
...
psql: error: could not connect to server: FATAL:  the database system is up, 
but hot_standby is off
...

Re: Allow continuations in "pg_hba.conf" files

2020-04-02 Thread David Zhang

On 2020-04-01 10:25 p.m., Fabien COELHO wrote:



Hello,


FWIW, I don't think so. Generally a trailing backspace is an escape
character for the following newline.  And '\ ' is a escaped space,
which is usualy menas a space itself.

In this case escape character doesn't work generally and I think it 
is natural that a backslash in the middle of a line is a backslash 
character itself.


I concur: The backslash char is only a continuation as the very last 
character of the line, before cr/nl line ending markers.


+Agree. However, it would nice to update the sentence below if I 
understand it correctly.


"+   Comments, whitespace and continuations are handled in the same way 
as in" pg_hba.conf


For example, if a user provide a configuration like below (even such a 
comments is not recommended)


"host    all all 127.0.0.1/32    trust  #COMMENTS, it works"

i.e. the original pg_hba.conf allows to have comments in each line, but 
with new continuation introduced, the comments has to be put to the last 
line.




There are no assumption about backslash escaping, quotes and such, 
which seems reasonable given the lexing structure of the files, i.e. 
records of space-separated words, and # line comments.



--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: Allow continuations in "pg_hba.conf" files

2020-04-01 Thread David Zhang
Hi Fabien,
Should we consider the case "\ ", i.e. one or more spaces after the backslash?
For example, if I replace a user map 
"mymap   /^(.*)@mydomain\.com$  \1" with 
"mymap   /^(.*)@mydomain\.com$  \ "
"\1"
by adding one extra space after the backslash, then I got the pg_role="\\"
but I think what we expect is pg_role="\\1"

Re: BUG #16147: postgresql 12.1 (from homebrew) - pg_restore -h localhost --jobs=2 crashes

2020-03-05 Thread David Zhang

Hi,

I can reproduce this pg_restore crash issue (pg_dump crash too when 
running with multiple jobs) on MacOS 10.14 Mojave and MacOS 10.15 
Catalina using following steps.


1. build pg_resotre from 12.2 with "--with-gssapi" enabled, or use the 
release from https://www.postgresql.org/download/macosx/


2. start pg server and generate some load,
    pgbench -i -p 5432 -d postgres -s 10"

3. backup database,
    pg_dump -h localhost -Fc --no-acl --no-owner postgres > /tmp/128m

4. drop the tables,
    psql -d postgres -c "drop table pgbench_accounts; drop table 
pgbench_branches; drop table pgbench_history; drop table pgbench_tellers;"


5. restore database,
    pg_restore -d postgres -h localhost -Fc /tmp/128m --jobs=2
    Password:
    pg_restore: error: a worker process died unexpectedly

6. check tables, all display size 0 bytes.
    postgres=# \d+
      List of relations
     Schema |   Name   | Type  |  Owner   |  Size   | Description
+--+---+--+-+-
     public | pgbench_accounts | table | postgres | 0 bytes |
     public | pgbench_branches | table | postgres | 0 bytes |
     public | pgbench_history  | table | postgres | 0 bytes |
     public | pgbench_tellers  | table | postgres | 0 bytes |
    (4 rows)

7. core dump, about 2G,
(lldb) bt all
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x7fff6c29c44e 
libdispatch.dylib`_dispatch_mgr_queue_push + 41
    frame #1: 0x7fff41475a74 
Security`___ZN8Security12KeychainCore14StorageManager14tickleKeychainEPNS0_12KeychainImplE_block_invoke_2 
+ 76
    frame #2: 0x7fff6c29250e 
libdispatch.dylib`_dispatch_client_callout + 8
    frame #3: 0x7fff6c29e567 
libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 60
    frame #4: 0x7fff41475935 
Security`Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) 
+ 485
    frame #5: 0x7fff412400d8 
Security`Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) 
+ 352
    frame #6: 0x7fff41417975 
Security`Security::KeychainCore::IdentityCursor::next(Security::SecPointer&) 
+ 217

    frame #7: 0x7fff4143c4c3 Security`SecIdentitySearchCopyNext + 155
    frame #8: 0x7fff414477d8 
Security`SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 261

    frame #9: 0x7fff4144b024 Security`SecItemCopyMatching + 338
    frame #10: 0x7fff56dab303 Heimdal`keychain_query + 531
    frame #11: 0x7fff56da8f4c Heimdal`hx509_certs_find + 92
    frame #12: 0x7fff56d67b52 Heimdal`_krb5_pk_find_cert + 466
    frame #13: 0x7fff376da9bb GSS`_gsspku2u_acquire_cred + 619
    frame #14: 0x7fff376bfc1c GSS`gss_acquire_cred + 940
    frame #15: 0x00010016e6e1 
libpq.5.dylib`pg_GSS_have_cred_cache(cred_out=0x000100505688) at 
fe-gssapi-common.c:67:10
    frame #16: 0x00010014f769 
libpq.5.dylib`PQconnectPoll(conn=0x000100505310) at fe-connect.c:2785:22
    frame #17: 0x00010014be9f 
libpq.5.dylib`connectDBComplete(conn=0x000100505310) at 
fe-connect.c:2095:10
    frame #18: 0x00010014bb0c 
libpq.5.dylib`PQconnectdbParams(keywords=0x7ffeefbfeee0, 
values=0x7ffeefbfeea0, expand_dbname=1) at fe-connect.c:625:10
    frame #19: 0x0001ec20 
pg_restore`ConnectDatabase(AHX=0x000100505070, dbname="postgres", 
pghost="david.highgo.ca", pgport=0x, username="david", 
prompt_password=TRI_DEFAULT) at pg_backup_db.c:287:20
    frame #20: 0x0001a75a 
pg_restore`CloneArchive(AH=0x0001002020f0) at 
pg_backup_archiver.c:4850:3
    frame #21: 0x000100017b4b 
pg_restore`RunWorker(AH=0x0001002020f0, slot=0x000100221718) at 
parallel.c:866:7
    frame #22: 0x0001000179f5 
pg_restore`ParallelBackupStart(AH=0x0001002020f0) at parallel.c:1028:4
    frame #23: 0x00014473 
pg_restore`RestoreArchive(AHX=0x0001002020f0) at 
pg_backup_archiver.c:662:12
    frame #24: 0x00011be4 pg_restore`main(argc=10, 
argv=0x7ffeefbff8f0) at pg_restore.c:447:3

    frame #25: 0x7fff6c2eb7fd libdyld.dylib`start + 1
(lldb)

8. however it works with either,
    PGGSSENCMODE=disable pg_restore -d postgres -h localhost -Fc 
/tmp/128m --jobs=2

or,
    pg_restore -d "dbname=postgres gssencmode=disable" -h localhost -Fc 
/tmp/128m --jobs=2


9. pg_config output and versions, no SSL configured,

$ pg_config
BINDIR = /Users/david/sandbox/pg122/app/bin
DOCDIR = /Users/david/sandbox/pg122/app/share/doc/postgresql
HTMLDIR = /Users/david/sandbox/pg122/app/share/doc/postgresql
INCLUDEDIR = /Users/david/sandbox/pg122/app/include
PKGINCLUDEDIR = /Users/david/sandbox/pg122/app/include/postgresql
INCLUDEDIR-SERVER = /Users/david/sandbox/pg122/app/include/postgresql/server
LIBDIR = /Users/david/sandbox/pg122/app/lib
PKGLIBDIR = /Users/david/sandbox/pg122/app/lib/postgresql
LOCALEDIR = /Users/david/sandbox/pg122/app/share/locale
MANDIR = 

kerberos regression test enhancement

2020-03-05 Thread David Zhang

Hi Hackers,

I found one interesting behavior when "--with-gssapi" is enabled,

given a very "common" configuration in gp_hba.conf like below,

    host    postgres    david   192.168.0.114/32    trust

the query message is always encrypted when using a very "common" way 
connect to PG server,


    $ psql -h pgserver -d postgres -U david

unless I specify "gssencmode=disable" with -d option,

    $ psql -h pgserver -U david  -d "dbname=postgres gssencmode=disable"

Based on above behaviors, I did a further exercise on kerberos 
regression test and found the test coverage is not enough. It should be 
enhanced to cover the above behavior when user specified a "host" 
followed by "trust" access in pg_hba.conf.


the attachment is a patch to cover the behaviors mentioned above for 
kerberos regression test.


Any thoughts?


Thanks,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index b3aeea9574..7c2e65ce76 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -19,7 +19,7 @@ use Test::More;
 
 if ($ENV{with_gssapi} eq 'yes')
 {
-   plan tests => 18;
+   plan tests => 20;
 }
 else
 {
@@ -333,3 +333,25 @@ test_access(
0,
'',
'succeeds with include_realm=0 and defaults');
+
+truncate($node->data_dir . '/pg_ident.conf', 0);
+unlink($node->data_dir . '/pg_hba.conf');
+$node->append_conf('pg_hba.conf',
+   qq{host all all $hostaddr/32 trust});
+$node->restart;
+
+test_access(
+   $node,
+   'test1',
+   'SELECT not gss_authenticated AND encrypted from pg_stat_gssapi where 
pid = pg_backend_pid();',
+   0,
+   '',
+   'succeeds with GSS-encrypted with default gssencmode and host trust 
hba');
+
+test_access(
+   $node,
+   "test1",
+   'SELECT not gss_authenticated and not encrypted from pg_stat_gssapi 
where pid = pg_backend_pid();',
+   0,
+   "gssencmode=disable",
+   "succeeds with GSS encryption disabled with access disabled and host 
trust hba");


Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-03 Thread David Zhang

Hi Dilip,

I repeated the same test cases again and can't reproduce the 
disconnection issue after applied your new patch.


Best regards,

David

On 2020-03-02 9:11 p.m., Dilip Kumar wrote:

On Wed, Feb 19, 2020 at 6:00 AM David Zhang  wrote:

After manually applied the patch, a diff regenerated is attached.

On 2020-02-18 4:16 p.m., David Zhang wrote:

1. Tried to apply the patch to PG 12.2 commit 
45b88269a353ad93744772791feb6d01bc7e1e42 (HEAD -> REL_12_2, tag: REL_12_2), it 
doesn't work. Then tried to check the patch, and found the errors showing below.
$ git apply --check 
0001-Fastpath-for-sending-changes-to-output-plugin-in-log.patch
error: patch failed: contrib/test_decoding/logical.conf:1
error: contrib/test_decoding/logical.conf: patch does not apply
error: patch failed: src/backend/replication/logical/reorderbuffer.c:1133
error: src/backend/replication/logical/reorderbuffer.c: patch does not apply

2. Ran a further check for file "logical.conf", and found there is only one commit since 
2014, which doesn't have the parameter, "logical_decoding_work_mem = 64kB"

3. Manually apply the patch including 
src/backend/replication/logical/reorderbuffer.c, and then ran a simple logical 
replication test. A connection issue is found like below,
"table public.pgbench_accounts: INSERT: aid[integer]:4071 bid[integer]:1 
abalance[integer]:0 filler[character]:'  
  '
pg_recvlogical: error: could not receive data from WAL stream: server closed 
the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
pg_recvlogical: disconnected; waiting 5 seconds to try again"

4. This connection issue can be reproduced on PG 12.2 commit mentioned above, 
the basic steps,
4.1 Change "wal_level = logical" in "postgresql.conf"
4.2 create a logical slot and listen on it,
$ pg_recvlogical -d postgres --slot test --create-slot
$ pg_recvlogical -d postgres --slot test --start -f -

4.3 from another terminal, run the command below,
$ pgbench -i -p 5432 -d postgres

Let me know if I did something wrong, and if a new patch is available, I can 
re-run the test on the same environment.

Thanks for testing and rebasing.  I think one of the hunks is missing
in your rebased version.  That could be the reason for failure.  Can
you test on my latest version?


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Making psql error out on output failures

2020-02-28 Thread David Zhang

Hi Alvaro,

Thanks for your review, now the new patch with the error message in PG 
style is attached.


On 2020-02-28 8:03 a.m., Alvaro Herrera wrote:

On 2020-Feb-18, David Zhang wrote:


3. I think a better way to resolve this issue will still be the solution
with an extra %m, which can make the error message much more informative to
the end users.

Yes, agreed.  However, we use a style like this:

pg_log_error("could not print tuples: %m");


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 4b2679360f..9ffb1ef10f 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -855,6 +855,7 @@ static bool
 PrintQueryTuples(const PGresult *results)
 {
printQueryOpt my_popt = pset.popt;
+   bool result = true;
 
/* one-shot expanded output requested via \gx */
if (pset.g_expanded)
@@ -872,6 +873,11 @@ PrintQueryTuples(const PGresult *results)
disable_sigpipe_trap();
 
printQuery(results, _popt, fout, false, pset.logfile);
+   if (ferror(fout))
+   {
+   pg_log_error("could not print tuples: %m");
+   result = false;
+   }
 
if (is_pipe)
{
@@ -882,9 +888,16 @@ PrintQueryTuples(const PGresult *results)
fclose(fout);
}
else
+   {
printQuery(results, _popt, pset.queryFout, false, 
pset.logfile);
+   if (ferror(pset.queryFout))
+   {
+   pg_log_error("could not print tuples: %m");
+   result = false;
+   }
+   }
 
-   return true;
+   return result;
 }
 
 


Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-02-18 Thread David Zhang

After manually applied the patch, a diff regenerated is attached.

On 2020-02-18 4:16 p.m., David Zhang wrote:

1. Tried to apply the patch to PG 12.2 commit 
45b88269a353ad93744772791feb6d01bc7e1e42 (HEAD -> REL_12_2, tag: REL_12_2), it 
doesn't work. Then tried to check the patch, and found the errors showing below.
$ git apply --check 
0001-Fastpath-for-sending-changes-to-output-plugin-in-log.patch
error: patch failed: contrib/test_decoding/logical.conf:1
error: contrib/test_decoding/logical.conf: patch does not apply
error: patch failed: src/backend/replication/logical/reorderbuffer.c:1133
error: src/backend/replication/logical/reorderbuffer.c: patch does not apply

2. Ran a further check for file "logical.conf", and found there is only one commit since 
2014, which doesn't have the parameter, "logical_decoding_work_mem = 64kB"

3. Manually apply the patch including 
src/backend/replication/logical/reorderbuffer.c, and then ran a simple logical 
replication test. A connection issue is found like below,
"table public.pgbench_accounts: INSERT: aid[integer]:4071 bid[integer]:1 
abalance[integer]:0 filler[character]:'  
  '
pg_recvlogical: error: could not receive data from WAL stream: server closed 
the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_recvlogical: disconnected; waiting 5 seconds to try again"

4. This connection issue can be reproduced on PG 12.2 commit mentioned above, 
the basic steps,
4.1 Change "wal_level = logical" in "postgresql.conf"
4.2 create a logical slot and listen on it,
$ pg_recvlogical -d postgres --slot test --create-slot
$ pg_recvlogical -d postgres --slot test --start -f -

4.3 from another terminal, run the command below,
$ pgbench -i -p 5432 -d postgres

Let me know if I did something wrong, and if a new patch is available, I can 
re-run the test on the same environment.


--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/contrib/test_decoding/logical.conf 
b/contrib/test_decoding/logical.conf
index 367f706651..02595d99d5 100644
--- a/contrib/test_decoding/logical.conf
+++ b/contrib/test_decoding/logical.conf
@@ -1,2 +1,3 @@
 wal_level = logical
 max_replication_slots = 4
+logical_decoding_work_mem = 64MB
diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index a74fd705b4..62b661dc4d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -989,11 +989,6 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
nr_txns++;
}
 
-   /*
-* TODO: Consider adding fastpath for the rather common nr_txns=1 case, 
no
-* need to allocate/build a heap then.
-*/
-
/* allocate iteration state */
state = (ReorderBufferIterTXNState *)
MemoryContextAllocZero(rb->context,
@@ -1009,10 +1004,11 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
state->entries[off].segno = 0;
}
 
-   /* allocate heap */
-   state->heap = binaryheap_allocate(state->nr_txns,
- 
ReorderBufferIterCompare,
- 
state);
+   /* allocate heap, if we have more than one transaction. */
+   if (nr_txns > 1)
+   state->heap = binaryheap_allocate(state->nr_txns,
+   
  ReorderBufferIterCompare,
+   
  state);
 
/* Now that the state fields are initialized, it is safe to return it. 
*/
*iter_state = state;
@@ -1044,7 +1040,9 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
state->entries[off].change = cur_change;
state->entries[off].txn = txn;
 
-   binaryheap_add_unordered(state->heap, Int32GetDatum(off++));
+   /* add to heap, only if we have more than one transaction. */
+   if (nr_txns > 1)
+   binaryheap_add_unordered(state->heap, 
Int32GetDatum(off++));
}
 
/* add subtransactions if they contain changes */
@@ -1073,12 +1071,15 @@ ReorderBufferIterTXNInit(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
state->entries[off].change = cur_change;
state->entries[off].txn = cur_txn;
 
-   binaryheap_add_unordered(state->heap, 
Int32GetDatum(off++));
+   /* add to heap, only if we have more than one 
transaction. */
+

Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-02-18 Thread David Zhang
1. Tried to apply the patch to PG 12.2 commit 
45b88269a353ad93744772791feb6d01bc7e1e42 (HEAD -> REL_12_2, tag: REL_12_2), it 
doesn't work. Then tried to check the patch, and found the errors showing below.
$ git apply --check 
0001-Fastpath-for-sending-changes-to-output-plugin-in-log.patch
error: patch failed: contrib/test_decoding/logical.conf:1
error: contrib/test_decoding/logical.conf: patch does not apply
error: patch failed: src/backend/replication/logical/reorderbuffer.c:1133
error: src/backend/replication/logical/reorderbuffer.c: patch does not apply

2. Ran a further check for file "logical.conf", and found there is only one 
commit since 2014, which doesn't have the parameter, "logical_decoding_work_mem 
= 64kB"

3. Manually apply the patch including 
src/backend/replication/logical/reorderbuffer.c, and then ran a simple logical 
replication test. A connection issue is found like below,
"table public.pgbench_accounts: INSERT: aid[integer]:4071 bid[integer]:1 
abalance[integer]:0 filler[character]:' 
   '
pg_recvlogical: error: could not receive data from WAL stream: server closed 
the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_recvlogical: disconnected; waiting 5 seconds to try again"

4. This connection issue can be reproduced on PG 12.2 commit mentioned above, 
the basic steps,
4.1 Change "wal_level = logical" in "postgresql.conf"
4.2 create a logical slot and listen on it,
$ pg_recvlogical -d postgres --slot test --create-slot
$ pg_recvlogical -d postgres --slot test --start -f -

4.3 from another terminal, run the command below,
$ pgbench -i -p 5432 -d postgres

Let me know if I did something wrong, and if a new patch is available, I can 
re-run the test on the same environment.

-- 
David
Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: First WAL segment file that initdb creates

2020-02-18 Thread David Zhang
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, passed

The issue has been verified using below steps:
1. $ initdb -D /home/test/PG122DATA/data
2. $ ls -l /home/test/PG122DATA/data/pg_wal/
total 16388
-rw--- 1 test test 16777216 Feb 18 12:07 00010001
drwx-- 2 test test 4096 Feb 18 12:07 archive_status

The first WAL segment file created by initdb is "00010001", not 
"0001".

Re: Making psql error out on output failures

2020-02-18 Thread David Zhang
2.168.0.19 -At -c "select 
repeat('111', 200)" >> E:/file

Error printing tuples (No space left on device)

### macOS Mojave 10.14
postgres=# select repeat('111', 100) \g  /Volumes/sdcard/file
Error printing tuples (No space left on device)
postgres=# \q

MacBP:bin david$ psql -d postgres -h 192.168.0.10 -At -c "select 
repeat('111', 300)" > /Volumes/sdcard/file

Error printing tuples (No space left on device)

MacBP:bin david$ psql -d postgres -h 192.168.0.10 -At -c "select 
repeat('111', 300)" -o /Volumes/sdcard/file

Error printing tuples (No space left on device)


On 2020-01-29 1:51 a.m., Daniel Verite wrote:

David Zhang wrote:


Are you sure? I don't find that redefinition. Besides
print_aligned_text() also calls putc and puts.

Yes, below is the gdb debug message when psql first time detects the
error "No space left on device". Test case, "postgres=# select
repeat('111', 100) \g /mnt/ramdisk/file"
bt
#0  flushbuffer (target=0x7ffd6a709ad0) at snprintf.c:313

Indeed. For some reason gdb won't let me step into these fprintf()
calls, but you're right they're redefined (through include/port.h):

#define vsnprintf   pg_vsnprintf
#define snprintfpg_snprintf
#define vsprintfpg_vsprintf
#define sprintf pg_sprintf
#define vfprintfpg_vfprintf
#define fprintf pg_fprintf
#define vprintf pg_vprintf
#define printf(...) pg_printf(__VA_ARGS__)

Anyway, I don't see it leading to an actionable way to reliably keep
errno, as discussed upthread.

Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 4b2679360f..4fa63134d3 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -855,6 +855,7 @@ static bool
 PrintQueryTuples(const PGresult *results)
 {
printQueryOpt my_popt = pset.popt;
+   bool result = true;
 
/* one-shot expanded output requested via \gx */
if (pset.g_expanded)
@@ -872,6 +873,11 @@ PrintQueryTuples(const PGresult *results)
disable_sigpipe_trap();
 
printQuery(results, _popt, fout, false, pset.logfile);
+   if (ferror(fout))
+   {
+   pg_log_error("Error printing tuples (%m)");
+   result = false;
+   }
 
if (is_pipe)
{
@@ -882,9 +888,16 @@ PrintQueryTuples(const PGresult *results)
fclose(fout);
}
else
+   {
printQuery(results, _popt, pset.queryFout, false, 
pset.logfile);
+   if (ferror(pset.queryFout))
+   {
+   pg_log_error("Error printing tuples (%m)");
+   result = false;
+   }
+   }
 
-   return true;
+   return result;
 }
 
 


Re: Making psql error out on output failures

2020-01-28 Thread David Zhang



On 2020-01-28 4:14 a.m., Daniel Verite wrote:

David Zhang wrote:


The error "No space left on device" can be logged by fprintf() which is
redefined as pg_fprintf() when print_aligned_text() is called

Are you sure? I don't find that redefinition. Besides
print_aligned_text() also calls putc and puts.
Yes, below is the gdb debug message when psql first time detects the 
error "No space left on device". Test case, "postgres=# select 
repeat('111', 100) \g /mnt/ramdisk/file"

bt
#0  flushbuffer (target=0x7ffd6a709ad0) at snprintf.c:313
#1  0x55ba90b88b6c in dopr_outchmulti (c=32, slen=447325, 
target=0x7ffd6a709ad0) at snprintf.c:1435
#2  0x55ba90b88d5e in trailing_pad (padlen=-147, 
target=0x7ffd6a709ad0) at snprintf.c:1514
#3  0x55ba90b87f36 in fmtstr (value=0x55ba90bb4f9a "", leftjust=1, 
minlen=147, maxwidth=0, pointflag=0, target=0x7ffd6a709ad0) at 
snprintf.c:994
#4  0x55ba90b873c6 in dopr (target=0x7ffd6a709ad0, 
format=0x55ba90bb5083 "%s%-*s", args=0x7ffd6a709f40) at snprintf.c:675
#5  0x55ba90b865b5 in pg_vfprintf (stream=0x55ba910cf240, 
fmt=0x55ba90bb507f "%-*s%s%-*s", args=0x7ffd6a709f40) at snprintf.c:257
#6  0x55ba90b866aa in pg_fprintf (stream=0x55ba910cf240, 
fmt=0x55ba90bb507f "%-*s%s%-*s") at snprintf.c:270
#7  0x55ba90b75d22 in print_aligned_text (cont=0x7ffd6a70a210, 
fout=0x55ba910cf240, is_pager=false) at print.c:937
#8  0x55ba90b7ba19 in printTable (cont=0x7ffd6a70a210, 
fout=0x55ba910cf240, is_pager=false, flog=0x0) at print.c:3378
#9  0x55ba90b7bedc in printQuery (result=0x55ba910f9860, 
opt=0x7ffd6a70a2c0, fout=0x55ba910cf240, is_pager=false, flog=0x0) at 
print.c:3496
#10 0x55ba90b39560 in PrintQueryTuples (results=0x55ba910f9860) at 
common.c:874
#11 0x55ba90b39d55 in PrintQueryResults (results=0x55ba910f9860) at 
common.c:1262
#12 0x55ba90b3a343 in SendQuery (query=0x55ba910f2590 "select 
repeat('111', 100) ") at common.c:1446
#13 0x55ba90b51f36 in MainLoop (source=0x7f1623a9ea00 
<_IO_2_1_stdin_>) at mainloop.c:505
#14 0x55ba90b5c4da in main (argc=3, argv=0x7ffd6a70a738) at 
startup.c:445

(gdb) l
308 size_t  written;
309
310 written = fwrite(target->bufstart, 1, nc, 
target->stream);

311 target->nchars += written;
312 if (written != nc)
313 target->failed = true;
314 }
315 target->bufptr = target->bufstart;
316 }
317
(gdb) p written
$2 = 1023
(gdb) p nc
$3 = 1024
(gdb) p strerror(errno)
$4 = 0x7f16238672c9 "No space left on device"
(gdb)



Will that be a better solution if redefine fputs similar to fprintf and
log the exact error when first time discovered?

I think we can assume it's not acceptable to have pg_fprintf()
to print anything to stderr, or to set a flag through a global
variable. So even if using pg_fprintf() for all the printing, no matter
how  (through #defines or otherwise),  there's still the problem that the
error needs to be propagated up the call chain to be processed by psql.


The concern is that if we can't provide a more accurate
information to the end user when error happens, sometimes the
end user might got even confused.

It's better to have a more informative message, but I'm for
not having the best be the enemy of the good.
The first concern is that at the moment, we have no error
report at all in the case when the output can be opened
but the error happens later along the writes.


Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: Making psql error out on output failures

2020-01-27 Thread David Zhang



On 2020-01-20 2:42 a.m., Daniel Verite wrote:

David Zhang wrote:


Yes, I agree with you. For case 2 "select repeat('111', 100) \g
/mnt/ramdisk/file", it can be easily fixed with more accurate error
message similar to pg_dump, one example could be something like below.
But for case 1 "psql -d postgres  -At -c "select repeat('111', 100)"

/mnt/ramdisk/file" , it may require a lot of refactoring work.

I don't quite see why you make that distinction? The relevant bits of
code are common, it's all the code in src/fe_utils/print.c called
from PrintQuery().


The reason is that, within PrintQuery() function call, one case goes to 
print_unaligned_text(), and the other case goes to print_aligned_text(). 
The error "No space left on device" can be logged by fprintf() which is 
redefined as pg_fprintf() when print_aligned_text() is called, however 
the original c fputs function is used directly when 
print_unaligned_text() is called, and it is used everywhere.


Will that be a better solution if redefine fputs similar to fprintf and 
log the exact error when first time discovered? The concern is that if 
we can't provide a more accurate information to the end user when error 
happens, sometimes the end user might got even confused.




Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca




Re: Making psql error out on output failures

2020-01-16 Thread David Zhang

On 2020-01-16 5:20 a.m., Daniel Verite wrote:

David Zhang wrote:


If I change the error log message like below, where "%m" is used to pass the
value of strerror(errno), "could not write to output file:" is copied from
function "WRITE_ERROR_EXIT".
-   pg_log_error("Error printing tuples");
+   pg_log_error("could not write to output file: %m");
then the output message is something like below, which, I believe, is more
consistent with pg_dump.

The problem is that errno may not be reliable to tell us what was
the problem that leads to ferror(fout) being nonzero, since it isn't
saved at the point of the error and the output code may have called
many libc functions between the first occurrence of the output error
and when pg_log_error() is called.

The linux manpage on errno warns specifically about this:

NOTES
A common mistake is to do

   if (somecall() == -1) {
   printf("somecall() failed\n");
   if (errno == ...) { ... }
   }

where errno no longer needs to have the value  it  had  upon  return
from  somecall()
(i.e.,  it  may have been changed by the printf(3)).  If the value of
errno should be
preserved across a library call, it must be saved:


This other bit from the POSIX spec [1] is relevant:

   "The value of errno shall be defined only after a call to a function
   for which it is explicitly stated to be set and until it is changed
   by the next function call or if the application assigns it a value."

To use errno in a way that complies with the above, the psql code
should be refactored. I don't know if having a more precise error
message justifies that refactoring. I've elaborated a bit about that
upthread with the initial submission.


Yes, I agree with you. For case 2 "select repeat('111', 100) \g 
/mnt/ramdisk/file", it can be easily fixed with more accurate error 
message similar to pg_dump, one example could be something like below. 
But for case 1 "psql -d postgres  -At -c "select repeat('111', 100)" 
> /mnt/ramdisk/file" , it may require a lot of refactoring work.


diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 8fd997553e..e6c239fd9f 100644
--- a/src/port/snprintf.c
+++ b/src/port/snprintf.c
@@ -309,8 +309,10 @@ flushbuffer(PrintfTarget *target)

    written = fwrite(target->bufstart, 1, nc, target->stream);
    target->nchars += written;
-   if (written != nc)
+   if (written != nc) {
    target->failed = true;
+   fprintf(stderr, "could not write to output file: 
%s\n", strerror(errno));

+   }


Besides, I'm not even
sure that errno is necessarily set on non-POSIX platforms when fputc
or fputs fails.

Verified, fputs does set the errno at least in Ubuntu Linux.

That's why this patch uses the safer approach to emit a generic
error message.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/errno.html


Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca






Re: Making psql error out on output failures

2020-01-15 Thread David Zhang
Right, the file difference is caused by "-At". 

On the other side, in order to keep the output message more consistent with 
other tools, I did a litter bit more investigation on pg_dump to see how it 
handles this situation. Here is my findings.
pg_dump using WRITE_ERROR_EXIT to throw the error message when "(bytes_written 
!= size * nmemb)", where WRITE_ERROR_EXIT calls fatal("could not write to 
output file: %m") and then "pg_log_generic(PG_LOG_ERROR, __VA_ARGS__)". After 
ran a quick test in the same situation, I got message like below,
$ pg_dump -h localhost -p 5432  -d postgres -t psql_error -f /mnt/ramdisk/file
pg_dump: error: could not write to output file: No space left on device

If I change the error log message like below, where "%m" is used to pass the 
value of strerror(errno), "could not write to output file:" is copied from 
function "WRITE_ERROR_EXIT". 
-   pg_log_error("Error printing tuples");
+   pg_log_error("could not write to output file: %m");
then the output message is something like below, which, I believe, is more 
consistent with pg_dump.
$ psql -d postgres  -t -c "select repeat('111', 100)" -o /mnt/ramdisk/file
could not write to output file: No space left on device
$ psql -d postgres  -t -c "select repeat('111', 100)" > /mnt/ramdisk/file
could not write to output file: No space left on device

Hope the information will help.

David
---
Highgo Software Inc. (Canada)
www.highgo.ca