Re: [HACKERS] create_unique_path and GEQO

2017-03-24 Thread Rushabh Lathia
On Wed, Mar 22, 2017 at 3:43 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi,
> In create_unique_path() there's comment
> /*
>  * We must ensure path struct and subsidiary data are allocated in main
>  * planning context; otherwise GEQO memory management causes trouble.
>  */
> oldcontext = MemoryContextSwitchTo(root->planner_cxt);
>
> pathnode = makeNode(UniquePath);
>
> This means that when GEQO resets the memory context, the RelOptInfo
> for which this path is created and may be set to cheapest_unique_path
> goes away, the unique path lingers on in the planner context.
> Shouldn't we instead allocate the path in the same context as the
> RelOptInfo similar to mark_dummy_rel()?
>
>
Do you have test case, which can reproduce the issue you
explained above?


> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Rushabh Lathia


Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Craig Ringer
On 24 March 2017 at 14:07, Kuntal Ghosh  wrote:
> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>  wrote:
>> Hello,
>> In Windows, if one needs to take a dump in plain text format (this is
>> the default option, or can be specified using -Fp) with some level of
>> compression (-Z[0-9]), an output file has to
>> be specified. Otherwise, if the output is redirected to stdout, it'll
>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>> carriage returns in the file).
> To reproduce the issue, please use the following command in windows cmd:
>
> pg_dump -Z 9 test > E:\test_xu.backup
> pg_dump -Fp -Z 9 test > E:\test_xu.backup

This is a known problem. It is not specific to PostgreSQL, it affects
any software that attempts to use stdin/stdout on Windows via cmd,
where it is not 8-bit clean.

We don't just refuse to run with stdout as a destination because it's
perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
as I know, tell whether it's being invoked by cmd or something else.

If you have concrete ideas on how to improve this they'd be welcomed.
Is there anywhere you expected to find info in the docs? Do you know
of a way to detect in Windows if the output stream is not 8-bit clean
from within the application program? ... other?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-03-24 Thread Ashutosh Bapat
On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklau  wrote:
> On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote:
>> On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau 
> wrote:
>> > With range partitioning, we guarantee that each partition contains non-
>> > overlapping values. Since we know the range allowed for each partition, it
>> > is possible to sort them according to the partition key (as is done
>> > already for looking up partitions).
>> >
>> > Thus, we ca generate sorted Append plans instead of MergeAppend when
>> > sorting occurs on the partition key.
>>
>> Great idea.  This is too late for v10 at this point, but please add it
>> to the next CommitFest so we don't forget about it.
>
> I know it is too late, and thought that it was too early to add it to the
> commitfest properly since so many design decisions should be discussed. Thanks
> for the feedback, I added it.

Thanks for working on this. I was also thinking about the same.

I will try to get back to review/work with this for v11. Mean time, I
am working on partition-wise joins [1]. In those patches, I have added
a structure called PartitionScheme, which represents how a relation is
partitioned. For base relations it's mostly copy of PartitionDesc and
PartitionKey, but then it bubbles up across join, with each
partitioned join getting relevant partitioning scheme. If you could
base your design such that is uses PartitionScheme, it could be used
for joins and probably when Jeevan's patch for partition-wise
aggregate [2] comes along, it can be used with grouping.

[1] 
https://www.postgresql.org/message-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQvQTwzQOGczq_sw%40mail.gmail.com
[2] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-24 Thread Amit Kapila
On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma  wrote:
>>
>> I think it would have been probably okay to use *int* for ntuples as
>> that matches with what you are actually assigning in the function.
>
> okay, corrected it. Attached is newer version of patch.
>

Thanks, this version looks good to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-24 Thread Amit Kapila
On Fri, Mar 24, 2017 at 9:50 AM, Ashutosh Sharma  wrote:
> On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma  
> wrote:
>>
>> Thanks for reviewing my patch. I have removed the extra white space.
>> Attached are both the patches.
>
> Sorry, I have mistakenly attached wrong patch. Here are the correct
> set of patches.
>

The latest version of patches looks fine to me.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-24 Thread Amit Khandekar
On 24 March 2017 at 13:11, Rajkumar Raghuwanshi
 wrote:
> I have given patch on latest pg sources (on commit
> 457a4448732881b5008f7a3bcca76fc299075ac3). configure and make all
> install ran successfully, but initdb failed with below error.

> FailedAssertion("!(LWLockTranchesAllocated >=
> LWTRANCHE_FIRST_USER_DEFINED)", File: "lwlock.c", Line: 501)

Thanks for reporting, Rajkumar.

With the new PARALLEL_APPEND tranche ID, LWTRANCHE_FIRST_USER_DEFINED
value has crossed the value 64. So we need to increase the initial
size of LWLockTrancheArray from 64 to 128. Attached is the updated
patch.


ParallelAppend_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-24 Thread Amit Langote
Hi Ashutosh,

On 2017/03/23 21:48, Ashutosh Bapat wrote:
>>> I have fixed all the issues reported till now.

In patch 0007, the following code in have_partkey_equi_join() looks
potentially unsafe:

/*
 * If the clause refers to different partition keys from
 * both relations, it can not be used for partition-wise join.
 */
if (ipk1 != ipk2)
continue;

/*
 * The clause allows partition-wise join if only it uses the same
 * operator family as that specified by the partition key.
 */
if (!list_member_oid(rinfo->mergeopfamilies,
 part_scheme->partopfamily[ipk1]))
continue;

What if ipk1 and ipk2 both turn out to be -1? Accessing
part_schem->partopfamily[ipk1] would be incorrect, no?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Michael Banck
Hi,

On Fri, Mar 24, 2017 at 12:32:28AM +0100, Petr Jelinek wrote:
> Yes, I also forgot to check if the table actually exists on subscriber
> when fetching them in CREATE SUBSCRIPTION (we have check during
> replication but not there).
> 
> Attached patches should fix both issues.

I no longer get a segfault, thanks.

However, replication also seems to not work, I'm using the following
script right now:

--8<--
#!/bin/sh

set -x

#rm -rf data_* pg*.log

initdb --pgdata=data_pg1 1> /dev/null 2>&1
sed -i -e 's/^#wal_level.=.replica/wal_level = logical/'
data_pg1/postgresql.conf
pg_ctl --pgdata=data_pg1 -l pg1.log start 1> /dev/null
psql -c "CREATE TABLE t1(id int);"
pg_basebackup --pgdata=data_pg2
sed -i -e 's/^#port.=.5432/port = 5433/' data_pg2/postgresql.conf
psql -c "INSERT INTO t1 VALUES(1);"
pg_ctl --pgdata=data_pg2 -l pg2.log start 1> /dev/null
psql -c "CREATE PUBLICATION pub1;"
psql --port=5433 -c "CREATE SUBSCRIPTION sub1 CONNECTION
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);"
sleep 6
psql -c 'SELECT application_name, state FROM pg_stat_replication;'
psql --port=5433 -c "SELECT COUNT(*) FROM t1;"
--8<--

(I had to add the sleep 6 - 5 wasn't always enough - in order to get the
subcription from 'catchup' to 'streaming' which was a bit surprising to
me)

This is the output:

--8<--
+ initdb --pgdata=data_pg1
+ sed -i -e s/^#wal_level.=.replica/wal_level = logical/
data_pg1/postgresql.conf
+ pg_ctl --pgdata=data_pg1 -l pg1.log start
+ psql -c CREATE TABLE t1(id int);
CREATE TABLE
+ pg_basebackup --pgdata=data_pg2
+ sed -i -e s/^#port.=.5432/port = 5433/ data_pg2/postgresql.conf
+ psql -c INSERT INTO t1 VALUES(1);
INSERT 0 1
+ pg_ctl --pgdata=data_pg2 -l pg2.log start
+ psql -c CREATE PUBLICATION pub1;
CREATE PUBLICATION
+ psql --port=5433 -c CREATE SUBSCRIPTION sub1 CONNECTION
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);
NOTICE:  created replication slot "sub1" on publisher
NOTICE:  synchronized table states
CREATE SUBSCRIPTION
+ sleep 6
+ psql -c SELECT application_name, state FROM pg_stat_replication;
 application_name |   state   
--+---
 sub1 | streaming
(1 row)

+ psql --port=5433 -c SELECT COUNT(*) FROM t1;
 count 
---
 0
(1 row)
--8<--

I would have assumed the inserted value to be visible on the standby.
If I insert further values, don't show up, either.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-03-24 Thread Fabien COELHO


Hello Peter,


I think the fix belongs into the web site CSS, so there is nothing to
commit into PostgreSQL here.


Indeed, the changes were only for the "remove nesting" solution.

I will close the commit fest entry, but I have added a section to the 
open items list so we keep track of it. 
(https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain)


I put forward that the quick workaround a colleague of mine suggested (aka 
something like code code { font-size: 100%; important! }) could also be 
applied to the web site CSS while waiting for a more definite answer which 
might take some pretty unknown time close to never?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-24 Thread Craig Ringer
On 24 March 2017 at 02:29, Robert Haas  wrote:
> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer  wrote:
>> Changes made per discussion.
>
> Committed 0001.

Much appreciated.

Here's the 2nd patch rebased on top of master, with the TAP test
included this time. Looks ready to go.

I really appreciate the time you've taken to look at this. Point me at
anything from your team you want some outside review on.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From eca78655c966613b1b98baf9049b5c4d519d375a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 23 Jan 2017 13:34:02 +0800
Subject: [PATCH] Introduce txid_status(bigint) to get status of an xact

If an application loses its connection while a COMMIT request is in
flight, the backend crashes mid-commit, etc, then an application may
not be sure whether or not a commit completed successfully or was
rolled back. While two-phase commit solves this it does so at a
considerable overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a a
commit based on an xid-with-epoch as returned by txid_current() or
similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.

Introduces TransactionIdInRecentPast(...) for the use of other
functions that need similar logic in future.

Authors: Craig Ringer, Robert Haas
---
 doc/src/sgml/func.sgml|  27 ++
 src/backend/utils/adt/txid.c  | 132 ++
 src/include/catalog/pg_proc.h |   2 +
 src/test/recovery/t/011_crash_recovery.pl |  46 +++
 src/test/regress/expected/txid.out|  68 +++
 src/test/regress/sql/txid.sql |  38 +
 6 files changed, 313 insertions(+)
 create mode 100644 src/test/recovery/t/011_crash_recovery.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 64f86ce..356fd33 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17523,6 +17523,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
 txid_visible_in_snapshot

 
+   
+txid_status
+   
+

 The functions shown in 
 provide server transaction information in an exportable form.  The main
@@ -17573,6 +17577,11 @@ SELECT collation for ('foo' COLLATE "de_DE");
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
   
+  
+   txid_status(bigint)
+   txid_status
+   report the status of the given xact - committed, aborted, in progress, or NULL if the xid is too old
+  
  
 

@@ -17643,6 +17652,24 @@ SELECT collation for ('foo' COLLATE "de_DE");

 

+txid_status(bigint) reports the commit status of a recent
+transaction.  Applications may use it to determine whether a transaction
+committed or aborted when the application and database server become
+disconnected while a COMMIT is in progress.
+The status of a transaction will be reported as either
+in progress,
+committed, or aborted, provided that the
+transaction is recent enough that the system retains the commit status
+of that transaction.  If is old enough that no references to that
+transaction survive in the system and the commit status information has
+been discarded, this function will return NULL.  Note that prepared
+transactions are reported as in progress; applications must
+check pg_prepared_xacts if they
+need to determine whether the xid is a prepared transaction.
+   
+
+   
 The functions shown in 
 provide information about transactions that have been already committed.
 These functions mainly provide information about when the transactions
diff --git a/src/backend/utils/adt/txid.c b/src/backend/utils/adt/txid.c
index 772d7c7..5c64e32 100644
--- a/src/backend/utils/adt/txid.c
+++ b/src/backend/utils/adt/txid.c
@@ -21,6 +21,7 @@
 
 #include "postgres.h"
 
+#include "access/clog.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -28,6 +29,7 @@
 #include "miscadmin.h"
 #include "libpq/pqformat.h"
 #include "postmaster/postmaster.h"
+#include "storage/lwlock.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
@@ -93,6 +95,70 @@ load_xid_epoch(TxidEpoch *state)
 }
 
 /*
+ * Helper to get a TransactionId from a 64-bit xid with wraparound detection.
+ *
+ * It is an ERROR if the xid is in the future.  Otherwise, returns true if
+ * the transaction is still new enough that we can determine 

Re: [HACKERS] identity columns

2017-03-24 Thread Vitaly Burovoy
On 3/23/17, Peter Eisentraut  wrote:
> On 3/23/17 06:09, Vitaly Burovoy wrote:
>> I think we'll end up with "DROP IDENTITY IF EXISTS" to avoid raising
>> an exception and "ADD OR SET" if your grammar remains.
>
> That sounds reasonable to me.

It would be great if "DROP IDENTITY IF EXISTS" is in the current patch
since we don't have any disagreements about "DROP IDENTITY" behavior
and easiness of implementation of "IF EXISTS" there.

>> Right. From that PoV IDENTITY also changes a default value: "SET (ADD
>> ... AS?) IDENTITY" works as setting a default to "nextval(...)"
>> whereas "DROP IDENTITY" works as setting it back to NULL.
>
> But dropping and re-adding an identity destroys state, so it's not quite
> the same.

How does dropping and re-adding affects a choosing between "ADD" and "SET"?
If you drop identity property, you get a column's state destroyed
whatever grammar variation you are using.

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


I have an idea. What about the next version of DDL:

DROP IDENTITY [ IF EXISTS ]
SET GENERATED { ALWAYS | BY DEFAULT } [ IF NOT EXISTS ] | SET ...

That version:
1. does not introduce a new "ADD" variation;
2. without "IF NOT EXISTS" follows the standard;
3. with "IF NOT EXISTS" sets a column's identity property or alters it
(works as "CREATE OR REPLACE" for functions).

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-24 Thread Mithun Cy
On Fri, Mar 24, 2017 at 1:22 AM, Mithun Cy  wrote:
> Hi Amit please find the new patch

The pageinspect.sgml has an example which shows the output of
"hash_metapage_info()". Since we increase the spares array and
eventually ovflpoint, I have updated the example with corresponding
values..


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-03-24 Thread vinayak


On 2017/03/23 15:04, Haribabu Kommi wrote:



On Wed, Mar 22, 2017 at 8:11 PM, vinayak 
> wrote:



On 2017/03/21 21:25, Haribabu Kommi wrote:



On Tue, Mar 21, 2017 at 3:41 PM, vinayak
> wrote:

Thank you for testing the patch on Windows platform.


Thanks for the updated patch.

It works good for a normal relation.  But for a relation that
contains child tables,
the PROGRESS_ANALYZE_NUM_ROWS_SAMPLED produces wrong results.


Thank you for reviewing the patch.
The attached patch implements a way to report sample rows count
from acquire_sample_rows() even if called for child tables.

How about adding another phase called
PROGRESS_ANALYZE_PHASE_COLLECT_INHERIT_SAMPLE_ROWS
and set this phase only when it is an inheritance analyze
operation. And adding
some explanation of ROWS_SAMPLED phase about inheritance tables
how these sampled rows are calculated will provide good analyze
progress of
relation that contains child relations also.

I have added the phase called
PROGRESS_ANALYZE_PHASE_COLLECT_INH_SAMPLE_ROWS.
I have also updated the documentation.


Thanks for the updated patch. I will check it.

The ANALYZE command takes long time in computing statistics
phase.So I think we can add some column or phase so that user can
easily understand the progress.
How about adding new column like "num_rows_processed" will compute
the statistics of specified column?


I prefer a column with rows processed instead of a phase.
Because we already set the phase of compute stats and showing
the progress there will number of rows processed will be good.

How about separate the computing "inheritance statistics" phase
from the computing regular "single table" statistics.
Comment?


Yes, this will be good to show both that states of inheritance of 
sampled rows and
compute inheritance stats, so that it will be clearly visible to the 
user the current

status.

I have updated the patch.
I have added column "num_rows_processed" and phase "computing inherited 
statistics" in the view.


=# \d+ pg_stat_progress_analyze
 View "pg_catalog.pg_stat_progress_analyze"
 Column |  Type   | Collation | Nullable | Default | 
Storage  | Description

+-+---+--+-+--+-
 pid| integer |   |  | | 
plain|
 datid  | oid |   |  | | 
plain|
 datname| name|   |  | | 
plain|
 relid  | oid |   |  | | 
plain|
 phase  | text|   |  | | 
extended |
 num_target_sample_rows | bigint  |   |  | | 
plain|
 num_rows_sampled   | bigint  |   |  | | 
plain|
 num_rows_processed | bigint  |   |  | | 
plain|

View definition:
 SELECT s.pid,
s.datid,
d.datname,
s.relid,
CASE s.param1
WHEN 0 THEN 'initializing'::text
WHEN 1 THEN 'collecting sample rows'::text
WHEN 2 THEN 'collecting inherited sample rows'::text
WHEN 3 THEN 'computing statistics'::text
WHEN 4 THEN 'computing inherited statistics'::text
ELSE NULL::text
END AS phase,
s.param2 AS num_target_sample_rows,
s.param3 AS num_rows_sampled,
s.param4 AS num_rows_processed
   FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, 
param1, param2, param3, param4, param5, param6, param7, param8, param9, 
param10)

 LEFT JOIN pg_database d ON s.datid = d.oid;

Regards,
Vinayak Pokale
NTT Open Source Software Center


pg_stat_progress_analyze_v6.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-03-24 Thread Craig Ringer
On 23 March 2017 at 17:44, Craig Ringer  wrote:

Minor update to catalog_xmin walsender patch to fix failure to
parenthesize definition of PROCARRAY_PROC_FLAGS_MASK .

This one's ready to go. Working on drop slots on DB drop now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From b5e34ecaa8f43825fe41ae2e2bbf0a97258cb56a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/replication/walreceiver.c  |  45 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  12 ++-
 src/include/storage/proc.h |   5 +
 src/include/storage/procarray.h|  11 +++
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 7 files changed, 202 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 48ca414..b3a5026 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1916,10 +1916,11 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current xmin. This may be 0, if the standby is
-  sending notification that Hot Standby feedback will no longer
-  be sent on this connection. Later non-zero messages may
-  reinitiate the feedback mechanism.
+  The standby's current global xmin, excluding the catalog_xmin from any
+  replication slots. If both this value and the following
+  catalog_xmin are 0 this is treated as a notification that Hot Standby
+  feedback will no longer be sent on this connection. Later non-zero
+  messages may reinitiate the feedback mechanism.
   
   
   
@@ -1929,7 +1930,29 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current epoch.
+  The epoch of the global xmin xid on the standby.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The lowest catalog_xmin of any replication slots on the standby. Set to 0
+  if no catalog_xmin exists on the standby or if hot standby feedback is being
+  disabled.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The epoch of the catalog_xmin xid on the standby.
   
   
   
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 31c567b..0f22f17 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1175,8 +1175,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 {
 	TimestampTz now;
 	TransactionId nextXid;
-	uint32		nextEpoch;
-	TransactionId xmin;
+	uint32		xmin_epoch, catalog_xmin_epoch;
+	TransactionId xmin, catalog_xmin;
 	static TimestampTz sendTime = 0;
 	/* initially true so we always send at least one feedback message */
 	static bool master_has_standby_xmin = true;
@@ -1221,29 +1221,56 @@ XLogWalRcvSendHSFeedback(bool immed)
 	 * everything else has been checked.
 	 */
 	if (hot_standby_feedback)
-		xmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT);
+	{
+		TransactionId slot_xmin;
+
+		/*
+		 * Usually GetOldestXmin() would include the global replication slot
+		 * xmin and catalog_xmin in its calculations, but we don't want to hold
+		 * upstream back from vacuuming normal user table tuples just because
+		 * they're within the catalog_xmin horizon of logical replication slots
+		 * on this standby, so we ignore slot xmin and catalog_xmin GetOldestXmin
+		 * then deal with them ourselves.
+		 */
+		xmin = GetOldestXmin(NULL,
+			 PROCARRAY_FLAGS_DEFAULT|PROCARRAY_SLOTS_XMIN);
+
+		ProcArrayGetReplicationSlotXmin(_xmin, _xmin);
+
+		if (TransactionIdIsValid(slot_xmin) &&
+			TransactionIdPrecedes(slot_xmin, xmin))
+			xmin = slot_xmin;
+	}
 	else
+	{
 		xmin = InvalidTransactionId;
+		catalog_xmin = 

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-24 Thread Ashutosh Sharma
>>> > I think this will work, but not sure if there is a merit to deviate
>>> > from what btree does to handle this case.   One thing I find slightly
>>> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
>>> > using a number of tuples registered as part of fixed data
>>> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
>>> > I think it will be better if we register offsets also in fixed part of
>>> > data as we are doing btree case.
>>
>> Agreed. I have made the changes accordingly. Please check attached v2 patch.
>>
>
> Changes look good to me.   I think you can modify the comments in
> structure xl_hash_vacuum_one_page to mention "TARGET OFFSET NUMBERS
> FOLLOW AT THE END"
>

Added the comment in xl_hash_vacuum_one_page structure.

>>>
>>> >
>>> >
>>>
>>> Also another small point in this regard, do we need two separate
>>> variables to track number of deleted items in below code?  I think one
>>> variable is sufficient.
>>>
>>> _hash_vacuum_one_page()
>>> {
>>> ..
>>> deletable[ndeletable++] = offnum;
>>> tuples_removed += 1;--
>>>
>>
>> Yes, I think 'ndeletable' alone should be fine.
>>
>
> I think it would have been probably okay to use *int* for ntuples as
> that matches with what you are actually assigning in the function.

okay, corrected it. Attached is newer version of patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index da4c2c5..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -966,8 +966,6 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	OffsetNumber	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	int		i;
-	char *ptr;
-	Size len;
 
 	xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
 
@@ -986,12 +984,20 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 		return latestRemovedXid;
 
 	/*
+	 * Check if WAL replay has reached a consistent database state. If not,
+	 * we must PANIC. See the definition of btree_xlog_delete_get_latestRemovedXid
+	 * for more details.
+	 */
+	if (!reachedConsistency)
+		elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
+
+	/*
 	 * Get index page.  If the DB is consistent, this should not fail, nor
 	 * should any of the heap page fetches below.  If one does, we return
 	 * InvalidTransactionId to cancel all HS transactions.  That's probably
 	 * overkill, but it's safe, and certainly better than panicking here.
 	 */
-	XLogRecGetBlockTag(record, 1, , NULL, );
+	XLogRecGetBlockTag(record, 0, , NULL, );
 	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
 
 	if (!BufferIsValid(ibuffer))
@@ -1003,9 +1009,7 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	 * Loop through the deleted index items to obtain the TransactionId from
 	 * the heap items they point to.
 	 */
-	ptr = XLogRecGetBlockData(record, 1, );
-
-	unused = (OffsetNumber *) ptr;
+	unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
 
 	for (i = 0; i < xlrec->ntuples; i++)
 	{
@@ -1130,23 +1134,15 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
 
 	if (action == BLK_NEEDS_REDO)
 	{
-		char *ptr;
-		Size len;
-
-		ptr = XLogRecGetBlockData(record, 0, );
-
 		page = (Page) BufferGetPage(buffer);
 
-		if (len > 0)
+		if (XLogRecGetDataLen(record) > SizeOfHashVacuumOnePage)
 		{
 			OffsetNumber *unused;
-			OffsetNumber *unend;
 
-			unused = (OffsetNumber *) ptr;
-			unend = (OffsetNumber *) ((char *) ptr + len);
+			unused = (OffsetNumber *) ((char *) xldata + SizeOfHashVacuumOnePage);
 
-			if ((unend - unused) > 0)
-PageIndexMultiDelete(page, unused, unend - unused);
+			PageIndexMultiDelete(page, unused, xldata->ntuples);
 		}
 
 		/*
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 8640e85..8699b5b 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -344,7 +344,6 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 	Page	page = BufferGetPage(buf);
 	HashPageOpaque	pageopaque;
 	HashMetaPage	metap;
-	double tuples_removed = 0;
 
 	/* Scan each tuple in page to see if it is marked as LP_DEAD */
 	maxoff = PageGetMaxOffsetNumber(page);
@@ -355,10 +354,7 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 		ItemId	itemId = PageGetItemId(page, offnum);
 
 		if (ItemIdIsDead(itemId))
-		{
 			deletable[ndeletable++] = offnum;
-			tuples_removed += 1;
-		}
 	}
 
 	if (ndeletable > 0)
@@ -386,7 +382,7 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 		pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
 
 		metap = HashPageGetMeta(BufferGetPage(metabuf));
-		metap->hashm_ntuples -= tuples_removed;
+		metap->hashm_ntuples -= ndeletable;
 
 		MarkBufferDirty(buf);
 		

Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer  wrote:
> On 24 March 2017 at 14:07, Kuntal Ghosh  wrote:
>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>>  wrote:
>>> Hello,
>>> In Windows, if one needs to take a dump in plain text format (this is
>>> the default option, or can be specified using -Fp) with some level of
>>> compression (-Z[0-9]), an output file has to
>>> be specified. Otherwise, if the output is redirected to stdout, it'll
>>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>>> carriage returns in the file).
>> To reproduce the issue, please use the following command in windows cmd:
>>
>> pg_dump -Z 9 test > E:\test_xu.backup
>> pg_dump -Fp -Z 9 test > E:\test_xu.backup
>
> This is a known problem. It is not specific to PostgreSQL, it affects
> any software that attempts to use stdin/stdout on Windows via cmd,
> where it is not 8-bit clean.
>
> We don't just refuse to run with stdout as a destination because it's
> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
> as I know, tell whether it's being invoked by cmd or something else.
ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

> If you have concrete ideas on how to improve this they'd be welcomed.
> Is there anywhere you expected to find info in the docs? Do you know
> of a way to detect in Windows if the output stream is not 8-bit clean
> from within the application program? ... other?
Actually, I'm not that familiar with windows environment. But, I
couldn't find any note to user in pg_dump documentation regarding the
issue. In cmd, if someone needs a plain-text dump in compressed
format, they should specify the output file, otherwise they may run
into the above problem. However, if a dump is corrupted due to the
above issue, a fix for that is provided in [1]. Should we include this
in the documentation?



[1] http://www.gzip.org/
Use fixgz.c to remove the extra CR (carriage return) bytes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
 wrote:
> Hello,
> In Windows, if one needs to take a dump in plain text format (this is
> the default option, or can be specified using -Fp) with some level of
> compression (-Z[0-9]), an output file has to
> be specified. Otherwise, if the output is redirected to stdout, it'll
> create a corrupted dump (cmd is set to ASCII mode, so it'll put
> carriage returns in the file).
To reproduce the issue, please use the following command in windows cmd:

pg_dump -Z 9 test > E:\test_xu.backup
pg_dump -Fp -Z 9 test > E:\test_xu.backup


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create_unique_path and GEQO

2017-03-24 Thread Ashutosh Bapat
On Fri, Mar 24, 2017 at 11:52 AM, Rushabh Lathia
 wrote:
>
>
> On Wed, Mar 22, 2017 at 3:43 PM, Ashutosh Bapat
>  wrote:
>>
>> Hi,
>> In create_unique_path() there's comment
>> /*
>>  * We must ensure path struct and subsidiary data are allocated in
>> main
>>  * planning context; otherwise GEQO memory management causes trouble.
>>  */
>> oldcontext = MemoryContextSwitchTo(root->planner_cxt);
>>
>> pathnode = makeNode(UniquePath);
>>
>> This means that when GEQO resets the memory context, the RelOptInfo
>> for which this path is created and may be set to cheapest_unique_path
>> goes away, the unique path lingers on in the planner context.
>> Shouldn't we instead allocate the path in the same context as the
>> RelOptInfo similar to mark_dummy_rel()?
>>
>
> Do you have test case, which can reproduce the issue you
> explained above?
>

No. It would require some surgery in standard_planner() to measure the
memory consumed in the planner context OR build the code with
SHOW_MEMORY_STATS defined and dump memory context statistics and check
planner context memory usage. I don't think I can produce a testcase
quickly right now. But then, I think the problem is quite apparent
from the code inspection alone, esp. comparing what mark_dummy_rel()
does with what create_unique_path() is doing.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-03-24 Thread Michael Paquier
On Fri, Mar 24, 2017 at 1:20 PM, Peter Eisentraut
 wrote:
> On 2/15/17 12:11, Robert Haas wrote:
>> On Wed, Feb 15, 2017 at 11:30 AM, Peter Eisentraut
>>  wrote:
>>> If RegisterBackgroundWorker() (the non-dynamic kind that is only
>>> loadable from shared_preload_libraries) fails to register the worker, it
>>> writes a log message and proceeds, ignoring the registration request.  I
>>> think that is a mistake, it should be a hard error.  The only way in
>>> practice to fix the problem is to change shared_preload_libraries or
>>> max_worker_processes, both requiring a restart anyway, so proceeding
>>> without the worker is not useful.
>>
>> I guess the question is whether people will prefer to have the
>> database start up and be missing the worker, or to have it not start.
>> As you point out, the former is likely to result in an eventual
>> restart, but the latter may lead to a longer period of downtime RIGHT
>> NOW.  People tend to really hate things that make the database not
>> start, so I'm not sure what's best here.
>
> Any other thoughts on this?  Seems like a potential usability issue.

What if we just let the user choose what they want with a new switch
in bgw_flags, but keep LOG the default? One behavior and the other
look both sensible to me.

@@ -824,7 +824,8 @@ RegisterBackgroundWorker(BackgroundWorker *worker)
   "Up to %d background workers can be
registered with the current settings.",
   max_worker_processes,
   max_worker_processes),
- errhint("Consider increasing the configuration
parameter \"max_worker_processes\".")));
+ errhint("Consider increasing the configuration
parameter \"max_worker_processes\"."),
+ errcontext("registration of background worker
\"%s\"", worker->bgw_name)));
 return;
 }
No issues with this bit in 0001.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-24 Thread Rajkumar Raghuwanshi
On Fri, Mar 24, 2017 at 12:38 AM, Amit Khandekar  wrote:
> Meanwhile, attached is a WIP patch v10. The only change in this patch
> w.r.t. the last patch (v9) is that this one has a new function defined
> append_nonpartial_cost(). Just sending this to show how the algorithm
> looks like; haven't yet called it.
>

Hi,

I have given patch on latest pg sources (on commit
457a4448732881b5008f7a3bcca76fc299075ac3). configure and make all
install ran successfully, but initdb failed with below error.

[edb@localhost bin]$ ./initdb -D data
The files belonging to this database system will be owned by user "edb".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data ... ok
creating subdirectories ... ok
selecting default max_connections ... sh: line 1:  3106 Aborted
 (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=100 -c shared_buffers=1000 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1:  3112 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=50 -c shared_buffers=500 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1:  3115 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=40 -c shared_buffers=400 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1:  3118 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=30 -c shared_buffers=300 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1:  3121 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=20 -c shared_buffers=200 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1:  3124 Aborted (core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=10 -c shared_buffers=100 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
10
selecting default shared_buffers ... sh: line 1:  3127 Aborted
(core dumped)
"/home/edb/WORKDB/PG3/postgresql/inst/bin/postgres" --boot -x0 -F -c
max_connections=10 -c shared_buffers=16384 -c
dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... TRAP:
FailedAssertion("!(LWLockTranchesAllocated >=
LWTRANCHE_FIRST_USER_DEFINED)", File: "lwlock.c", Line: 501)
child process was terminated by signal 6: Aborted
initdb: removing data directory "data"

[edb@localhost bin]$

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Mark Kirkwood

On 24/03/17 12:32, Petr Jelinek wrote:


On 24/03/17 00:14, Mark Kirkwood wrote:

On 24/03/17 02:00, Peter Eisentraut wrote:

On 3/21/17 21:38, Peter Eisentraut wrote:

This patch is looking pretty good to me, modulo the failing pg_dump
tests.

Attached is a fixup patch.  I have mainly updated some comments and
variable naming for (my) clarity.  No functional changes.

Committed all that.


Testing now this patch is in, I'm unable to create a subscription:

(master)

bench=# CREATE PUBLICATION pgbench
 FOR TABLE pgbench_accounts , pgbench_branches,
   pgbench_tellers, pgbench_history
 WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);

(slave)

bench=# CREATE SUBSCRIPTION pgbench
 CONNECTION 'port=5447 user=postgres dbname=bench'
 PUBLICATION pgbench
 WITH (COPY DATA);
ERROR:  duplicate key value violates unique constraint
"pg_subscription_rel_srrelid_srsubid_index"
DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.

This is a pair of freshly initdb'ed instances, the master has a size 100
pgbench schema.

I'm guessing this is a different bug from the segfault also reported


Yes, I also forgot to check if the table actually exists on subscriber
when fetching them in CREATE SUBSCRIPTION (we have check during
replication but not there).

Attached patches should fix both issues.




Yep, does seem to.

I note that (probably intensional) specifying 'COPY DATA' does not 
'CREATE TABLES' for you...ok, I probably didn't read ...something 
somewhere...but anyway, after creating the tables it all seems to work. 
Nice.


However one minor observation - as Michael Banck noted - the elapsed 
time for slave to catch up after running:


$ pgbench -c8 -T600 bench

on the master was (subjectively) much longer than for physical streaming 
replication. Is this expected?


regards

Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-03-24 Thread Rahila Syed
Hello Rushabh,

Thank you for reviewing.
Have addressed all your comments in the attached patch. The attached patch
currently throws an
error if a new partition is added after default partition.

>Rather then adding check for default here, I think this should be handle
inside
>get_qual_for_list().
Have moved the check inside get_qual_for_partbound() as needed to do some
operations
before calling get_qual_for_list() for default partitions.

Thank you,
Rahila Syed

On Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia 
wrote:

> I picked this for review and noticed that patch is not getting
> cleanly complied on my environment.
>
> partition.c: In function ‘RelationBuildPartitionDesc’:
> partition.c:269:6: error: ISO C90 forbids mixed declarations and code
> [-Werror=declaration-after-statement]
>   Const*val = lfirst(c);
>   ^
> partition.c: In function ‘generate_partition_qual’:
> partition.c:1590:2: error: ISO C90 forbids mixed declarations and code
> [-Werror=declaration-after-statement]
>   PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
>   ^
> partition.c: In function ‘get_partition_for_tuple’:
> partition.c:1820:5: error: array subscript has type ‘char’
> [-Werror=char-subscripts]
>  result = parent->indexes[partdesc->boundinfo->def_index];
>  ^
> partition.c:1824:16: error: assignment makes pointer from integer without
> a cast [-Werror]
>  *failed_at = RelationGetRelid(parent->reldesc);
> ^
> cc1: all warnings being treated as errors
>
> Apart from this, I was reading patch here are few more comments:
>
> 1) Variable initializing happening at two place.
>
> @@ -166,6 +170,8 @@ RelationBuildPartitionDesc(Relation rel)
>  /* List partitioning specific */
>  PartitionListValue **all_values = NULL;
>  boolfound_null = false;
> +boolfound_def = false;
> +intdef_index = -1;
>  intnull_index = -1;
>
>  /* Range partitioning specific */
> @@ -239,6 +245,8 @@ RelationBuildPartitionDesc(Relation rel)
>  i = 0;
>  found_null = false;
>  null_index = -1;
> +found_def = false;
> +def_index = -1;
>  foreach(cell, boundspecs)
>  {
>  ListCell   *c;
> @@ -249,6 +257,15 @@ RelationBuildPartitionDesc(Relation rel)
>
>
> 2)
>
> @@ -1558,6 +1586,19 @@ generate_partition_qual(Relation rel)
>  bound = stringToNode(TextDatumGetCString(boundDatum));
>  ReleaseSysCache(tuple);
>
> +/* Return if it is a default list partition */
> +PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
> +ListCell *cell;
> +foreach(cell, spec->listdatums)
>
> More comment on above hunk is needed?
>
> Rather then adding check for default here, I think this should be handle
> inside
> get_qual_for_list().
>
> 3) Code is not aligned with existing
>
> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index 6316688..ebb7db7 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -2594,6 +2594,7 @@ partbound_datum:
>  Sconst{ $$ = makeStringConst($1, @1); }
>  | NumericOnly{ $$ = makeAConst($1, @1); }
>  | NULL_P{ $$ = makeNullAConst(@1); }
> +| DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
>  ;
>
>
> 4) Unnecessary hunk:
>
> @@ -2601,7 +2602,6 @@ partbound_datum_list:
>  | partbound_datum_list ',' partbound_datum
>  { $$ = lappend($1, $3); }
>  ;
> -
>
> Note: this is just an initially review comments, I am yet to do the
> detailed review
> and the testing for the patch.
>
> Thanks.
>
> On Mon, Mar 20, 2017 at 9:27 AM, Rahila Syed 
> wrote:
>
>> Hello,
>>
>> Please find attached a rebased patch with support for pg_dump. I am
>> working on the patch
>> to handle adding a new partition after a default partition by throwing an
>> error if
>> conflicting rows exist in default partition and adding the partition
>> successfully otherwise.
>> Will post an updated patch by tomorrow.
>>
>> Thank you,
>> Rahila Syed
>>
>> On Mon, Mar 13, 2017 at 5:42 AM, Robert Haas 
>> wrote:
>>
>>> On Fri, Mar 10, 2017 at 2:17 PM, Peter Eisentraut
>>>  wrote:
>>> > On 3/2/17 21:40, Robert Haas wrote:
>>> >> On the point mentioned above, I
>>> >> don't think adding a partition should move tuples, necessarily; seems
>>> >> like it would be good enough - maybe better - for it to fail if there
>>> >> are any that would need to be moved.
>>> >
>>> > ISTM that the uses cases of various combinations of adding a default
>>> > partition, adding another partition after it, removing the default
>>> > partition, clearing out the default partition in order to add more
>>> > nondefault partitions, and so on, need to be more clearly spelled out

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Amit Kapila
On Fri, Mar 24, 2017 at 12:25 AM, Pavan Deolasee
 wrote:
>
>
> On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila 
> wrote:
>>
>> >
>>
>> I am not sure on what basis user can set such parameters, it will be
>> quite difficult to tune those parameters.  I think the point is
>> whatever threshold we keep, once that is crossed, it will perform two
>> scans for all the indexes.
>
>
> Well, that applies to even vacuum parameters, no?
>

I don't know how much we can directly compare the usability of the new
parameters you are proposing here to existing parameters.

> The general sense I've got
> here is that we're ok to push some work in background if it helps the
> real-time queries, and I kinda agree with that.
>

I don't think we can define this work as "some" work, it can be a lot
of work depending on the number of indexes.  Also, I think for some
cases it will generate maintenance work without generating benefit.
For example, when there is one index on a table and there are updates
for that index column.

> Having said that, I see many ways we can improve on this later. Like we can
> track somewhere else information about tuples which may have received WARM
> updates (I think it will need to be a per-index bitmap or so) and use that
> to do WARM chain conversion in a single index pass.
>

Sure, if we have some way to do it in a single pass or does most of
the time in foreground process (like we have some dead marking idea
for indexes), then it would have been better.

> But this is clearly not
> PG 10 material.
>

I don't see much discussion about this aspect of the patch, so not
sure if it is acceptable to increase the cost of vacuum.  Now, I don't
know if your idea of GUC's make it such that the additional cost will
occur seldom and this additional pass has a minimal impact which will
make it acceptable.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix and simplify check for whether we're running as Windows serv

2017-03-24 Thread Heikki Linnakangas

On 03/22/2017 07:44 PM, Robert Haas wrote:

On Wed, Mar 22, 2017 at 10:13 AM, Alvaro Herrera
 wrote:

Heikki Linnakangas wrote:

I did some archeology, and found CheckTokenMembership() in MinGW's w32api
packages version 3.14
(https://sourceforge.net/projects/mingw/files/MinGW/Base/w32api/w32api-3.14/,
in include/winbase.h). According to the timestamps on that download page,
that was released in 2009. That was the oldest version I could find, so it
might go even further back.

Dave, do you know exactly what version of MinGW narwhal is running? And how
difficult is it to upgrade to something slightly more modern? Ease of
upgrade is another good data point on how far we need to support old
versions.


Given that this was backpatched and that it broke narwhal in all
branches, I think the solution needs to make narwhal work again without
requiring it to upgrade; so we should acquire CheckTokenMembership via
dynloading just like we do the other functions.  If we want to require a
newer mingw version in pg10, that's acceptable, but it should be a
separate patch.


+1 for not moving the minimum system requirements in the back-branches.


Ok. I reverted this patch in the back-branches, and applied the much 
less invasive "V2" patch [1] instead. HEAD is unchanged, so narwhal 
still fails there.


Dave: the consensus is that we no longer support the old version of 
MinGW that narwhal is using, for PostgreSQL v 10. Can you modify the 
configuration of narwhal to not try building 'master' anymore, or 
upgrade the toolchain, please?


[1] 
https://www.postgresql.org/message-id/CAB7nPqSvfu%3DKpJ%3DNX%2BYAHmgAmQdzA7N5h31BjzXeMgczhGCC%2BQ%40mail.gmail.com


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Kuntal Ghosh
On Fri, Mar 24, 2017 at 2:17 PM, Kuntal Ghosh
 wrote:
> On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer  wrote:
>> On 24 March 2017 at 14:07, Kuntal Ghosh  wrote:
>>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>>>  wrote:
 Hello,
 In Windows, if one needs to take a dump in plain text format (this is
 the default option, or can be specified using -Fp) with some level of
 compression (-Z[0-9]), an output file has to
 be specified. Otherwise, if the output is redirected to stdout, it'll
 create a corrupted dump (cmd is set to ASCII mode, so it'll put
 carriage returns in the file).
>>> To reproduce the issue, please use the following command in windows cmd:
>>>
>>> pg_dump -Z 9 test > E:\test_xu.backup
>>> pg_dump -Fp -Z 9 test > E:\test_xu.backup
>>
>> This is a known problem. It is not specific to PostgreSQL, it affects
>> any software that attempts to use stdin/stdout on Windows via cmd,
>> where it is not 8-bit clean.
>>
>> We don't just refuse to run with stdout as a destination because it's
>> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
>> as I know, tell whether it's being invoked by cmd or something else.
> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
> whenever we want compression on a plain-text dump file, we can set the
> stdout mode to O_BINARY. Is it a wrong approach?
With the help from Ashutosh Sharma, I tested this in Windows
environment. Sadly, it still doesn't work. :( IMHO, we should document
the issue somewhere.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> There is a function for that.
[...]
> There is not a function for that, but there could be one.

I'm not sure you've really considered what you're suggesting here.

We need to to make sure we have every file between two LSNs.  Yes, we
could step the LSN forward one byte at a time, calling the appropriate
function for every single byte, to make sure that we have that file, but
that really isn't a reasonable approach.  Nor would it be reasonable if
we go on the assumption that WAL files can't be less than 1MB.

Beyond that, this also bakes in an assumption that we would then require
access to a database (of a current enough version to have the functions
needed too!) to connect to and run these functions, which is a poor
design.  If the user is using a remote system to gather the WAL on, that
system may not have easy access to PG.  Further, backup tools will want
to do things like off-line verification that the backup is complete,
perhaps in another environment entirely which doesn't have PG, or maybe
where what they're trying to do is make sure that a given backup is good
before starting a restore to bring PG back up.

Also, given that one of the things we're talking about here is
specifically that we want to be able to change the WAL size for
different databases, you would have to make sure that the database
you're running these functions on uses the same WAL file size as the one
which is being backed up.

No, I don't agree that we can claim the LSN -> WAL filename mapping is
an internal PG detail that we can whack around because there are
functions to calculate the answer.  External utilities need to be able
to perform that translation and we need to document for them how to do
so correctly.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Aleksander Alekseev
Hi Anastasia,

Thanks a lot for a review!

As was mentioned above there is also a bottleneck in find_all_inheritors
procedure. Turned out the problem there is similar and it could be easily
fixed as well. Corresponding patch is attached to this message. To keep
things in order I'm attaching the previous patch as well.

On Wed, Mar 22, 2017 at 11:53:45AM +, Anastasia Lubennikova wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:not tested
> 
> The patch looks good to me.
> It applies clearly, passes all the tests and eliminates the bottleneck 
> described in the first message.
> And as I can see from the thread, there were no objections from others,
> except a few minor comments about code style, which are fixed in the last 
> version of the patch.
> So I mark it "Ready for committer".
> 
> The new status of this patch is: Ready for Committer
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7cacb1e9b2..a22a5a37c8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -161,6 +161,20 @@ typedef struct TabStatusArray
 static TabStatusArray *pgStatTabList = NULL;
 
 /*
+ * pgStatTabHash entry
+ */
+typedef struct TabStatHashEntry
+{
+	Oid t_id;
+	PgStat_TableStatus* tsa_entry;
+} TabStatHashEntry;
+
+/*
+ * Hash table for O(1) t_id -> tsa_entry lookup
+ */
+static HTAB *pgStatTabHash = NULL;
+
+/*
  * Backends store per-function info that's waiting to be sent to the collector
  * in this hash table (indexed by function OID).
  */
@@ -815,7 +829,13 @@ pgstat_report_stat(bool force)
 pgstat_send_tabstat(this_msg);
 this_msg->m_nentries = 0;
 			}
+
+			/*
+			 * Entry will be zeroed soon so we need to remove it from the lookup table.
+			 */
+			hash_search(pgStatTabHash, >t_id, HASH_REMOVE, NULL);
 		}
+
 		/* zero out TableStatus structs after use */
 		MemSet(tsa->tsa_entries, 0,
 			   tsa->tsa_used * sizeof(PgStat_TableStatus));
@@ -1667,59 +1687,77 @@ pgstat_initstats(Relation rel)
 }
 
 /*
+ * Make sure pgStatTabList and pgStatTabHash are initialized.
+ */
+static void
+make_sure_stat_tab_initialized()
+{
+	HASHCTL ctl;
+
+	if (pgStatTabList)
+		return;
+
+	Assert(!pgStatTabHash);
+
+	memset(, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(TabStatHashEntry);
+	ctl.hcxt = TopMemoryContext;
+
+	pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup table",
+		TABSTAT_QUANTUM, , HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	pgStatTabList = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+sizeof(TabStatusArray));
+}
+
+/*
  * get_tabstat_entry - find or create a PgStat_TableStatus entry for rel
  */
 static PgStat_TableStatus *
 get_tabstat_entry(Oid rel_id, bool isshared)
 {
+	TabStatHashEntry* hash_entry;
 	PgStat_TableStatus *entry;
 	TabStatusArray *tsa;
-	TabStatusArray *prev_tsa;
-	int			i;
+	bool found;
+
+	make_sure_stat_tab_initialized();
+
+	/*
+	 * Find an entry or create a new one.
+	 */
+	hash_entry = hash_search(pgStatTabHash, _id, HASH_ENTER, );
+	if(found)
+		return hash_entry->tsa_entry;
 
 	/*
-	 * Search the already-used tabstat slots for this relation.
+	 * `hash_entry` was just created and now we have to fill it.
+	 * First make sure there is a free space in a last element of pgStatTabList.
 	 */
-	prev_tsa = NULL;
-	for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = tsa->tsa_next)
+	tsa = pgStatTabList;
+	while(tsa->tsa_used == TABSTAT_QUANTUM)
 	{
-		for (i = 0; i < tsa->tsa_used; i++)
+		if(tsa->tsa_next == NULL)
 		{
-			entry = >tsa_entries[i];
-			if (entry->t_id == rel_id)
-return entry;
+			tsa->tsa_next = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+		sizeof(TabStatusArray));
 		}
 
-		if (tsa->tsa_used < TABSTAT_QUANTUM)
-		{
-			/*
-			 * It must not be present, but we found a free slot instead. Fine,
-			 * let's use this one.  We assume the entry was already zeroed,
-			 * either at creation or after last use.
-			 */
-			entry = >tsa_entries[tsa->tsa_used++];
-			entry->t_id = rel_id;
-			entry->t_shared = isshared;
-			return entry;
-		}
+		tsa = tsa->tsa_next;
 	}
 
 	/*
-	 * We ran out of tabstat slots, so allocate more.  Be sure they're zeroed.
-	 */
-	tsa = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
-	sizeof(TabStatusArray));
-	if (prev_tsa)
-		prev_tsa->tsa_next = tsa;
-	else
-		pgStatTabList = tsa;
-
-	/*
-	 * Use the first entry of the new TabStatusArray.
+	 * Add an entry.
 	 */
 	entry = >tsa_entries[tsa->tsa_used++];
 	entry->t_id = rel_id;
 	entry->t_shared = isshared;
+

Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread David Steele

On 3/24/17 12:27 AM, Peter Eisentraut wrote:

On 3/23/17 16:58, Stephen Frost wrote:

The backup tools need to also get the LSN from the pg_stop_backup and
verify that they have the WAL file associated with that LSN.


There is a function for that.


They also
need to make sure that they have all of the WAL files between the
starting LSN and the ending LSN.  Doing that requires understanding how
the files are named to make sure there aren't any missing.


There is not a function for that, but there could be one.


A function would be nice, but tools often cannot depend on the database 
being operational so it's still necessary to re-implement them.  Having 
a sane sequence in the WAL makes that easier.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Petr Jelinek
On 24/03/17 11:23, Erik Rijkers wrote:
> On 2017-03-24 10:45, Mark Kirkwood wrote:
>>
>> However one minor observation - as Michael Banck noted - the elapsed
>> time for slave to catch up after running:
>>
>> $ pgbench -c8 -T600 bench
>>
>> on the master was (subjectively) much longer than for physical
>> streaming replication. Is this expected?
>>
> 
> I think you probably want to do (on the slave) :
> 
>   alter role  set synchronous_commit = off;
> 
> otherwise it's indeed extremely slow.
> 

Yes that might be one reason, see
https://www.postgresql.org/message-id/08b7053b-5679-0733-3af7-00b8cde62...@2ndquadrant.com
(although it probably needs rebase now)

Also don't forget that the snapbuild performance fixes haven't been
committed yet so it can take long time to even start.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-24 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/22/17 09:17, Stephen Frost wrote:
> >> If we do it via GRANTs instead, then users can easily extend it.
> > The intent here is that users will *also* be able to do it via GRANTs if
> > they wish to.
> 
> But why not do it with GRANTs in the first place then?

This is akin to asking why do we need GRANT ALL and ALTER DEFAULT PRIVs.

Would it be technically possible to make users jump through hoops every
time they set up a new system to create their own monitor role that
would then have the right set of privileges for *that* version of PG?
Yes, but it's not exactly friendly.  The whole idea here is that the
tool authors will be able to tell the users that they just need to GRANT
this one role, not a script of 20 GRANT statements, oh, and that it
won't break when doing upgrades.

If we make the users run all the statements individually then they'll
also have to get an updated script for the next version of PG too
because we will have added things that the tools will want access to.

Further, they'll have to make sure to install all the contrib modules
they want to use before running the GRANT script which is provided, or
deal with GRANT'ing the rights out after installing some extension.

With the approach that Dave and I are advocating, we can avoid all of
that.  Contrib modules can bake-in GRANTs to the appropriate roles,
upgrades can be handled smoothly even when we add new capabilities which
are appropriate, users have a simple and straight-forward way to set up
good monitoring, and tool authors will know what permissions are
available and can finally have a better answer than "well, just make the
monior user superuser if you want to avoid all these complexities."

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Erik Rijkers

On 2017-03-24 10:45, Mark Kirkwood wrote:


However one minor observation - as Michael Banck noted - the elapsed
time for slave to catch up after running:

$ pgbench -c8 -T600 bench

on the master was (subjectively) much longer than for physical
streaming replication. Is this expected?



I think you probably want to do (on the slave) :

  alter role  set synchronous_commit = off;

otherwise it's indeed extremely slow.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-24 Thread Teodor Sigaev

Thank you all, pushed

Michael Paquier wrote:

On Fri, Mar 24, 2017 at 1:45 AM, Teodor Sigaev  wrote:

I believe patch looks good and it's ready to commit.


Thanks for the review!


As I understand, it fixes bug introduced by
commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200

Implement backup API functions for non-exclusive backups


Indeed.


And, suppose, it should be backpatched to 9.6?


Yes, that's where non-exclusive backups have been introduced.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Pavan Deolasee
On Fri, Mar 24, 2017 at 4:04 PM, Amit Kapila 
wrote:

> On Fri, Mar 24, 2017 at 12:25 AM, Pavan Deolasee
>  wrote:
> >
> >
> > On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila 
> >
> > The general sense I've got
> > here is that we're ok to push some work in background if it helps the
> > real-time queries, and I kinda agree with that.
> >
>
> I don't think we can define this work as "some" work, it can be a lot
> of work depending on the number of indexes.  Also, I think for some
> cases it will generate maintenance work without generating benefit.
> For example, when there is one index on a table and there are updates
> for that index column.
>
>
That's a fair point. I think we can address this though. At the end of
first index scan we would know how many warm pointers the index has and
whether it's worth doing a second scan. For the case you mentioned, we will
do a second scan just on that one index and skip on all other indexes and
still achieve the same result. On the other hand, if one index receives
many updates and other indexes are rarely updated then we might leave
behind a few WARM chains behind and won't be able to do IOS on those pages.
But given the premise that other indexes are receiving rare updates, it may
not be a problem. Note: the code is not currently written that way, but it
should be a fairly small change.

The other thing that we didn't talk about is that vacuum will need to track
dead tuples and warm candidate chains separately which increases memory
overhead. So for very large tables, and for the same amount of
maintenance_work_mem, one round of vacuum will be able to clean lesser
pages. We can work out more compact representation, but something not done
currently.


>
> > But this is clearly not
> > PG 10 material.
> >
>
> I don't see much discussion about this aspect of the patch, so not
> sure if it is acceptable to increase the cost of vacuum.  Now, I don't
> know if your idea of GUC's make it such that the additional cost will
> occur seldom and this additional pass has a minimal impact which will
> make it acceptable.


Yeah, I agree. I'm trying to schedule some more benchmarks, but any help is
appreciated.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] [psql] patch to fix ordering in words_after_create array

2017-03-24 Thread Rushabh Lathia
Hi All,

While looking at the code around tab-complete.c, I
found the ordering in words_after_create array is not
correct for DEFAULT PRIVILEGES, which been added
under below commit:

commit d7d77f3825122bde55be9e06f6c4851028b99795
Author: Peter Eisentraut 
Date:   Thu Mar 16 18:54:28 2017 -0400

psql: Add completion for \help DROP|ALTER

PFA patch to fix the same.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index f749406..02a1571 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -999,8 +999,8 @@ static const pgsql_thing_t words_after_create[] = {
 	{"CONFIGURATION", Query_for_list_of_ts_configurations, NULL, THING_NO_SHOW},
 	{"CONVERSION", "SELECT pg_catalog.quote_ident(conname) FROM pg_catalog.pg_conversion WHERE substring(pg_catalog.quote_ident(conname),1,%d)='%s'"},
 	{"DATABASE", Query_for_list_of_databases},
-	{"DICTIONARY", Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW},
 	{"DEFAULT PRIVILEGES", NULL, NULL, THING_NO_CREATE | THING_NO_DROP},
+	{"DICTIONARY", Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW},
 	{"DOMAIN", NULL, _for_list_of_domains},
 	{"EVENT TRIGGER", NULL, NULL},
 	{"EXTENSION", Query_for_list_of_extensions},

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Fix and simplify check for whether we're running as Windows serv

2017-03-24 Thread Dave Page
On Friday, March 24, 2017, Heikki Linnakangas  wrote:

> On 03/22/2017 07:44 PM, Robert Haas wrote:
>
>> On Wed, Mar 22, 2017 at 10:13 AM, Alvaro Herrera
>>  wrote:
>>
>>> Heikki Linnakangas wrote:
>>>
 I did some archeology, and found CheckTokenMembership() in MinGW's
 w32api
 packages version 3.14
 (https://sourceforge.net/projects/mingw/files/MinGW/Base/
 w32api/w32api-3.14/,
 in include/winbase.h). According to the timestamps on that download
 page,
 that was released in 2009. That was the oldest version I could find, so
 it
 might go even further back.

 Dave, do you know exactly what version of MinGW narwhal is running? And
 how
 difficult is it to upgrade to something slightly more modern? Ease of
 upgrade is another good data point on how far we need to support old
 versions.

>>>
>>> Given that this was backpatched and that it broke narwhal in all
>>> branches, I think the solution needs to make narwhal work again without
>>> requiring it to upgrade; so we should acquire CheckTokenMembership via
>>> dynloading just like we do the other functions.  If we want to require a
>>> newer mingw version in pg10, that's acceptable, but it should be a
>>> separate patch.
>>>
>>
>> +1 for not moving the minimum system requirements in the back-branches.
>>
>
> Ok. I reverted this patch in the back-branches, and applied the much less
> invasive "V2" patch [1] instead. HEAD is unchanged, so narwhal still fails
> there.
>
> Dave: the consensus is that we no longer support the old version of MinGW
> that narwhal is using, for PostgreSQL v 10. Can you modify the
> configuration of narwhal to not try building 'master' anymore, or upgrade
> the toolchain, please?
>

I've disabled it. Thanks.


-- 
Dave Page
PostgreSQL Core Team
http://www.postgresql.org/


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-24 Thread Aleksander Alekseev
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

This patch looks good to me. As I understand we have both a complete feature 
and a consensus in a thread here. If there are no objection, I'm marking this 
patch as "Ready for Commiter".

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-24 Thread Heikki Linnakangas

On 03/24/2017 03:02 PM, Michael Paquier wrote:

In order to close this thread, I propose to reuse the patches I sent
here to make scram_build_verifier() available to frontends:
https://www.postgresql.org/message-id/CAB7nPqT4yc3u8wspYkWbG088Ndp6asMH3=zb___ck89ctvz...@mail.gmail.com

And on top of it modify \password so as it generates a md5 verifier
for pre-9.6 servers and a scram one for post-10 servers by looking at
the backend version of the current connection. What do you think?


Yep, sounds like a plan.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread David Steele

On 3/23/17 4:45 PM, Peter Eisentraut wrote:

On 3/22/17 17:33, David Steele wrote:

I think if we don't change the default size it's very unlikely I would
use alternate WAL segment sizes or recommend that anyone else does, at
least in v10.

I simply don't think it would get the level of testing required to be
production worthy


I think we could tweak the test harnesses to run all the tests with
different segment sizes.  That would get pretty good coverage.


I would want to see 1,16,64 at a minimum.  More might be nice but that 
gets a bit ridiculous at some point.  I would be fine with different 
critters having different defaults.  I don't think that each critter 
needs to test each value.



More generally, the methodology that we should not add an option unless
we also change the default because the option would otherwise not get
enough testing is a bit dubious.


Generally, I would agree, but I think this is a special case.  This 
option has been around for a long time and we are just now exposing it 
in a way that's useful to end users.  It's easy to see how various 
assumptions may have arisen around the default and led to code that is 
not quite right when using different values.  Even if that's not true in 
the backend code, it might affect bin, and certainly affects third party 
tools.


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Peter Eisentraut
On 3/23/17 19:32, Petr Jelinek wrote:
> Yes, I also forgot to check if the table actually exists on subscriber
> when fetching them in CREATE SUBSCRIPTION (we have check during
> replication but not there).

I think for this we can probably just change the missing_ok argument of
RangeVarGetRelid() to false.

Unless we want the custom error message, in which case we need to change
AlterSubscription_refresh(), because right now it errors out because
missing_ok = false.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Petr Jelinek
On 24/03/17 15:05, Peter Eisentraut wrote:
> On 3/23/17 19:32, Petr Jelinek wrote:
>> Yes, I also forgot to check if the table actually exists on subscriber
>> when fetching them in CREATE SUBSCRIPTION (we have check during
>> replication but not there).
> 
> I think for this we can probably just change the missing_ok argument of
> RangeVarGetRelid() to false.
> 
> Unless we want the custom error message, in which case we need to change
> AlterSubscription_refresh(), because right now it errors out because
> missing_ok = false.
> 

You are right, stupid me.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 81c7750095cfac35fad9a841309b2c5501e59a62 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Mar 2017 00:24:47 +0100
Subject: [PATCH 2/2] Check that published table exists on subscriber

---
 src/backend/commands/subscriptioncmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8f201b2..efe70b4 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -416,7 +416,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 RangeVar   *rv = (RangeVar *) lfirst(lc);
 Oid			relid;
 
-relid = RangeVarGetRelid(rv, AccessShareLock, true);
+relid = RangeVarGetRelid(rv, AccessShareLock, false);
 
 SetSubscriptionRelState(subid, relid, table_state,
 		InvalidXLogRecPtr);
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-24 Thread Rafia Sabih
On Thu, Mar 23, 2017 at 11:26 PM, Robert Haas  wrote:

>
> The changes to the plpgsql code don't look so good to me.  The change
> to exec_stmt_return_query fixes the same bug that I mentioned in the
> email linked above, but only half of it -- it corrects the RETURN
> QUERY EXECUTE case but not the RETURN QUERY case.  And it's clearly a
> separate change; that part is a bug fix, not an enhancement.

My bad. Since, you have given this as a separate patch in the link
upthread, I suppose there's nothing expected from me regarding this
right now.

> Some of
> the other changes depend on whether we're in a trigger, which seems
> irrelevant to whether we can use parallelism. Even if the outer query
> is doing writes, we can still use parallelism for queries inside the
> trigger function if warranted.  It's probably a rare case to have
> queries inside a trigger that are expensive enough to justify such
> handling but I don't see the value of putting in special-case logic to
> prevent it.
>
Fixed. I confused it with not allowing parallel workers when update
command is in progress.

> I suspect that code fails to achieve its goals anyway.  At the top of
> exec_eval_expr(), you call exec_prepare_plan() and unconditionally
> pass CURSOR_OPT_PARALLEL_OK, so when that function returns, expr->plan
> might now be a parallel plan.  If we reach the call to
> exec_run_select() further down in that function, and if we happen to
> pass false, it's not going to matter, because exec_run_select() is
> going to find the plan already initialized.
>
True, fixed.
The attached patch is to be applied over [1].

[1]  
https://www.postgresql.org/message-id/CA%2BTgmoZ_ZuH%2BauEeeWnmtorPsgc_SmP%2BXWbDsJ%2BcWvWBSjNwDQ%40mail.gmail.com
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


pl_parallel_opt_support_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-03-24 Thread Amit Langote
Hi Ashutosh,

On 2017/03/23 21:48, Ashutosh Bapat wrote:
>>> I have fixed all the issues reported till now.

I've tried to fix your 0012 patch (Multi-level partitioned table
expansion) considering your message earlier on this thread [1].
Especially the fact that no AppendRelInfo and RelOptInfo are allocated for
partitioned child tables as of commit d3cc37f1d [2].  I've fixed
expand_inherited_rtentry() such that AppendRelInfo *is* allocated for a
partitioned child RTEs whose rte->inh is set to true.  Such an RTE is
recursively expanded with that RTE the parent.

Also as I mentioned elsewhere [3], the multi-level inheritance expansion
of partitioned table will break update/delete for partitioned table, which
is because inheritance_planner() is not ready to handle inheritance sets
structured that way.  I tried to refactor inheritance_planner() such that
its core logic can be recursively invoked for partitioned child RTEs.  The
resulting child paths and other auxiliary information related to planning
across the hierarchy are maintained in one place using a struct to hold
the same in a few flat lists.  The refactoring didn't break any existing
tests and a couple of new tests are added to check that it indeed works
for multi-level partitioned tables expanded using new multi-level structure.

There is some test failure in 0014 (Multi-level partition-wise join
tests), probably because of the changes I made to 0012, which I didn't get
time to check why, although I've checked using an example that multi-level
join planning still works, so it's not completely broken either.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFjFpRefs5ZMnxQ2vP9v5zOtWtNPuiMYc01sb1SWjCOB1CT%3DuQ%40mail.gmail.com

[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d3cc37f1d

[3]
https://www.postgresql.org/message-id/744d20fe-fc7b-f89e-8d06-6496ec537b86%40lab.ntt.co.jp
>From 7051b9cb4760908e23e64969988b58fcb466868e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 10 Feb 2017 13:50:14 +0530
Subject: [PATCH 12/14] Multi-level partitioned table expansion.

Construct inheritance hierarchy of a partitioned table to reflect the
partition hierarchy.

Commit d3cc37f1d got rid of AppendRelInfos for *all* partitioned child
child tables.  But we'd need one for recursively expandded partitioned
children.  In its absence, some of the later planning stages would not be
able to access the partitions below the first level, because they are not
directly linked to the root parent.  Fix that by allocating one.  That
means such children also get a RelOptInfo, which also wasn't allocated
previously.

Expanding partitioned table inheritance this new way means we need to
teach a few places how to traverse the hierarchy, such as propagating
lateral join information from the root table down the partition
hierarchy.

Further, inheritance_planner() needs refactoring to enable recursive
invocation of its core logic.  While the query_planner() can handle the
new partition hierarchical structure just fine because of the way the
relevant code works, inheritance_planner() would miss any child tables
below the first level, because it can't do recursion.  Refactor it so
that it can.
---
 src/backend/optimizer/plan/initsplan.c |  14 +-
 src/backend/optimizer/plan/planner.c   | 290 +
 src/backend/optimizer/prep/prepunion.c |  69 ++--
 src/test/regress/expected/inherit.out  |  22 +++
 src/test/regress/sql/inherit.sql   |  17 ++
 5 files changed, 292 insertions(+), 120 deletions(-)

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index b4ac224a7a..20774b29fe 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -630,7 +630,19 @@ create_lateral_join_info(PlannerInfo *root)
 	{
 		RelOptInfo *brel = root->simple_rel_array[rti];
 
-		if (brel == NULL || brel->reloptkind != RELOPT_BASEREL)
+		if (brel == NULL)
+			continue;
+
+		/*
+		 * If an "other rel" RTE is a "partitioned table", we must propagate
+		 * the lateral info inherited all the way from the root parent to its
+		 * children. That's because the children are not linked directly with
+		 * the root parent via AppendRelInfo's unlike in case of a regular
+		 * inheritance set (see expand_inherited_rtentry()).  Failing to
+		 * do this would result in those children not getting marked with the
+		 * appropriate lateral info.
+		 */
+		if (brel->reloptkind != RELOPT_BASEREL && !brel->part_scheme)
 			continue;
 
 		if (root->simple_rte_array[rti]->inh)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4487683196..59cb559b9e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -91,10 +91,28 @@ typedef struct
 	List	   *groupClause;	/* overrides parse->groupClause */
 } standard_qp_extra;
 
+/* Result of a given invocation of 

Re: [HACKERS] scram and \password

2017-03-24 Thread Heikki Linnakangas

On 03/23/2017 06:41 AM, Michael Paquier wrote:

And after a lookup the failure is here:
-   result = get_role_password(port->user_name, _pass, logdetail);
+   shadow_pass = get_role_password(port->user_name, logdetail);
if (result == STATUS_OK)
result is never setup in this code path, so that may crash.


Ah, of course. For some reason, I has -Wno-maybe-uninitialized in my 
configure command line. Without that, gcc even warns about that.


Fixed, and pushed. Thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-24 Thread Michael Paquier
On Fri, Mar 24, 2017 at 7:58 PM, Teodor Sigaev  wrote:
> Thank you all, pushed.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Aleksander Alekseev
Also I would like to share some benchmark results.

Before applying any patches (copied from one of previous messages):

```
latency average = 1153.495 ms
latency stddev = 154.366 ms
tps = 6.061104 (including connections establishing)
tps = 6.061211 (excluding connections establishing)
```

After applying first patch (copied as well):

```
latency average = 853.862 ms
latency stddev = 112.270 ms
tps = 8.191861 (including connections establishing)
tps = 8.192028 (excluding connections establishing)
```

After applying both patches:

```
latency average = 533.646 ms
latency stddev = 86.311 ms
tps = 13.110328 (including connections establishing)
tps = 13.110608 (excluding connections establishing)
```

Small amount of partitions (2 to be exact), no patches:

```
latency average = 0.928 ms
latency stddev = 0.296 ms
tps = 7535.224897 (including connections establishing)
tps = 7536.145457 (excluding connections establishing)
```

Small amount of partitions, bot patches applied:

```
latency average = 0.915 ms
latency stddev = 0.269 ms
tps = 7638.344922 (including connections establishing)
tps = 7639.164769 (excluding connections establishing)
```

As you can see these patches don't make things worse in any regard.

Perf shows that both bottlenecks are gone. Before [1] and after [2].

[1] http://afiskon.ru/s/00/2008c4ae66_temp.png
[2] http://afiskon.ru/s/a5/fd81628a3a_temp.png

On Fri, Mar 24, 2017 at 03:17:44PM +0300, Aleksander Alekseev wrote:
> Hi Anastasia,
> 
> Thanks a lot for a review!
> 
> As was mentioned above there is also a bottleneck in find_all_inheritors
> procedure. Turned out the problem there is similar and it could be easily
> fixed as well. Corresponding patch is attached to this message. To keep
> things in order I'm attaching the previous patch as well.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Simon Riggs
On 1 March 2017 at 01:36, Amit Langote  wrote:

> I don't know which way you're thinking of fixing this, but a planner patch
> to implement faster partition-pruning will have taken care of this, I
> think.  As you may know, even declarative partitioned tables currently
> depend on constraint exclusion for partition-pruning and planner's current
> approach of handling inheritance requires to open all the child tables
> (partitions), whereas the new approach hopefully shouldn't need to do
> that.  I am not sure if looking for a more localized fix for this would be
> worthwhile, although I may be wrong.

What "new approach" are we discussing?

Is there a patch or design discussion?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we cacheline align PGXACT?

2017-03-24 Thread Simon Riggs
On 10 March 2017 at 13:08, Alexander Korotkov 
wrote:

Results look good for me.  Idea of committing both of patches looks
> attractive.
>

I'll commit mine since I understand what it does. I'll look at the other
one also, but won't commit yet.


> We have pretty much acceleration for read-only case and small acceleration
> for read-write case.
> I'll run benchmark on 72-cores machine as well.
>

No need to wait for that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread Stephen Frost
Jeff,

* Jeff Janes (jeff.ja...@gmail.com) wrote:
> On Thu, Mar 23, 2017 at 1:45 PM, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
> > On 3/22/17 17:33, David Steele wrote:
> > > and I doubt that most tool writers would be quick to
> > > add support for a feature that very few people (if any) use.
> >
> > I'm not one of those tool writers, although I have written my share of
> > DBA scripts over the years, but I wonder why those tools would really
> > care.  They are handed files with predetermined names to archive, and
> > for restore files with predetermined names are requested back from them.
> >  What else do they need?  If something is missing that requires them to
> > parse file names, then maybe that should be added.
> 
> I have a pg_restore which predicts the file 5 files ahead of the one it was
> asked for, and initiates a pre-fetch and decompression of it. Then it
> delivers the file it was asked for, either by pulling it out of the
> pre-staging area set up by the N-5th invocation, or by going directly to
> the archive to get it.  This speeds up play-back dramatically when the
> files are stored compressed and non-local.

Ah, yes, that is on our road-map for pgBackrest to do also, along with
parallel WAL fetch which also needs to figure out the WAL names before
being asked for them.

We do already have parallel push, which also needs to figure out what
the upcoming file names are going to be so we can find them and push
them when they're indicated as ready in archive_status.  Perhaps we
could just push whatever is ready and remember everything that was
pushed for when PG asks, but that is really not ideal.

> That is why I need to know how the files are numbered.  I don't think that
> that makes much of a difference, though.  Any change is going to break
> that, no matter which change.  Then I'll fix it.

Right, but the discussion here is actually about the idea that we're
going to encourage people to use the initdb-time option to change the
WAL size, meaning you'll need to deal with different WAL sizes and
different naming due to that, and then we're going to turn around in the
very next release and break the naming, meaning you'll have to adjust
your tools first for the different possible WAL sizes in PG10 and then
adjust again for the different naming in PG11.

I'm trying to suggest that if we're going to do this that, perhaps, we
should try to make both the changes in one release instead of across
two.

> If we are going to break it, I'd prefer to just do away with the 'segment'
> thing altogether.  You have timelines, and you have files.  That's it.

I'm not sure I follow this proposal.  We have to know which WAL file has
which LSN in it, how do you do that with just 'timelines and files'?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Suppress Clang 3.9 warnings

2017-03-24 Thread Aleksander Alekseev
Hi Tom,

Since no one seems to be particularly excited about this patch I'm
marking it as "Returned with feedback" to save reviewers time.

On Wed, Mar 15, 2017 at 12:21:21PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Wed, Mar 15, 2017 at 10:57:15AM -0400, Tom Lane wrote:
> >> Seems like the correct solution is to
> >> absorb that fix, either by updating to a newer autoconf release or by
> >> carrying our own version of AC_CHECK_DECLS until they come out with one.
> 
> > As you mention upthread, that Autoconf commit is still newer than every
> > Autoconf release.  (Last release was 58 months ago.)  Altering configure.ac 
> > to
> > work around the bug would be reasonable, but it feels heavy relative to the
> > benefit of suppressing some warnings.
> 
> It does seem like rather a lot of work, but I think it's preferable to
> hacking up the coding in port.h.  Mainly because we could booby-trap the
> substitute AC_CHECK_DECLS to make sure we revert it whenever autoconf 2.70
> does materialize (a check on m4_PACKAGE_VERSION, like the one at
> configure.in line 22, ought to do the trick); whereas I do not think
> we'd remember to de-kluge port.h if we kluge around it there.
> 
> I'm fine with leaving it alone, too.
> 
>   regards, tom lane

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] scram and \password

2017-03-24 Thread Michael Paquier
On Fri, Mar 24, 2017 at 8:36 PM, Heikki Linnakangas  wrote:
> On 03/23/2017 06:41 AM, Michael Paquier wrote:
>>
>> And after a lookup the failure is here:
>> -   result = get_role_password(port->user_name, _pass, logdetail);
>> +   shadow_pass = get_role_password(port->user_name, logdetail);
>> if (result == STATUS_OK)
>> result is never setup in this code path, so that may crash.
>
> Ah, of course. For some reason, I has -Wno-maybe-uninitialized in my
> configure command line. Without that, gcc even warns about that.
>
> Fixed, and pushed. Thanks!

OK, cool.

In order to close this thread, I propose to reuse the patches I sent
here to make scram_build_verifier() available to frontends:
https://www.postgresql.org/message-id/CAB7nPqT4yc3u8wspYkWbG088Ndp6asMH3=zb___ck89ctvz...@mail.gmail.com

And on top of it modify \password so as it generates a md5 verifier
for pre-9.6 servers and a scram one for post-10 servers by looking at
the backend version of the current connection. What do you think?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Amit Kapila
On Thu, Mar 23, 2017 at 3:54 PM, Pavan Deolasee
 wrote:
> Thanks Amit. v19 addresses some of the comments below.
>
> On Thu, Mar 23, 2017 at 10:28 AM, Amit Kapila 
> wrote:
>>
>> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila 
>> wrote:
>> > On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
>> >  wrote:
>
>>
>> 5.
>> +btrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
>> + Relation heapRel, HeapTuple heapTuple)
>> {
>> ..
>> + if (!datumIsEqual(values[i - 1], indxvalue, att->attbyval,
>> + att->attlen))
>> ..
>> }
>>
>> Will this work if the index is using non-default collation?
>>
>
> Not sure I understand that. Can you please elaborate?
>

I was worried for the case if the index is created non-default
collation, will the datumIsEqual() suffice the need.  Now again
thinking about it, I think it will because in the index tuple we are
storing the value as in heap tuple.  However today it occurred to me
how will this work for toasted index values (index value >
TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
probably won't work for toasted values.  Have you considered that
point?

>>
>> 6.
>> +++ b/src/backend/access/nbtree/nbtxlog.c
>> @@ -390,83 +390,9 @@ btree_xlog_vacuum(XLogReaderState *record)
>> -#ifdef UNUSED
>>   xl_btree_vacuum *xlrec = (xl_btree_vacuum *) XLogRecGetData(record);
>>
>>   /*
>> - * This section of code is thought to be no longer needed, after analysis
>> - * of the calling paths. It is retained to allow the code to be
>> reinstated
>> - * if a flaw is revealed in that thinking.
>> - *
>> ..
>>
>> Why does this patch need to remove the above code under #ifdef UNUSED
>>
>
> Yeah, it isn't strictly necessary. But that dead code was coming in the way
> and hence I decided to strip it out. I can put it back if it's an issue or
> remove that as a separate commit first.
>

I think it is better to keep unrelated changes out of patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] create_unique_path and GEQO

2017-03-24 Thread Tom Lane
Ashutosh Bapat  writes:
>> Do you have test case, which can reproduce the issue you
>> explained above?

> No. It would require some surgery in standard_planner() to measure the
> memory consumed in the planner context OR build the code with
> SHOW_MEMORY_STATS defined and dump memory context statistics and check
> planner context memory usage. I don't think I can produce a testcase
> quickly right now. But then, I think the problem is quite apparent
> from the code inspection alone, esp. comparing what mark_dummy_rel()
> does with what create_unique_path() is doing.

Yeah.  I think the code in mark_dummy_rel is newer and better-thought-out
than what's in create_unique_path.  It probably makes sense to change over.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-24 Thread Dave Page
On Fri, Mar 24, 2017 at 4:24 PM, Robert Haas  wrote:
>
>> If we make the users run all the statements individually then they'll
>> also have to get an updated script for the next version of PG too
>> because we will have added things that the tools will want access to.
>
> That's possible, but it really depends on the tool, not on core
> PostgreSQL.  The tool should be the one providing the script, because
> the tool is what knows its own permissions requirements.  Users who
> care about security won't want to grant every privilege given by a
> pg_monitor role to a tool that only needs a subset of those
> privileges.

The upshot of this would be that every time there's a database server
upgrade that changes the permissions required somehow, the user has to
login to every server they have and run a script. It is no longer a
one-time thing, which makes it vastly more painful to deal with
upgrades.

>> With the approach that Dave and I are advocating, we can avoid all of
>> that.  Contrib modules can bake-in GRANTs to the appropriate roles,
>> upgrades can be handled smoothly even when we add new capabilities which
>> are appropriate, users have a simple and straight-forward way to set up
>> good monitoring, and tool authors will know what permissions are
>> available and can finally have a better answer than "well, just make the
>> monior user superuser if you want to avoid all these complexities."
>
> They can have that anyway.  They just have to run a script provided by
> the tool rather than one provided by the core project as a
> one-size-fits-all solution for every tool.

Do you object to having individual default roles specifically for
cases where there are superuser-only checks at the moment that prevent
GRANT? e.g. pg_show_all_settings for SHOW, pg_show_all_stats for
pg_tablespace_size and friends, pg_stat_statements for, well,
pg_stat_statements and so on? It would be an inferior solution in my
opinion, given that it would demonstrably cause users more work, but
at least we could do something.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement multivariate n-distinct coefficients

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 1:16 PM, Alvaro Herrera  wrote:
> Implement multivariate n-distinct coefficients

dromedary and arapaima have failures like this, which seems likely
related to this commit:

  EXPLAIN
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
   QUERY PLAN
  -
!  HashAggregate  (cost=225.00..235.00 rows=1000 width=16)
 Group Key: a, d
!->  Seq Scan on ndistinct  (cost=0.00..150.00 rows=1 width=8)
  (3 rows)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread Beena Emerson
Hello,

On Wed, Mar 22, 2017 at 9:41 PM, Kuntal Ghosh 
wrote:

> On Wed, Mar 22, 2017 at 3:14 PM, Beena Emerson 
> wrote:
> > PFA an updated patch which fixes a minor bug I found. It only increases
> the
> > string size in pretty_wal_size function.
> > The 01-add-XLogSegmentOffset-macro.patch has also been rebased.
> Thanks for the updated versions. Here is a partial review of the patch:
>
> In pg_standby.c and pg_waldump.c,
> + XLogPageHeader hdr = (XLogPageHeader) buf;
> + XLogLongPageHeader NewLongPage = (XLogLongPageHeader) hdr;
> +
> + XLogSegSize = NewLongPage->xlp_seg_size;
> It waits until the file is at least XLOG_BLCKSZ, then gets the
> expected final size from XLogPageHeader. This looks really clean
> compared to the previous approach.
>

thank you for testing. PFA the rebased patch incorporating your comments.


>
> + * Verify that the min and max wal_size meet the minimum requirements.
> Better to write min_wal_size and max_wal_size.
>

Updated wherever applicable.


>
> + errmsg("Insufficient value for \"min_wal_size\"")));
> "min_wal_size %d is too low" may be? Use lower case for error
> messages. Same for max_wal_size.
>

updated.


>
> + /* Set the XLogSegSize */
> + XLogSegSize = ControlFile->xlog_seg_size;
> +
> A call to IsValidXLogSegSize() will be good after this, no?
>

Is it necessary? We already have the CRC check for ControlFiles. Is that
not enough?


>
> + /* Update variables using XLogSegSize */
> + check_wal_size();
> The method name looks somewhat misleading compared to the comment for
> it, doesn't it?
>

The method name is correct since it only checks if the value provided is
sufficient (at least 2 segment size). I have updated the comment to say
Check and update variables dependent on XLogSegSize


> This patch introduces a new guc_unit having values in MB for
> max_wal_size and min_wal_size. I'm not sure about the upper limit
> which is set to INT_MAX for 32-bit systems as well. Is it needed to
> define something like MAX_MEGABYTES similar to MAX_KILOBYTES?
> It is worth mentioning that GUC_UNIT_KB can't be used in this case
> since MAX_KILOBYTES is INT_MAX / 1024(<2GB) on 32-bit systems. That's
> not a sufficient value for min_wal_size/max_wal_size.
>

You are right. I can use the same value as upper limit for GUC_UNIT_MB, we
could probably change the name of MAX_KILOBYTES to something more general
for both GUC_UNIT_MB and GUC_UNIT_KB. The max size in 32-bit systems would
be 2TB.


> While testing with pg_waldump, I got the following error.
> bin/pg_waldump -p master/pg_wal/ -s 0/0100
> Floating point exception (core dumped)
> Stack:
> #0  0x004039d6 in ReadPageInternal ()
> #1  0x00404c84 in XLogFindNextRecord ()
> #2  0x00401e08 in main ()
> I think that the problem is in following code:
> /* parse files as start/end boundaries, extract path if not specified */
> if (optind < argc)
> {
> 
> + if (!RetrieveXLogSegSize(full_path))
> ...
> }
> In this case, RetrieveXLogSegSize is conditionally called. So, if the
> condition is false, XLogSegSize will not be initialized.
>
>
pg_waldump code has been updated.




-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


02-initdb-walsegsize-v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Avoid SnapshotResetXmin() during AtEOXact_Snapshot()

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs  wrote:
> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
>
> For normal commits and aborts we already reset PgXact->xmin
> Avoiding touching highly contented shmem improves concurrent
> performance.
>
> Simon Riggs

I'm getting occasional crashes with backtraces that look like this:

#0  0x7fff9679c286 in __pthread_kill ()
#1  0x7fff94e1a9f9 in pthread_kill ()
#2  0x7fff9253a9a3 in abort ()
#3  0x000107e0659e in ExceptionalCondition (conditionName=, errorType=0x6 , fileName=, lineNumber=) at assert.c:54
#4  0x000107e4be2b in AtEOXact_Snapshot (isCommit=, isPrepare=0 '\0') at
snapmgr.c:1154
#5  0x000107a76c06 in CleanupTransaction () at xact.c:2643
#6  0x000107a76267 in CommitTransactionCommand () at xact.c:2818
#7  0x000107cecfc2 in exec_simple_query
(query_string=0x7f975481e640 "ABORT TRANSACTION") at postgres.c:2461
#8  0x000107ceabb7 in PostgresMain (argc=, argv=, dbname=, username=) at postgres.c:4071
#9  0x000107c6bb58 in PostmasterMain (argc=, argv=) at postmaster.c:4317
#10 0x000107be5cdd in main (argc=, argv=) at main.c:228

I suspect that is the fault of this patch.  Please fix or revert.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-24 Thread Teodor Sigaev

No, it is really needed so that the lag measure is correct.

Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-24 Thread Fabien COELHO


Hello Corey,


v24 highlights:

- finally using git format-patch
- all conditional slash commands broken out into their own functions
(exec_command_$NAME) , each one tests if it's in an active branch, and if
it's not it consumes the same number of parameters, but discards them.
comments for each slash-command family were left as-is, may need to be
bulked up.
- TAP tests discarded in favor of one ON_EROR_STOP test for invalid \elif
placement
- documentation recommending ON_ERROR_STOP removed, because invalid
expressions no longer throw off if-endif balance
- documentation updated to reflex that contextually-correct-but-invalid
boolean expressions are treated as false
- psql_get_variable has a passthrough void pointer now, but I ended up not
needing it. Instead, all slash commands in false blocks either fetch
OT_NO_EVAL or OT_WHOLE_LINE options. If I'm missing something, let me know.


A few comments about the patch.

Patch applies. "make check" ok.

As already pointed out, there is one useless file in the patch.

Although currently there is only one expected argument for boolean 
expressions, the n² concatenation algorithm in gather_boolean_expression 
is not very elegant. Is there some string buffer data structure which 
could be used instead?


ISTM that ignore_boolean_expression may call free on a NULL pointer if the 
expression is empty?


Generally I find the per-command functions rather an improvement.

However there is an impact on testing because of all these changes. ISTM 
that test cases should reflect this situation and test that \cd, \edit, 
... are indeed ignored properly and taking account there expected args...


In "exec_command_connect" an argument is changed from "-reuse-previous" to 
"-reuse-previous=", not sure why.


There seems to be pattern repetition for _ev _ef and _sf _sv. Would it 
make sense to create a function instead of keeping the initial copy-paste?


I think that some functions could be used for some repeated cases such as 
"discard one arg", "discard one or two arg", "discard whole line", for the 
various inactive branches, so as to factor out code.


I would suggest to put together all if-related backslash command, 
so that the stack management is all in one function instead of 4.


For pset the inactive branch does OT_NORMAL instead of OT_NOT_EVAL, not 
sure why.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-24 Thread Pavan Deolasee
On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
wrote:

>
>
> I was worried for the case if the index is created non-default
> collation, will the datumIsEqual() suffice the need.  Now again
> thinking about it, I think it will because in the index tuple we are
> storing the value as in heap tuple.  However today it occurred to me
> how will this work for toasted index values (index value >
> TOAST_INDEX_TARGET).  It is mentioned on top of datumIsEqual() that it
> probably won't work for toasted values.  Have you considered that
> point?
>
>
No, I haven't and thanks for bringing that up. And now that I think more
about it and see the code, I think the naive way of just comparing index
attribute value against heap values is probably wrong. The example of
TOAST_INDEX_TARGET is one such case, but I wonder if there are other
varlena attributes that we might store differently in heap and index. Like
index_form_tuple() ->heap_fill_tuple seem to some churning for varlena.
It's not clear to me if index_get_attr will return the values which are
binary comparable to heap values.. I wonder if calling index_form_tuple on
the heap values, fetching attributes via index_get_attr on both index
tuples and then doing a binary compare is a more robust idea. Or may be
that's just duplicating efforts.

While looking at this problem, it occurred to me that the assumptions made
for hash indexes are also wrong :-( Hash index has the same problem as
expression indexes have. A change in heap value may not necessarily cause a
change in the hash key. If we don't detect that, we will end up having two
hash identical hash keys with the same TID pointer. This will cause the
duplicate key scans problem since hashrecheck will return true for both the
hash entries. That's a bummer as far as supporting WARM for hash indexes is
concerned, unless we find a way to avoid duplicate index entries.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
 wrote:
>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
>> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
>> whenever we want compression on a plain-text dump file, we can set the
>> stdout mode to O_BINARY. Is it a wrong approach?
> With the help from Ashutosh Sharma, I tested this in Windows
> environment. Sadly, it still doesn't work. :( IMHO, we should document
> the issue somewhere.

Why not?  I mean, if there's code there to force the output into
binary mode, does that not work for the -Fc case?  And if it does work
for the -Fc case, then why doesn't it also work for -Z9?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-03-24 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Here's a rebased series on top of today's a3eac988c267.  I call this
> v28.
> 
> I put David's pg_dump and COMMENT patches as second in line, just after
> the initial infrastructure patch.  I suppose those three have to be
> committed together, while the others (which add support for additional
> statistic types) can rightly remain as separate commits.

As I said in another thread, I pushed parts 0002,0003,0004.  Tomas said
he would try to rebase patches 0001,0005,0006 on top of what was
committed.  My intention is to give that one a look as soon as it is
available.  So we will have n-distinct and functional dependencies in
PG10.  It sounds unlikely that we will get MCVs and histograms in, since
they're each a lot of code.

I suppose we need 0011 too (psql tab completion), but that can wait.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 5:57 AM, Rafia Sabih
 wrote:
>> I suspect that code fails to achieve its goals anyway.  At the top of
>> exec_eval_expr(), you call exec_prepare_plan() and unconditionally
>> pass CURSOR_OPT_PARALLEL_OK, so when that function returns, expr->plan
>> might now be a parallel plan.  If we reach the call to
>> exec_run_select() further down in that function, and if we happen to
>> pass false, it's not going to matter, because exec_run_select() is
>> going to find the plan already initialized.
>>
> True, fixed.
> The attached patch is to be applied over [1].

After some scrutiny I didn't find anything particularly wrong with
this, with the exception that exec_eval_expr() was passing false as
the parallelOK argument to exec_run_select(), which is inconsistent
with that function's earlier use of CURSOR_OPT_PARALLEL_OK to plan the
same query.  I fixed that by ripping out the parallelOK argument
altogether and just passing CURSOR_OPT_PARALLEL_OK when portalP ==
NULL.  The only reason I added parallelOK in the first place was
because of that RETURN QUERY stuff which subsequent study has shown to
be misguided.

Committed that way; please let me know if you see any problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 2:27 AM, Craig Ringer  wrote:
> On 24 March 2017 at 02:29, Robert Haas  wrote:
>> On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer  wrote:
>>> Changes made per discussion.
>>
>> Committed 0001.
>
> Much appreciated.
>
> Here's the 2nd patch rebased on top of master, with the TAP test
> included this time. Looks ready to go.

Committed.

> I really appreciate the time you've taken to look at this. Point me at
> anything from your team you want some outside review on.

Thanks, that is a valuable offer which I will be pleased to accept!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Avoid SnapshotResetXmin() during AtEOXact_Snapshot()

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 12:27 PM, Robert Haas  wrote:
> On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas  wrote:
>> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs  wrote:
>>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
>>>
>>> For normal commits and aborts we already reset PgXact->xmin
>>> Avoiding touching highly contented shmem improves concurrent
>>> performance.
>>>
>>> Simon Riggs
>>
>> I'm getting occasional crashes with backtraces that look like this:
>>
>> #4  0x000107e4be2b in AtEOXact_Snapshot (isCommit=> temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at
>> snapmgr.c:1154
>> #5  0x000107a76c06 in CleanupTransaction () at xact.c:2643
>>
>> I suspect that is the fault of this patch.  Please fix or revert.
>
> Also, the entire buildfarm is turning red.
>
> longfin, spurfowl, and magpie all show this assertion failure in the
> log.  I haven't checked the others.
>
> TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154)

Another thing that is interesting is that when I run make -j8
check-world, the overall tests appear to succeed even though there are
failures mid-way through:

test tablefunc... FAILED (test process exited with exit code 2)

...but then later we end with:

ok
All tests successful.
Files=11, Tests=80, 251 wallclock secs ( 0.07 usr  0.02 sys + 19.77
cusr 14.45 csys = 34.31 CPU)
Result: PASS

real4m27.421s
user3m50.047s
sys1m31.937s

That's unrelated to the current problem of course, but it seems to
suggest that make's -j option doesn't entirely do what you'd expect
when used with make check-world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_serial early wraparound

2017-03-24 Thread Thomas Munro
On Sat, Mar 25, 2017 at 3:11 AM, Anastasia Lubennikova
 wrote:
> Hi, I've tried to review this patch, but it seems that I miss something 
> essential.

Hi Anastasia,

Thanks for looking at this.

> You claim that SLRUs now support five digit segment name, while in slru.h
> at current master I see the following:
>
>  * Note: slru.c currently assumes that segment file names will be four hex
>  * digits.  This sets a lower bound on the segment size (64K transactions
>  * for 32-bit TransactionIds).
>  */
> #define SLRU_PAGES_PER_SEGMENT  32
>
> /* Maximum length of an SLRU name */
> #define SLRU_MAX_NAME_LENGTH32

That comment is out of date.  Commit 638cf09e extended SLRUs to
support 5 digit names, to support pg_multixact.  And I see now that
commit 73c986ad more recently created the possibility of 6 chacater
SLRU file names for pg_commit_ts.

> Could you please clarify the idea of the patch? Is it still relevant?

The idea is simply to remove some strange old code including scary
error messages that is no longer needed.  In my study of predicate.c
for other reasons, I noticed this in passing and thought I'd tidy it
up.  Because I have tangled with pg_multixact and seen 5-character
SLRU files with my own eyes, I knew that the restriction that
motivated this code was no longer valid.

> I've also run your test script.
> pg_clog was renamed to pg_xact, so it need to be changed accordingly
> echo "Contents of pg_clog:"
>   ls $PGDATA/pg_xact/

Right.

> The test shows failed assertion:
>
> == setting next xid to 1073741824 =
> Transaction log reset
> waiting for server to start2017-03-24 17:05:19.897 MSK [1181] LOG:  
> listening on IPv4 address "127.0.0.1", port 5432
> 2017-03-24 17:05:19.981 MSK [1181] LOG:  listening on Unix socket 
> "/tmp/.s.PGSQL.5432"
> 2017-03-24 17:05:20.081 MSK [1183] LOG:  database system was shut down at 
> 2017-03-24 17:05:19 MSK
> 2017-03-24 17:05:20.221 MSK [1181] LOG:  database system is ready to accept 
> connections
>  done
> server started
> vacuumdb: vacuuming database "postgres"
> vacuumdb: vacuuming database "template0"
> vacuumdb: vacuuming database "template1"
> TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(oldestXact, 
> ShmemVariableCache->oldestXid))", File: "clog.c", Line: 669)
> vacuumdb: vacuuming of database "template1" failed: server closed the 
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> 2017-03-24 17:05:21.541 MSK [1181] LOG:  server process (PID 1202) was 
> terminated by signal 6: Aborted
> 2017-03-24 17:05:21.541 MSK [1181] DETAIL:  Failed process was running: 
> VACUUM (FREEZE);

My cheap trick for moving the xid around the clock quickly to test
wraparound scenarios no longer works, since this new assertion was
added in ea42cc18.  That was committed just a few hours before you
tested this.  Bad luck for me!

> The new status of this patch is: Waiting on Author

It's not urgent, it's just cleanup work, so I've now moved it to the
next commitfest.  I will try to figure out a new way to demonstrate
that it works correctly without having to ask a reviewing to disable
any assertions.  Thanks again.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-24 Thread Michael Banck
Hi,

On Thu, Mar 23, 2017 at 12:41:54PM +0100, Magnus Hagander wrote:
> On Tue, Mar 21, 2017 at 8:34 PM, Robert Haas  wrote:
> > So I tend to think that there should always be some explicit user
> > action to cause the creation of a slot, like --create-slot-if-needed
> > or --create-slot=name.  That still won't prevent careless use of that
> > option but it's less dangerous than assuming that a user who refers to
> > a nonexistent slot intended to create it when, perhaps, they just
> > typo'd it.
> 
> Well, the explicit user action would be "--slot". But sure, I can
> definitely live with a --create-if-not-exists. 

Can we just make that option create slot and don't worry if one exists
already? CreateReplicationSlot() can be told to not mind about already
existing slots, and I don't see a huge point in erroring out if the slot
exists already, unless somebody can show that this leads to data loss
somehow.

> The important thing, to me, is that you can run it as a single
> command, which makes it a lot easier to work with. (And not like we
> currently have for pg_receivewal which requires a separate command to
> create the slot)

Oh, that is how it works with pg_receivewal, I have to admit I've never
used it so was a bit confused about this when I read its code.

So in that case I think we don't necessarily need to have the same user
interface at all. I first thought about just adding "-C, --create" (as
in "--create --slot=foo"), but this on second thought this looked a bit
shortsighted - who knows what flashy thing pg_basebackup might create in
5 years... So I settled on --create-slot, which is only slightly more to
type (albeit repetive, "--create-slot --slot=foo"), but adding a short
option "-C" would be fine I thinkg "-C -S foo".

So attached is a patch with adds that option. If people really think it
should be --create-slot-if-not-exists instead I can update the patch, of
course.

I again added a second patch with some further refactoring which makes
it print a message on temporary slot creation in verbose mode.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
>From 4e37e110ac402e67874f729832b330a837284d4b Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Fri, 24 Mar 2017 18:27:47 +0100
Subject: [PATCH 1/2] Add option to create a replication slot in pg_basebackup
 if not yet present.

When reqeusting a particular replication slot, the new option --create tries to
create it before starting to replicate from it in case it does not exist
already.
---
 doc/src/sgml/ref/pg_basebackup.sgml  | 12 
 src/bin/pg_basebackup/pg_basebackup.c| 44 +++-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 23 +--
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index e1cec9d..789f649 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -269,6 +269,18 @@ PostgreSQL documentation
  
 
  
+  -C
+  --create-slot
+  
+   
+This option can only be used together with the --slot 
+	option.  It causes the specified slot name to be created if it does not
+exist already.  
+   
+  
+ 
+
+ 
   -T olddir=newdir
   --tablespace-mapping=olddir=newdir
   
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 0a4944d..2af2e22 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -92,6 +92,7 @@ static pg_time_t last_progress_report = 0;
 static int32 maxrate = 0;		/* no limit by default */
 static char *replication_slot = NULL;
 static bool temp_replication_slot = true;
+static bool create_slot = false;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -337,6 +338,7 @@ usage(void)
 			 " write recovery.conf for replication\n"));
 	printf(_("  -S, --slot=SLOTNAMEreplication slot to use\n"));
 	printf(_("  --no-slot  prevent creation of temporary replication slot\n"));
+	printf(_("  -C, --create-slot  create replication slot if not present already\n"));
 	printf(_("  -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
 	  " relocate tablespace in OLDDIR to NEWDIR\n"));
 	printf(_("  -X, --wal-method=none|fetch|stream\n"
@@ -581,6 +583,19 @@ StartLogStreamer(char *startpos, uint32 timeline, char *sysidentifier)
 	else
 		param->temp_slot = temp_replication_slot;
 
+	/*
+	 * Create permanent physical replication slot if requested.
+	 */
+	if (replication_slot && create_slot)
+	{
+		if (verbose)
+			fprintf(stderr, 

Re: [HACKERS] Monitoring roles patch

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 8:30 AM, Stephen Frost  wrote:
>> But why not do it with GRANTs in the first place then?
>
> This is akin to asking why do we need GRANT ALL and ALTER DEFAULT PRIVs.

Not really.  ALTER DEFAULT PRIVILEGES affects what happens for future
objects, which isn't necessary here at all.  The monitoring tool only
needs to be granted the minimum set of privileges that it currently
needs, not future privileges which it or other monitoring tools may
need in future versions.  GRANT ALL is closer, but GRANT .. ON ALL
TABLES IN SCHEMA really is just a convenience function.  You would
still have the capability to do the exact same thing without that
function; it would just be more work.  And the same is true here.  I
understand why you and Dave want to make this a single command: that's
easier.  But, like GRANT ON ALL TABLES, it's also less flexible.  If
you want to grant on all tables except one, you can't use that magic
syntax any more, and so here.  pg_monitor will either target one
particular monitoring solution (hey, maybe it'll be the one my
employer produces!) or the judgement of one particular person
(whatever Stephen Frost thinks it SHOULD do in a given release,
independent of what tools on the ground do) and those aren't good
definitions.

> If we make the users run all the statements individually then they'll
> also have to get an updated script for the next version of PG too
> because we will have added things that the tools will want access to.

That's possible, but it really depends on the tool, not on core
PostgreSQL.  The tool should be the one providing the script, because
the tool is what knows its own permissions requirements.  Users who
care about security won't want to grant every privilege given by a
pg_monitor role to a tool that only needs a subset of those
privileges.

> Further, they'll have to make sure to install all the contrib modules
> they want to use before running the GRANT script which is provided, or
> deal with GRANT'ing the rights out after installing some extension.

That doesn't sound very hard.

> With the approach that Dave and I are advocating, we can avoid all of
> that.  Contrib modules can bake-in GRANTs to the appropriate roles,
> upgrades can be handled smoothly even when we add new capabilities which
> are appropriate, users have a simple and straight-forward way to set up
> good monitoring, and tool authors will know what permissions are
> available and can finally have a better answer than "well, just make the
> monior user superuser if you want to avoid all these complexities."

They can have that anyway.  They just have to run a script provided by
the tool rather than one provided by the core project as a
one-size-fits-all solution for every tool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel "return query" is no good

2017-03-24 Thread Robert Haas
On Thu, Mar 23, 2017 at 1:53 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> I guess the downside of back-patching this is that it could cause a
>> plan change for somebody which ends up being worse.  On the whole,
>> serial execution of queries intended to be run in parallel isn't
>> likely to work out well, but it's always possible somebody has a cases
>> where it happens to be winning, and this could break it.  So maybe I
>> should do this only in master?  Thoughts?
>
> I think that the chances of someone depending on a parallel plan running
> serially by accident which is better than the non-parallel plan, are
> pretty slim.
>
> +1 for back-patching.

All right, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Report the number of skipped frozen pages by manual VACUUM

2017-03-24 Thread Fujii Masao
On Thu, Mar 23, 2017 at 4:28 AM, Fujii Masao  wrote:
> On Wed, Mar 15, 2017 at 7:51 PM, Masahiko Sawada  
> wrote:
>> On Wed, Mar 15, 2017 at 1:09 PM, Yugo Nagata  wrote:
>>> On Fri, 10 Mar 2017 20:08:42 +0900
>>> Masahiko Sawada  wrote:
>>>
 On Fri, Mar 10, 2017 at 3:58 PM, Jim Nasby  wrote:
 > On 3/6/17 8:34 PM, Masahiko Sawada wrote:
 >>
 >> I don't think it can say "1 frozen pages" because the number of
 >> skipped pages according to visibility map is always more than 32
 >> (SKIP_PAGES_THRESHOLD).
 >
 >
 > That's just an artifact of how the VM currently works. I'm not a fan of
 > cross dependencies like that unless there's a pretty good reason.

 Thank you for the comment.
 Agreed. Attached fixed version patch.
>>>
>>> It seems that it says "frozen pages" if pinskipped_pages is not zero,
>>> rather than about frozenskipped_pages.
>>>
>>> How about writing as below?
>>>
>>> appendStringInfo(, ngettext("Skipped %u page due to buffer pins, "
>>> "Skipped %u pages due to buffer pins, ",
>>> vacrelstats->pinskipped_pages),
>>>  vacrelstats->pinskipped_pages);
>>> appendStringInfo(, ngettext("%u frozen page.\n", "%u frozen 
>>> pages.\n",
>>> vacrelstats->frozenskipped_pages),
>>>  vacrelstats->frozenskipped_pages);
>>>
>>
>> Yeah, you're right. I've attached the updated version patch
>> incorporated your comment and fixing documentation.
>
> The patch looks very simple and good to me.
> Barring objection, I will commit this.

Pushed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-24 Thread Andres Freund
Hi,

On 2017-03-24 11:26:27 -0400, Tom Lane wrote:
> Another modest proposal:
> 
> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
> trigger initial tupleslot de-forming.  Certainly we want to have a single
> slot_getsomeattrs call per source slot, but as-is, we need a separate
> traversal over the expression tree just to precompute the max attribute
> number needed.  That can't be good for expression compile speed, and
> it introduces nontrivial bug risks because the code that does that
> is completely decoupled from the code that emits the EEOP_VAR opcodes
> (which are what's really relying on the de-forming to have happened).

Hm.  We had the separate traversal for projections for a long while, and
I don't think there've been a a lot of changes to the extraction of the
last attribute number.  I'm very doubtful that the cost of traversing
the expression twice is meaningful in comparison to the other costs.


> What do you think about a design like this:
> 
> * Drop the FETCHSOME opcodes.
> 
> * Add fields to struct ExprState that will hold the maximum inner,
> outer, and scan attribute numbers needed.
> 
> * ExecInitExpr initializes those fields to zero, and then during
> ExecInitExprRec, whenever we generate an EEOP_VAR opcode, we do e.g.
> 
>   state->last_inner_attno = Max(state->last_inner_attno,
> variable->varattno);
> 
> * ExecInitExprSlots, get_last_attnums_walker, etc all go away.
> 
> * In the startup segment of ExecInterpExpr, add
> 
>   if (state->last_inner_attno > 0)
>   slot_getsomeattrs(innerslot, state->last_inner_attno);
>   if (state->last_outer_attno > 0)
>   slot_getsomeattrs(outerslot, state->last_outer_attno);
>   if (state->last_scan_attno > 0)
>   slot_getsomeattrs(scanslot, state->last_scan_attno);
> 
> This would be a little different from the existing code as far as runtime
> branch-prediction behavior goes, but it's not apparent to me that it'd be
> any worse.

I'd be suprised if it weren't.

I'm not super strongly against this setup, but I fail to see the benefit
of whacking this around.  I've benchmarked the previous/current setup
fairly extensively, and I'd rather not redo that.  In contrast to the
other changes you've talked about, this definitely is in the hot-path...


> Also, for JIT purposes it'd still be entirely possible to compile the
> slot_getsomeattrs calls in-line; you'd just be looking to a different
> part of the ExprState struct to find out what to do.

Yea, we could do that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Aleksander Alekseev
Hi Teodor,

Thanks a lot for a review!

> > step1 In pgstat_report_stat() you remove one by one entries from hash and
> > remove them all. Isn't it better to hash_destroy/hash_create or even let 
> > hash
> > lives in separate memory context and just resets it?

Agree, fixed.

> > step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash
> > although they will be free from point of view of pgStatTabList.

Good point! Fixed.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b704788eb5..4060f241e2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -161,6 +161,20 @@ typedef struct TabStatusArray
 static TabStatusArray *pgStatTabList = NULL;
 
 /*
+ * pgStatTabHash entry
+ */
+typedef struct TabStatHashEntry
+{
+	Oid t_id;
+	PgStat_TableStatus* tsa_entry;
+} TabStatHashEntry;
+
+/*
+ * Hash table for O(1) t_id -> tsa_entry lookup
+ */
+static HTAB *pgStatTabHash = NULL;
+
+/*
  * Backends store per-function info that's waiting to be sent to the collector
  * in this hash table (indexed by function OID).
  */
@@ -824,6 +838,12 @@ pgstat_report_stat(bool force)
 	}
 
 	/*
+	 * pgStatTabHash is outdated on this point so we have to clean it.
+	 */
+	hash_destroy(pgStatTabHash);
+	pgStatTabHash = NULL;
+
+	/*
 	 * Send partial messages.  Make sure that any pending xact commit/abort
 	 * gets counted, even if there are no table stats to send.
 	 */
@@ -1668,59 +1688,87 @@ pgstat_initstats(Relation rel)
 }
 
 /*
+ * Make sure pgStatTabList and pgStatTabHash are initialized.
+ */
+static void
+make_sure_stat_tab_initialized()
+{
+	HASHCTL			ctl;
+	MemoryContext	new_ctx;
+
+	if(!pgStatTabList)
+	{
+		/* This is first time procedure is called */
+		pgStatTabList = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+sizeof(TabStatusArray));
+	}
+
+	if(pgStatTabHash)
+		return;
+
+	/* Hash table was freed or never existed.  */
+
+	new_ctx = AllocSetContextCreate(
+		TopMemoryContext,
+		"PGStatLookupHashTableContext",
+		ALLOCSET_DEFAULT_SIZES);
+
+	memset(, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(TabStatHashEntry);
+	ctl.hcxt = new_ctx;
+
+	pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup hash table",
+		TABSTAT_QUANTUM, , HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+}
+
+/*
  * get_tabstat_entry - find or create a PgStat_TableStatus entry for rel
  */
 static PgStat_TableStatus *
 get_tabstat_entry(Oid rel_id, bool isshared)
 {
+	TabStatHashEntry* hash_entry;
 	PgStat_TableStatus *entry;
 	TabStatusArray *tsa;
-	TabStatusArray *prev_tsa;
-	int			i;
+	bool found;
+
+	make_sure_stat_tab_initialized();
 
 	/*
-	 * Search the already-used tabstat slots for this relation.
+	 * Find an entry or create a new one.
 	 */
-	prev_tsa = NULL;
-	for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = tsa->tsa_next)
+	hash_entry = hash_search(pgStatTabHash, _id, HASH_ENTER, );
+	if(found)
+		return hash_entry->tsa_entry;
+
+	/*
+	 * `hash_entry` was just created and now we have to fill it.
+	 * First make sure there is a free space in a last element of pgStatTabList.
+	 */
+	tsa = pgStatTabList;
+	while(tsa->tsa_used == TABSTAT_QUANTUM)
 	{
-		for (i = 0; i < tsa->tsa_used; i++)
+		if(tsa->tsa_next == NULL)
 		{
-			entry = >tsa_entries[i];
-			if (entry->t_id == rel_id)
-return entry;
+			tsa->tsa_next = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+		sizeof(TabStatusArray));
 		}
 
-		if (tsa->tsa_used < TABSTAT_QUANTUM)
-		{
-			/*
-			 * It must not be present, but we found a free slot instead. Fine,
-			 * let's use this one.  We assume the entry was already zeroed,
-			 * either at creation or after last use.
-			 */
-			entry = >tsa_entries[tsa->tsa_used++];
-			entry->t_id = rel_id;
-			entry->t_shared = isshared;
-			return entry;
-		}
+		tsa = tsa->tsa_next;
 	}
 
 	/*
-	 * We ran out of tabstat slots, so allocate more.  Be sure they're zeroed.
-	 */
-	tsa = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
-	sizeof(TabStatusArray));
-	if (prev_tsa)
-		prev_tsa->tsa_next = tsa;
-	else
-		pgStatTabList = tsa;
-
-	/*
-	 * Use the first entry of the new TabStatusArray.
+	 * Add an entry.
 	 */
 	entry = >tsa_entries[tsa->tsa_used++];
 	entry->t_id = rel_id;
 	entry->t_shared = isshared;
+
+	/*
+	 * Add a corresponding entry to pgStatTabHash.
+	 */
+	hash_entry->tsa_entry = entry;
 	return entry;
 }
 
@@ -1732,22 +1780,19 @@ get_tabstat_entry(Oid rel_id, bool isshared)
 PgStat_TableStatus *
 find_tabstat_entry(Oid rel_id)
 {
-	PgStat_TableStatus *entry;
-	TabStatusArray *tsa;
-	int			i;
+	TabStatHashEntry* hash_entry;
 
-	for (tsa = pgStatTabList; tsa != NULL; tsa = tsa->tsa_next)
-	{
-		for (i = 0; i < tsa->tsa_used; i++)
-		{
-			entry = >tsa_entries[i];
-			if (entry->t_id == rel_id)
-return entry;
-		}
-	}
+	/*
+	 * There are no entries at all.
+	 */
+	if(!pgStatTabHash)
+		

Re: [HACKERS] [COMMITTERS] pgsql: Avoid SnapshotResetXmin() during AtEOXact_Snapshot()

2017-03-24 Thread Andres Freund
On 2017-03-24 13:50:54 -0400, Robert Haas wrote:
> On Fri, Mar 24, 2017 at 12:27 PM, Robert Haas  wrote:
> > On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas  wrote:
> >> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs  
> >> wrote:
> >>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
> >>>
> >>> For normal commits and aborts we already reset PgXact->xmin
> >>> Avoiding touching highly contented shmem improves concurrent
> >>> performance.
> >>>
> >>> Simon Riggs
> >>
> >> I'm getting occasional crashes with backtraces that look like this:
> >>
> >> #4  0x000107e4be2b in AtEOXact_Snapshot (isCommit= >> temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at
> >> snapmgr.c:1154
> >> #5  0x000107a76c06 in CleanupTransaction () at xact.c:2643
> >>
> >> I suspect that is the fault of this patch.  Please fix or revert.
> >
> > Also, the entire buildfarm is turning red.
> >
> > longfin, spurfowl, and magpie all show this assertion failure in the
> > log.  I haven't checked the others.
> >
> > TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 
> > 1154)
> 
> Another thing that is interesting is that when I run make -j8
> check-world, the overall tests appear to succeed even though there are
> failures mid-way through:
> 
> test tablefunc... FAILED (test process exited with exit code 
> 2)
> 
> ...but then later we end with:
> 
> ok
> All tests successful.
> Files=11, Tests=80, 251 wallclock secs ( 0.07 usr  0.02 sys + 19.77
> cusr 14.45 csys = 34.31 CPU)
> Result: PASS

> real4m27.421s
> user3m50.047s
> sys1m31.937s

> That's unrelated to the current problem of course, but it seems to
> suggest that make's -j option doesn't entirely do what you'd expect
> when used with make check-world.
> 

That's likely the output of a different test from the one that failed.
It's a lot easier to see the result if you're doing
&& echo success || echo failure

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extended statistics: n-distinct

2017-03-24 Thread Alvaro Herrera
Pushed this after some more tweaking.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Avoid SnapshotResetXmin() during AtEOXact_Snapshot()

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas  wrote:
> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs  wrote:
>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
>>
>> For normal commits and aborts we already reset PgXact->xmin
>> Avoiding touching highly contented shmem improves concurrent
>> performance.
>>
>> Simon Riggs
>
> I'm getting occasional crashes with backtraces that look like this:
>
> #4  0x000107e4be2b in AtEOXact_Snapshot (isCommit= temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at
> snapmgr.c:1154
> #5  0x000107a76c06 in CleanupTransaction () at xact.c:2643
>
> I suspect that is the fault of this patch.  Please fix or revert.

Also, the entire buildfarm is turning red.

longfin, spurfowl, and magpie all show this assertion failure in the
log.  I haven't checked the others.

TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-03-24 Thread Ashutosh Sharma
Hi,

On Fri, Mar 24, 2017 at 9:00 PM, David Steele  wrote:
> Hi Ashutosh,
>
> On 3/22/17 8:52 AM, Amit Kapila wrote:
>>
>> On Wed, Mar 22, 2017 at 12:07 AM, David Steele 
>> wrote:
>>>
>>>
>>> Amit, Magnus, you are signed up as reviewers for this patch.  Do you know
>>> when you'll have a chance to take a look?
>>>
>>
>> I have provided my feedback and I could not test it on my machine.  I
>> think Ashutosh Sharma has tested it.  I can give it another look, but
>> not sure if it helps.
>
>
> I know you are not signed up as a reviewer, but have you take a look at this
> patch?

Well, I had a quick look into the patch just because I wanted the test
it as I am having windows setup. But, I never looked into the patch
from reviewer's perspective. The intention was to reverify the test
results shared by Tsunawaka.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 12:46 PM, Dave Page  wrote:
> On Fri, Mar 24, 2017 at 4:24 PM, Robert Haas  wrote:
>>> If we make the users run all the statements individually then they'll
>>> also have to get an updated script for the next version of PG too
>>> because we will have added things that the tools will want access to.
>>
>> That's possible, but it really depends on the tool, not on core
>> PostgreSQL.  The tool should be the one providing the script, because
>> the tool is what knows its own permissions requirements.  Users who
>> care about security won't want to grant every privilege given by a
>> pg_monitor role to a tool that only needs a subset of those
>> privileges.
>
> The upshot of this would be that every time there's a database server
> upgrade that changes the permissions required somehow, the user has to
> login to every server they have and run a script. It is no longer a
> one-time thing, which makes it vastly more painful to deal with
> upgrades.

So, I might be all wet here, but I would have expected that changes on
the TOOL side would be vastly more frequent.  I mean, we do not change
what a certain builtin permission does very often.  If we add
pg_read_all_settings, what is the likelihood that the remit of that
role is ever going to change?  I would judge that was is vastly more
likely is that a new version of some tool would start needing that
privilege (or some other) where it didn't before.  If that happens,
and the script is on the tool side, then you just add a new line to
the script.  If the script is built into initdb, then you've got to
wait for the next major release before you can update the definition
of pg_monitor - and maybe argue with other tool authors with different
requirements.

>>> With the approach that Dave and I are advocating, we can avoid all of
>>> that.  Contrib modules can bake-in GRANTs to the appropriate roles,
>>> upgrades can be handled smoothly even when we add new capabilities which
>>> are appropriate, users have a simple and straight-forward way to set up
>>> good monitoring, and tool authors will know what permissions are
>>> available and can finally have a better answer than "well, just make the
>>> monior user superuser if you want to avoid all these complexities."
>>
>> They can have that anyway.  They just have to run a script provided by
>> the tool rather than one provided by the core project as a
>> one-size-fits-all solution for every tool.
>
> Do you object to having individual default roles specifically for
> cases where there are superuser-only checks at the moment that prevent
> GRANT? e.g. pg_show_all_settings for SHOW, pg_show_all_stats for
> pg_tablespace_size and friends, pg_stat_statements for, well,
> pg_stat_statements and so on? It would be an inferior solution in my
> opinion, given that it would demonstrably cause users more work, but
> at least we could do something.

I think pg_show_all_settings is brilliant.  It's narrowly targeted and
in no way redundant with what can anyway be done with GRANT otherwise.
As far as the others, I think that we should just let people GRANT
privileges one by one whenever that's possible, and use built-in roles
where it isn't.  So I'm fine with this:

-if (tblspcOid != MyDatabaseTableSpace)
+if (tblspcOid != MyDatabaseTableSpace &&
+!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_ALL_STATS))

But I don't like this:

+GRANT EXECUTE ON FUNCTION pgstattuple(text) TO pg_read_all_stats;

My position is that execute permission on a function is already
grantable, so granting it by default to a built-in role is just
removing flexibility which would otherwise be available to the user.
I do understand that in some cases that may result in a long list of
GRANT commands to make a particular monitoring tool work, but I think
the right solution is to package those commands in an extension or
script distributed with that tool.  When there's a permission that is
reasonably necessary for monitoring tools (or some other reasonable
purpose) and we can't arrange for that permission to be given via a
GRANT EXECUTE ON FUNCTION, then I support adding built-in roles for
those things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Avoid SnapshotResetXmin() during AtEOXact_Snapshot()

2017-03-24 Thread Simon Riggs
On 24 March 2017 at 16:14, Robert Haas  wrote:

> I suspect that is the fault of this patch.  Please fix or revert.

Will revert then fix.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Implement multivariate n-distinct coefficients

2017-03-24 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Why not use COSTS OFF?  Or I'll put that even more strongly: all the
>> existing regression tests use COSTS OFF, exactly to avoid this sort of
>> machine-dependent output.  There had better be a really damn good
>> reason not to use it here.

> If we use COSTS OFF, the test is completely pointless, as the plans look
> identical regardless of whether the multivariate stats are being used or
> not.

Well, I think you are going to find that the exact costs are far too
fragile to have in the regression test output.  Just because you wish
you could test them this way doesn't mean you can.

> If we had a ROWS option to ANALYZE that showed estimated number of rows
> but not the cost, that would be an option.

Unlikely to be any better.  All these numbers are subject to lots of
noise, eg due to auto-analyze happening at unexpected times, random
sampling during analyze, etc.  If you try to constrain the test case
enough that none of that happens, I wonder how useful it will really be.

What I would suggest is devising a test case whereby you actually
get a different plan shape now than you did before.  That shouldn't
be too terribly hard, or else what was the point?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Implement multivariate n-distinct coefficients

2017-03-24 Thread Alvaro Herrera
Robert Haas wrote:
> On Fri, Mar 24, 2017 at 1:16 PM, Alvaro Herrera  
> wrote:
> > Implement multivariate n-distinct coefficients
> 
> dromedary and arapaima have failures like this, which seems likely
> related to this commit:
> 
>   EXPLAIN
>SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
>QUERY PLAN
>   -
> !  HashAggregate  (cost=225.00..235.00 rows=1000 width=16)
>  Group Key: a, d
> !->  Seq Scan on ndistinct  (cost=0.00..150.00 rows=1 width=8)
>   (3 rows)

Yes.  What seems to be going on here, is that both arapaima and
dromedary are 32 bit machines; all the 64 bit ones are passing (except
for prion which showed a real relcache bug, which I already stomped).
Now, the difference is that the total cost in those machines for seqscan
is 155 instead of 150.  Tomas suggests that this happens because
MAXALIGN is different, leading to packing tuples differently: the
expected cost (on our laptop's 64 bit) is 155, and the cost we get in 32
bit arch is 150 -- so 5 pages of difference.  We insert 1000 rows on the
table; 4 bytes per tuple would amount to 40 kB, which is exactly 5
pages.

I'll push an alternate expected file for this test, which we think is
the simplest fix.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Implement multivariate n-distinct coefficients

2017-03-24 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> dromedary and arapaima have failures like this, which seems likely
>> related to this commit:
>> 
>> EXPLAIN
>> SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
>> QUERY PLAN
>> -
>> !  HashAggregate  (cost=225.00..235.00 rows=1000 width=16)
>> Group Key: a, d
>> !->  Seq Scan on ndistinct  (cost=0.00..150.00 rows=1 width=8)
>> (3 rows)

> Yes.  What seems to be going on here, is that both arapaima and
> dromedary are 32 bit machines; all the 64 bit ones are passing (except
> for prion which showed a real relcache bug, which I already stomped).
> Now, the difference is that the total cost in those machines for seqscan
> is 155 instead of 150.  Tomas suggests that this happens because
> MAXALIGN is different, leading to packing tuples differently: the
> expected cost (on our laptop's 64 bit) is 155, and the cost we get in 32
> bit arch is 150 -- so 5 pages of difference.  We insert 1000 rows on the
> table; 4 bytes per tuple would amount to 40 kB, which is exactly 5
> pages.

> I'll push an alternate expected file for this test, which we think is
> the simplest fix.

Why not use COSTS OFF?  Or I'll put that even more strongly: all the
existing regression tests use COSTS OFF, exactly to avoid this sort of
machine-dependent output.  There had better be a really damn good
reason not to use it here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-24 Thread Fujii Masao
On Wed, Feb 15, 2017 at 12:53 PM, Haribabu Kommi
 wrote:
>
>
> On Wed, Feb 8, 2017 at 9:36 PM, Amit Kapila  wrote:
>>
>> On Tue, Feb 7, 2017 at 11:47 AM, Haribabu Kommi
>>  wrote:
>> > Hi Hackers,
>> >
>> > I just want to discuss adding of a new statistics view that provides
>> > the information of wal writing details as follows
>> >
>>
>> +1.  I think it will be useful to observe WAL activity.
>
>
> Thanks for your opinion.
>
>> > postgres=# \d pg_stat_wal_writer
>> > View "pg_catalog.pg_stat_wal_writer"
>> > Column |   Type   | Collation | Nullable
>> > |
>> > Default
>> >
>> > ---+--+---+--+-
>> >  num_backend_writes   | bigint   |   |
>> > |
>> >  num_total_writes  | bigint   |   |  |
>> >  num_blocks  | bigint   |   |  |
>> >  total_write_time   | bigint|   |  |
>> >  stats_reset   | timestamp with time zone |   |
>> > |
>> >
>> > The columns of the view are
>> > 1. Total number of xlog writes that are called from the backend.
>> > 2. Total number of xlog writes that are called from both backend
>> >  and background workers. (This column can be changed to just
>> >  display on the background writes).
>> > 3. The number of the blocks that are written.
>> > 4. Total write_time of the IO operation it took, this variable data is
>> > filled only when the track_io_timing GUC is enabled.
>>
>> So, here is *write_time* the total time system has spent in WAL
>> writing before the last reset?
>
>
> total write_time spent in WAL writing "after" the last reset in
> milliseconds.
>
>> I think there should be a separate column for write and sync time.
>>
>
> Added.
>
>>
>> > Or it is possible to integrate the new columns into the existing
>> > pg_stat_bgwriter view also.
>> >
>>
>> I feel separate view is better.
>
>
> Ok.
>
> Following the sample out of the view after regress run.
>
> postgres=# select * from pg_stat_walwrites;
> -[ RECORD 1 ]--+--
> backend_writes | 19092
> writes | 663
> write_blocks   | 56116
> write_time | 0
> sync_time  | 3064
> stats_reset| 2017-02-15 13:37:09.454314+11
>
> Currently, writer, walwriter and checkpointer processes
> are considered as background processes that can do
> the wal write mainly.

I'm not sure if this categorization is good. You told that this view is useful
to tune walwriter parameters at the top of this thread. If so, ISTM that
the information about walwriter's activity should be separated from others.

What about other processes which *can* write WAL, for example walsender
(e.g., BASE_BACKUP can cause WAL record), startup process (e.g., end-of-
recovery checkpoint) and logical replication worker (Not sure if it always
works with synchronous_commit=off, though). There might be other processes
which can write WAL.

Why didn't you separate "write_blocks", "write_time" and "sync_time" per
the process category, like "backend_writes" and "writes"?

This view doesn't seem to take into consideration the WAL writing and flushing
during creating new WAL segment file.

I think that it's better to make this view report also the number of WAL pages
which are written when wal_buffer is full. This information is useful to
tune the size of wal_buffers. This was proposed by Nagayasu before.
https://www.postgresql.org/message-id/4ff824f3.5090...@uptime.jp

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-03-24 Thread Amit Khandekar
Thanks Amit for your review comments. I am yet to handle all of your
comments, but meanwhile , attached is an updated patch, that handles
RETURNING.

Earlier it was not working because ExecInsert() did not return any
RETURNING clause. This is because the setup needed to create RETURNIG
projection info for leaf partitions is done in ExecInitModifyTable()
only in case of INSERT. But because it is an UPDATE operation, we have
to do this explicitly as a one-time operation when it is determined
that row-movement is required. This is similar to how we do one-time
setup of mt_partition_dispatch_info. So in the patch, I have moved
this code into a new function ExecInitPartitionReturningProjection(),
and now this is called in ExecInitModifyTable() as well as during row
movement for ExecInsert() processing the returning clause.

Basically we need to do all that is done in ExecInitModifyTable() for
INSERT. There are a couple of other things that I suspect that might
need to be done as part of the missing initialization for Execinsert()
during row-movement :
1. Junk filter handling
2. WITH CHECK OPTION


Yet, ExecDelete() during row-movement is still returning the RETURNING
result redundantly, which I am yet to handle this.

On 23 March 2017 at 07:04, Amit Langote  wrote:
> Hi Amit,
>
> Thanks for the updated patch.
>
> On 2017/03/23 3:09, Amit Khandekar wrote:
>> Attached is v2 patch which implements the above optimization.
>
> Would it be better to have at least some new tests?  Also, there are a few
> places in the documentation mentioning that such updates cause error,
> which will need to be updated.  Perhaps also add some explanatory notes
> about the mechanism (delete+insert), trigger behavior, caveats, etc.
> There were some points discussed upthread that could be mentioned in the
> documentation.

Yeah, agreed. Will do this in the subsequent patch.

>
> @@ -633,6 +634,9 @@ ExecDelete(ItemPointer tupleid,
>  HeapUpdateFailureData hufd;
>  TupleTableSlot *slot = NULL;
>
> +if (already_deleted)
> +*already_deleted = false;
> +
>
> concurrently_deleted?

Done.

>
> @@ -962,7 +969,7 @@ ExecUpdate(ItemPointer tupleid,
>  }
>  else
>  {
> -LockTupleMode lockmode;
> +LockTupleMode   lockmode;
>
> Useless hunk.
Removed.


I am yet to handle your other comments , still working on them, but
till then , attached is the updated patch.


update-partition-key_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comment/security label for publication/subscription

2017-03-24 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Mar 24, 2017 at 12:18 AM, Peter Eisentraut
>  wrote:
> > Here is a patch to add COMMENT support for publications and subscriptions.
> >
> > On a similar issue, do we need SECURITY LABEL support for those?  Does
> > that make sense?
> 
> IMHO, it's good to have COMMENT and SECURITY LABEL support for pretty
> much everything.

+1

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] increasing the default WAL segment size

2017-03-24 Thread Robert Haas
On Wed, Mar 22, 2017 at 6:05 PM, David Steele  wrote:
>> Wait, really?  I thought you abandoned this approach because there's
>> then no principled way to handle WAL segments of less than the default
>> size.
>
> I did say that, but I thought I had hit on a compromise.
>
> But, as I originally pointed out the hex characters in the filename are not
> aligned correctly for > 8 bits (< 16MB segments) and using different
> alignments just made it less consistent.

I don't think I understand what the compromise is.  Are you saying we
should have one rule for segments < 16MB and another rule for segments
> 16MB?  I think using two different rules for file naming depending
on the segment size will be a negative for both tool authors and
ordinary users.

> It would be OK if we were willing to drop the 1,2,4,8 segment sizes because
> then the alignment would make sense and not change the current 16MB
> sequence.

Well, that is true.  But the thing I'm trying to do here is to keep
this patch down to what actually needs to be changed in order to
accomplish the original purchase, not squeeze more and more changes
into it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-24 Thread Corey Huinker
>
>
> A few comments about the patch.
>
> Patch applies. "make check" ok.
>
> As already pointed out, there is one useless file in the patch.
>
> Although currently there is only one expected argument for boolean
> expressions, the n² concatenation algorithm in gather_boolean_expression is
> not very elegant. Is there some string buffer data structure which could be
> used instead?
>

I wished for the same thing, happy to use one if it is made known to me.
I pulled that pattern from somewhere else in the code, and given that the
max number of args for a command is around 4, I'm not too worried about
scaling.


>
> ISTM that ignore_boolean_expression may call free on a NULL pointer if the
> expression is empty?
>

True. The psql code is actually littered with a lot of un-checked free(p)
calls, so I started to wonder if maybe we had a wrapper on free() that
checked for NULL. I'll fix this one just to be consistent.


>
> Generally I find the per-command functions rather an improvement.
>

I did too. I tried to split this patch up into two parts, one that broke
out the functions, and one that added if-then, and found that the first
patch was just as unwieldily without the if-then stuff as with.


>
> However there is an impact on testing because of all these changes. ISTM
> that test cases should reflect this situation and test that \cd, \edit, ...
> are indeed ignored properly and taking account there expected args...
>

I think one grand

\if false
\a
\c some_connect_string
...
\z some_table_name
\endif

should do the trick, but it wouldn't detect memory leaks.


>
> In "exec_command_connect" an argument is changed from "-reuse-previous" to
> "-reuse-previous=", not sure why.
>

It shouldn't have been. Good catch. Most commands were able to be migrated
with simple changes (status => *status, strcmp() if-block becomes
active-if-block, etc), but that one was slightly different.


>
> There seems to be pattern repetition for _ev _ef and _sf _sv. Would it
> make sense to create a function instead of keeping the initial copy-paste?
>

Yes, and a few things like that, but I wanted this patch to keep as much
code as-is as possible.


>
> I think that some functions could be used for some repeated cases such as
> "discard one arg", "discard one or two arg", "discard whole line", for the
> various inactive branches, so as to factor out code.
>

I'd be in favor of that as well


>
> I would suggest to put together all if-related backslash command, so that
> the stack management is all in one function instead of 4.
>

I recognize the urge to group them together, but would there be any actual
code sharing between them? Wouldn't I be either re-checking the string
"cmd" again, or otherwise setting an enum that I immediately re-check
inside the all_branching_commands() function?


>
> For pset the inactive branch does OT_NORMAL instead of OT_NOT_EVAL, not
> sure why.


An oversight. Good catch.


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Peter Eisentraut
On 3/24/17 05:22, Michael Banck wrote:
> However, replication also seems to not work, I'm using the following
> script right now:

The problem is that your publication does not include any tables.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Peter Eisentraut
On 3/24/17 10:09, Petr Jelinek wrote:
> On 24/03/17 15:05, Peter Eisentraut wrote:
>> On 3/23/17 19:32, Petr Jelinek wrote:
>>> Yes, I also forgot to check if the table actually exists on subscriber
>>> when fetching them in CREATE SUBSCRIPTION (we have check during
>>> replication but not there).
>>
>> I think for this we can probably just change the missing_ok argument of
>> RangeVarGetRelid() to false.
>>
>> Unless we want the custom error message, in which case we need to change
>> AlterSubscription_refresh(), because right now it errors out because
>> missing_ok = false.
>>
> 
> You are right, stupid me.

Committed this version.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-24 Thread Fabien COELHO


Hello Corey,


I wished for the same thing, happy to use one if it is made known to me.
I pulled that pattern from somewhere else in the code, and given that the
max number of args for a command is around 4, I'm not too worried about
scaling.


If there are expressions one day like pgbench, the number of arguments 
becomes arbitrary. Have you looked at PQExpBuffer?



However there is an impact on testing because of all these changes. ISTM
that test cases should reflect this situation and test that \cd, \edit, ...
are indeed ignored properly and taking account there expected args...


I think one grand

\if false
\a
\c some_connect_string
...
\z some_table_name
\endif
should do the trick,


Yes. Maybe some commands could be on the same line as well.


but it wouldn't detect memory leaks.


No miracle...


There seems to be pattern repetition for _ev _ef and _sf _sv. Would it
make sense to create a function instead of keeping the initial copy-paste?


Yes, and a few things like that, but I wanted this patch to keep as much
code as-is as possible.


If you put the generic function at the same place, one would be more or 
less kept and the other would be just removed?


"git diff --patience -w" gives a rather convenient output for looking at 
the patch.



I would suggest to put together all if-related backslash command, so that
the stack management is all in one function instead of 4.


I recognize the urge to group them together, but would there be any actual
code sharing between them? Wouldn't I be either re-checking the string
"cmd" again, or otherwise setting an enum that I immediately re-check
inside the all_branching_commands() function?


I do not see that as a significant issue, especially compared to the 
benefit of having the automaton transition management in a single place.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] comment/security label for publication/subscription

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 12:18 AM, Peter Eisentraut
 wrote:
> Here is a patch to add COMMENT support for publications and subscriptions.
>
> On a similar issue, do we need SECURITY LABEL support for those?  Does
> that make sense?

IMHO, it's good to have COMMENT and SECURITY LABEL support for pretty
much everything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-24 Thread Kevin Grittner
On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu
 wrote:

> I've finished a draft proposal for "Eliminate O(N^2) scaling from
> rw-conflict tracking in serializable transactions".

I've attached some comments to the document; let me know if they
don't show up for you or you need clarification.

Overall, if the comments are addressed, I think it is great.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-24 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-24 11:26:27 -0400, Tom Lane wrote:
>> Another modest proposal:
>> 
>> I'm not really sold on the approach of using EEOP_FETCHSOME opcodes to
>> trigger initial tupleslot de-forming.  Certainly we want to have a single
>> slot_getsomeattrs call per source slot, but as-is, we need a separate
>> traversal over the expression tree just to precompute the max attribute
>> number needed.  That can't be good for expression compile speed, and
>> it introduces nontrivial bug risks because the code that does that
>> is completely decoupled from the code that emits the EEOP_VAR opcodes
>> (which are what's really relying on the de-forming to have happened).

> Hm.  We had the separate traversal for projections for a long while, and
> I don't think there've been a a lot of changes to the extraction of the
> last attribute number.

That's easily disproven just by looking at the code:

/*
 * Don't examine the arguments or filters of Aggrefs or WindowFuncs,
 * because those do not represent expressions to be evaluated within the
 * calling expression's econtext.  GroupingFunc arguments are never
 * evaluated at all.
 */
if (IsA(node, Aggref))
return false;
if (IsA(node, WindowFunc))
return false;
if (IsA(node, GroupingFunc))
return false;
return expression_tree_walker(node, get_last_attnums_walker,
  (void *) info);

The WindowFunc exception hasn't been there so long, and the GroupingFunc
one is very new.  And who's to say whether e.g. the recent XMLTABLE patch
got this right at all?  We could easily be extracting columns we don't
need to.

I'm willing to leave this as-is for the moment, but I really think we
should look into changing it (after the main patch is in).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 3:54 AM, Amit Kapila  wrote:
> On Fri, Mar 24, 2017 at 9:50 AM, Ashutosh Sharma  
> wrote:
>> On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma  
>> wrote:
>>>
>>> Thanks for reviewing my patch. I have removed the extra white space.
>>> Attached are both the patches.
>>
>> Sorry, I have mistakenly attached wrong patch. Here are the correct
>> set of patches.
>
> The latest version of patches looks fine to me.

I don't really like 0002.  What about this, instead?

--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
 /* Check that page type is sane. */
 pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
 if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
-pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
+pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
+pagetype != LH_UNUSED_PAGE)
 ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid hash page type %08x", pagetype)));

The advantage of that is (1) it won't get confused if in the future we
have an unused page that has some flag bit not in LH_PAGE_TYPE set,
and (2) if in the future we want to add any other checks to this
function which should apply to unused pages also, they won't get
bypassed by an early return statement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-03-24 Thread Robert Haas
On Fri, Mar 24, 2017 at 3:41 AM, vinayak
 wrote:
> I have updated the patch.

You can't change the definition of AcquireSampleRowsFunc without
updating the documentation in fdwhandler.sgml, but I think I don't
immediately understand why that's a thing we want to do at all if
neither file_fdw nor postgres_fdw are going to use the additional
argument.  It seems to be entirely pointless because the new value
being passed down is always zero and even if the callee were to update
it, do_analyze_rel() would just ignore the change on return.  Am I
missing something here?  Also, the whitespace-only changes to fdwapi.h
related to AcquireSampleRowsFunc going to get undone by pgindent, so
even if there's some reason to change that you should leave the lines
that don't need changing untouched.

+/* Reset rows processed to zero for the next column */
+pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_PROCESSED,
0);

This seems like a bad design.  If you see a value other than zero in
this field, you still won't know how much progress analyze has made,
because you don't know which column you're processing.  I also feel
like you're not paying enough attention to a key point here that I
made in my last review, which is that you need to look at where
ANALYZE actually spends most of the time.  If the process of computing
the per-row statistics consumes significant CPU time, then maybe
there's some point in trying to instrument it, but does it?  Unless
you know the answer to that question, you don't know whether there's
even a point to trying to instrument this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-24 Thread Ashutosh Sharma
 Thanks for reviewing my patch. I have removed the extra white space.
 Attached are both the patches.
>>>
>>> Sorry, I have mistakenly attached wrong patch. Here are the correct
>>> set of patches.
>>
>> The latest version of patches looks fine to me.
>
> I don't really like 0002.  What about this, instead?
>
> --- a/contrib/pageinspect/hashfuncs.c
> +++ b/contrib/pageinspect/hashfuncs.c
> @@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
>  /* Check that page type is sane. */
>  pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
>  if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
> -pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
> +pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
> +pagetype != LH_UNUSED_PAGE)
>  ereport(ERROR,
>  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>   errmsg("invalid hash page type %08x", pagetype)));
>
> The advantage of that is (1) it won't get confused if in the future we
> have an unused page that has some flag bit not in LH_PAGE_TYPE set,
> and (2) if in the future we want to add any other checks to this
> function which should apply to unused pages also, they won't get
> bypassed by an early return statement.

Agreed. Moreover, previous approach might even change the current
behaviour of functions like hash_page_items() and hash_page_stats()
basically when we pass UNUSED pages to these functions. Attached is
the newer version of patch.
From 5c54fa58895cd279ff44424a3eb1feafdaca69d5 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Sat, 25 Mar 2017 01:02:35 +0530
Subject: [PATCH] Allow pageinspect to handle UNUSED hash pages v3

---
 contrib/pageinspect/hashfuncs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..2632287 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
 	/* Check that page type is sane. */
 	pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
 	if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
-		pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
+		pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
+		pagetype != LH_UNUSED_PAGE)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid hash page type %08x", pagetype)));
-- 
1.8.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Michael Banck
Am Freitag, den 24.03.2017, 14:57 -0400 schrieb Peter Eisentraut:
> On 3/24/17 05:22, Michael Banck wrote:
> > However, replication also seems to not work, I'm using the following
> > script right now:
> 
> The problem is that your publication does not include any tables.

Oops, of course. That part must've got lost in a copy-paste or when I
tried to dig further why the CREATE SUBSCRIPTION segfaulted, sorry for
the noise.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Implement multivariate n-distinct coefficients

2017-03-24 Thread Alvaro Herrera
Tom Lane wrote:

> Why not use COSTS OFF?  Or I'll put that even more strongly: all the
> existing regression tests use COSTS OFF, exactly to avoid this sort of
> machine-dependent output.  There had better be a really damn good
> reason not to use it here.

If we use COSTS OFF, the test is completely pointless, as the plans look
identical regardless of whether the multivariate stats are being used or
not.

If we had a ROWS option to ANALYZE that showed estimated number of rows
but not the cost, that would be an option.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-24 Thread Corey Huinker
On Fri, Mar 24, 2017 at 4:10 PM, Fabien COELHO  wrote:

>
> Hello Corey,
>
> I wished for the same thing, happy to use one if it is made known to me.
>> I pulled that pattern from somewhere else in the code, and given that the
>> max number of args for a command is around 4, I'm not too worried about
>> scaling.
>>
>
> If there are expressions one day like pgbench, the number of arguments
> becomes arbitrary. Have you looked at PQExpBuffer?


I will look into it.


>
>>> There seems to be pattern repetition for _ev _ef and _sf _sv. Would it
>>> make sense to create a function instead of keeping the initial
>>> copy-paste?
>>>
>>
>> Yes, and a few things like that, but I wanted this patch to keep as much
>> code as-is as possible.
>>
>
> If you put the generic function at the same place, one would be more or
> less kept and the other would be just removed?


> "git diff --patience -w" gives a rather convenient output for looking at
> the patch.


Good to know about that option.

As for a function for digested ignored slash options, it seems like I can
disregard the true/false value of the "semicolon" parameter. Is that
correct?


> I would suggest to put together all if-related backslash command, so that
>>> the stack management is all in one function instead of 4.
>>>
>>
>> I recognize the urge to group them together, but would there be any actual
>> code sharing between them? Wouldn't I be either re-checking the string
>> "cmd" again, or otherwise setting an enum that I immediately re-check
>> inside the all_branching_commands() function?
>>
>
> I do not see that as a significant issue, especially compared to the
> benefit of having the automaton transition management in a single place.


I'm still struggling to see how this would add any clarity to the code
beyond what I can achieve by clustering the
exec_command_(if/elif/else/endif) near one another.


Re: [HACKERS] delta relations in AFTER triggers

2017-03-24 Thread Thomas Munro
On Fri, Mar 24, 2017 at 5:36 PM, Thomas Munro
 wrote:
> On Fri, Mar 24, 2017 at 1:14 PM, Thomas Munro
>  wrote:
>> If that's fixed and the permissions question can be waved away by
>> saying it's the same as the per-row situation, my only other comment
>> would be a bikeshed issue: Enr isn't a great name for a struct.
>
> One more thought: should this be allowed?
>
> postgres=# create table mytab (i int) partition by list (i);
> CREATE TABLE
> postgres=# create table mytab1 partition of mytab for values in (42);
> CREATE TABLE
> postgres=# create function my_trigger_function() returns trigger as $$
> begin end; $$ language plpgsql;
> CREATE FUNCTION
> postgres=# create trigger my_trigger after update on mytab referencing
> old table as my_old for each statement execute procedure
> my_trigger_function();
> CREATE TRIGGER

On second thoughts, that's actually arguably a bug in committed code.
What do you think about the attached patch?

-- 
Thomas Munro
http://www.enterprisedb.com


no-transition-tables-for-partitioned-tables.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-24 Thread Robert Haas
On Thu, Mar 23, 2017 at 2:35 PM, Ashutosh Sharma  wrote:
> I take your suggestion. Please find the attachments.

I think you should consider refactoring this so that it doesn't need
to use goto.  Maybe move the while (offnum <= maxoff) logic into a
helper function and have it return itemIndex.  If itemIndex == 0, you
can call it again.  Notice that the code in the "else" branch of the
if (itemIndex == 0) stuff could actually be removed from the else
block without changing anything, because the "if" block always either
returns or does a goto.  That's worthy of a little more work to try to
make things simple and clear.

+ *  We find the first item(or, if backward scan, the last item) in

Missing space.

In _hash_dropscanbuf(), the handling of hashso_bucket_buf is now
inconsistent with the handling of hashso_split_bucket_buf, which seems
suspicious. Suppose we enter this function with so->hashso_bucket_buf
and so->currPos.buf both being valid buffers, but not the same one.
It looks to me as if we'll release the first pin but not the second
one.  My guess (which could be wrong) is that so->hashso_bucket_buf =
InvalidBuffer should be moved back up higher in the function where it
was before, just after the first if statement, and that the new
condition so->hashso_bucket_buf == so->currPos.buf on the last
_hash_dropbuf() should be removed.  If that's not right, then I think
I need somebody to explain why not.

I am suspicious that _hash_kill_items() is going to have problems if
the overflow page is freed before it reacquires the lock.
_btkillitems() contains safeguards against similar cases.

This is not a full review, but I'm out of time for the moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm.c API tweak

2017-03-24 Thread Thomas Munro
On Sat, Mar 25, 2017 at 1:59 PM, Thomas Munro
 wrote:
> On Sat, Mar 25, 2017 at 12:35 PM, Alvaro Herrera
>  wrote:
>> Alvaro Herrera wrote:
>>> Per
>>> https://postgr.es/m/CAEepm=11ma_Z1HoPxPcSCANnh5ykHORa=hca1u1v1+5s_jw...@mail.gmail.com
>>> it seems that the dsm.c API is a bit inconvenient right now.  I proposed
>>> in the first patch in that thread to change the API so that a segment is
>>> marked as "pinned" if created with no ResourceOwner set as current;
>>> which is essentially the same as creating a fake one, then marking the
>>> segment as pinned.
>>>
>>> Thomas agrees with me, so I propose this patch as preparatory work for
>>> the autovacuum/BRIN stuff I'm interested in.
>>
>> Here's the proposed patch.
>
> +1

-   seg->resowner = CurrentResourceOwner;
-   ResourceOwnerRememberDSM(CurrentResourceOwner, seg);
+   if (CurrentResourceOwner)
+   {
+   seg->resowner = CurrentResourceOwner;
+   ResourceOwnerRememberDSM(CurrentResourceOwner, seg);
+   }

You need to assign seg->resowner = CurrentResourceOwner
unconditionally here.  Otherwise seg->resowner is uninitialised junk.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >