Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-23 Thread Thomas Munro
Hi,

Here is a new patch series responding to feedback from Peter and Andres:

1.  Support pgstat_report_tempfile and log_temp_files, which I had
overlooked as Peter pointed out.

2.  Use a patch format that is acceptable to git am, per complaint
off-list from Andres.  (Not actually made with git format-patch; I
need to learn some more git-fu, but they now apply cleanly with git
am).

On Thu, Mar 23, 2017 at 12:55 PM, Peter Geoghegan  wrote:
> I took a quick look at your V9 today. This is by no means a
> comprehensive review of 0008-hj-shared-buf-file-v9.patch, but it's
> what I can manage right now.

Thanks.  I really appreciate your patience with the resource
management stuff I had failed to think through.

> ...
>
> However, I notice that the place that this happens instead,
> PathNameDelete(), does not repeat the fd.c step of doing a final
> stat(), and using the stats for a pgstat_report_tempfile(). So, a new
> pgstat_report_tempfile() call is simply missing. However, the more
> fundamental issue in my mind is: How can you fix that? Where would it
> go if you had it?

You're right.  I may be missing something here (again), but it does
seem straightforward to implement because we always delete each file
that really exists exactly once (and sometimes we also try to delete
files that don't exist due to imprecise meta-data, but that isn't
harmful and we know when that turns out to be the case).

> If you do the obvious thing of just placing that before the new
> unlink() within PathNameDelete(), on the theory that that needs parity
> with the fd.c stuff, that has non-obvious implications. Does the
> pgstat_report_tempfile() call need to happen when going through this
> path, for example?:
>
>> +/*
>> + * Destroy a shared BufFile early.  Files are normally cleaned up
>> + * automatically when all participants detach, but it might be useful to
>> + * reclaim disk space sooner than that.  The caller asserts that no backends
>> + * will attempt to read from this file again and that only one backend will
>> + * destroy it.
>> + */
>> +void
>> +SharedBufFileDestroy(SharedBufFileSet *set, int partition, int participant)
>> +{

Yes, I think it should definitely go into
PathNameDeleteTemporaryFile() (formerly PathNameDelete()).

> The theory with the unlink()'ing() function PathNameDelete(), I
> gather, is that it doesn't matter if it happens to be called more than
> once, say from a worker and then in an error handling path in the
> leader or whatever. Do I have that right?

Yes, it may be called for a file that doesn't exist either because it
never existed, or because it has already been deleted.  To recap,
there are two reasons it needs to tolerate attempts to delete files
that aren't there:

1.  To be able to delete the fd.c files backing a BufFile given only a
BufFileTag.  We don't know how many segment files there are, but we
know how to build the prefix of the filename so we try to delete
[prefix].0, [prefix].1, [prefix].2 ... until we get ENOENT and
terminate.  I think this sort of thing would be more questionable for
durable storage backing a database object, but for temporary files I
can't think of a problem with it.

2.  SharedBufFileSet doesn't actually know how many partitions exist,
it just knows the *range* of partition numbers (because of its
conflicting fixed space and increasable partitions requirements).
>From that information it can loop building BufFileTags for all backing
files that *might* exist, and in practice they usually do because we
don't tend to have a 'sparse' range of partitions.

The error handling path isn't a special case: whoever is the last to
detach from the DSM segment will delete all the files, whether that
results from an error or not.  Now someone might call
SharedBufFileDestroy() to delete files sooner, but that can't happen
at the same time as a detach cleanup (the caller is still attached).

As a small optimisation avoiding a bunch of pointless unlink syscalls,
I shrink the SharedBufFileSet range if you happen to delete explicitly
with a partition number at the extremities of the range, and it so
happens that Parallel Hash Join explicitly deletes them in partition
order as the join runs, so in practice the range is empty by the time
SharedBufFileSet's cleanup runs and there is nothing to do, unless an
error occurs.

> Obviously the concern I have about that is that any stat() call you
> might add for the benefit of a new pgstat_report_tempfile() call,
> needed to keep parity with fd.c, now has a risk of double counting in
> error paths, if I'm not mistaken. We do need to do that accounting in
> the event of error, just as we do when there is no error, at least if
> current stats collector behavior is to be preserved. How can you
> determine which duplicate call here is the duplicate? In other words,
> how can you figure out which one is not supposed to
> pgstat_report_tempfile()? If the size of temp files in each worker is
> unknowable to the implementation in error pat

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-23 Thread Ashutosh Sharma
Hi All,

I have tried to test 'group_update_clog_v11.1.patch' shared upthread by
Amit on a high end machine. I have tested the patch with various savepoints
in my test script. The machine details along with test scripts and the test
results are shown below,

Machine details:

24 sockets, 192 CPU(s)
RAM - 500GB

test script:


\set aid random (1,3000)
\set tid random (1,3000)

BEGIN;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s1;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s2;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s3;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s4;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s5;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
END;

Non-default parameters
==
max_connections = 200
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
checkpoint_timeout=900
synchronous_commit=off


pgbench -M prepared -c $thread -j $thread -T $time_for_reading postgres -f
~/test_script.sql

where, time_for_reading = 10 mins

Test Results:
=

With 3 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 50275 53704 6.82048732
64 62860 66561 5.887686923
8 18464 18752 1.559792028

With 5 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 46559 47715 2.482871196
64 52306 52082 -0.4282491492
8 12289 12852 4.581332899


With 7 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 41367 41500 0.3215123166
64 42996 41473 -3.542189971
8 9665 9657 -0.0827728919

With 10 savepoints
==

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 34513 34597 0.24338655
64 32581 32035 -1.67582
8 7293 7622 4.511175099
*Conclusion:*
As seen from the test results mentioned above, there is some performance
improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly
better than HEAD, with 7 and 10 SP(s) we do see regression with patch.
Therefore, I think the threshold value of 4 for number of subtransactions
considered in the patch looks fine to me.


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Mar 21, 2017 at 6:19 PM, Amit Kapila 
wrote:

> On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas 
> wrote:
> > On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila 
> wrote:
> >>> I was wondering about doing an explicit test: if the XID being
> >>> committed matches the one in the PGPROC, and nsubxids matches, and the
> >>> actual list of XIDs matches, then apply the optimization.  That could
> >>> replace the logic that you've proposed to exclude non-commit cases,
> >>> gxact cases, etc. and it seems fundamentally safer.  But it might be a
> >>> more expensive test, too, so I'm not sure.
> >>
> >> I think if the number of subxids is very small let us say under 5 or
> >> so, then such a check might not matter, but otherwise it could be
> >> expensive.
> >
> > We could find out by testing it.  We could also restrict the
> > optimization to cases with just a few subxids, because if you've got a
> > large number of subxids this optimization probably isn't buying much
> > anyway.
> >
>
> Yes, and I have modified the patch to compare xids and subxids for
> group update.  In the initial short tests (with few client counts), it
> seems like till 3 savepoints we can win and 10 savepoints onwards
> there is some regression or at the very least there doesn't appear to
> be any benefit.  We need more tests to identify what is the safe
> number, but I thought it is better to share the patch to see if we
> agree on the changes because if not, then the whole testing needs to
> be repeated.  Let me know what do you think about attached?
>
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Logical decoding on standby

2017-03-23 Thread Craig Ringer
On 23 March 2017 at 00:13, Simon Riggs  wrote:
> On 22 March 2017 at 08:53, Craig Ringer  wrote:
>
>> I'm splitting up the rest of the decoding on standby patch set with
>> the goal of getting minimal functionality for creating and managing
>> slots on standbys in, so we can maintain slots on standbys and use
>> them when the standby is promoted to master.
>>
>> The first, to send catalog_xmin separately to the global xmin on
>> hot_standby_feedback and store it in the upstream physical slot's
>> catalog_xmin, is attached.
>>
>> These are extracted directly from the logical decoding on standby
>> patch, with comments by Petr and Andres made re the relevant code
>> addressed.
>
> I've reduced your two patches back to one with a smaller blast radius.
>
> I'll commit this tomorrow morning, barring objections.

This needs rebasing on top of

commit af4b1a0869bd3bb52e5f662e4491554b7f611489
Author: Simon Riggs 
Date:   Wed Mar 22 16:51:01 2017 +

Refactor GetOldestXmin() to use flags

Replace ignoreVacuum parameter with more flexible flags.

Author: Eiji Seki
Review: Haribabu Kommi


That patch landed up using PROCARRAY flags directly as flags to
GetOldestXmin, so it doesn't make much sense to add a flag like
PROCARRAY_REPLICATION_SLOTS . There won't be any corresponding PROC_
flag for PGXACT->vacuumFlags, replication slot xmin and catalog_xmin
are global state not tracked in individual proc entries.

Rather than add some kind of "PROC_RESERVED" flag in proc.h that would
never be used and only exist to reserve a bit for use for
PROCARRAY_REPLICATION_SLOTS, which we'd use a flag to GetOldestXmin, I
added a new argument to GetOldestXmin like the prior patch did.

If preferred I can instead add

proc.h:

#define PROC_RESERVED 0x20

procarray.h:

#define PROCARRAY_REPLICATION_SLOTS 0x20

and then test for (flags & PROCARRAY_REPLICATION_SLOTS)

but that's kind of ugly to say the least, I'd rather just add another argument.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From ffa43fae35857dbff0efe83ef199df165d887d97 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 contrib/pg_visibility/pg_visibility.c  |   6 +-
 contrib/pgstattuple/pgstatapprox.c |   2 +-
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/access/transam/xlog.c  |   4 +-
 src/backend/catalog/index.c|   3 +-
 src/backend/commands/analyze.c |   2 +-
 src/backend/commands/vacuum.c  |   5 +-
 src/backend/replication/walreceiver.c  |  44 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  11 ++-
 src/include/storage/procarray.h|   2 +-
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 12 files changed, 198 insertions(+), 62 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index ee3936e..2db5762 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -557,7 +557,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 	if (all_visible)
 	{
 		/* Don't pass rel; that will fail in recovery. */
-		OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
+		OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM, false);
 	}
 
 	rel = relation_open(relid, AccessShareLock);
@@ -674,7 +674,9 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
  * a buffer lock. And this shouldn't happen often, so it's
  * worth being careful so as to avoid false positives.
  */
-RecomputedOldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
+RecomputedOldestXmin = GetOldestXmin(NULL,
+	 PROCARRAY_FLAGS_VACUUM,
+	 false);
 
 if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
 	record_corrupt_item(items, &tuple.t_self);
diff --git a/contrib/pgstattu

Re: [HACKERS] Multiple false-positive warnings from Valgrind

2017-03-23 Thread Michael Paquier
On Tue, Mar 21, 2017 at 10:57 PM, Aleksander Alekseev
 wrote:
> Recently I've decided to run PostgreSQL under Valgrind according to wiki
> description [1]. Lots of warnings are generated [2] but it is my
> understanding that all of them are false-positive. For instance I've
> found these two reports particularly interesting:
>
> ```
> ==00:00:40:40.161 7677== Use of uninitialised value of size 8
> ==00:00:40:40.161 7677==at 0xA15FF5: pg_b64_encode (base64.c:68)
> ==00:00:40:40.161 7677==by 0x6FFE85: scram_build_verifier 
> (auth-scram.c:348)
> ==00:00:40:40.161 7677==by 0x6F3F76: encrypt_password (crypt.c:171)
> ==00:00:40:40.161 7677==by 0x68F40C: CreateRole (user.c:403)
> ==00:00:40:40.161 7677==by 0x85D53A: standard_ProcessUtility 
> (utility.c:716)
> ==00:00:40:40.161 7677==by 0x85CCC7: ProcessUtility (utility.c:353)
> ==00:00:40:40.161 7677==by 0x85BD22: PortalRunUtility (pquery.c:1165)
> ==00:00:40:40.161 7677==by 0x85BF20: PortalRunMulti (pquery.c:1308)
> ==00:00:40:40.161 7677==by 0x85B4A0: PortalRun (pquery.c:788)
> ==00:00:40:40.161 7677==by 0x855672: exec_simple_query (postgres.c:1101)
> ==00:00:40:40.161 7677==by 0x8597BB: PostgresMain (postgres.c:4066)
> ==00:00:40:40.161 7677==by 0x7C6322: BackendRun (postmaster.c:4317)
> ==00:00:40:40.161 7677==  Uninitialised value was created by a stack 
> allocation
> ==00:00:40:40.161 7677==at 0x6FFDB7: scram_build_verifier 
> (auth-scram.c:328)

I can see those warnings as well after calling a code path of
scram_build_verifier(), and I have a hard time seeing that as nothing
else than a false positive as you do. All those warnings go away if
you just initialize just do MemSet(salt, 0, SCRAM_SALT_LEN) before
calling pg_backend_random() but this data is filled in with
RAND_bytes() afterwards (if built with openssl). The estimated lengths
of the encoding are also correct. I don't see immediately what's wrong
here, this deserves a second look...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Simon Riggs
On 23 March 2017 at 06:42, Simon Riggs  wrote:
> On 23 March 2017 at 01:02, Thomas Munro  wrote:
>
>> Thanks!  Please find attached v7, which includes a note we can point
>> at when someone asks why it doesn't show 00:00:00, as requested.
>
> Thanks.
>
> Now I look harder the handling for logical lag seems like it would be
> problematic in many cases. It's up to the plugin whether it sends
> anything at all, so we should make a LagTrackerWrite() call only if
> the plugin sends something. Otherwise the lag tracker will just slow
> down logical replication.
>
> What I think we should do is add an LSN onto LogicalDecodingContext to
> represent the last LSN sent by the plugin, if any.
>
> If that advances after the call to LogicalDecodingProcessRecord() then
> we know we just sent a message and we can track that with
> LagTrackerWrite().
>
> So we make it the plugin's responsibility to maintain this LSN
> correctly, if at all. (It may decide not to)
>
> In English that means the plugin will update the LSN after each
> Commit, and since we reply only on commit this will provide a series
> of measurements we can use. It will still give a saw-tooth, but its
> better than flooding the LagTracker with false measurements.
>
> I think it seems easier to add that as a minor cleanup/open item after
> this commit.

Second thoughts... I'll just make LagTrackerWrite externally
available, so a plugin can send anything it wants to the tracker.
Which means I'm explicitly removing the "logical replication support"
from this patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.

2017-03-23 Thread Magnus Hagander
On Wed, Mar 22, 2017 at 9:00 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/22/17 14:09, Robert Haas wrote:
> >> The opposite means primary.  I can flip the GUC name to "is_primary", if
> >> that's clearer.
> > Hmm, I don't find that clearer.  "hot standby" has a very specific
> > meaning; "primary" isn't vague, but I would say it's less specific.
>
> The problem I have is that there is already a GUC named "hot_standby",
> which determines whether an instance is in hot (as opposed to warm?)
> mode if it is a standby.  This is proposing to add a setting named
> "in_hot_standby" which says nothing about the hotness, but something
> about the standbyness.  Note that these are all in the same namespace.
>
> I think we could use "in_recovery", which would be consistent with
> existing naming.
>

One thing we might want to consider around this -- in 10 we have
target_session_attrs=read-write (since
721f7bd3cbccaf8c07cad2707826b83f84694832), which will issue a SHOW
transaction_read_only on the connection.

We should probably consider if there is some way we can implement these two
things the same way. If we're inventing a new variable that gets pushed on
each connection, perhaps we can use that one and avoid the SHOW command? Is
the read-write thing really interesting in the aspect of the general case,
or is it more about detectinv readonly standbys as well? Or to flip that,
would sending the transaction_read_only parameter be enough for the usecase
in this thread, without having to invent a new variable?

(I haven't thought it through all the way, but figured I should mention the
thought as I'm working through my email backlog.)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] bug/oversight in TestLib.pm and PostgresNode.pm

2017-03-23 Thread Erik Rijkers

On 2017-03-23 03:28, Michael Paquier wrote:

On Thu, Mar 23, 2017 at 12:51 AM, Erik Rijkers  wrote:
While trying to test pgbench's stderr (looking for 'creating tables' 
in
output of the initialisation step)  I ran into these two bugs (or 
perhaps

better 'oversights').


+   if (defined $expected_stderr) {
+   like($stderr, $expected_stderr, "$test_name: stderr matches");
+   }
+   else {
is($stderr, '', "$test_name: no stderr");
-   like($stdout, $expected_stdout, "$test_name: matches");
+   }
To simplify that you could as well set expected_output to be an empty
string, and just use like() instead of is(), saving this if/else.


(I'll assume you meant '$expected_stderr' (not 'expected_output'))

That would be nice but with that, other tests start complaining: 
"doesn't look like a regex to me"


To avoid that, I uglified your version back to:

+   like($stderr, (defined $expected_stderr ? $expected_stderr : 
qr{}),

+   "$test_name: stderr matches");

I did it like that in the attached patch 
(0001-testlib-like-stderr.diff).



The other (PostgresNode.pm.diff) is unchanged.

make check-world without error.


Thanks,

Erik Rijkers

--- src/test/perl/TestLib.pm.orig	2017-03-23 08:11:16.034410936 +0100
+++ src/test/perl/TestLib.pm	2017-03-23 08:12:33.154132124 +0100
@@ -289,13 +289,14 @@
 
 sub command_like
 {
-	my ($cmd, $expected_stdout, $test_name) = @_;
+	my ($cmd, $expected_stdout, $test_name, $expected_stderr) = @_;
 	my ($stdout, $stderr);
 	print("# Running: " . join(" ", @{$cmd}) . "\n");
 	my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
 	ok($result, "$test_name: exit code 0");
-	is($stderr, '', "$test_name: no stderr");
-	like($stdout, $expected_stdout, "$test_name: matches");
+	like($stderr, (defined $expected_stderr ? $expected_stderr : qr{}),
+	"$test_name: stderr matches");
+	like($stdout, $expected_stdout, "$test_name: stdout matches");
 }
 
 sub command_fails_like
--- src/test/perl/PostgresNode.pm.orig	2017-03-22 15:58:58.690052999 +0100
+++ src/test/perl/PostgresNode.pm	2017-03-22 15:49:38.422777312 +0100
@@ -1283,6 +1283,23 @@
 
 =pod
 
+=item $node->command_fails_like(...) - TestLib::command_fails_like with our PGPORT
+
+See command_ok(...)
+
+=cut
+
+sub command_fails_like
+{
+	my $self = shift;
+
+	local $ENV{PGPORT} = $self->port;
+
+	TestLib::command_fails_like(@_);
+}
+
+=pod
+
 =item $node->issues_sql_like(cmd, expected_sql, test_name)
 
 Run a command on the node, then verify that $expected_sql appears in the

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for grammatical error in code-comment

2017-03-23 Thread Magnus Hagander
On Wed, Mar 22, 2017 at 2:47 PM, Emil Iggland  wrote:

> I found a grammatical error in one of the code comments.
>
> The comment is made regarding the run-time of the algorithm, and
> references big-O performance.
> However the comment text uses "effective", instead of the probably more
> correct "efficient".
>
> The original comment reads as though the code would not do its job
> correctly, rather than just not very quickly.
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Magnus Hagander
On Mon, Mar 20, 2017 at 1:37 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/18/17 09:00, Peter Eisentraut wrote:
> > I just noticed that createlang and droplang have been listed as
> > deprecated since PG 9.1.
> >
> > Do we dare remove them?
>
> Patch
>
>
+1


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Amit Kapila
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
>
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.
>
> I reverted back to UNLOGGED table because with WAL the results looked very
> weird (as posted earlier) even when I was taking a CHECKPOINT before each
> set and had set max_wal_size and checkpoint_timeout high enough to avoid any
> checkpoint during the run. Anyways, that's a matter of separate
> investigation and not related to this patch.
>
> I did two kinds of tests.
> a) update last column of the index
> b) update second column of the index
>
> v19 does considerably better than even master for the last column update
> case and pretty much inline for the second column update test. The reason is
> very clear because v19 determines early in the cycle that the buffer is
> already full and there is very little chance of doing a HOT update on the
> page. In that case, it does not check any columns for modification.
>

That sounds like you are dodging the actual problem.  I mean you can
put that same PageIsFull() check in master code as well and then you
will most probably again see the same regression.  Also, I think if we
test at fillfactor 80 or 75 (which is not unrealistic considering an
update-intensive workload), then we might again see regression.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-23 Thread Andreas Karlsson
I am fine with this version of the patch. The issues I have with it, 
which I mentioned earlier in this thread, seem to be issues with ICU 
rather than with this patch. For example there seems to be no way for 
ICU to validate the syntax of the BCP 47 locales (or ICU's old format). 
But I think we will just have to accept the weirdness of how ICU handles 
locales.


I think this patch is ready to be committed.

Found a typo in the documentation:

"The inspect the currently available locales" should be "To inspect the 
currently available locales".


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-03-23 Thread Craig Ringer
On 23 March 2017 at 16:07, Craig Ringer  wrote:

> If preferred I can instead add
>
> proc.h:
>
> #define PROC_RESERVED 0x20
>
> procarray.h:
>
> #define PROCARRAY_REPLICATION_SLOTS 0x20
>
> and then test for (flags & PROCARRAY_REPLICATION_SLOTS)

Attached done that way.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 2e887ee19c9c1bae442b9f0682169f9b0c61268a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/replication/walreceiver.c  |  45 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  12 ++-
 src/include/storage/proc.h |   5 +
 src/include/storage/procarray.h|  11 +++
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 7 files changed, 202 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 244e381..d8786f0 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1911,10 +1911,11 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current xmin. This may be 0, if the standby is
-  sending notification that Hot Standby feedback will no longer
-  be sent on this connection. Later non-zero messages may
-  reinitiate the feedback mechanism.
+  The standby's current global xmin, excluding the catalog_xmin from any
+  replication slots. If both this value and the following
+  catalog_xmin are 0 this is treated as a notification that Hot Standby
+  feedback will no longer be sent on this connection. Later non-zero
+  messages may reinitiate the feedback mechanism.
   
   
   
@@ -1924,7 +1925,29 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current epoch.
+  The epoch of the global xmin xid on the standby.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The lowest catalog_xmin of any replication slots on the standby. Set to 0
+  if no catalog_xmin exists on the standby or if hot standby feedback is being
+  disabled.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The epoch of the catalog_xmin xid on the standby.
   
   
   
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 31c567b..0f22f17 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1175,8 +1175,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 {
 	TimestampTz now;
 	TransactionId nextXid;
-	uint32		nextEpoch;
-	TransactionId xmin;
+	uint32		xmin_epoch, catalog_xmin_epoch;
+	TransactionId xmin, catalog_xmin;
 	static TimestampTz sendTime = 0;
 	/* initially true so we always send at least one feedback message */
 	static bool master_has_standby_xmin = true;
@@ -1221,29 +1221,56 @@ XLogWalRcvSendHSFeedback(bool immed)
 	 * everything else has been checked.
 	 */
 	if (hot_standby_feedback)
-		xmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT);
+	{
+		TransactionId slot_xmin;
+
+		/*
+		 * Usually GetOldestXmin() would include the global replication slot
+		 * xmin and catalog_xmin in its calculations, but we don't want to hold
+		 * upstream back from vacuuming normal user table tuples just because
+		 * they're within the catalog_xmin horizon of logical replication slots
+		 * on this standby, so we ignore slot xmin and catalog_xmin GetOldestXmin
+		 * then deal with them ourselves.
+		 */
+		xmin = GetOldestXmin(NULL,
+			 PROCARRAY_FLAGS_DEFAULT|PROCARRAY_SLOTS_XMIN);
+
+		ProcArrayGetReplicationSlotXmin(&slot_xmin, &catalog_xmin);
+
+		if (TransactionIdIsValid(slot_xmin) &&
+			TransactionIdPrecedes(slot_xmin, xmin))
+			xmin = slot_xmin;
+	}
 	else
+	{
 		xmin = InvalidTransactionId;
+		catalog_xmin = Invali

Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Simon Riggs
> Second thoughts... I'll just make LagTrackerWrite externally
> available, so a plugin can send anything it wants to the tracker.
> Which means I'm explicitly removing the "logical replication support"
> from this patch.

Done.

Here's the patch I'm looking to commit, with some docs and minor code
changes as discussed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


replication-lag-v7sr1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-03-23 Thread Vitaly Burovoy
On 3/22/17, Peter Eisentraut  wrote:
> On 3/22/17 03:59, Vitaly Burovoy wrote:
>> Column's IDENTITY behavior is very similar to a DEFAULT one. We write
>> "SET DEFAULT" and don't care whether it was set before or not, because
>> we can't have many of them for a single column. Why should we do that
>> for IDENTITY?
>
> One indication is that the SQL standard requires that DROP IDENTITY only
> succeed if the column is currently an identity column.  That is
> different from how DEFAULT works.

I think we'll end up with "DROP IDENTITY IF EXISTS" to avoid raising
an exception and "ADD OR SET" if your grammar remains.

> Another difference is that there is no such thing as "no default",
> because in absence of an explicit default, it is NULL.  So all you are
> doing with SET DEFAULT or DROP DEFAULT is changing the default.  You are
> not actually adding or removing it.

Right. From that PoV IDENTITY also changes a default value: "SET (ADD
... AS?) IDENTITY" works as setting a default to "nextval(...)"
whereas "DROP IDENTITY" works as setting it back to NULL.

> Therefore, the final effect of SET DEFAULT is the same no matter whether
> another default was there before or not.  For ADD/SET IDENTITY, you get
> different behaviors.  For example:
>
> ADD .. AS IDENTITY (START 2)
>
> creates a new sequence that starts at 2 and uses default parameters
> otherwise.  But
>
> SET (START 2)
>
> alters the start parameter of an existing sequence.  So depending on
> whether you already have an identity sequence, these commands do
> completely different things.

If you use "SET START 2" to a non-identity columns, you should get an
exception (no information about an identity type: neither "by default"
nor "always"). The correct example is:
ADD GENERATED BY DEFAULT AS IDENTITY (START 2)
and
SET GENERATED BY DEFAULT SET START 2

Note that creating a sequence is an internal machinery hidden from users.
Try to see from user's PoV: the goal is to have a column with an
autoincrement. If it is already autoincremented, no reason to create a
sequence (it is already present) and no reason to restart with 2
(there can be rows with such numbers).
"... SET START 2" is for the next "RESTART" DDL, and if a user insists
to start with 2, it is still possible:

SET GENERATED BY DEFAULT SET START 2 RESTART 2


I still think that introducing "ADD" for a property which can not be
used more than once (compare with "ADD CHECK": you can use it with the
same expression multiple times) is not a good idea.

I think there should be a consensus in the community for a grammar.

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Thomas Munro
On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs  wrote:
>> Second thoughts... I'll just make LagTrackerWrite externally
>> available, so a plugin can send anything it wants to the tracker.
>> Which means I'm explicitly removing the "logical replication support"
>> from this patch.
>
> Done.
>
> Here's the patch I'm looking to commit, with some docs and minor code
> changes as discussed.

Giving LagTrackerWrite external linkage seems sensible, assuming there
is a reasonable way for logical replication to decide when to call it.

+system. Monitoring systems should choose whether the represent this
+as missing data, zero or continue to display the last known value.

s/whether the/whether to/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
Thanks Amit. v19 addresses some of the comments below.

On Thu, Mar 23, 2017 at 10:28 AM, Amit Kapila 
wrote:

> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila 
> wrote:
> > On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
> >  wrote:
> >>
> >>>
> >>
> >> Please find attached rebased patches.
> >>
> >
> > Few comments on 0005_warm_updates_v18.patch:
> >
>
> Few more comments on 0005_warm_updates_v18.patch:
> 1.
> @@ -234,6 +241,25 @@ index_beginscan(Relation heapRelation,
>   scan->heapRelation = heapRelation;
>   scan->xs_snapshot = snapshot;
>
> + /*
> + * If the index supports recheck,
> make sure that index tuple is saved
> + * during index scans. Also build and cache IndexInfo which is used by
> + * amrecheck routine.
> + *
> + * XXX Ideally, we should look at
> all indexes on the table and check if
> + * WARM is at all supported on the base table. If WARM is not supported
> + * then we don't need to do any recheck.
> RelationGetIndexAttrBitmap() does
> + * do that and sets rd_supportswarm after looking at all indexes. But we
> + * don't know if the function was called earlier in the
> session when we're
> + * here. We can't call it now because there exists a risk of causing
> + * deadlock.
> + */
> + if (indexRelation->rd_amroutine->amrecheck)
> + {
> +scan->xs_want_itup = true;
> + scan->indexInfo = BuildIndexInfo(indexRelation);
> + }
> +
>
> Don't we need to do this rechecking during parallel scans?  Also what
> about bitmap heap scans?
>
>
Yes, we need to handle parallel scans. Bitmap scans are not a problem
because it can never return the same TID twice. I fixed this though by
moving this inside index_beginscan_internal.


> 2.
> +++ b/src/backend/access/nbtree/nbtinsert.c
> -
>  typedef struct
>
> Above change is not require.
>
>
Sure. Fixed.


> 3.
> +_bt_clear_items(Page page, OffsetNumber *clearitemnos, uint16 nclearitems)
> +void _hash_clear_items(Page page, OffsetNumber *clearitemnos,
> +   uint16 nclearitems)
>
> Both the above functions look exactly same, isn't it better to have a
> single function like page_clear_items?  If you want separation for
> different index types, then we can have one common function which can
> be called from different index types.
>
>
Yes, makes sense. Moved that to bufpage.c. The reason why I originally had
index-specific versions because I started by putting WARM flag in
IndexTuple header. But since hash index does not have the bit free, moved
everything to TID bit-flag. I still left index-specific wrappers, but they
just call PageIndexClearWarmTuples.


> 4.
> - if (callback(htup, callback_state))
> + flags = ItemPointerGetFlags(&itup->t_tid);
> + is_warm = ((flags & BTREE_INDEX_WARM_POINTER) != 0);
> +
> + if (is_warm)
> + stats->num_warm_pointers++;
> + else
> + stats->num_clear_pointers++;
> +
> + result = callback(htup, is_warm, callback_state);
> + if (result == IBDCR_DELETE)
> + {
> + if (is_warm)
> + stats->warm_pointers_removed++;
> + else
> + stats->clear_pointers_removed++;
>
> The patch looks to be inconsistent in collecting stats for btree and
> hash.  I don't see above stats are getting updated in hash index code.
>
>
Fixed. The hashbucketcleanup signature is just getting a bit too long. May
be we should move some of these counters in a structure and pass that
around. Not done here though.


> 5.
> +btrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
> + Relation heapRel, HeapTuple heapTuple)
> {
> ..
> + if (!datumIsEqual(values[i - 1], indxvalue, att->attbyval,
> + att->attlen))
> ..
> }
>
> Will this work if the index is using non-default collation?
>
>
Not sure I understand that. Can you please elaborate?


> 6.
> +++ b/src/backend/access/nbtree/nbtxlog.c
> @@ -390,83 +390,9 @@ btree_xlog_vacuum(XLogReaderState *record)
> -#ifdef UNUSED
>   xl_btree_vacuum *xlrec = (xl_btree_vacuum *) XLogRecGetData(record);
>
>   /*
> - * This section of code is thought to be no longer needed, after analysis
> - * of the calling paths. It is retained to allow the code to be reinstated
> - * if a flaw is revealed in that thinking.
> - *
> ..
>
> Why does this patch need to remove the above code under #ifdef UNUSED
>
>
Yeah, it isn't strictly necessary. But that dead code was coming in the way
and hence I decided to strip it out. I can put it back if it's an issue or
remove that as a separate commit first.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila 
wrote:

>
>
> That sounds like you are dodging the actual problem.  I mean you can
> put that same PageIsFull() check in master code as well and then you
> will most probably again see the same regression.


Well I don't see it that way. There was a specific concern about a specific
workload that WARM might regress. I think this change addresses that. Sure
if you pick that one piece, put it in master first and then compare against
rest of the WARM code, you will see a regression. But I thought what we
were worried is WARM causing regression to some existing user, who might
see her workload running 10% slower, which this change mitigates.


> Also, I think if we
> test at fillfactor 80 or 75 (which is not unrealistic considering an
> update-intensive workload), then we might again see regression.


Yeah, we might, but it will be lesser than before, may be 2% instead of
10%. And by doing this we are further narrowing an already narrow test
case. I think we need to see things in totality and weigh in costs-benefits
trade offs. There are numbers for very common workloads, where WARM may
provide 20, 30 or even more than 100% improvements.

Andres and Alvaro already have other ideas to address this problem even
further. And as I said, we can pass-in index specific information and make
that routine bail-out even earlier. We need to accept that WARM will need
to do more attr checks than master, especially when there are more than 1
indexes on the table,  and sometimes those checks will go waste. I am ok if
we want to provide table-specific knob to disable WARM, but not sure if
others would like that idea.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Daniel Gustafsson
> On 20 Mar 2017, at 01:37, Peter Eisentraut  
> wrote:
> 
> On 3/18/17 09:00, Peter Eisentraut wrote:
>> I just noticed that createlang and droplang have been listed as
>> deprecated since PG 9.1.
>> 
>> Do we dare remove them?
> 
> Patch

LGTM, +1

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-23 Thread Dave Page
On Wed, Mar 22, 2017 at 3:46 PM, Dave Page  wrote:
> On Wed, Mar 22, 2017 at 1:15 PM, Stephen Frost  wrote:
>>
>> I did specifically ask for explicit roles to be made to enable such
>> capability and that the pg_monitor role be GRANT'd those roles instead
>> of hacking the pg_monitor OID into those checks, but it seems like
>> that's not been done yet.
>
> Yeah, sorry - I missed that for pg_stat_activity. I'll update the patch.

Updated patch attached. This changes pg_read_all_gucs to
pg_read_all_settings, and adds pg_read_all_stats which it grants to
pg_monitor. pg_read_all_stats has full access to the pg_stat_ views
(as pg_monitor did previously), and is used in the various contrib
modules in place of pg_monitor.

Thanks.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
index 497dbeb229..18f7a87452 100644
--- a/contrib/pg_buffercache/Makefile
+++ b/contrib/pg_buffercache/Makefile
@@ -4,8 +4,9 @@ MODULE_big = pg_buffercache
 OBJS = pg_buffercache_pages.o $(WIN32RES)
 
 EXTENSION = pg_buffercache
-DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql \
-   pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
+DATA = pg_buffercache--1.2.sql pg_buffercache--1.2--1.3.sql \
+   pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql \
+   pg_buffercache--unpackaged--1.0.sql
 PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in real-time"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_buffercache/pg_buffercache.control 
b/contrib/pg_buffercache/pg_buffercache.control
index a4d664f3fa..8c060ae9ab 100644
--- a/contrib/pg_buffercache/pg_buffercache.control
+++ b/contrib/pg_buffercache/pg_buffercache.control
@@ -1,5 +1,5 @@
 # pg_buffercache extension
 comment = 'examine the shared buffer cache'
-default_version = '1.2'
+default_version = '1.3'
 module_pathname = '$libdir/pg_buffercache'
 relocatable = true
diff --git a/contrib/pg_freespacemap/Makefile b/contrib/pg_freespacemap/Makefile
index 7bc0e9555d..0a2f000ec6 100644
--- a/contrib/pg_freespacemap/Makefile
+++ b/contrib/pg_freespacemap/Makefile
@@ -4,8 +4,8 @@ MODULE_big = pg_freespacemap
 OBJS = pg_freespacemap.o $(WIN32RES)
 
 EXTENSION = pg_freespacemap
-DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.0--1.1.sql \
-   pg_freespacemap--unpackaged--1.0.sql
+DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.1--1.2.sql \
+   pg_freespacemap--1.0--1.1.sql pg_freespacemap--unpackaged--1.0.sql
 PGFILEDESC = "pg_freespacemap - monitoring of free space map"
 
 ifdef USE_PGXS
diff --git a/contrib/pg_freespacemap/pg_freespacemap.control 
b/contrib/pg_freespacemap/pg_freespacemap.control
index 764db30d18..ac8fc5050a 100644
--- a/contrib/pg_freespacemap/pg_freespacemap.control
+++ b/contrib/pg_freespacemap/pg_freespacemap.control
@@ -1,5 +1,5 @@
 # pg_freespacemap extension
 comment = 'examine the free space map (FSM)'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/pg_freespacemap'
 relocatable = true
diff --git a/contrib/pg_stat_statements/Makefile 
b/contrib/pg_stat_statements/Makefile
index 298951a5f5..39b368b70e 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -4,9 +4,10 @@ MODULE_big = pg_stat_statements
 OBJS = pg_stat_statements.o $(WIN32RES)
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.3--1.4.sql \
-   pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \
-   pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.4--1.5.sql \
+   pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
+   pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \
+   pg_stat_statements--unpackaged--1.0.sql
 PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 
 LDFLAGS_SL += $(filter -lm, $(LIBS))
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index 221ac98d4a..cec94d5896 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -62,6 +62,7 @@
 #include 
 
 #include "access/hash.h"
+#include "catalog/pg_authid.h"
 #include "executor/instrument.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
@@ -1386,7 +1387,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
MemoryContext per_query_ctx;
MemoryContext oldcontext;
Oid userid = GetUserId();
-   boolis_superuser = superuser();
+   boolis_superuser = false;
char   *qbuffer = NULL;
Sizeqbuffer_size = 0;
Sizeextent = 0;
@@ -1394,6 +1395,9 @@ pg_stat_statements_internal(FunctionCall

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-23 Thread Andreas Karlsson

On 03/21/2017 08:02 AM, Haribabu Kommi wrote:

Solution -1) Just ignore dumping these CREATE DATABASE
commands and provide the user information in the documentation
to create "postgres" and "template1" database in the target in case
if they don't exist. If this kind of cases are very rare.

Solution-2) Add a new command line option/some other settings
to indicate the pg_dump execution is from pg_dumpall and follow
the current refactored behavior, otherwise follow the earlier pg_dump
behavior in handling CREATE DATABASE commands for "postgres"
and "template1" databases.


I am leaning towards (2) since I feel having pg_dump act differently 
depending on the name of the database is a quite surprising behavior. It 
makes more sense to let a tool like pg_dumpall handle logic like that.



2.  In dumpDatabases function before calling the runPgDump command,
Before refactoring, it used to connect to the database and dump
"SET default_transaction_read_only = off;" to prevent some accidental
overwrite of the target.

I fixed it in the attached patch by removing the connection and dumping
the set command.

Does it needs the similar approach of solution-2) in previous problem and
handle dumping the "SET default_transaction_read_only = off;" whenever
the CREATE DATABASE and \connect command is issued?


Hm, that is a bit annoying. I do not think we want to change any 
behavior here, either of pg_dump or pg_dumpall, but I also do not like 
having to add two new flags to pg_dump (one for including the ALTER 
DATABASE commands but not CREATE DATABASE, and another flag for 
default_transaction_read_only) or a special flag similar to 
--binary-upgrade.


None of these options seem optimal to me, and I do not have any strong 
preference other than that we should avoid breaking pg_dump or changing 
behavior not related to the database attributes.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-23 Thread Ashutosh Sharma
On Thu, Mar 23, 2017 at 9:17 AM, Amit Kapila  wrote:
>
> On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila  wrote:
> > On Wed, Mar 22, 2017 at 3:39 PM, Ashutosh Sharma  
> > wrote:
> >> Hi,
> >>
> >> On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila  
> >> wrote:
> >>
> >> To fix this, I think we should pass 'REGBUF_KEEP_DATA' while
> >> registering the buffer. Something like this,
> >>
> >> -   XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
> >> +   XLogRegisterBuffer(0, buf, REGBUF_STANDARD |
> >> REGBUF_KEEP_DATA);
> >>
> >> Attached is the patch that fixes this issue.
> >>
> >
> > I think this will work, but not sure if there is a merit to deviate
> > from what btree does to handle this case.   One thing I find slightly
> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
> > using a number of tuples registered as part of fixed data
> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
> > I think it will be better if we register offsets also in fixed part of
> > data as we are doing btree case.

Agreed. I have made the changes accordingly. Please check attached v2 patch.

>
> >
> >
>
> Also another small point in this regard, do we need two separate
> variables to track number of deleted items in below code?  I think one
> variable is sufficient.
>
> _hash_vacuum_one_page()
> {
> ..
> deletable[ndeletable++] = offnum;
> tuples_removed += 1;--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
> ..
> }
>

Yes, I think 'ndeletable' alone should be fine.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index de7522e..d9ac42c 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -957,8 +957,6 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	OffsetNumber	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	int		i;
-	char *ptr;
-	Size len;
 
 	xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
 
@@ -977,12 +975,20 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 		return latestRemovedXid;
 
 	/*
+	 * Check if WAL replay has reached a consistent database state. If not,
+	 * we must PANIC. See the definition of btree_xlog_delete_get_latestRemovedXid
+	 * for more details.
+	 */
+	if (!reachedConsistency)
+		elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
+
+	/*
 	 * Get index page.  If the DB is consistent, this should not fail, nor
 	 * should any of the heap page fetches below.  If one does, we return
 	 * InvalidTransactionId to cancel all HS transactions.  That's probably
 	 * overkill, but it's safe, and certainly better than panicking here.
 	 */
-	XLogRecGetBlockTag(record, 1, &rnode, NULL, &blkno);
+	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
 	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
 
 	if (!BufferIsValid(ibuffer))
@@ -994,9 +1000,7 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	 * Loop through the deleted index items to obtain the TransactionId from
 	 * the heap items they point to.
 	 */
-	ptr = XLogRecGetBlockData(record, 1, &len);
-
-	unused = (OffsetNumber *) ptr;
+	unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
 
 	for (i = 0; i < xlrec->ntuples; i++)
 	{
@@ -1121,23 +1125,15 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
 
 	if (action == BLK_NEEDS_REDO)
 	{
-		char *ptr;
-		Size len;
-
-		ptr = XLogRecGetBlockData(record, 0, &len);
-
 		page = (Page) BufferGetPage(buffer);
 
-		if (len > 0)
+		if (XLogRecGetDataLen(record) > SizeOfHashVacuumOnePage)
 		{
 			OffsetNumber *unused;
-			OffsetNumber *unend;
 
-			unused = (OffsetNumber *) ptr;
-			unend = (OffsetNumber *) ((char *) ptr + len);
+			unused = (OffsetNumber *) ((char *) xldata + SizeOfHashVacuumOnePage);
 
-			if ((unend - unused) > 0)
-PageIndexMultiDelete(page, unused, unend - unused);
+			PageIndexMultiDelete(page, unused, xldata->ntuples);
 		}
 
 		/*
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 8640e85..8699b5b 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -344,7 +344,6 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 	Page	page = BufferGetPage(buf);
 	HashPageOpaque	pageopaque;
 	HashMetaPage	metap;
-	double tuples_removed = 0;
 
 	/* Scan each tuple in page to see if it is marked as LP_DEAD */
 	maxoff = PageGetMaxOffsetNumber(page);
@@ -355,10 +354,7 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 		ItemId	itemId = PageGetItemId(page, offnum);
 
 		if (ItemIdIsDead(itemId))
-		{
 			deletable[ndeletable++] = offnum;
-			tuples_removed += 1;
-		}
 	}
 
 	if (ndeletable > 0)
@@ -386,7 +382,7 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer

Re: [HACKERS] Parallel Append implementation

2017-03-23 Thread Amit Khandekar
On 23 March 2017 at 05:55, Robert Haas  wrote:
> On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar  
> wrote:
>> Attached is the updated patch that handles the changes for all the
>> comments except the cost changes part. Details about the specific
>> changes are after the cost-related points discussed below.
>>
>> For non-partial paths, I was checking following 3 options :
>>
>> Option 1. Just take the sum of total non-partial child costs and
>> divide it by number of workers. It seems to be getting close to the
>> actual cost.
>
> If the costs for all children are about equal, then that works fine.
> But when they are very unequal, then it's highly misleading.
>
>> Option 2. Calculate exact cost by an algorithm which I mentioned
>> before, which is pasted below for reference :
>> Per­subpath cost : 20 16 10 8 3 1, with 3 workers.
>> After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
>> times remaining are :
>> 10  6  0 8 3 1
>> After 6 units (minimum of 10, 06, 08), the times remaining are :
>> 4  0  0 2 3 1
>> After 2 units (minimum of 4, 2, 3), the times remaining are :
>>  2  0  0 0 1 1
>> After 1 units (minimum of 2, 1, 1), the times remaining are :
>>  1  0  0 0 0 0
>> After 1 units (minimum of 1, 0 , 0), the times remaining are :
>>  0  0  0 0 0 0
>> Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20
>

> This gives the same answer as what I was proposing

Ah I see.

> but I believe it's more complicated to compute.
Yes a bit, particularly because in my algorithm, I would have to do
'n' subtractions each time, in case of 'n' workers. But it looked more
natural because it follows exactly the way we manually calculate.

> The way my proposal would work in this
> case is that we would start with an array C[3] (since there are three
> workers], with all entries 0.  Logically C[i] represents the amount of
> work to be performed by worker i.  We add each path in turn to the
> worker whose array entry is currently smallest; in the case of a tie,
> just pick the first such entry.
>
> So in your example we do this:
>
> C[0] += 20;
> C[1] += 16;
> C[2] += 10;
> /* C[2] is smaller than C[0] or C[1] at this point, so we add the next
> path to C[2] */
> C[2] += 8;
> /* after the previous line, C[1] is now the smallest, so add to that
> entry next */
> C[1] += 3;
> /* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
> C[2] += 1;
> /* final result: C[0] = 20, C[1] = 19, C[2] = 19 */
>
> Now we just take the highest entry that appears in any array, which in
> this case is C[0], as the total cost.

Wow. The way your final result exactly tallies with my algorithm
result is very interesting. This looks like some maths or computer
science theory that I am not aware.

I am currently coding the algorithm using your method. Meanwhile
attached is a patch that takes care of your other comments, details of
which are below...

>
> In my previous review, I said that you should "define and document a
> new builtin tranche ID"; you did the first but not the second.  See
> the table in monitoring.sgml.

Yeah, I tried to search how TBM did in the source, but I guess I
didn't correctly run "git grep" commands, so the results did not have
monitoring.sgml, so I thought may be you mean something else by
"document".

Added changes in monitoring.sgml now.

>
> Definition of exec_append_goto_next_plan should have a line break
> after the return type, per usual PostgreSQL style rules.

Oops. Done.

>
> - * initialize to scan first subplan
> + * In case it's a sequential Append, initialize to scan first subplan.
>
> This comment is confusing because the code is executed whether it's
> parallel or not.  I think it might be better to write something like
> "initialize to scan first subplan (but note that we'll override this
> later in the case of a parallel append)"
Done.

>
>  /*
> + * Check if we are already finished plans from parallel append. This
> + * can happen if all the subplans are finished when this worker
> + * has not even started returning tuples.
> + */
> +if (node->as_padesc && node->as_whichplan == PA_INVALID_PLAN)
> +return ExecClearTuple(node->ps.ps_ResultTupleSlot);
>
> There seems to be no reason why this couldn't be hoisted out of the
> loop.  Actually, I think Ashutosh pointed this out before, but I
> didn't understand at that time what his point was.  Looking back, I
> see that he also pointed out that the as_padesc test isn't necessary,
> which is also true.

I am assuming both yours and Ashutosh's concern is that this check
will be executed for *each* tuple returned, and which needs to be
avoided. Actually, just moving it out of the loop is not going to
solve the runs-for-each-tuple issue. It still will execute for each
tuple. But after a thought, now I agree this can be taken out of loop
anyways, but, not for solving the per-tuple issue, but because it need
not be run for each of the iteration of the loop becau

Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-23 Thread Rafia Sabih
On Thu, Mar 16, 2017 at 12:45 AM, Fabien COELHO  wrote:
>
> Hello Rafia,
>
>> I was reviewing v7 of this patch, to start with I found following white
>> space errors when applying with git apply,
>> /home/edb/Desktop/patches/others/pgbench-into-7.patch:66: trailing
>> whitespace.
>
>
> Yep.
>
> I do not know why "git apply" sometimes complains. All is fine for me both
> with "git apply" and "patch".
>
> Last time it was because my mailer uses text/x-diff for the mime type, as
> define by the system in "/etc/mime.types", which some mailer then interpret
> as a license to change eol-style when saving, resulting in this kind of
> behavior. Could you tell your mailer just to save the file as is?
>
>> Apart from that, on executing SELECT 1 AS a \gset \set i debug(:a) SELECT
>> 2
>> AS a \gcset SELECT 3; given in your provided script gset-1.sql. it is
>> giving error Invalid command \gcset.
>
>
> Are you sure that you are using the compiled pgbench, not a previously
> installed one?
>
>   bin/pgbench> pgbench -t 1 -f SQL/gset-1.sql
> SQL/gset-1.sql:1: invalid command in command "gset"
> \gset
>
>   bin/pgbench> ./pgbench -t 1 -f SQL/gset-1.sql
> starting vacuum...end.
> debug(script=0,command=2): int 1
> debug(script=0,command=4): int 2
> ...
>
>> Not sure what is the intention of this script anyway?
>
>
> The intention is to test that gset & gcset work as expected in various
> settings, especially with combined queries (\;) the right result must be
> extracted in the sequence.
>
>> Also, instead of so many different files for error why don't you combine
>> it into one.
>
>
> Because a pgbench scripts stops on the first error, and I wanted to test
> what happens with several kind of errors.
>
if (my_command->argc > 2)
+ syntax_error(source, lineno, my_command->line, my_command->argv[0],
+ "at most on argument expected", NULL, -1);

I suppose you mean 'one' argument here.
Apart from that indentation is not correct as per pgindent, please check.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-03-23 Thread Jeevan Chalke
On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska  wrote:

> Jeevan Chalke  wrote:
>
> > Declarative partitioning is supported in PostgreSQL 10 and work is
> already in
> > progress to support partition-wise joins. Here is a proposal for
> partition-wise
> > aggregation/grouping. Our initial performance measurement has shown 7
> times
> > performance when partitions are on foreign servers and approximately 15%
> when
> > partitions are local.
> >
> > Partition-wise aggregation/grouping computes aggregates for each
> partition
> > separately. If the group clause contains the partition key, all the rows
> > belonging to a given group come from one partition, thus allowing
> aggregates
> > to be computed completely for each partition. Otherwise, partial
> aggregates
> > computed for each partition are combined across the partitions to
> produce the
> > final aggregates. This technique improves performance because:
>
> > i. When partitions are located on foreign server, we can push down the
> > aggregate to the foreign server.
>
> > ii. If hash table for each partition fits in memory, but that for the
> whole
> > relation does not, each partition-wise aggregate can use an in-memory
> hash
> > table.
>
> > iii. Aggregation at the level of partitions can exploit properties of
> > partitions like indexes, their storage etc.
>
> I suspect this overlaps with
>
> https://www.postgresql.org/message-id/29111.1483984605%40localhost
>
> I'm working on the next version of the patch, which will be able to
> aggregate
> the result of both base relation scans and joins. I'm trying hard to make
> the
> next version available before an urgent vacation that I'll have to take at
> random date between today and early April. I suggest that we coordinate the
> effort, it's lot of work in any case.
>

IIUC, it seems that you are trying to push down the aggregation into the
joining relations. So basically you are converting
Agg -> Join -> {scan1, scan2} into
FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
In addition to that your patch pushes aggregates on base rel to its
children,
if any.

Where as what I propose here is pushing down aggregation below the append
node keeping join/scan as is. So basically I am converting
Agg -> Append-> Join -> {scan1, scan2} into
Append -> Agg -> Join -> {scan1, scan2}.
This will require partition-wise join as posted in [1].
But I am planning to make this work for partitioned relations and not for
generic inheritance.

I treat these two as separate strategies/paths to be consider while
planning.

Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.

I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these
task,
then we may need to find a way to integrate them.

[1]
https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com#CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com



>
> --
> Antonin Houska
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de, http://www.cybertec.at
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-23 Thread Kuntal Ghosh
On Wed, Mar 22, 2017 at 9:54 PM, Robert Haas  wrote:
> On Wed, Mar 22, 2017 at 12:20 PM, Robert Haas  wrote:
>> On Wed, Mar 22, 2017 at 1:31 AM, Michael Paquier
>>  wrote:
>>> Okay, switched as ready for committer. One note for the committer
>>> though: keeping the calls of pgstat_bestart() out of
>>> BackgroundWorkerInitializeConnection() and
>>> BackgroundWorkerInitializeConnectionByOid() keeps users the
>>> possibility to not have backends connected to the database show up in
>>> pg_stat_activity. This may matter for some users (cloud deployment for
>>> example). I am as well in favor in keeping the work of those routines
>>> minimal, without touching at pgstat.
>>
>> I think that's just inviting bugs of omission, in both core and
>> extension code.  I think it'd be much better to do this in a
>> centralized place.
>
> I mean, your argument boils down to "somebody might want to
> deliberately hide things from pg_stat_activity".  But that's not
> really a mode we support in general, and supporting it only for
> certain cases doesn't seem like something that this patch should be
> about.  We could add an option to BackgroundWorkerInitializeConnection
> and BackgroundWorkerInitializeConnectionByOid to suppress it, if it's
> something that somebody wants, but actually I'd be more inclined to
> think that everybody (who has a shared memory connection) should go
> into the machinery and then security-filtering should be left to some
> higher-level facility that can make policy decisions rather than being
> hard-coded in the individual modules.
>
> But I'm slightly confused as to how this even arises.  Background
> workers already show up in pg_stat_activity output, or at least I sure
> think they do.  So why does this patch need to make any change to that
> case at all?
When we initialize a process through InitPostgres, the control returns
from the method at different locations based on the process's type(as
soon as its relevant information is initialized).  Following is the
order of returning a process from InitPostgres:

IsAutoVacuumLauncherProcess
walsender not connected to a DB
bgworker not connected to a DB
Other processes using InitPostgres

Before the patch, not all the processes are shown in pg_stat_activity.
Hence, only at two locations in InitPostgres, we need to call
pgstat_bestart() to initialize the entry for the respective process.
But, since we're increasing the types of a process shown in
pg_stat_activity, it seems to be reasonable to move the
pgstat_bestart() call to a proc's main entry point. I've followed the
same approach for auxiliary processes as well.

AutovacuumLauncher - AutoVacLauncherMain()
bgwriter BackgroundWriterMain()
checkpointer CheckpointerMain()
startup StartupProcessMain()
walwriter WalWriterMain()
walreceiver WalReceiverMain()
walsender InitWalSender()

Hence, to be consistent with others, bgworker processes can be
initialized from BackgroundWorkerInitializeConnectionBy[Oid].

I've attached the updated patches which reflect the above change. PFA.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-all-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream


0002-Expose-stats-for-all-backends.patch
Description: binary/octet-stream


0003-Add-backend_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-23 Thread Michael Paquier
On Thu, Mar 23, 2017 at 8:19 PM, Kuntal Ghosh
 wrote:
> Hence, to be consistent with others, bgworker processes can be
> initialized from BackgroundWorkerInitializeConnectionBy[Oid].

Yeah, I am fine with that. Thanks for the updated versions.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 4:08 PM, Pavan Deolasee 
wrote:

>
>
> On Thu, Mar 23, 2017 at 3:02 PM, Amit Kapila 
> wrote:
>
>>
>>
>> That sounds like you are dodging the actual problem.  I mean you can
>> put that same PageIsFull() check in master code as well and then you
>> will most probably again see the same regression.
>
>
> Well I don't see it that way. There was a specific concern about a
> specific workload that WARM might regress. I think this change addresses
> that. Sure if you pick that one piece, put it in master first and then
> compare against rest of the WARM code, you will see a regression.
>

BTW the PageIsFull() check may not help as much in master as it does with
WARM. In master we anyways bail out early after couple of column checks. In
master it may help to reduce the 10% drop that we see while updating last
index column, but if we compare master and WARM with the patch applied,
regression should be quite nominal.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Alexey Kondratov
Hi pgsql-hackers,

I'm planning to apply to GSOC'17 and my proposal consists currently of two 
parts:

(1) Add errors handling to COPY as a minimum program

Motivation: Using PG on the daily basis for years I found that there are some 
cases when you need to load (e.g. for a further analytics) a bunch of not well 
consistent records with rare type/column mismatches. Since PG throws exception 
on the first error, currently the only one solution is to preformat your data 
with any other tool and then load to PG. However, frequently it is easier to 
drop certain records instead of doing such preprocessing for every data source 
you have.

I have done a small research and found the item in PG's TODO 
https://wiki.postgresql.org/wiki/Todo#COPY, previous attempt to push similar 
patch 
https://www.postgresql.org/message-id/flat/603c8f070909141218i291bc983t501507ebc996a531%40mail.gmail.com#603c8f070909141218i291bc983t501507ebc996a...@mail.gmail.com.
 There were no negative responses against this patch and it seams that it was 
just forgoten and have not been finalized.

As an example of a general idea I can provide read_csv method of python package 
– pandas 
(http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html). 
It uses C parser which throws error on first columns mismatch. However, it has 
two flags error_bad_lines and warn_bad_lines, which being set to False helps to 
drop bad lines or even hide warn messages about them.


(2) Parallel COPY execution as a maximum program

I guess that there is nothing necessary to say about motivation, it just should 
be faster on multicore CPUs.

There is also an record about parallel COPY in PG's wiki 
https://wiki.postgresql.org/wiki/Parallel_Query_Execution. There are some side 
extensions, e.g. https://github.com/ossc-db/pg_bulkload, but it always better 
to have well-performing core functionality out of the box.


My main concerns here are:

1) Is there anyone out of PG comunity who will be interested in such project 
and can be a menthor?
2) These two points have a general idea – to simplify work with a large amount 
of data from a different sources, but mybe it would be better to focus on the 
single task?
3) Is it realistic to mostly finish both parts during the 3+ months of almost 
full-time work or I am too presumptuous?

I will be very appreciate to any comments and criticism.


P.S. I know about very interesting ready projects from the PG's comunity 
https://wiki.postgresql.org/wiki/GSoC_2017, but it always more interesting to 
solve your own problems, issues and questions, which are the product of you 
experience with software. That's why I dare to propose my own project.

P.P.S. A few words about me: I'm a PhD stident in Theoretical physics from 
Moscow, Russia, and highly involved in software development since 2010. I guess 
that I have good skills in Python, Ruby, JavaScript, MATLAB, C, Fortran 
development and basic understanding of algorithms design and analysis.


Best regards,

Alexey

Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-03-23 Thread Magnus Hagander
On Tue, Mar 21, 2017 at 8:34 PM, Robert Haas  wrote:

> On Sun, Mar 19, 2017 at 12:01 PM, Magnus Hagander 
> wrote:
> > I think maybe we should output a message when the slot is created, at
> least
> > in verbose mode, to make sure people realize that happened. Does that
> seem
> > reasonable?
>
> Slots are great until you leave one lying around by accident.  I'm
> afraid that no matter what we do, we're going to start getting
> complaints from people who mess that up.  For example, somebody
> creates a replica using the new super-easy method, and then blows it
> away without dropping the slot from the master, and then days or weeks
> later pg_wal fills up and takes the server down.  The user says, oh,
> these old write-ahead log files should have gotten removed, and
> removes them all.  Oops.


> So I tend to think that there should always be some explicit user
> action to cause the creation of a slot, like --create-slot-if-needed
> or --create-slot=name.  That still won't prevent careless use of that
> option but it's less dangerous than assuming that a user who refers to
> a nonexistent slot intended to create it when, perhaps, they just
> typo'd it.
>

Well, the explicit user action would be "--slot". But sure, I can
definitely live with a --create-if-not-exists. The important thing, to me,
is that you can run it as a single command, which makes it a lot easier to
work with. (And not like we currently have for pg_receivewal which requires
a separate command to create the slot)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Monitoring roles patch

2017-03-23 Thread Stephen Frost
Dave,

* Dave Page (dp...@pgadmin.org) wrote:
> On Wed, Mar 22, 2017 at 3:46 PM, Dave Page  wrote:
> > On Wed, Mar 22, 2017 at 1:15 PM, Stephen Frost  wrote:
> >> I did specifically ask for explicit roles to be made to enable such
> >> capability and that the pg_monitor role be GRANT'd those roles instead
> >> of hacking the pg_monitor OID into those checks, but it seems like
> >> that's not been done yet.
> >
> > Yeah, sorry - I missed that for pg_stat_activity. I'll update the patch.
> 
> Updated patch attached. This changes pg_read_all_gucs to
> pg_read_all_settings, and adds pg_read_all_stats which it grants to
> pg_monitor. pg_read_all_stats has full access to the pg_stat_ views
> (as pg_monitor did previously), and is used in the various contrib
> modules in place of pg_monitor.

Thanks!  I've started looking through it.

> diff --git a/contrib/pg_buffercache/Makefile b/contrib/pg_buffercache/Makefile
> index 497dbeb229..18f7a87452 100644
> --- a/contrib/pg_buffercache/Makefile
> +++ b/contrib/pg_buffercache/Makefile
> @@ -4,8 +4,9 @@ MODULE_big = pg_buffercache
>  OBJS = pg_buffercache_pages.o $(WIN32RES)
>  
>  EXTENSION = pg_buffercache
> -DATA = pg_buffercache--1.2.sql pg_buffercache--1.1--1.2.sql \
> - pg_buffercache--1.0--1.1.sql pg_buffercache--unpackaged--1.0.sql
> +DATA = pg_buffercache--1.2.sql pg_buffercache--1.2--1.3.sql \
> + pg_buffercache--1.1--1.2.sql pg_buffercache--1.0--1.1.sql \
> + pg_buffercache--unpackaged--1.0.sql
>  PGFILEDESC = "pg_buffercache - monitoring of shared buffer cache in 
> real-time"
>  
>  ifdef USE_PGXS

Did you forget to 'add' the new files..?  I don't see the
pg_buffercache--1.2--1.3.sql file.

> diff --git a/contrib/pg_freespacemap/Makefile 
> b/contrib/pg_freespacemap/Makefile
> index 7bc0e9555d..0a2f000ec6 100644
> --- a/contrib/pg_freespacemap/Makefile
> +++ b/contrib/pg_freespacemap/Makefile
> @@ -4,8 +4,8 @@ MODULE_big = pg_freespacemap
>  OBJS = pg_freespacemap.o $(WIN32RES)
>  
>  EXTENSION = pg_freespacemap
> -DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.0--1.1.sql \
> - pg_freespacemap--unpackaged--1.0.sql
> +DATA = pg_freespacemap--1.1.sql pg_freespacemap--1.1--1.2.sql \
> + pg_freespacemap--1.0--1.1.sql pg_freespacemap--unpackaged--1.0.sql
>  PGFILEDESC = "pg_freespacemap - monitoring of free space map"
>  
>  ifdef USE_PGXS

Same here.

> diff --git a/contrib/pg_stat_statements/Makefile 
> b/contrib/pg_stat_statements/Makefile
> index 298951a5f5..39b368b70e 100644
> --- a/contrib/pg_stat_statements/Makefile
> +++ b/contrib/pg_stat_statements/Makefile
> @@ -4,9 +4,10 @@ MODULE_big = pg_stat_statements
>  OBJS = pg_stat_statements.o $(WIN32RES)
>  
>  EXTENSION = pg_stat_statements
> -DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.3--1.4.sql \
> - pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \
> - pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
> +DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.4--1.5.sql \
> + pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
> + pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql \
> + pg_stat_statements--unpackaged--1.0.sql
>  PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
>  
>  LDFLAGS_SL += $(filter -lm, $(LIBS))

And here.

> diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
> b/contrib/pg_stat_statements/pg_stat_statements.c
> index 221ac98d4a..cec94d5896 100644
> --- a/contrib/pg_stat_statements/pg_stat_statements.c
> +++ b/contrib/pg_stat_statements/pg_stat_statements.c
> @@ -1386,7 +1387,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
>   MemoryContext per_query_ctx;
>   MemoryContext oldcontext;
>   Oid userid = GetUserId();
> - boolis_superuser = superuser();
> + boolis_superuser = false;
>   char   *qbuffer = NULL;
>   Sizeqbuffer_size = 0;
>   Sizeextent = 0;
> @@ -1394,6 +1395,9 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
>   HASH_SEQ_STATUS hash_seq;
>   pgssEntry  *entry;
>  
> + /* For the purposes of this module, we consider pg_read_all_stats 
> members to be superusers */
> + is_superuser = (superuser() || is_member_of_role(GetUserId(), 
> DEFAULT_ROLE_READ_ALL_STATS));

Whoa, let's *not* set an 'is_superuser' flag for a case where the
user is not, actually, a superuser.  This should be 'allowed_role' or
something along those lines, I'm thinking.

> diff --git a/contrib/pg_visibility/Makefile b/contrib/pg_visibility/Makefile
> index bc42944426..21d787ddf7 100644
> --- a/contrib/pg_visibility/Makefile
> +++ b/contrib/pg_visibility/Makefile
> @@ -4,7 +4,8 @@ MODULE_big = pg_visibility
>  OBJS = pg_visibility.o $(WIN32RES)
>  
>  EXTENSION = pg_visibility
> -DATA = pg_visibility--1.1.sql pg_visibility--1.0--1.1.sql
> +DATA = pg_visibility--1.1.sql pg_visibility--1.1--1.2.sql 

Re: [HACKERS] [GSoC] Push-based query executor discussion

2017-03-23 Thread sher-ars

Oleg Bartunov  writes:

I don't know exactly if the results of GSoC project should be 
committed,


Technically, they are not required:
https://developers.google.com/open-source/gsoc/faq


Are mentoring organizations required to use the code produced by
students?



No. While we hope that all the code that comes out of this program will
find a happy home, we don’t require organizations to use the student's'
code.



--
Arseny Sher



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-23 Thread Etsuro Fujita

On 2017/03/21 18:40, Etsuro Fujita wrote:

Ok, I'll update the patch.  One thing I'd like to revise in addition to
that is (1) add to JoinPathExtraData a flag member to indicate whether
to give the FDW a chance to consider a remote join, which will be set to
true if the joinrel's fdwroutine is not NULL and the fdwroutine's
GetForeignJoinPaths is not NULL, and (2) if the flag is true, save info
to create an alternative local join path, such as hashclauses and
mergeclauses proposed in the patch, into JoinPathExtraData in
add_paths_to_joinrel.  This would avoid useless overhead in saving such
info into JoinPathExtraData when we don't give the FDW that chance.


Done.  Attached is a new version of the patch.

Best regards,
Etsuro Fujita


epqpath-for-foreignjoin-6.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Pavel Stehule
Hi

2017-03-23 12:33 GMT+01:00 Alexey Kondratov :

> Hi pgsql-hackers,
>
> I'm planning to apply to GSOC'17 and my proposal consists currently of two
> parts:
>
> (1) Add errors handling to COPY as a minimum program
>
> Motivation: Using PG on the daily basis for years I found that there are
> some cases when you need to load (e.g. for a further analytics) a bunch of
> not well consistent records with rare type/column mismatches. Since PG
> throws exception on the first error, currently the only one solution is to
> preformat your data with any other tool and then load to PG. However,
> frequently it is easier to drop certain records instead of doing such
> preprocessing for every data source you have.
>
> I have done a small research and found the item in PG's TODO
> https://wiki.postgresql.org/wiki/Todo#COPY, previous attempt to push
> similar patch https://www.postgresql.org/message-id/flat/
> 603c8f070909141218i291bc983t501507ebc996a531%40mail.gmail.com#
> 603c8f070909141218i291bc983t501507ebc996a...@mail.gmail.com. There were
> no negative responses against this patch and it seams that it was just
> forgoten and have not been finalized.
>
> As an example of a general idea I can provide *read_csv* method of python
> package – *pandas* (http://pandas.pydata.org/pandas-docs/stable/generated/
> pandas.read_csv.html). It uses C parser which throws error on first
> columns mismatch. However, it has two flags *error_bad_lines* and
> *warn_bad_lines*, which being set to False helps to drop bad lines or
> even hide warn messages about them.
>
>
> (2) Parallel COPY execution as a maximum program
>
> I guess that there is nothing necessary to say about motivation, it just
> should be faster on multicore CPUs.
>
> There is also an record about parallel COPY in PG's wiki
> https://wiki.postgresql.org/wiki/Parallel_Query_Execution. There are some
> side extensions, e.g. https://github.com/ossc-db/pg_bulkload, but it
> always better to have well-performing core functionality out of the box.
>
>
> My main concerns here are:
>
> 1) Is there anyone out of PG comunity who will be interested in such
> project and can be a menthor?
> 2) These two points have a general idea – to simplify work with a large
> amount of data from a different sources, but mybe it would be better to
> focus on the single task?
>

I spent lot of time on implementation @1 - maybe I found somewhere a patch.
Both tasks has some common - you have to divide import to more batches.



> 3) Is it realistic to mostly finish both parts during the 3+ months of
> almost full-time work or I am too presumptuous?
>

It is possible, I am thinking - I am not sure about all possible details,
but basic implementation can be done in 3 months.


>
> I will be very appreciate to any comments and criticism.
>
>
> P.S. I know about very interesting ready projects from the PG's comunity
> https://wiki.postgresql.org/wiki/GSoC_2017, but it always more
> interesting to solve your own problems, issues and questions, which are the
> product of you experience with software. That's why I dare to propose my
> own project.
>
> P.P.S. A few words about me: I'm a PhD stident in Theoretical physics from
> Moscow, Russia, and highly involved in software development since 2010. I
> guess that I have good skills in Python, Ruby, JavaScript, MATLAB, C,
> Fortran development and basic understanding of algorithms design and
> analysis.
>
>
> Best regards,
>
> Alexey
>


Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.

2017-03-23 Thread Elvis Pranskevichus
On Thursday, March 23, 2017 4:50:20 AM EDT Magnus Hagander wrote:
> We should probably consider if there is some way we can implement
> these two things the same way. If we're inventing a new variable that
> gets pushed on each connection, perhaps we can use that one and avoid
> the SHOW command? Is the read-write thing really interesting in the
> aspect of the general case, or is it more about detectinv readonly
> standbys as well? Or to flip that, would sending the
> transaction_read_only parameter be enough for the usecase in this
> thread, without having to invent a new variable?

Hot standby differs from regular read-only transactions in a number of 
ways.  Most importantly, it prohibits LISTEN/NOTIFY.  Grep for 
PreventCommandDuringRecovery() if you're interested in the scope.

   Elvis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Pavel Stehule
>> 1) Is there anyone out of PG comunity who will be interested in such
>> project and can be a menthor?
>> 2) These two points have a general idea – to simplify work with a large
>> amount of data from a different sources, but mybe it would be better to
>> focus on the single task?
>>
>
> I spent lot of time on implementation @1 - maybe I found somewhere a
> patch. Both tasks has some common - you have to divide import to more
> batches.
>

Patch is in /dev/null :( - My implementation was based on subtransactions
for 1000 rows. When some checks fails, then I throw subtransaction and I
imported every row from block in own subtransaction. It was a prototype - I
didn't search some smarter implementation.

>
>
>
>> 3) Is it realistic to mostly finish both parts during the 3+ months of
>> almost full-time work or I am too presumptuous?
>>
>
> It is possible, I am thinking - I am not sure about all possible details,
> but basic implementation can be done in 3 months.
>

Some data, some check depends on order - it can be a problem in parallel
processing - you should to define corner cases.


>
>
>>
>> I will be very appreciate to any comments and criticism.
>>
>>
>> P.S. I know about very interesting ready projects from the PG's comunity
>> https://wiki.postgresql.org/wiki/GSoC_2017, but it always more
>> interesting to solve your own problems, issues and questions, which are the
>> product of you experience with software. That's why I dare to propose my
>> own project.
>>
>> P.P.S. A few words about me: I'm a PhD stident in Theoretical physics
>> from Moscow, Russia, and highly involved in software development since
>> 2010. I guess that I have good skills in Python, Ruby, JavaScript, MATLAB,
>> C, Fortran development and basic understanding of algorithms design and
>> analysis.
>>
>>
>> Best regards,
>>
>> Alexey
>>
>
>


Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Craig Ringer
On 23 March 2017 at 19:33, Alexey Kondratov  wrote:

> (1) Add errors handling to COPY as a minimum program

Huge +1 if you can do it in an efficient way.

I think the main barrier to doing so is that the naïve approach
creates a subtransaction for every row, which is pretty dire in
performance terms and burns transaction IDs very rapidly.

Most of our datatype I/O functions, etc, have no facility for being
invoked in a mode where they fail nicely and clean up after
themselves. We rely on unwinding the subtransaction's memory context
for error handling, for releasing any LWLocks that were taken, etc.
There's no try_timestamptz_in function or anything, just
timestamptz_in, and it ERROR's if it doesn't like its input. You
cannot safely PG_TRY / PG_CATCH such an exception and continue
processing to, say, write another row.

Currently we also don't have a way to differentiate between

* "this row is structurally invalid" (wrong number of columns, etc)
* "this row is structually valid but has fields we could not parse
into their data types"
* "this row looks structurally valid and has data types we could
parse, but does not satisfy a constraint on the destination table"

Nor do we have a way to write to any kind of failure-log table in the
database, since a simple approach relies on aborting subtransactions
to clean up failed inserts so it can't write anything for failed rows.
Not without starting a 2nd subxact to record the failure, anyway.

So, having said why it's hard, I don't really have much for you in
terms of suggestions for ways forward. User-defined data types,
user-defined constraints and triggers, etc mean anything involving
significant interface changes will be a hard sell, especially in
something pretty performance-sensitive like COPY.

I guess it'd be worth setting out your goals first. Do you want to
handle all the kinds of problems above? Malformed  rows, rows with
malformed field values, and rows that fail to satisfy a constraint? or
just some subset?



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw: support parameterized foreign joins

2017-03-23 Thread Etsuro Fujita

On 2017/03/21 18:38, Etsuro Fujita wrote:

On 2017/03/16 22:23, Arthur Zakirov wrote:

Can you rebase the patch? It is not applied now.



Ok, will do.  Thanks for the report!


Done.  Also, I added regression tests and revised code and comments a 
bit.  (As for create_foreignscan_path(), I haven't done anything about 
that yet.)  Please find attached a new version created on top of [1].


Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/79b98e30-4d38-7deb-f1fb-bc0bc589a958%40lab.ntt.co.jp


postgres-fdw-param-foreign-joins-2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Peter Eisentraut
On 3/21/17 21:38, Peter Eisentraut wrote:
> This patch is looking pretty good to me, modulo the failing pg_dump tests.
> 
> Attached is a fixup patch.  I have mainly updated some comments and
> variable naming for (my) clarity.  No functional changes.

Committed all that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma
Hi Amit,

>>> +
>>> + /* Check if it is an unused hash page. */
>>> + if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
>>> + return page;
>>>
>>>
>>> I don't see we need to have a separate check for unused page, why
>>> can't it be checked with other page types in below check.
>>>
>>> if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
>>> pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
>>
>> That is because UNUSED page is as good as an empty page except that
>> empty page doesn't have any pagetype. If we add condition for checking
>> UNUSED page in above if check it will never show unused page as an
>> unsed page rather it will always show it as an empty page.
>>
>
> Oh, okay, but my main objection was that we should not check hash page
> type (hasho_flag) without ensuring whether it is a hash page.  Can you
> try to adjust the above code so that this check can be moved after
> hasho_page_id check?

Yes, I got your point. I have done that but then i had to remove the
check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
only be true for one page in entire hash index table and can be
ignored. If you wish, I could mention about it in the documentation.

>
>> To avoid
>> this, at the start of verify_hash_page function itself if we recognise
>> page as UNUSED page we return immediately.
>>
>>>
>>> 2.
>>> + /* Check if it is an empty hash page. */
>>> + if (PageIsEmpty(page))
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INDEX_CORRUPTED),
>>> + errmsg("index table contains empty page")));
>>>
>>>
>>> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
>>> it better that the same error message can be given for both of them as
>>> from user perspective there is not much difference between both the
>>> messages?
>>
>> I think we should show separate message because they are two different
>> type of pages. In the sense like, one is initialised whereas other is
>> completely zero.
>>
>
> I understand your point, but not sure if it makes any difference to user.
>

okay, I have now anyways removed the check for PageIsEmpty. Please
refer to the attached '0002
allow_pageinspect_handle_UNUSED_hash_pages.patch'


Also, I have attached
'0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
handles your comment mentioned in [1].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BVE_TDRLWpyeOf%2B7%2B6if68kgPNwO4guKo060rm_t3O5w%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, &ovflbuf) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+	ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 	MarkBufferDirty(ovflbuf);
 
 	if (BufferIsValid(prevbuf))
-- 
1.8.3.1

diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..6ba8847 100644
--- a/contrib/pageinspect/hashfuncs.c
+

Re: [HACKERS] Monitoring roles patch

2017-03-23 Thread Dave Page
Hi

On Thu, Mar 23, 2017 at 12:23 PM, Stephen Frost  wrote:
>
>> diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
>> b/contrib/pg_stat_statements/pg_stat_statements.c
>> index 221ac98d4a..cec94d5896 100644
>> --- a/contrib/pg_stat_statements/pg_stat_statements.c
>> +++ b/contrib/pg_stat_statements/pg_stat_statements.c
>> @@ -1386,7 +1387,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
>>   MemoryContext per_query_ctx;
>>   MemoryContext oldcontext;
>>   Oid userid = GetUserId();
>> - boolis_superuser = superuser();
>> + boolis_superuser = false;
>>   char   *qbuffer = NULL;
>>   Sizeqbuffer_size = 0;
>>   Sizeextent = 0;
>> @@ -1394,6 +1395,9 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
>>   HASH_SEQ_STATUS hash_seq;
>>   pgssEntry  *entry;
>>
>> + /* For the purposes of this module, we consider pg_read_all_stats 
>> members to be superusers */
>> + is_superuser = (superuser() || is_member_of_role(GetUserId(), 
>> DEFAULT_ROLE_READ_ALL_STATS));
>
> Whoa, let's *not* set an 'is_superuser' flag for a case where the
> user is not, actually, a superuser.  This should be 'allowed_role' or
> something along those lines, I'm thinking.

Fixed.

>> diff --git a/contrib/pg_visibility/Makefile b/contrib/pg_visibility/Makefile
>> index bc42944426..21d787ddf7 100644
>> --- a/contrib/pg_visibility/Makefile
>> +++ b/contrib/pg_visibility/Makefile
>> @@ -4,7 +4,8 @@ MODULE_big = pg_visibility
>>  OBJS = pg_visibility.o $(WIN32RES)
>>
>>  EXTENSION = pg_visibility
>> -DATA = pg_visibility--1.1.sql pg_visibility--1.0--1.1.sql
>> +DATA = pg_visibility--1.1.sql pg_visibility--1.1--1.2.sql \
>> + pg_visibility--1.0--1.1.sql
>>  PGFILEDESC = "pg_visibility - page visibility information"
>>
>>  REGRESS = pg_visibility
>
> Missing file here also.

Yeah, I forgot to add this and the others. Sorry about - fixed now.

>> diff --git a/contrib/pgrowlocks/pgrowlocks.c 
>> b/contrib/pgrowlocks/pgrowlocks.c
>> index db9e0349a0..c9194d8c0d 100644
>> --- a/contrib/pgrowlocks/pgrowlocks.c
>> +++ b/contrib/pgrowlocks/pgrowlocks.c
>> @@ -28,6 +28,7 @@
>>  #include "access/relscan.h"
>>  #include "access/xact.h"
>>  #include "catalog/namespace.h"
>> +#include "catalog/pg_authid.h"
>>  #include "funcapi.h"
>>  #include "miscadmin.h"
>>  #include "storage/bufmgr.h"
>> @@ -98,9 +99,11 @@ pgrowlocks(PG_FUNCTION_ARGS)
>>   relrv = 
>> makeRangeVarFromNameList(textToQualifiedNameList(relname));
>>   rel = heap_openrv(relrv, AccessShareLock);
>>
>> - /* check permissions: must have SELECT on table */
>> - aclresult = pg_class_aclcheck(RelationGetRelid(rel), 
>> GetUserId(),
>> -   
>> ACL_SELECT);
>> + /* check permissions: must have SELECT on table or be in 
>> pg_read_all_stats */
>> + aclresult = (pg_class_aclcheck(RelationGetRelid(rel), 
>> GetUserId(),
>> +   
>> ACL_SELECT) ||
>> + is_member_of_role(GetUserId(), 
>> DEFAULT_ROLE_READ_ALL_STATS));
>> +
>>   if (aclresult != ACLCHECK_OK)
>>   aclcheck_error(aclresult, ACL_KIND_CLASS,
>>  
>> RelationGetRelationName(rel));
>
> Doesn't this go beyond just pg_stat_X, but actually allows running
> pgrowlocks against any relation?  That seems like it should be
> independent, though it actually fits the "global-read-metadata" role
> case that has been discussed previously as being what pg_visibility,
> pgstattuple, and other similar contrib modules need.  I don't have a
> great name for that role, but it seems different to me from
> 'pg_read_all_stats', though I don't hold that position very strongly as
> I can see an argument that these kind of are stats.

That was the way I was leaning, on the grounds that the distinction
between two separate roles might end up being confusing for users.
It's simple enough to change if that's the consencus, and we can come
up with a suitable name.

> I see you did change pgstattuple too, and perhaps these are the changes
> you intended for pg_visibility too?  In any case, if so, we need to make
> it clear in the docs that pg_read_all_stats grants access to more than
> just pg_stat_X.

The pg_read_all_stats role docs have:

Read all pg_stat_* views and use various statistics related
extensions, even those normally visible only to superusers.

And then I updated the docs for each contrib module to note where
pg_read_all_stats can use functionality.

>> diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
>> index df0435c3f0..5472ff76a7 100644
>> --- a/doc/src/sgml/catalogs.sgml
>> +++ b/doc/src/sgml/catalogs.sgml
>> @@ -10027,15 +10027,17 @@ SELECT * FROM pg_locks pl LEFT JOIN 
>> pg_prepared

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Amit Kapila
On Thu, Mar 23, 2017 at 3:44 PM, Pavan Deolasee
 wrote:
> On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila 
> wrote:
>>
>
>> 3.
>> + /*
>> + * HASH indexes compute a hash value of the key and store that in the
>> + * index. So
>> we must first obtain the hash of the value obtained from the
>> + * heap and then do a comparison
>> +
>>  */
>> + _hash_convert_tuple(indexRel, values, isnull, values2, isnull2);
>>
>> I think here, you need to handle the case where heap has a NULL value
>> as the hash index doesn't contain NULL values, otherwise, the code in
>> below function can return true which is not right.
>>
>
> I think we can simply conclude hashrecheck has failed the equality if the
> heap has NULL value because such a tuple should not have been reached via
> hash index unless a non-NULL hash key was later updated to a NULL key,
> right?
>

Right.

>>
>>
>> 6.
>> + *stats = index_bulk_delete(&ivinfo, *stats,
>> +lazy_indexvac_phase1, (void *) vacrelstats);
>> + ereport(elevel,
>> +(errmsg("scanned index \"%s\" to remove %d row version, found "
>> +"%0.f warm pointers, %0.f clear pointers, removed "
>> +"%0.f warm pointers, removed %0.f clear pointers",
>> +RelationGetRelationName(indrel),
>> + vacrelstats->num_dead_tuples,
>> + (*stats)->num_warm_pointers,
>> +(*stats)->num_clear_pointers,
>> +(*stats)->warm_pointers_removed,
>> + (*stats)->clear_pointers_removed)));
>> +
>> + (*stats)->num_warm_pointers = 0;
>> + (*stats)->num_clear_pointers = 0;
>> + (*stats)->warm_pointers_removed = 0;
>> + (*stats)->clear_pointers_removed = 0;
>> + (*stats)->pointers_cleared = 0;
>> +
>> + *stats =index_bulk_delete(&ivinfo, *stats,
>> + lazy_indexvac_phase2, (void *)vacrelstats);
>>
>> To convert WARM chains, we need to do two index passes for all the
>> indexes.  I think it can substantially increase the random I/O. I
>> think this can help us in doing more WARM updates, but I don't see how
>> the downside of that (increased random I/O) will be acceptable for all
>> kind of cases.
>>
>
> Yes, this is a very fair point. The way I proposed to address this upthread
> is by introducing a set of threshold/scale GUCs specific to WARM. So users
> can control when to invoke WARM cleanup. Only if the WARM cleanup is
> required, we do 2 index scans. Otherwise vacuum will work the way it works
> today without any additional overhead.
>

I am not sure on what basis user can set such parameters, it will be
quite difficult to tune those parameters.  I think the point is
whatever threshold we keep, once that is crossed, it will perform two
scans for all the indexes.  IIUC, this conversion of WARM chains is
required so that future updates can be WARM or is there any other
reason?  I see this as a big penalty for future updates.

> We already have some intelligence to skip the second index scan if we did
> not find any WARM candidate chains during the first heap scan. This should
> take care of majority of the users who never update their indexed columns.
> For others, we need either a knob or some built-in way to deduce whether to
> do WARM cleanup or not.
>
> Does that seem worthwhile?
>

Is there any consensus on your proposal, because I feel this needs
somewhat broader discussion, me and you can't take a call on this
point.  I request others also to share their opinion on this point.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Stas Kelvich

> On 23 Mar 2017, at 15:53, Craig Ringer  wrote:
> 
> On 23 March 2017 at 19:33, Alexey Kondratov  
> wrote:
> 
>> (1) Add errors handling to COPY as a minimum program
> 
> Huge +1 if you can do it in an efficient way.
> 
> I think the main barrier to doing so is that the naïve approach
> creates a subtransaction for every row, which is pretty dire in
> performance terms and burns transaction IDs very rapidly.
> 
> Most of our datatype I/O functions, etc, have no facility for being
> invoked in a mode where they fail nicely and clean up after
> themselves. We rely on unwinding the subtransaction's memory context
> for error handling, for releasing any LWLocks that were taken, etc.
> There's no try_timestamptz_in function or anything, just
> timestamptz_in, and it ERROR's if it doesn't like its input. You
> cannot safely PG_TRY / PG_CATCH such an exception and continue
> processing to, say, write another row.
> 
> Currently we also don't have a way to differentiate between
> 
> * "this row is structurally invalid" (wrong number of columns, etc)
> * "this row is structually valid but has fields we could not parse
> into their data types"
> * "this row looks structurally valid and has data types we could
> parse, but does not satisfy a constraint on the destination table"
> 
> Nor do we have a way to write to any kind of failure-log table in the
> database, since a simple approach relies on aborting subtransactions
> to clean up failed inserts so it can't write anything for failed rows.
> Not without starting a 2nd subxact to record the failure, anyway.

If we are optimising COPY for case with small amount of bad rows
than 2nd subtransaction seems as not a bad idea. We can try to
apply batch in subtx, if it fails on some row N then insert rows [1, N)
in next subtx, report an error, commit subtx. Row N+1 can be treated
as beginning of next batch.


But if there will be some problems with handling everything with
subtransaction and since parallelism is anyway mentioned, what about
starting bgworker that will perform data insertion and will be controlled
by backend?

For example backend can do following:

* Start bgworker (or even parallel worker) 
* Get chunk of rows out of the file and try to apply them in batch
as subtransaction in bgworker.
* If it fails than we can open transaction in backend itself and
raise notice / move failed rows to special errors table.

> So, having said why it's hard, I don't really have much for you in
> terms of suggestions for ways forward. User-defined data types,
> user-defined constraints and triggers, etc mean anything involving
> significant interface changes will be a hard sell, especially in
> something pretty performance-sensitive like COPY.
> 
> I guess it'd be worth setting out your goals first. Do you want to
> handle all the kinds of problems above? Malformed  rows, rows with
> malformed field values, and rows that fail to satisfy a constraint? or
> just some subset?
> 
> 
> 
> -- 
> Craig Ringer   http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-23 Thread Maksim Milyutin

Hi!

I have noticed that there is scheduled unlinking of nonexistent physical 
storage under partitioned table when we execute DROP TABLE statement on 
this partitioned table. Though this action doesn't generate any error 
under typical behavior of postgres because the error of storage's lack 
is caught through if-statement [1] I think it is not safe.


My patch fixes this issue.

1. src/backend/storage/smgr/md.c:1385

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3999e6e..36917c8 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1823,7 +1823,8 @@ heap_drop_with_catalog(Oid relid)
 	 */
 	if (rel->rd_rel->relkind != RELKIND_VIEW &&
 		rel->rd_rel->relkind != RELKIND_COMPOSITE_TYPE &&
-		rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
+		rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE &&
+		rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 	{
 		RelationDropStorage(rel);
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Schedule and Release Management Team for PG10

2017-03-23 Thread Tom Lane
The Postgres release team has decided that last year's appointment
of a Release Management Team worked pretty well, so we're going to
run the version-10 release cycle the same way.  The members of the
RMT for this year will be Peter Eisentraut, Robert Haas, and Noah
Misch.

As previously agreed at the PGCon 2016 developers meeting, we'll
institute v10 feature freeze at the completion of the current
commitfest (end of this month).  The RMT will target making the
10beta1 release in mid-May and 10.0 final release in September.

Attached for reference is the RMT charter as agreed to by the
pgsql-release list.

regards, tom lane


== Proposed Release Management Team Charter

This charter takes effect on 2017-03-31 and expires three weeks after the
PostgreSQL 10 final release.

Duties:
- Meet the deadlines pgsql-release@ sets for beta1 and final release.
- When the open items list has an unowned item, promptly inform the associated
  committer that he owns that item.
- Approve the creation of REL10_STABLE, the welcoming of PostgreSQL 11
  submissions, and the resumption of CommitFests.
- Nag item owners to ensure forward progress.

Powers, exercisable by majority vote of the RMT, even in the face of no
consensus or contrary consensus:
- Approve the commit of a patch for the purpose of resolving an open item,
  including reversion commits.
- Add or remove open items.
- Enact a feature freeze.

Before using one of those powers, the Release Management Team (RMT) should
make more than zero effort to let normal community processes arrive at a
sufficient answer.  Given the aggressive schedule, the RMT may nonetheless use
these powers quickly and often.

pgsql-release@ appoints the RMT and has the power to dissolve it.

Members are Peter Eisentraut, Robert Haas, and Noah Misch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-23 Thread Amit Langote
Hi,

On Thu, Mar 23, 2017 at 11:27 PM, Maksim Milyutin
 wrote:
> Hi!
>
> I have noticed that there is scheduled unlinking of nonexistent physical
> storage under partitioned table when we execute DROP TABLE statement on this
> partitioned table. Though this action doesn't generate any error under
> typical behavior of postgres because the error of storage's lack is caught
> through if-statement [1] I think it is not safe.
>
> My patch fixes this issue.
>
> 1. src/backend/storage/smgr/md.c:1385

Good catch, will incorporate that in the main patch.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Jesper Pedersen

Hi,

On 03/22/2017 09:32 AM, Ashutosh Sharma wrote:

Done. Please refer to the attached v2 version of patch.



Thanks.


1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
patch rewrites the hash index scan module to work in page-at-a-time
mode. It basically introduces two new functions-- _hash_readpage() and
_hash_saveitem(). The former is used to load all the qualifying tuples
from a target bucket or overflow page into an items array. The latter
one is used by _hash_readpage to save all the qualifying tuples found
in a page into an items array. Apart from that, this patch bascially
cleans _hash_first(), _hash_next and hashgettuple().



0001v2:

In hashgettuple() you can remove the 'currItem' and 'offnum' from the 
'else' part, and do the assignment inside


  if (so->numKilled < MaxIndexTuplesPerPage)

instead.


No new comments for 0002 and 0003.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] minor spelling error fix (btis -> bits)

2017-03-23 Thread Jon Nelson
Attached please find a minor spelling error fix, changing "btis" to "bits".




-- 
Jon
From f590a6dce6677bc5b8a409d40fd651ecb69b27bb Mon Sep 17 00:00:00 2001
From: Jon Nelson 
Date: Sun, 23 Mar 2014 08:23:48 -0500
Subject: [PATCH] - fix minor spelling error

---
 src/backend/utils/adt/network.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/network.c b/src/backend/utils/adt/network.c
index 1f8469a..b126494 100644
--- a/src/backend/utils/adt/network.c
+++ b/src/backend/utils/adt/network.c
@@ -1136,7 +1136,7 @@ network_scan_first(Datum in)
 
 /*
  * return "last" IP on a given network. It's the broadcast address,
- * however, masklen has to be set to its max btis, since
+ * however, masklen has to be set to its max bits, since
  * 192.168.0.255/24 is considered less than 192.168.0.255/32
  *
  * inet_set_masklen() hacked to max out the masklength to 128 for IPv6
-- 
2.10.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-23 Thread Kevin Grittner
On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu
 wrote:

> I've finished a draft proposal for "Eliminate O(N^2) scaling from
> rw-conflict tracking in serializable transactions".  You can find
> it from GSOC website or by the link below.
> https://docs.google.com/document/d/17TAs3EJIokwPU7UTUmnlVY3ElB-VHViyX1zkQJmrD1A/edit?usp=sharing
>
> I was wondering if you have time to review the proposal and give me some 
> comments?

Will take a look and give you an initial review in a day or two.

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor spelling error fix (btis -> bits)

2017-03-23 Thread Simon Riggs
On 23 March 2017 at 15:05, Jon Nelson  wrote:
> Attached please find a minor spelling error fix, changing "btis" to "bits".

Applied, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add missing support for new node fields

2017-03-23 Thread Peter Eisentraut
Another idea here:

Instead of making COPY_PARSE_PLAN_TREES a compile-time option, make it a
run-time option, and make that somehow selectable while running the test
suite.  It would be much easier to run the test suite with an option on
the command line, instead of having to manually edit a header file and
recompiling, then switching it back, etc.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] perlcritic

2017-03-23 Thread Daniel Gustafsson
> On 21 Mar 2017, at 19:20, David Steele  wrote:
> 
> On 3/6/17 12:02 PM, Dagfinn Ilmari Mannsåker wrote:
>> ilm...@ilmari.org (Dagfinn Ilmari Mannsåker) writes:
>> 
>>> Hi Peter,
>>> 
>>> Peter Eisentraut  writes:
>>> 
 I posted this about 18 months ago but then ran out of steam. [ ] Here
 is an updated patch.  The testing instructions below still apply.
 Especially welcome would be ideas on how to address some of the places
 I have marked with ## no critic.
>>> 
>>> Attached is a patch on top of yours that addresses all the ## no critic
>>> annotations except RequireFilenameMatchesPackage, which can't be fixed
>>> without more drastic reworking of the plperl build process.
>>> 
>>> Tested on perl 5.8.1 and 5.24.0 by configuring with --with-perl and
>>> --enable-tap-tests followed by make check-world, and running pgindent
>>> --build.
>> 
>> Attached is an updated version of the patch, in which
>> src/tools/msvc/gendef.pl actually compiles.  If someone on Windows could
>> test it, that would be great.
> 
> You are signed up to review this patch.  Do you know when you'll have a 
> chance to do that?

Below is a review of the two patches attached to the commitfest entry:

The v2-0001-Clean-up-Perl-code-according-to-perlcritic-severi.patch didn’t
apply cleanly due to later commits, but the fixes to get there were trivial.
The followup 0001-Fix-most-perlcritic-exceptions-v2.patch applied clean on top
of that.  The attached patch contains these two patches, rebased on top of
current master, with the below small nitpicks.

Since the original submission, most things have been addressed already, leaving
this patch with mostly changing to three-close open.  The no critic exceptions
left are quite harmless: two cases of RequireFilenameMatchesPackage and one
ProhibitStringyEval.  All three could be fixed at the expense of complicating
things without much (or any) benefit (as mentioned up-thread by Dagfinn Ilmari
Mannsåker), so I’m fine with leaving them in.

A few small nitpicks on the patch:

## In src/interfaces/libpq/test/regress.pl:

-open(REGRESS_IN, "<", $regress_in)
+open(my $regress_in_fh, "<", $regress_in)

Reading and closing this file was still using REGRESS_IN, fixed in the attached
updated patch.

## In src/test/locale/sort-test.pl:

-open(INFILE, "<$ARGV[0]");
-chop(my (@words) = );
-close(INFILE);
+chop(my (@words) = <>);

While this hunk does provide the same functionality due to the magic handling
of ARGV in <>, it also carries the side “benefit” that arbitrary applications
can be executed by using a | to read the output from a program:

  $ src/test/locale/sort-test.pl "rm README |"

  $ cat README
  cat: README: No such file or directory

A silly example for sure, but since the intent of the patch is to apply best
practices and safe practices, I’d argue that a normal three-clause open is
safer here.  The risk for misuse is very low, but it also makes the code less
magic and more readable IMO.

Reading the thread, most of the discussion was around the use of three-clause
open instead of the older two-clause.  Without diving into the arguments, there
are a few places where we should use three-clause open, so simply applying it
everywhere rather than on a case by case basis seems reasonable to me.
Consistency across the codebase helps when reading the code.

There is no measurable performance impact on the changes, and no user visible
changes to functionality.  With this applied, make check-world passes and
perlcritic returns a clean run (except on the autogenerated Gen_dummy_probes.pl
which is kept out of this work).  The intent of the patch is to make the code
consistent and readable, and it achieves that.  I see no reason to not go ahead
with these changes, if only to keep the codebase consistent with with modern
Perl code is expected to look like.

Given the nitpick nature of the comments, bumping status to ready for
committer.

cheers ./daniel



perlcritic-with-review-comments.patch
Description: Binary data


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: GSOC 2017 project ideas

2017-03-23 Thread Charles Cui
Given the low activity in pgsql-students, forward here to get more
feedbacks.

-- Forwarded message --
From: Charles Cui 
Date: 2017-03-23 8:58 GMT-07:00
Subject: GSOC 2017 project ideas
To: pgsql-stude...@postgresql.org


Hi guys,
This is Charles Cui from Tsinghua University, China. I am pretty
interested in
the ideas listed in GSOC 2017 page on Postgresql, especially the idea
"Eliminate O(N^2) scaling from rw-conflict tracking in serializable
transactions". For my PhD program in
Tsinghua University, I focus on operating system scalability and
performance on multicore processors. One of my jobs is to design the
benchmark suite to stress kinds of
operating system interfaces from user level and analyze performance. After
that, I will propose solutions to solve the problem. For example, I have
published papers in
improving synchronization primitives in Linux kernel and pretty familiar
with kinds of locks and lock-free techniques. I think this project is
similar to my previous work and I believe that I have the confidence and
ability to do this well. Besides, I speak fluent English and I am full
available during the whole summer. I am wondering whether I can join this
project and work with you guys? In case that this project is allocated to
other people, I am also open to pick other projects in Postgresql. Let me
know if you have questions or feedbacks.


Thanks, Charles.


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-23 Thread Pierre Ducroquet
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi

This is my first review (Magnus said in his presentation in PGDay Paris that 
volunteers should just come and help, so here I am), so please notify me for 
any mistake I do when using the review tools...

The feature seems to work as expected, but I don't claim to be a markdown and 
rst expert.
Some minor issues with the code itself :
- some indentation issues (documentation and code itself with mix between space 
based and tab based indentation) and a few trailing spaces in code
- typographic issues in the documentation : 
  - "The html, asciidoc, latex, latex-longtable, troff-ms, and markdown and rst 
formats" ==> duplicated and
  - "Sets the output format to one of unaligned, aligned, wrapped, html, 
asciidoc, latex (uses tabular), latex-longtable, rst, markdown, or troff-ms." 
==> extra comma at the end of the list
- the comment " dont add line after last row, because line is added after every 
row" is misleading, it should warn that it's only for rst
- there is a block of commented out code left
- in the print_aligned_vertical function, there is a mix between 
"cont->opt->format == PRINT_RST" and "format == &pg_rst" and I don't see any 
obvious reason for that
- the documentation doesn't mention (but ok, it's kind of obvious) that the 
linestyle option will not work with rst and markdown

Thanks !

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Order-preserving function transforms and EquivalenceClass

2017-03-23 Thread Mat Arye
Hi,

I am on a team developing an open-source extension for time-series data
storage in PostgreSQL (https://github.com/timescaledb/timescaledb).

We are trying to extend/hook into the planner so that it understands that
date_trunc('minute', time) has the same ordering as time (or rather that a
sort ordering on the latter is always a valid sort ordering on the former).
But this question really applies to any order-preserving transform such as
(time+1) vs (time).

Let's assume we can detect such cases. How do we tell the planner about it.

I see two ways of doing this:

(1) Having an EquivalenceClass with two members - the "time" and
"date_trunc" member. Then the pathkey for the sort would reference this
member and understand the equivalence. I think this is a pretty clean way
of doing things. But this equivalence between "time" and "date_trunc" only
applies to sorts. It should not apply to joins (for which EquivalenceClass
is also used) because these values are not actually equivalent. They are
only equivalent for sorting purposes. I know an EquivalenceClass has
a ec_has_volatile field which marks the EquivalenceClass as only for sorts
and not for joins. But is it an incorrect hack to set this for cases where
the EquivalenceMembers are not volatile? Is there some other way as marking
an EquivalenceClass as only for sorts that I am missing? Another wrinkle is
that while a date_trunc sort can be safely represented by a time sort the
reverse isn't true. Has there been any discussion on supporting such cases
in EquivalenceClasses? I'd be happy to submit a patch to the core if people
think this is worthwhile.

(2) I can create new EquivalenceClass objects and pathkeys and use planner
hooks to substitute the appropriate pathkeys for doing things like finding
indexes etc. I have a prototype where this works, but I just think approach
1 is much cleaner.

Any thoughts would be greatly appreciated. I am pretty new to the planner
codebase and just want to avoid any obvious pitfalls.

Thanks,
Matvey Arye


Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-23 Thread Teodor Sigaev

Thank you, pushed

Andrew Borodin wrote:

2017-03-22 22:48 GMT+05:00 Teodor Sigaev :

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')


Yes, I think this naming is good. It's clear what's in common in these
flags and what's different.


And if the whole posting tree is empty,then we could mark root page as leaf
and remove all other pages in tree without any locking. Although, it could
be a task for separate patch.


From the performance point of view, this is a very good idea. Both,
performance of VACUUM and performance of Scans. But doing so we risk
to leave some garbage pages in case of a crash. And I do not see how
to avoid these without unlinking pages one by one. I agree, that
leaving this trick for a separate patch is quite reasonable.

Best regards, Andrey Borodin.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backend crash on non-exclusive backup cancel

2017-03-23 Thread Teodor Sigaev

Hi!

I believe patch looks good and it's ready to commit. As I understand, it fixes 
bug introduced by

commit 7117685461af50f50c03f43e6a622284c8d54694
Date:   Tue Apr 5 20:03:49 2016 +0200

Implement backup API functions for non-exclusive backups

And, suppose, it should be backpatched to 9.6?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] parallel "return query" is no good

2017-03-23 Thread Robert Haas
Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
statement in a PL/pgsql block.  As it turns out, the analysis that led
to this decision was totally wrong-headed, because the plan will
always be executed using SPI_cursor_fetch(portal, true, 50), which
will cause ExecutePlan() to get invoked with a count of 50, which will
cause it to run the parallel plan serially, without workers.
Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
can do is cause us to pick a parallel plan that's slow when executed
serially instead of the best serial plan.

The attached patch fixes it.  I plan to commit this and back-patch it
to 9.6, barring objections or better ideas.

I previously remarked on this in
http://postgr.es/m/ca+tgmobxehvhbjtwdupzm9bvslitj-kshxqj2um5gpdze9f...@mail.gmail.com
but I wasn't quite so clear what the whole picture was in that email
as I am now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


no-parallel-return-query.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Robert Haas
On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:
> Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
> plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
> statement in a PL/pgsql block.  As it turns out, the analysis that led
> to this decision was totally wrong-headed, because the plan will
> always be executed using SPI_cursor_fetch(portal, true, 50), which
> will cause ExecutePlan() to get invoked with a count of 50, which will
> cause it to run the parallel plan serially, without workers.
> Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
> can do is cause us to pick a parallel plan that's slow when executed
> serially instead of the best serial plan.
>
> The attached patch fixes it.  I plan to commit this and back-patch it
> to 9.6, barring objections or better ideas.

I guess the downside of back-patching this is that it could cause a
plan change for somebody which ends up being worse.  On the whole,
serial execution of queries intended to be run in parallel isn't
likely to work out well, but it's always possible somebody has a cases
where it happens to be winning, and this could break it.  So maybe I
should do this only in master?  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:
> > Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
> > plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
> > statement in a PL/pgsql block.  As it turns out, the analysis that led
> > to this decision was totally wrong-headed, because the plan will
> > always be executed using SPI_cursor_fetch(portal, true, 50), which
> > will cause ExecutePlan() to get invoked with a count of 50, which will
> > cause it to run the parallel plan serially, without workers.
> > Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
> > can do is cause us to pick a parallel plan that's slow when executed
> > serially instead of the best serial plan.
> >
> > The attached patch fixes it.  I plan to commit this and back-patch it
> > to 9.6, barring objections or better ideas.
> 
> I guess the downside of back-patching this is that it could cause a
> plan change for somebody which ends up being worse.  On the whole,
> serial execution of queries intended to be run in parallel isn't
> likely to work out well, but it's always possible somebody has a cases
> where it happens to be winning, and this could break it.  So maybe I
> should do this only in master?  Thoughts?

For my 2c, I'd back-patch it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Andres Freund
On 2017-03-23 13:03:19 -0400, Robert Haas wrote:
> On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:
> > Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
> > plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
> > statement in a PL/pgsql block.  As it turns out, the analysis that led
> > to this decision was totally wrong-headed, because the plan will
> > always be executed using SPI_cursor_fetch(portal, true, 50), which
> > will cause ExecutePlan() to get invoked with a count of 50, which will
> > cause it to run the parallel plan serially, without workers.
> > Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
> > can do is cause us to pick a parallel plan that's slow when executed
> > serially instead of the best serial plan.
> >
> > The attached patch fixes it.  I plan to commit this and back-patch it
> > to 9.6, barring objections or better ideas.
> 
> I guess the downside of back-patching this is that it could cause a
> plan change for somebody which ends up being worse.  On the whole,
> serial execution of queries intended to be run in parallel isn't
> likely to work out well, but it's always possible somebody has a cases
> where it happens to be winning, and this could break it.  So maybe I
> should do this only in master?  Thoughts?

I'm +0.5 for backpatching.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane
Looking at some of the coding choices you made here, I see that you're
making a hard assumption that called functions never scribble on their
fcinfo->arg/argnull arrays.  I do not believe that we've ever had such
an assumption before.  Are we comfortable with that?  If so, we'd
better document it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Joshua D. Drake

On 03/23/2017 10:03 AM, Robert Haas wrote:

On Thu, Mar 23, 2017 at 12:50 PM, Robert Haas  wrote:

Commit 7aea8e4f2daa4b39ca9d1309a0c4aadb0f7ed81b allowed a parallel
plan to be generated when for a RETURN QUERY or RETURN QUERY EXECUTE
statement in a PL/pgsql block.  As it turns out, the analysis that led
to this decision was totally wrong-headed, because the plan will
always be executed using SPI_cursor_fetch(portal, true, 50), which
will cause ExecutePlan() to get invoked with a count of 50, which will
cause it to run the parallel plan serially, without workers.
Therefore, passing CURSOR_OPT_PARALLEL_OK is a bad idea here; all it
can do is cause us to pick a parallel plan that's slow when executed
serially instead of the best serial plan.

The attached patch fixes it.  I plan to commit this and back-patch it
to 9.6, barring objections or better ideas.


I guess the downside of back-patching this is that it could cause a
plan change for somebody which ends up being worse.  On the whole,
serial execution of queries intended to be run in parallel isn't
likely to work out well, but it's always possible somebody has a cases
where it happens to be winning, and this could break it.  So maybe I
should do this only in master?  Thoughts?


I think the greater good of a fix applies here. +1 to 9.6.






--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Andres Freund
Hi,

On 2017-03-23 13:14:59 -0400, Tom Lane wrote:
> Looking at some of the coding choices you made here, I see that you're
> making a hard assumption that called functions never scribble on their
> fcinfo->arg/argnull arrays.  I do not believe that we've ever had such
> an assumption before.

I think we did that before, e.g. ExecEvalScalarArrayOp(). Think there's
others too.


> Are we comfortable with that?  If so, we'd better document it.

I think it's ok, but we indeed should document it. I recall a note
somewhere... Can't find it anywhere however, might have misremembered a
note about pass-by-ref arguments.  fmgr/README? A note in
FunctionCallInfoData's definition?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel "return query" is no good

2017-03-23 Thread Alvaro Herrera
Robert Haas wrote:

> I guess the downside of back-patching this is that it could cause a
> plan change for somebody which ends up being worse.  On the whole,
> serial execution of queries intended to be run in parallel isn't
> likely to work out well, but it's always possible somebody has a cases
> where it happens to be winning, and this could break it.  So maybe I
> should do this only in master?  Thoughts?

I think that the chances of someone depending on a parallel plan running
serially by accident which is better than the non-parallel plan, are
pretty slim.

+1 for back-patching.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-23 Thread Ashutosh Sharma
Hi,

On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas  wrote:
> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila  wrote:
>>> Maybe we should call them "unused pages".
>>
>> +1.  If we consider some more names for that column then probably one
>> alternative could be "empty pages".
>
> Yeah, but I think "unused" might be better.  Because a page could be
> in use (as an overflow page or primary bucket page) and still be
> empty.
>

Based on the earlier discussions, I have prepared a patch that would
allow pgstathashindex() to show the number of unused pages in hash
index. Please find the attached patch for the same. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From e3b59fa85f16d6d15be5360e85b7faf63e8683a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:02:26 +0530
Subject: [PATCH] Allow pgstathashindex to show unused pages v1

---
 contrib/pgstattuple/expected/pgstattuple.out  | 12 ++--
 contrib/pgstattuple/pgstatindex.c | 19 ---
 contrib/pgstattuple/pgstattuple--1.4--1.5.sql |  1 +
 doc/src/sgml/pgstattuple.sgml | 17 -
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/contrib/pgstattuple/expected/pgstattuple.out b/contrib/pgstattuple/expected/pgstattuple.out
index 2c3515b..1f1ff46 100644
--- a/contrib/pgstattuple/expected/pgstattuple.out
+++ b/contrib/pgstattuple/expected/pgstattuple.out
@@ -132,9 +132,9 @@ select * from pgstatginindex('test_ginidx');
 
 create index test_hashidx on test using hash (b);
 select * from pgstathashindex('test_hashidx');
- version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | live_items | dead_items | free_percent 
--+--++--++++--
-   2 |4 |  0 |1 |  0 |  0 |  0 |  100
+ version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | unused_pages | live_items | dead_items | free_percent 
+-+--++--++--+++--
+   2 |4 |  0 |1 |  0 |0 |  0 |  0 |  100
 (1 row)
 
 -- these should error with the wrong type
@@ -233,9 +233,9 @@ select pgstatindex('test_partition_idx');
 (1 row)
 
 select pgstathashindex('test_partition_hash_idx');
-   pgstathashindex   
--
- (2,8,0,1,0,0,0,100)
+pgstathashindex
+---
+ (2,8,0,1,0,0,0,0,100)
 (1 row)
 
 drop table test_partitioned;
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index d448e9e..6fc41d6 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -120,6 +120,7 @@ typedef struct HashIndexStat
 	BlockNumber overflow_pages;
 	BlockNumber bitmap_pages;
 	BlockNumber zero_pages;
+	BlockNumber unused_pages;
 
 	int64	live_items;
 	int64	dead_items;
@@ -588,8 +589,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	BufferAccessStrategy bstrategy;
 	HeapTuple	tuple;
 	TupleDesc	tupleDesc;
-	Datum		values[8];
-	bool		nulls[8];
+	Datum		values[9];
+	bool		nulls[9];
 	Buffer		metabuf;
 	HashMetaPage	metap;
 	float8		free_percent;
@@ -667,6 +668,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 			}
 			else if (opaque->hasho_flag & LH_BITMAP_PAGE)
 stats.bitmap_pages++;
+			else if (PageIsEmpty(page))
+stats.unused_pages++;
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -680,8 +683,9 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	/* Done accessing the index */
 	index_close(rel, AccessShareLock);
 
-	/* Count zero pages as free space. */
-	stats.free_space += stats.zero_pages * stats.space_per_page;
+	/* Count zero and unused pages as free space. */
+	stats.free_space += (stats.zero_pages + stats.unused_pages) *
+		 stats.space_per_page;
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
@@ -711,9 +715,10 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	values[2] = Int64GetDatum((int64) stats.overflow_pages);
 	values[3] = Int64GetDatum((int64) stats.bitmap_pages);
 	values[4] = Int64GetDatum((int64) stats.zero_pages);
-	values[5] = Int64GetDatum(stats.live_items);
-	values[6] = Int64GetDatum(stats.dead_items);
-	values[7] = Float8GetDatum(free_percent);
+	values[5] = Int64GetDatum((int64) stats.unused_pages);
+	values[6] = Int64GetDatum(stats.live_items);
+	values[7] = Int64GetDatum(stats.dead_items);
+	values[8] = Float8GetDatum(free_percent);
 	tuple = heap_form_tuple(tupleDesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
diff --git a/contrib/pgstattuple/pgstattuple--1.4--1.5.sql b/contrib/pgstattuple/pgstattuple--1.4--1.5.sql
index 84e112e..eda719a 100644
--- a/contrib/pgstattuple/pgstattuple--1.4--1.5.sql
+++ b/contrib/pgstattuple/pgstattuple--1.4--1.5.sql
@@ -118,6 +118,7 @@ CREATE OR 

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

2017-03-23 Thread Robert Haas
On Thu, Mar 23, 2017 at 12:41 AM, Rafia Sabih
 wrote:
> Agree, done.

OK, committed execute-once-v3.patch after some further study.  See
also 
https://www.postgresql.org/message-id/CA%2BTgmoZ_ZuH%2BauEeeWnmtorPsgc_SmP%2BXWbDsJ%2BcWvWBSjNwDQ%40mail.gmail.com
which resulted from that study.

Regarding pl_parallel_opt_support_v3.patch, the change to
init_execution_state looks good.  Whether or not it's safe to use
parallel query doesn't depend on whether the function is marked
volatile, so the current code is just silly (and, arguably,
readonly_func is misnamed).  The changes to spi.c also seem fine; both
of those functions execute the plan to completion and don't allow
cursor operations, so we're good.

The changes to the plpgsql code don't look so good to me.  The change
to exec_stmt_return_query fixes the same bug that I mentioned in the
email linked above, but only half of it -- it corrects the RETURN
QUERY EXECUTE case but not the RETURN QUERY case.  And it's clearly a
separate change; that part is a bug fix, not an enhancement.  Some of
the other changes depend on whether we're in a trigger, which seems
irrelevant to whether we can use parallelism. Even if the outer query
is doing writes, we can still use parallelism for queries inside the
trigger function if warranted.  It's probably a rare case to have
queries inside a trigger that are expensive enough to justify such
handling but I don't see the value of putting in special-case logic to
prevent it.

I suspect that code fails to achieve its goals anyway.  At the top of
exec_eval_expr(), you call exec_prepare_plan() and unconditionally
pass CURSOR_OPT_PARALLEL_OK, so when that function returns, expr->plan
might now be a parallel plan.  If we reach the call to
exec_run_select() further down in that function, and if we happen to
pass false, it's not going to matter, because exec_run_select() is
going to find the plan already initialized.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Ashutosh Sharma
On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
 wrote:
> Hi,
>
> On 03/22/2017 09:32 AM, Ashutosh Sharma wrote:
>>
>> Done. Please refer to the attached v2 version of patch.
>>
>
> Thanks.
>
 1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
 patch rewrites the hash index scan module to work in page-at-a-time
 mode. It basically introduces two new functions-- _hash_readpage() and
 _hash_saveitem(). The former is used to load all the qualifying tuples
 from a target bucket or overflow page into an items array. The latter
 one is used by _hash_readpage to save all the qualifying tuples found
 in a page into an items array. Apart from that, this patch bascially
 cleans _hash_first(), _hash_next and hashgettuple().

>
> 0001v2:
>
> In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
> part, and do the assignment inside
>
>   if (so->numKilled < MaxIndexTuplesPerPage)
>
> instead.
>

Done. Please have a look into the attached v3 patch.

>
> No new comments for 0002 and 0003.

okay. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 4e953c35da2274165b00d763500b83e0f3f9e2a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:36:05 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev3

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 121 +++--
 src/backend/access/hash/hashpage.c   |  14 +-
 src/backend/access/hash/hashsearch.c | 330 ++-
 src/backend/access/hash/hashutil.c   |  23 ++-
 src/include/access/hash.h|  44 +
 6 files changed, 385 insertions(+), 156 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..8c28fbd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-			if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
-break;
-		}
-		if (offnum > maxoffnum)
-			elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
-		ItemPointerSetOffsetNumber(current, offnum);
-
-		/*
 		 * Check to see

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Mithun Cy
Hi Pavan,
On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
 wrote:
> Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> RAM, attached SSD) and results are shown below. But I think it is important
> to get independent validation from your side too, just to ensure I am not
> making any mistake in measurement. I've attached naively put together
> scripts which I used to run the benchmark. If you find them useful, please
> adjust the paths and run on your machine.

I did a similar test appears. Your v19 looks fine to me, it does not
cause any regression, On the other hand, I also ran tests reducing
table fillfactor to 80 there I can see a small regression 2-3% in
average when updating col2 and on updating col9 again I do not see any
regression.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


WARM_test_02.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Jesper Pedersen

Hi,

On 03/23/2017 02:11 PM, Ashutosh Sharma wrote:

On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
 wrote:

0001v2:

In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
part, and do the assignment inside

  if (so->numKilled < MaxIndexTuplesPerPage)

instead.



Done. Please have a look into the attached v3 patch.



No new comments for 0002 and 0003.


okay. Thanks.



I'll keep the entry in 'Needs Review' if Alexander, or others, want to 
add their feedback.


(Best to post the entire patch series each time)

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> You define DiscreteKnapsack to take integer weights and double
 Mark> values, and perform the usual Dynamic Programming algorithm to
 Mark> solve.  But the only place you call this, you pass in NULL for
 Mark> the values, indicating that all the values are equal.  I'm
 Mark> confused why in this degenerate case you bother with the DP at
 Mark> all.  Can't you just load the knapsack from lightest to heaviest
 Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
 Mark> been a long day.  Sorry if I am missing something obvious.

That's actually a very good question.

(Though in passing I should point out that the runtime cost is going to
be negligible in all practical cases. Even an 8-way cube (256 grouping
sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
4096 grouping sets was only chosen because it was a nice round number
that was larger than comparable limits in other database products. Other
stuff in the grouping sets code has worse time bounds; the
bipartite-match code used to minimize the number of rollups is
potentially O(n^2.5) in the number of grouping sets.)

Thinking about it, it's not at all obvious what we should be optimizing
for here in the absence of accurate sort costs. Right now what matters
is saving as many sort steps as possible, since that's a saving likely
to be many orders of magnitude greater than the differences between two
sorts. The one heuristic that might be useful is to prefer the smaller
estimated size if other factors are equal, just on memory usage grounds,
but even that seems a bit tenuous.

I'm inclined to leave things as they are in the current patch, and maybe
revisit it during beta if we get any relevant feedback from people
actually using grouping sets?

 Mark> I'm guessing you intend to extend the code at some later date to
 Mark> have different values for items.  Is that right?

Well, as I mentioned in a reply to Andres, we probably should eventually
optimize for greatest reduction in cost, and the code as it stands would
allow that to be added easily, but that would require having more
accurate cost info than we have now. cost_sort doesn't take into
consideration the number or types of sort columns or the costs of their
comparison functions, so all sorts of the same input data are costed
equally.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] createlang/droplang deprecated

2017-03-23 Thread Peter Eisentraut
On 3/23/17 06:41, Daniel Gustafsson wrote:
>> On 20 Mar 2017, at 01:37, Peter Eisentraut 
>>  wrote:
>>
>> On 3/18/17 09:00, Peter Eisentraut wrote:
>>> I just noticed that createlang and droplang have been listed as
>>> deprecated since PG 9.1.
>>>
>>> Do we dare remove them?
>>
>> Patch
> 
> LGTM, +1

Committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-23 Thread Robert Haas
On Tue, Mar 21, 2017 at 11:35 PM, Craig Ringer  wrote:
> Changes made per discussion.

Committed 0001.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Ashutosh Sharma
> Hi,
>
> On 03/23/2017 02:11 PM, Ashutosh Sharma wrote:
>>
>> On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
>>  wrote:
>>>
>>> 0001v2:
>>>
>>> In hashgettuple() you can remove the 'currItem' and 'offnum' from the
>>> 'else'
>>> part, and do the assignment inside
>>>
>>>   if (so->numKilled < MaxIndexTuplesPerPage)
>>>
>>> instead.
>>>
>>
>> Done. Please have a look into the attached v3 patch.
>>
>>>
>>> No new comments for 0002 and 0003.
>>
>>
>> okay. Thanks.
>>
>
> I'll keep the entry in 'Needs Review' if Alexander, or others, want to add
> their feedback.

okay. Thanks.

>
> (Best to post the entire patch series each time)

I take your suggestion. Please find the attachments.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 4e953c35da2274165b00d763500b83e0f3f9e2a9 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Thu, 23 Mar 2017 23:36:05 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev3

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 121 +++--
 src/backend/access/hash/hashpage.c   |  14 +-
 src/backend/access/hash/hashsearch.c | 330 ++-
 src/backend/access/hash/hashutil.c   |  23 ++-
 src/include/access/hash.h|  44 +
 6 files changed, 385 insertions(+), 156 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..8c28fbd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-			if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
-break;
-		}
-		if (offnum > maxoffnum)
-			elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
-		ItemPointerSetOffsetNumber(current, offnum);
-
-		/*
 		 * Check to see if we should kill the previously-fetched tuple.
 		 */
 		if (scan->kill_prior_tuple)
@@ -346,9 +303,11 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 
 			if (so->numKilled < MaxIndexTuplesPerPage)
 			{
-so->killedItems[so->numKilled].heapTid = so->hashso_heappos;
-so->killedItems[so->numKilled].indexOffset =
-			ItemPointerGetOffsetNumber(&(so->hashso_curpos));
+	

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 11:44 PM, Mithun Cy 
wrote:

> Hi Pavan,
> On Thu, Mar 23, 2017 at 12:19 AM, Pavan Deolasee
>  wrote:
> > Ok, no problem. I did some tests on AWS i2.xlarge instance (4 vCPU, 30GB
> > RAM, attached SSD) and results are shown below. But I think it is
> important
> > to get independent validation from your side too, just to ensure I am not
> > making any mistake in measurement. I've attached naively put together
> > scripts which I used to run the benchmark. If you find them useful,
> please
> > adjust the paths and run on your machine.
>
> I did a similar test appears. Your v19 looks fine to me, it does not
> cause any regression, On the other hand, I also ran tests reducing
> table fillfactor to 80 there I can see a small regression 2-3% in
> average when updating col2 and on updating col9 again I do not see any
> regression.
>
>
Thanks Mithun for repeating the tests and confirming that the v19 patch
looks ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andres Freund
On 2017-03-23 03:43:57 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  Andres> Changes to advance_aggregates() are, in my experience, quite
>  Andres> likely to have performance effects.  This needs some
>  Andres> performance tests.
>  [...]
>  Andres> Looks like it could all be noise, but it seems worthwhile to
>  Andres> look into some.
> 
> Trying to sort out signal from noise when dealing with performance
> impacts of no more than a few percent is _extremely hard_ these days.

Indeed. But that doesn't mean we needn't try.  With some determination
and profiling you can often sepearate signal from noise - I managed to
track down 0.12% regressions in the expression evaluation work...


> I will go ahead and do this, out of sheer curiosity if nothing else,
> but the preliminary results suggest there's probably nothing worth
> holding up the patch for.

Agreed. I'd want to run one more profile, checking whether the profiles
indicate new hotspots, but other than that...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Mark Dilger

> On Mar 23, 2017, at 11:22 AM, Andrew Gierth  
> wrote:
> 
>> "Mark" == Mark Dilger  writes:
> 
> Mark> You define DiscreteKnapsack to take integer weights and double
> Mark> values, and perform the usual Dynamic Programming algorithm to
> Mark> solve.  But the only place you call this, you pass in NULL for
> Mark> the values, indicating that all the values are equal.  I'm
> Mark> confused why in this degenerate case you bother with the DP at
> Mark> all.  Can't you just load the knapsack from lightest to heaviest
> Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
> Mark> been a long day.  Sorry if I am missing something obvious.
> 
> That's actually a very good question.
> 
> (Though in passing I should point out that the runtime cost is going to
> be negligible in all practical cases. Even an 8-way cube (256 grouping
> sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
> 4096 grouping sets was only chosen because it was a nice round number
> that was larger than comparable limits in other database products. Other
> stuff in the grouping sets code has worse time bounds; the
> bipartite-match code used to minimize the number of rollups is
> potentially O(n^2.5) in the number of grouping sets.)
> 
> Thinking about it, it's not at all obvious what we should be optimizing
> for here in the absence of accurate sort costs.

Is there a performance test case where this patch should shine
brightest?  I'd like to load a schema with lots of data, and run
a grouping sets query, both before and after applying the patch,
to see what the performance advantage is.

mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andres Freund
On 2017-03-23 05:09:46 +, Andrew Gierth wrote:
> > "Andres" == Andres Freund  writes:
> 
>  >> - Assert(newphase == 0 || newphase == aggstate->current_phase + 1);
>  >> + Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);
> 
>  Andres> I think this somewhere in the file header needs an expanded
>  Andres> explanation about what these "special" phases mean.
> 
> I could move most of the block comment for ExecInitAgg up into the file
> header; would that be better?

Yes, I think so.


>  >> + values = palloc((1 + max_weight) * sizeof(double));
>  >> + sets = palloc((1 + max_weight) * sizeof(Bitmapset *));
> 
>  Andres> We usually cast the result of palloc.
> 
> Rough count in the backend has ~400 without casts to ~1350 with, so this
> doesn't seem to have been consistently enforced.

Yea, but we're still trying.

>  >> + RelOptInfo *grouped_rel,
>  >> + Path *path,
>  >> + bool is_sorted,
>  >> + bool can_hash,
>  >> + PathTarget *target,
>  >> + grouping_sets_data *gd,
>  >> + const AggClauseCosts 
> *agg_costs,
>  >> + double dNumGroups)
>  >> +{
>  >> + Query  *parse = root->parse;
>  >> +
>  >> + /*
>  >> +  * If we're not being offered sorted input, then only consider plans 
> that
>  >> +  * can be done entirely by hashing.
>  >> +  *
>  >> +  * We can hash everything if it looks like it'll fit in work_mem. But if
>  >> +  * the input is actually sorted despite not being advertised as such, we
>  >> +  * prefer to make use of that in order to use less memory.
>  >> +  * If none of the grouping sets are sortable, then ignore the work_mem
>  >> +  * limit and generate a path anyway, since otherwise we'll just fail.
>  >> +  */
>  >> + if (!is_sorted)
>  >> + {
>  >> + List   *new_rollups = NIL;
>  >> + RollupData *unhashable_rollup = NULL;
>  >> + List   *sets_data;
>  >> + List   *empty_sets_data = NIL;
>  >> + List   *empty_sets = NIL;
>  >> + ListCell   *lc;
>  >> + ListCell   *l_start = list_head(gd->rollups);
>  >> + AggStrategy strat = AGG_HASHED;
>  >> + Sizehashsize;
>  >> + double  exclude_groups = 0.0;
>  >> +
>  >> + Assert(can_hash);
>  >> +
>  >> + if (pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
>  >> + {
>  >> + unhashable_rollup = lfirst(l_start);
> 
>  Andres> a) cast result of lfirst/lnext/whatnot.
> 
> casting lfirst is defensible (and only about 10% of existing uses don't
> cast it), but why would you ever cast the result of lnext?

Obviously lnext is bogus, was thinking of linitial...

>  >> + if (can_hash && gd->any_hashable)
>  >> + {
>  >> + List   *rollups = NIL;
>  >> + List   *hash_sets = list_copy(gd->unsortable_sets);
>  >> + double  availspace = (work_mem * 1024.0);
> 
>  Andres> Why the parens, and why the .0?
> 
> The .0 is because the * should be done as a double, not an int.

I was just missing why - but now that I've not already been awake for 20
hours, 8 of those crammed into a plane, it's obviously beneficial
because of overflow concerns.


>  Andres> I think you need a higher level explanation of how you're
>  Andres> mapping the work_mem issue to knapsack somewhere.
> 
> The general overview ("work_mem is the knapsack size, and we need to
> figure out how best to pack it with hashtables"), or the specific issue
> of the scale factor for discrete chunking of the size?

The general overview - the scaling thing doesn't seem that important to
understand in detail, to understand the approach.  Briefly explaining
what we're trying to minimize (sort steps), what the constraint is
(memory), that the problem is representable as the classic "knapsack
problem".


>  Andres> Hm.  So we're solely optimizing for the number of sorts, rather
>  Andres> the cost of the sorts.  Probably an acceptable tradeoff.
> 
> The existing cost model for sorts doesn't actually seem to help us here;
> all of our sorts sort the same data, just with different comparison
> columns, but cost_sort doesn't actually account for that (all its
> callers except the CLUSTER planning code pass in 0.0 for
> comparison_cost).
> 
> If the sort costs were correct, we could compute a "value" for each
> rollup that reflected the cost difference between the sort+unique
> comparisons for it, and the hash functions that would be called if we
> hashed instead; and just pass that to the knapsack function.

That'd indeed be desirable - but I think we can punt on that for now.

Greetings,

Andres Freund


-- 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-23 Thread Pavan Deolasee
On Thu, Mar 23, 2017 at 7:53 PM, Amit Kapila 
wrote:

> On Thu, Mar 23, 2017 at 3:44 PM, Pavan Deolasee
>
> >
> > Yes, this is a very fair point. The way I proposed to address this
> upthread
> > is by introducing a set of threshold/scale GUCs specific to WARM. So
> users
> > can control when to invoke WARM cleanup. Only if the WARM cleanup is
> > required, we do 2 index scans. Otherwise vacuum will work the way it
> works
> > today without any additional overhead.
> >
>
> I am not sure on what basis user can set such parameters, it will be
> quite difficult to tune those parameters.  I think the point is
> whatever threshold we keep, once that is crossed, it will perform two
> scans for all the indexes.


Well, that applies to even vacuum parameters, no? The general sense I've
got here is that we're ok to push some work in background if it helps the
real-time queries, and I kinda agree with that. If WARM improves things in
a significant manner even with these additional maintenance work, it's
still worth doing.

Having said that, I see many ways we can improve on this later. Like we can
track somewhere else information about tuples which may have received WARM
updates (I think it will need to be a per-index bitmap or so) and use that
to do WARM chain conversion in a single index pass. But this is clearly not
PG 10 material.


>   IIUC, this conversion of WARM chains is
> required so that future updates can be WARM or is there any other
> reason?  I see this as a big penalty for future updates.
>

It's also necessary for index-only-scans. But I don't see this as a big
penalty for future updates, because if there are indeed significant WARM
updates then not preparing for future updates will result in
write-amplification, something we are trying to solve here and something
which seems to be showing good gains.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-23 Thread Alvaro Herrera
Copying Fabrízio Mello here, who spent some time trying to work on
reloptions too.  He may have something to say about the new
functionality that this patch provides.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-23 Thread Amit Khandekar
On 23 March 2017 at 16:26, Amit Khandekar  wrote:
> On 23 March 2017 at 05:55, Robert Haas  wrote:
>> On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar 
wrote:
>>> Attached is the updated patch that handles the changes for all the
>>> comments except the cost changes part. Details about the specific
>>> changes are after the cost-related points discussed below.
>>>
>>> For non-partial paths, I was checking following 3 options :
>>>
>>> Option 1. Just take the sum of total non-partial child costs and
>>> divide it by number of workers. It seems to be getting close to the
>>> actual cost.
>>
>> If the costs for all children are about equal, then that works fine.
>> But when they are very unequal, then it's highly misleading.
>>
>>> Option 2. Calculate exact cost by an algorithm which I mentioned
>>> before, which is pasted below for reference :
>>> Per­subpath cost : 20 16 10 8 3 1, with 3 workers.
>>> After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the
>>> times remaining are :
>>> 10  6  0 8 3 1
>>> After 6 units (minimum of 10, 06, 08), the times remaining are :
>>> 4  0  0 2 3 1
>>> After 2 units (minimum of 4, 2, 3), the times remaining are :
>>>  2  0  0 0 1 1
>>> After 1 units (minimum of 2, 1, 1), the times remaining are :
>>>  1  0  0 0 0 0
>>> After 1 units (minimum of 1, 0 , 0), the times remaining are :
>>>  0  0  0 0 0 0
>>> Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20
>>
>
>> This gives the same answer as what I was proposing
>
> Ah I see.
>
>> but I believe it's more complicated to compute.
> Yes a bit, particularly because in my algorithm, I would have to do
> 'n' subtractions each time, in case of 'n' workers. But it looked more
> natural because it follows exactly the way we manually calculate.
>
>> The way my proposal would work in this
>> case is that we would start with an array C[3] (since there are three
>> workers], with all entries 0.  Logically C[i] represents the amount of
>> work to be performed by worker i.  We add each path in turn to the
>> worker whose array entry is currently smallest; in the case of a tie,
>> just pick the first such entry.
>>
>> So in your example we do this:
>>
>> C[0] += 20;
>> C[1] += 16;
>> C[2] += 10;
>> /* C[2] is smaller than C[0] or C[1] at this point, so we add the next
>> path to C[2] */
>> C[2] += 8;
>> /* after the previous line, C[1] is now the smallest, so add to that
>> entry next */
>> C[1] += 3;
>> /* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
>> C[2] += 1;
>> /* final result: C[0] = 20, C[1] = 19, C[2] = 19 */
>>
>> Now we just take the highest entry that appears in any array, which in
>> this case is C[0], as the total cost.
>
> Wow. The way your final result exactly tallies with my algorithm
> result is very interesting. This looks like some maths or computer
> science theory that I am not aware.
>
> I am currently coding the algorithm using your method.

While I was coding this, I was considering if Path->rows also should
be calculated similar to total cost for non-partial subpath and total
cost for partial subpaths. I think for rows, we can just take
total_rows divided by workers for non-partial paths, and this
approximation should suffice. It looks odd that it be treated with the
same algorithm we chose for total cost for non-partial paths.

Meanwhile, attached is a WIP patch v10. The only change in this patch
w.r.t. the last patch (v9) is that this one has a new function defined
append_nonpartial_cost(). Just sending this to show how the algorithm
looks like; haven't yet called it.


ParallelAppend_v10_WIP.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM

2017-03-23 Thread Fabrízio de Royes Mello
On Thu, Mar 23, 2017 at 3:58 PM, Alvaro Herrera 
wrote:
>
> Copying Fabrízio Mello here, who spent some time trying to work on
> reloptions too.  He may have something to say about the new
> functionality that this patch provides.
>

Thanks Álvaro, I'll look the patch and try to help in some way.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


[HACKERS] Privilege checks on array coercions

2017-03-23 Thread Tom Lane
There is a test in privileges.sql (currently lines 589-625 in
privileges.out) that seems to be dependent on the fact that the
ArrayCoerceExpr logic doesn't check for EXECUTE privilege on the
per-element type coercion function if it's dealing with a NULL input
array.

While fooling with Andres' faster-expressions patch, I moved the
pg_proc_aclcheck call for this into expression compilation, causing
that privileges.sql test to fail.

Since Andres' patch moves ACL checks for regular function calls into
expression compilation, I think it would be weird and inconsistent not
to do so for ArrayCoerceExpr as well.  Does anyone want to defend this
privileges test case as testing for some behavior that users expect?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-23 Thread Peter Eisentraut
On 3/23/17 05:34, Andreas Karlsson wrote:
> I am fine with this version of the patch. The issues I have with it, 
> which I mentioned earlier in this thread, seem to be issues with ICU 
> rather than with this patch. For example there seems to be no way for 
> ICU to validate the syntax of the BCP 47 locales (or ICU's old format). 
> But I think we will just have to accept the weirdness of how ICU handles 
> locales.
> 
> I think this patch is ready to be committed.
> 
> Found a typo in the documentation:
> 
> "The inspect the currently available locales" should be "To inspect the 
> currently available locales".

Committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Privilege checks on array coercions

2017-03-23 Thread Andres Freund
On 2017-03-23 15:26:51 -0400, Tom Lane wrote:
> There is a test in privileges.sql (currently lines 589-625 in
> privileges.out) that seems to be dependent on the fact that the
> ArrayCoerceExpr logic doesn't check for EXECUTE privilege on the
> per-element type coercion function if it's dealing with a NULL input
> array.
> 
> While fooling with Andres' faster-expressions patch, I moved the
> pg_proc_aclcheck call for this into expression compilation, causing
> that privileges.sql test to fail.
> 
> Since Andres' patch moves ACL checks for regular function calls into
> expression compilation, I think it would be weird and inconsistent not
> to do so for ArrayCoerceExpr as well.  Does anyone want to defend this
> privileges test case as testing for some behavior that users expect?

Not me - that seems quite sensible to change.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-23 Thread Mithun Cy
Hi Amit please find the new patch

On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila  wrote:
> It is not only about above calculation, but also what the patch is
> doing in function _hash_get_tbuckets().  By the way function name also
> seems unclear (mainly *tbuckets* in the name).

Fixed I have introduced some macros for readability and added more
comments to explain why some calculations are mad. Please let me know
if you think more improvements are needed.

>spelling.
>/forth/fourth
>/at time/at a time
Thanks fixed.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_04.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash support for grouping sets

2017-03-23 Thread Andrew Gierth
> "Mark" == Mark Dilger  writes:

 Mark> Is there a performance test case where this patch should shine
 Mark> brightest?  I'd like to load a schema with lots of data, and run
 Mark> a grouping sets query, both before and after applying the patch,
 Mark> to see what the performance advantage is.

The area which I think is most important for performance is the handling
of small cubes; without this patch, a 2d cube needs 2 full sorts, a 3d
one needs 3, and a 4d one needs 6. In many real-world data sets these
would all hash entirely in memory.

So here's a very simple example (deliberately using integers for
grouping to minimize the advantage; grouping by text columns in a non-C
locale would show a much greater speedup for the patch):

create table sales (
  id serial,
  product_id integer,
  store_id integer,
  customer_id integer,
  qty integer);

-- random integer function
create function d(integer) returns integer language sql
 as $f$ select floor(random()*$1)::integer + 1; $f$;

-- 10 million random rows
insert into sales (product_id,store_id,customer_id,qty)
  select d(20), d(6), d(10), d(100) from generate_series(1,1000);

-- example 2d cube:
select product_id, store_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id);

-- example 3d cube:
select product_id, store_id, customer_id, count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id);

-- example 4d cube with a computed column:
select product_id, store_id, customer_id, (qty / 10), count(*), sum(qty)
  from sales
 group by cube(product_id, store_id, customer_id, (qty / 10));

On my machine, the 2d cube is about 3.6 seconds with the patch, and
about 8 seconds without it; the 4d is about 18 seconds with the patch
and about 32 seconds without it (all with work_mem=1GB, compiled with
-O2 and assertions off).

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-23 Thread Jeff Janes
On Thu, Mar 23, 2017 at 12:34 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/23/17 05:34, Andreas Karlsson wrote:
> > I am fine with this version of the patch. The issues I have with it,
> > which I mentioned earlier in this thread, seem to be issues with ICU
> > rather than with this patch. For example there seems to be no way for
> > ICU to validate the syntax of the BCP 47 locales (or ICU's old format).
> > But I think we will just have to accept the weirdness of how ICU handles
> > locales.
> >
> > I think this patch is ready to be committed.
> >
> > Found a typo in the documentation:
> >
> > "The inspect the currently available locales" should be "To inspect the
> > currently available locales".
>
> Committed.
>

This has broken the C locale, and the build farm.

if (pg_database_encoding_max_length() > 1 || locale->provider ==
COLLPROVIDER_ICU)

segfaults because locale is null.  (locale_is_c is true)

Cheers,

Jeff


Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Teodor Sigaev

Hmm, it doesn't work (but appplies) on current HEAD:

% uname -a
FreeBSD *** 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0 r315651: Tue Mar 21 
02:44:23 MSK 2017 teodor@***:/usr/obj/usr/src/sys/XOR  amd64

% pg_config --configure
'--enable-depend' '--enable-cassert' '--enable-debug' '--enable-tap-tests' 
'--with-perl' 'CC=clang' 'CFLAGS=-O0'

% rm -rf /spool/pg_data/*
% initdb -n -E UTF8 --locale ru_RU.UTF-8 --lc-messages C --lc-monetary C 
--lc-numeric C --lc-time C -D /spool/pg_data

Running in no-clean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "teodor".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  ru_RU.UTF-8
  CTYPE:ru_RU.UTF-8
  MESSAGES: C
  MONETARY: C
  NUMERIC:  C
  TIME: C
The default text search configuration will be set to "russian".

Data page checksums are disabled.

fixing permissions on existing directory /spool/pg_data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:  could 
not open file "pg_clog": No such file or directory

child process exited with exit code 1
initdb: data directory "/spool/pg_data" not removed at user's request


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-23 Thread Jan Michálek
2017-03-23 17:26 GMT+01:00 Pierre Ducroquet :

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> Hi
>
> This is my first review (Magnus said in his presentation in PGDay Paris
> that volunteers should just come and help, so here I am), so please notify
> me for any mistake I do when using the review tools...
>
> The feature seems to work as expected, but I don't claim to be a markdown
> and rst expert.
> Some minor issues with the code itself :
> - some indentation issues (documentation and code itself with mix between
> space based and tab based indentation) and a few trailing spaces in code
> - typographic issues in the documentation :
>   - "The html, asciidoc, latex, latex-longtable, troff-ms, and markdown
> and rst formats" ==> duplicated and
>   - "Sets the output format to one of unaligned, aligned, wrapped, html,
> asciidoc, latex (uses tabular), latex-longtable, rst, markdown, or
> troff-ms." ==> extra comma at the end of the list
> - the comment " dont add line after last row, because line is added after
> every row" is misleading, it should warn that it's only for rst
> - there is a block of commented out code left
> - in the print_aligned_vertical function, there is a mix between
> "cont->opt->format == PRINT_RST" and "format == &pg_rst" and I don't see
> any obvious reason for that
> - the documentation doesn't mention (but ok, it's kind of obvious) that
> the linestyle option will not work with rst and markdown
>
> Thanks !
>

Thanks
I will work on it this weekend. I need to adapt it to current master and i
will do some indentation issues with this.
I need to add \x to markdown format and some things about title from older
posts there.

Nice Day Je;



>
> The new status of this patch is: Waiting on Author
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jelen
Starší čeledín datovýho chlíva


Re: [HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-23 Thread Teodor Sigaev

Hi, the patch looks good except why do you remove initialization of 
is_throttled?
Suppose, just a typo?

 --- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1967,7 +1967,6 @@ top:
st->listen = false;
st->sleeping = false;
st->throttling = false;
-   st->is_throttled = false;
memset(st->prepared, 0, sizeof(st->prepared));
}


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Peter Eisentraut
On 3/22/17 17:33, David Steele wrote:
> I think if we don't change the default size it's very unlikely I would 
> use alternate WAL segment sizes or recommend that anyone else does, at 
> least in v10.
> 
> I simply don't think it would get the level of testing required to be 
> production worthy

I think we could tweak the test harnesses to run all the tests with
different segment sizes.  That would get pretty good coverage.

More generally, the methodology that we should not add an option unless
we also change the default because the option would otherwise not get
enough testing is a bit dubious.

> and I doubt that most tool writers would be quick to 
> add support for a feature that very few people (if any) use.

I'm not one of those tool writers, although I have written my share of
DBA scripts over the years, but I wonder why those tools would really
care.  They are handed files with predetermined names to archive, and
for restore files with predetermined names are requested back from them.
 What else do they need?  If something is missing that requires them to
parse file names, then maybe that should be added.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-23 Thread Thomas Munro
On Fri, Mar 24, 2017 at 8:34 AM, Peter Eisentraut
 wrote:
> Committed.

On a macOS system using MacPorts, I have "icu" installed.  MacPorts is
a package manager that installs stuff under /opt/local.  I have
/opt/local/bin in $PATH so that pkg-config can be found.  I run
./configure with --with-icu but without mentioning any special paths
and it says:

checking whether to build with ICU support... yes
checking for pkg-config... /opt/local/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for icu-uc icu-i18n... yes

... but then building fails, because there are no headers in the search path:

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -g -O2 -Wall -Werror
-I../../../src/include/snowball
-I../../../src/include/snowball/libstemmer -I../../../src/include   -c
-o dict_snowball.o dict_snowball.c -MMD -MP -MF .deps/dict_snowball.Po
...
../../../src/include/utils/pg_locale.h:19:10: fatal error:
'unicode/ucol.h' file not found

But pkg-config does know where to find those headers:

$ pkg-config --cflags icu-i18n
-I/opt/local/include

... and it's not wrong:

$ ls /opt/local/include/unicode/ucol.h
/opt/local/include/unicode/ucol.h

So I think there may be a bug in the configure script: I think it
should be putting pkg-config's --cflags output into one of the
CFLAGS-like variables.

Or am I misunderstanding what pkg-config is supposed to be doing for us here?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-03-23 Thread Peter Eisentraut
On 3/23/17 16:45, Thomas Munro wrote:
> On Fri, Mar 24, 2017 at 8:34 AM, Peter Eisentraut
>  wrote:
>> Committed.
> 
> On a macOS system using MacPorts, I have "icu" installed.  MacPorts is
> a package manager that installs stuff under /opt/local.  I have
> /opt/local/bin in $PATH so that pkg-config can be found.  I run
> ./configure with --with-icu but without mentioning any special paths
> and it says:
> 
> checking whether to build with ICU support... yes
> checking for pkg-config... /opt/local/bin/pkg-config
> checking pkg-config is at least version 0.9.0... yes
> checking for icu-uc icu-i18n... yes
> 
> ... but then building fails, because there are no headers in the search path:

Fixed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-23 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/22/17 17:33, David Steele wrote:
> > and I doubt that most tool writers would be quick to 
> > add support for a feature that very few people (if any) use.
> 
> I'm not one of those tool writers, although I have written my share of
> DBA scripts over the years, but I wonder why those tools would really
> care.  They are handed files with predetermined names to archive, and
> for restore files with predetermined names are requested back from them.
>  What else do they need?  If something is missing that requires them to
> parse file names, then maybe that should be added.

PG backup technology has come a fair ways from that simple
characterization of it. :)

The backup tools need to also get the LSN from the pg_stop_backup and
verify that they have the WAL file associated with that LSN.  They also
need to make sure that they have all of the WAL files between the
starting LSN and the ending LSN.  Doing that requires understanding how
the files are named to make sure there aren't any missing.

David will probably point out other reasons that the backup tools need
to understand the file naming, but those are ones I know of off-hand.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Measuring replay lag

2017-03-23 Thread Thomas Munro
On Thu, Mar 23, 2017 at 10:50 PM, Simon Riggs  wrote:
>> Second thoughts... I'll just make LagTrackerWrite externally
>> available, so a plugin can send anything it wants to the tracker.
>> Which means I'm explicitly removing the "logical replication support"
>> from this patch.
>
> Done.
>
> Here's the patch I'm looking to commit, with some docs and minor code
> changes as discussed.

Thank you for committing this.  Time-based replication lag tracking
seems to be a regular topic on mailing lists and IRC, so I hope that
this will provide what people are looking for and not simply replace
that discussion with a new discussion about what lag really means :-)

Many thanks to Simon and Fujii-san for convincing me to move the
buffer to the sender (which now seems so obviously better), to
Fujii-san for the idea of tracking write and flush lag too, and to
Abhijit, Sawada-san, Ian, Craig and Robert for valuable feedback.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-23 Thread Tom Lane
Stylistic thought ... I am wondering if it wouldn't be a good idea
to replace EEOP_CASE_WHEN_STEP, EEOP_CASE_THEN_STEP, EEOP_COALESCE,
and EEOP_ARRAYREF_CHECKINPUT with instructions defined in a less
usage-dependent way as

EEOP_JUMP   unconditional jump
EEOP_JUMP_IF_NULL   jump if step result is null
EEOP_JUMP_IF_NOT_NULL   jump if step result isn't null
EEOP_JUMP_IF_NOT_TRUE   jump if step result isn't TRUE

One could imagine later filling out this set with the other BoolTest
condition types, but that seems to be all we need right now.

These are basically just renamings of the step types that exist now,
although EEOP_ARRAYREF_CHECKINPUT would have to drop its not-very-
necessary Assert(!op->d.arrayref.state->isassignment).  Well, I guess
I should say that they're renamings of the semantics that I have
for these steps in my working copy; for instance, I got rid of
casewhen.value/casewhen.isnull in favor of letting CASE WHEN expressions
evaluate into the CASE's final output variable.

At least to me, I think the compiling code would be more readable
this way.  I find WHEN_STEP and THEN_STEP a bit odd because they are
emitted after, not before, the expressions you'd think they control.
ARRAYREF_CHECKINPUT is pretty vaguely named, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-23 Thread Michael Paquier
On Fri, Mar 24, 2017 at 5:08 AM, Teodor Sigaev  wrote:
> Hmm, it doesn't work (but appplies) on current HEAD:
> [...]
> Data page checksums are disabled.
>
> fixing permissions on existing directory /spool/pg_data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting dynamic shared memory implementation ... posix
> creating configuration files ... ok
> running bootstrap script ... 2017-03-23 23:07:14.705 MSK [40286] FATAL:
> could not open file "pg_clog": No such file or directory
> child process exited with exit code 1
> initdb: data directory "/spool/pg_data" not removed at user's request

And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.
-- 
Michael
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2d33510930..7a007a6ba5 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -577,6 +577,13 @@ ShutdownCLOG(void)
 	/* Flush dirty CLOG pages to disk */
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false);
 	SimpleLruFlush(ClogCtl, false);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false);
 }
 
@@ -589,6 +596,13 @@ CheckPointCLOG(void)
 	/* Flush dirty CLOG pages to disk */
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
 	SimpleLruFlush(ClogCtl, true);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
 }
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 8e1df6e0ea..03ffa20908 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -746,6 +746,12 @@ ShutdownCommitTs(void)
 {
 	/* Flush dirty CommitTs pages to disk */
 	SimpleLruFlush(CommitTsCtl, false);
+
+	/*
+	 * fsync pg_commit_ts to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_commit_ts", true);
 }
 
 /*
@@ -756,6 +762,12 @@ CheckPointCommitTs(void)
 {
 	/* Flush dirty CommitTs pages to disk */
 	SimpleLruFlush(CommitTsCtl, true);
+
+	/*
+	 * fsync pg_commit_ts to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_commit_ts", true);
 }
 
 /*
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 4b4999fd7b..831693 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1650,6 +1650,14 @@ CheckPointTwoPhase(XLogRecPtr redo_horizon)
 	}
 	LWLockRelease(TwoPhaseStateLock);
 
+	/*
+	 * Flush unconditionally the parent directory to make any information
+	 * durable on disk.  Two-phase files could have been removed and those
+	 * removals need to be made persistent as well as any files newly created
+	 * previously since the last checkpoint.
+	 */
+	fsync_fname(TWOPHASE_DIR, true);
+
 	TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE();
 
 	if (log_checkpoints && serialized_xacts > 0)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b99ded5df6..11b1929c94 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3469,7 +3469,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	if (!find_free)
 	{
 		/* Force installation: get rid of any pre-existing segment file */
-		unlink(path);
+		durable_unlink(path, DEBUG1);
 	}
 	else
 	{
@@ -4020,16 +4020,13 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	  path)));
 			return;
 		}
-		rc = unlink(newpath);
+		rc = durable_unlink(newpath, LOG);
 #else
-		rc = unlink(path);
+		rc = durable_unlink(path, LOG);
 #endif
 		if (rc != 0)
 		{
-			ereport(LOG,
-	(errcode_for_file_access(),
-			   errmsg("could not remove old transaction log file \"%s\": %m",
-	  path)));
+			/* Message already logged by durable_unlink() */
 			return;
 		}
 		CheckpointStats.ckpt_segs_removed++;
@@ -10752,17 +10749,13 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 		(errcode_for_file_access(),
 		 errmsg("could not read file \"%s\": %m",
 BACKUP_LABEL_FILE)));
-			if (unlink(BACKUP_LABEL_FILE) != 0)
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg("could not remove file \"%s\": %m",
-BACKUP_LABEL_FILE)));
+			durable_unlink(BACKUP_LABEL_FILE, ERROR);
 
 			/*
 			 * Remove tablespace_map file if present, it is created only if there
 			 * are tablespaces.
 			 */
-			unlink(TABLESPACE_MAP);
+			durable_unlink(TABLESPACE_MAP, DEBUG1);
 		}
 		PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive));
 	}
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index f0ed2e9b5f..b1

Re: [HACKERS] pg_dump, pg_dumpall and data durability

2017-03-23 Thread Michael Paquier
On Wed, Mar 22, 2017 at 7:00 AM, Michael Paquier
 wrote:
> On Wed, Mar 22, 2017 at 6:24 AM, Andrew Dunstan
>  wrote:
>> This is really a pretty small patch all things considered, and pretty
>> low-risk (although I haven;t been threough the code in fine detail yet).
>> In the end I'm persuaded by Andres' point that there's actually no
>> practical alternative way to make sure the data is actually synced to disk.
>>
>> If nobody else wants to pick it up I will, unless there is a strong
>> objection.
>
> Thanks!

Thanks Andrew, I can see that this has been committed as 96a7128b.

I also saw that:
https://www.postgresql.org/message-id/75e1b6ff-c3d5-9a26-e38b-3cb22a099...@2ndquadrant.com
I'll send a patch in a bit for the regression tests.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-03-23 Thread Andrew Dunstan


On 03/21/2017 06:28 PM, Dmitry Dolgov wrote:
> > On 21 March 2017 at 03:03, Andrew Dunstan
>  > wrote:
> >
> > However, I think it should probably be broken up into a couple of
> pieces -
> > one for the generic json/jsonb transforms infrastructure (which probably
> > needs some more comments) and one for the FTS functions that will
> use it.
>
> Sure, here are two patches with separated functionality and a bit more
> commentaries for the transform functions.

I'm not through looking at this. However, here are a few preliminary
comments

  * we might need to rationalize the header locations a bit
  * iterate_json(b) and transform_json(b) are a bit too generally named.
Really what they do is iterate over or transform string values in
the json(b). They ignore / preserve the structure, keys, and
non-string scalar values in the json(b). A general iterate or
transform function would be called in effect with a stream of all
the elements in the json, not just scalar strings.
  * Unless I'm missing something the iterate_json(b)_values return value
is ignored. Instead of returning the state it looks to me like it
should return nothing and be declared as void instead of void *
  * transform_jsonb and transform_json are somewhat asymmetrical. The
latter should probably return a text* instead of a StringInfo, to be
consistent with the former.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Michael Banck
Hi,

On Thu, Mar 23, 2017 at 09:00:16AM -0400, Peter Eisentraut wrote:
> On 3/21/17 21:38, Peter Eisentraut wrote:
> > This patch is looking pretty good to me, modulo the failing pg_dump tests.
> > 
> > Attached is a fixup patch.  I have mainly updated some comments and
> > variable naming for (my) clarity.  No functional changes.
> 
> Committed all that.

Maybe I'm doing something wrong, but I'm getting a segfault trying to
start logical replication since that patch went in.

I've blown away my build tree and started over with this minimal
example, any obvious mistakes here?  Quoting the publisher/subscriber
names doesn't seem to change things:

$ initdb --pgdata=data_pg1
$ sed -i -e 's/^#wal_level.=.replica/wal_level = logical/' 
data_pg1/postgresql.conf
$ pg_ctl --pgdata=data_pg1 -l pg1.log start
$ pg_basebackup --pgdata=data_pg2
$ sed -i -e 's/^#port.=.5432/port = 5433/' data_pg2/postgresql.conf
$ pg_ctl --pgdata=data_pg2 -l pg2.log start
$ psql -c "CREATE PUBLICATION pub1;"
$ psql --port=5433 -c "CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres' 
PUBLICATION pub1;"

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
269 PinTupleDesc(tupdesc);
(gdb) bt
#0  ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
#1  0x005dc4fc in MakeSingleTupleTableSlot (tupdesc=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:203
#2  0x005a16ff in fetch_table_list (wrconn=wrconn@entry=0xc0030001,
publications=publications@entry=0xffb448)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:996
#3  0x005a1efa in CreateSubscription (stmt=0x101cd40, 
isTopLevel=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:396
#4  0x006ecf96 in ProcessUtilitySlow (pstate=0x0, 
pstate@entry=0xfc4818, pstmt=0x0,
pstmt@entry=0x101d080, queryString=0xf9d248 ,
queryString@entry=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", context=(unknown: 
16534992), context@entry=PROCESS_UTILITY_TOPLEVEL,
params=0x7ffd7e6263d0, params@entry=0x0,
completionTag=0x45 ,
completionTag@entry=0x7ffd7e626b00 "", dest=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:1612
#5  0x006ec4e9 in standard_ProcessUtility (pstmt=0x101d080,
queryString=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x101d160,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:922
#6  0x006e9f5f in PortalRunUtility (portal=0xfbb9d8, pstmt=0x101d080,
isTopLevel=, setHoldSnapshot=, 
dest=,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1165
#7  0x006ea977 in PortalRunMulti (portal=portal@entry=0xfbb9d8,
isTopLevel=isTopLevel@entry=1 '\001', 
setHoldSnapshot=setHoldSnapshot@entry=0 '\000',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1315
#8  0x006eb4cb in PortalRun (portal=portal@entry=0xfbb9d8,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:788
#9  0x006e7868 in exec_simple_query (
query_string=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:1101
#10 0x006e94a9 in PostgresMain (argc=1, argv=0x101c0b8, dbname=0xfc8478 
"postgres",
username=0xfc8458 "postgres")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:4069
#11 0x004770ad in BackendRun (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:4317
#12 BackendStartup (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:3989
#13 ServerLoop ()
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backe

  1   2   >