date:20170327

Re: [HACKERS] Parallel bitmap heap scan

2017-03-27 Thread Rafia Sabih

On Wed, Mar 8, 2017 at 11:52 PM, Robert Haas  wrote:
> On Wed, Mar 8, 2017 at 1:18 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> What I'm using is:
>>
>>> Configured with:
>>> --prefix=/Applications/Xcode.app/Contents/Developer/usr
>>> --with-gxx-include-dir=/usr/include/c++/4.2.1
>>> Apple LLVM version 7.0.2 (clang-700.1.81)
>>> Target: x86_64-apple-darwin14.5.0
>>> Thread model: posix
>>
>> Hm.  I noticed that longfin didn't spit up on it either, despite having
>> -Werror turned on.  That's a slightly newer version, but still Apple's
>> clang:
>>
>> $ gcc -v
>> Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr 
>> --with-gxx-include-dir=/usr/include/c++/4.2.1
>> Apple LLVM version 8.0.0 (clang-800.0.42.1)
>> Target: x86_64-apple-darwin16.4.0
>> Thread model: posix
>> InstalledDir: 
>> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
>
> Yeah.  I think on my previous MacBook Pro, you could do this without
> generating a warning:
>
> int x;
> printf("%d\n", x);
>
> The compiler on this one detects that case, but that seems to be about
> as far as it goes.

Recently, on testing TPC-H 300 scale factor I stumbled on to a error
for Q6, the test environment is as follows,
work_mem = 1GB,
shared_buffers = 10 GB,
Effective_cache_size = 10GB
random_page_cost = seq+page_cost =0.1

The error message is, ERROR:  invalid DSA memory alloc request size 1610612740
In case required, stack trace is as follows,

#0  errfinish (dummy=0) at elog.c:415
#1  0x10738820 in elog_finish (elevel=20, fmt=0x109115d0
"invalid DSA memory alloc request size %zu") at elog.c:1378
#2  0x10778824 in dsa_allocate_extended (area=0x1001b53b138,
size=1610612740, flags=4) at dsa.c:676
#3  0x103a3e60 in pagetable_allocate (pagetable=0x1001b52a590,
size=1610612736) at tidbitmap.c:1521
#4  0x103a0488 in pagetable_grow (tb=0x1001b52a590,
newsize=33554432) at ../../../src/include/lib/simplehash.h:379
#5  0x103a07b0 in pagetable_insert (tb=0x1001b52a590,
key=34962730, found=0x3fffc705ba48 "") at
../../../src/include/lib/simplehash.h:504
#6  0x103a3354 in tbm_get_pageentry (tbm=0x1001b526fa0,
pageno=34962730) at tidbitmap.c:1235
#7  0x103a1614 in tbm_add_tuples (tbm=0x1001b526fa0,
tids=0x1001b51d22a, ntids=1, recheck=0 '\000') at tidbitmap.c:425
#8  0x100fdf24 in btgetbitmap (scan=0x1001b51c4a0,
tbm=0x1001b526fa0) at nbtree.c:460
#9  0x100f2c48 in index_getbitmap (scan=0x1001b51c4a0,
bitmap=0x1001b526fa0) at indexam.c:726
#10 0x1033c7d8 in MultiExecBitmapIndexScan
(node=0x1001b51c180) at nodeBitmapIndexscan.c:91
#11 0x1031c0b4 in MultiExecProcNode (node=0x1001b51c180) at
execProcnode.c:611
#12 0x1033a8a8 in BitmapHeapNext (node=0x1001b5187a8) at
nodeBitmapHeapscan.c:143
#13 0x1032a094 in ExecScanFetch (node=0x1001b5187a8,
accessMtd=0x1033a6c8 , recheckMtd=0x1033bab8
) at execScan.c:95
#14 0x1032a194 in ExecScan (node=0x1001b5187a8,
accessMtd=0x1033a6c8 , recheckMtd=0x1033bab8
) at execScan.c:162
#15 0x1033bb88 in ExecBitmapHeapScan (node=0x1001b5187a8) at
nodeBitmapHeapscan.c:667

Please feel free to ask if any more information is required for this.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Michael Paquier

On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer  wrote:
> I'm fairly confident that I overlooked single row mode entirely in the
> original patch, though it's long enough ago that it's hard for me to
> remember exactly.
>
> I don't really have much of an opinion on the best handling of it.
>
> I would expect to be setting single-row mode just before I requested
> the *result* from the next pending query, since it relates to result
> processing rather than query dispatch. But that's about all the
> opinion I have here.

Yeah, I think that it makes sense to allow users to switch to single
row mode before requesting a result in the queue. It seems to me that
this should also be effective only during the fetching of one single
result set. When the client moves on to the next item in the queue we
should make necessary again a call to PQsetSingleRowMode().

Regarding the patch, I have spotted the following things in the last version:
- src/test/modules/Makefile should include test_libpq.
- Regression tests from 0002 are failing:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
not ok 6 - testlibpqbatch copyfailure
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-03-27 Thread Amit Khandekar

On 25 March 2017 at 01:34, Amit Khandekar  wrote:
I am yet to handle all of your comments, but meanwhile , attached is
> an updated patch, that handles RETURNING.
>
> Earlier it was not working because ExecInsert() did not return any
> RETURNING clause. This is because the setup needed to create RETURNIG
> projection info for leaf partitions is done in ExecInitModifyTable()
> only in case of INSERT. But because it is an UPDATE operation, we have
> to do this explicitly as a one-time operation when it is determined
> that row-movement is required. This is similar to how we do one-time
> setup of mt_partition_dispatch_info. So in the patch, I have moved
> this code into a new function ExecInitPartitionReturningProjection(),
> and now this is called in ExecInitModifyTable() as well as during row
> movement for ExecInsert() processing the returning clause.

> Basically we need to do all that is done in ExecInitModifyTable() for
> INSERT. There are a couple of other things that I suspect that might
> need to be done as part of the missing initialization for Execinsert()
> during row-movement :
> 1. Junk filter handling
> 2. WITH CHECK OPTION

Attached is an another updated patch v4 which does WITH-CHECK-OPTION
related initialization.

So we now have below two function calls during row movement :
/* Build WITH CHECK OPTION constraints for leaf partitions */
ExecInitPartitionWithCheckOptions(mtstate, root_rel);

/* Build a projection for each leaf partition rel. */
ExecInitPartitionReturningProjection(mtstate, root_rel);

And these functions are now re-used at two places : In
ExecInitModifyTable() and in row-movement code.
Basically whatever was not being initialized in ExecInitModifyTable()
is now done in row-movement code.

I have added relevant scenarios in sql/update.sql.

I checked the junk filter handling. I think there isn't anything that
needs to be done, because for INSERT, all that is needed is
ExecCheckPlanOutput(). And this function is anyway called even in
ExecInitModifyTable() even for UPDATE, so we don't have to initialize
anything additional.

> Yet, ExecDelete() during row-movement is still returning the RETURNING
> result redundantly, which I am yet to handle this.
Done above. Now we have a new parameter in ExecDelete() which tells
whether to skip RETURNING.

On 23 March 2017 at 07:04, Amit Langote  wrote:
> Would it be better to have at least some new tests?
Added some more scenarios in update.sql. Also have included scenarios
for WITH-CHECK-OPTION for updatable views.


> Also, there are a few places in the documentation mentioning that such 
> updates cause error,
> which will need to be updated.  Perhaps also add some explanatory notes
> about the mechanism (delete+insert), trigger behavior, caveats, etc.
> There were some points discussed upthread that could be mentioned in the
> documentation.
Yeah, I agree. Documentation needs some important changes. I am still
working on them.

> +if (!mtstate->mt_partition_dispatch_info)
> +{
>
> The if (pointer == NULL) style is better perhaps.
>
> +/* root table RT index is at the head of partitioned_rels */
> +if (node->partitioned_rels)
> +{
> +Index   root_rti;
> +Oid root_oid;
> +
> +root_rti = linitial_int(node->partitioned_rels);
> +root_oid = getrelid(root_rti, estate->es_range_table);
> +root_rel = heap_open(root_oid, NoLock); /* locked by
> InitPlan */
> +}
> +else
> +root_rel = mtstate->resultRelInfo->ri_RelationDesc;
>
> Some explanatory comments here might be good, for example, explain in what
> situations node->partitioned_rels would not have been set and/or vice versa.
Added some more comments in the relevant if conditions.

>
>> Now, for
>> UPDATE, ExecSetupPartitionTupleRouting() will be called only if row
>> movement is needed.
>>
>> We have to open an extra relation for the root partition, and keep it
>> opened and its handle stored in
>> mt_partition_dispatch_info[0]->reldesc. So ExecEndModifyTable() closes
>> this if it is different from node->resultRelInfo->ri_RelationDesc. If
>> it is same as node->resultRelInfo, it should not be closed because it
>> gets closed as part of ExecEndPlan().
>
> I guess you're referring to the following hunk.  Some comments:
>
> @@ -2154,10 +2221,19 @@ ExecEndModifyTable(ModifyTableState *node)
>   * Close all the partitioned tables, leaf partitions, and their indices
>   *
>   * Remember node->mt_partition_dispatch_info[0] corresponds to the root
> - * partitioned table, which we must not try to close, because it is the
> - * main target table of the query that will be closed by ExecEndPlan().
> - * Also, tupslot is NULL for the root partitioned table.
> + * partitioned table, which should not be closed if it is the main target
> + * table of t

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 15:26, Michael Paquier  wrote:
> On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer  wrote:
>> I'm fairly confident that I overlooked single row mode entirely in the
>> original patch, though it's long enough ago that it's hard for me to
>> remember exactly.
>>
>> I don't really have much of an opinion on the best handling of it.
>>
>> I would expect to be setting single-row mode just before I requested
>> the *result* from the next pending query, since it relates to result
>> processing rather than query dispatch. But that's about all the
>> opinion I have here.
>
> Yeah, I think that it makes sense to allow users to switch to single
> row mode before requesting a result in the queue. It seems to me that
> this should also be effective only during the fetching of one single
> result set. When the client moves on to the next item in the queue we
> should make necessary again a call to PQsetSingleRowMode().
>
> Regarding the patch, I have spotted the following things in the last version:
> - src/test/modules/Makefile should include test_libpq.
> - Regression tests from 0002 are failing:
> ok 1 - testlibpqbatch disallowed_in_batch
> ok 2 - testlibpqbatch simple_batch
> ok 3 - testlibpqbatch multi_batch
> ok 4 - testlibpqbatch batch_abort
> ok 5 - testlibpqbatch timings
> not ok 6 - testlibpqbatch copyfailure

There are only a few more days left of this commit fest.

Things are sounding pretty ready here, with some final cleanups
pending. It'd be cool to get this into Pg 10 :)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] free space map and visibility map

2017-03-27 Thread Masahiko Sawada

On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
 wrote:
> At Sat, 25 Mar 2017 19:53:47 -0700, Jeff Janes  wrote 
> in 
>> On Thu, Mar 23, 2017 at 7:01 PM, Kyotaro HORIGUCHI <
>> horiguchi.kyot...@lab.ntt.co.jp> wrote:
>>
>> > At Wed, 22 Mar 2017 02:15:26 +0900, Masahiko Sawada 
>> > wrote in > > gmail.com>
>> > > On Mon, Mar 20, 2017 at 11:28 PM, Robert Haas 
>> > wrote:
>> > > > On Sat, Mar 18, 2017 at 5:42 PM, Jeff Janes 
>> > wrote:
>> > > >> Isn't HEAP2_CLEAN only issued before an intended HOT update?  (Which
>> > then
>> > > >> can't leave the block as all visible or all frozen).  I think the
>> > issue is
>> > > >> here is HEAP2_VISIBLE or HEAP2_FREEZE_PAGE.  Am I reading this
>> > correctly,
>> > > >> that neither of those ever update the FSM, regardless of FPI?
>> > > >
>> > > > Yes, updates to the FSM are never logged.  Forcing replay of
>> > > > HEAP2_FREEZE_PAGE to update the FSM might be a good idea.
>> > > >
>> > >
>> > > I think I was missing something. I imaged your situation is that FPI
>> > > is replayed during crash recovery after the crashed server vacuums the
>> > > page and marked it as all-frozen. But this situation is also resolved
>> > > by that solution.
>> >
>> > # HEAP2_CLEAN is issued in lazy_vacuum_page
>> >
>> > It will work but I'm not sure it is right direction for
>> > HEAP2_FREEZE_PAGE to touch FSM.
>> >
>> > As Masahiko said, the situation must be created by HEAP2_VISIBLE
>> > without preceding HEAP2_CLEAN, or with HEAP2_CLEAN with FPI. I
>> > think only the latter can happen. The comment in heap_xlog_clean
>> > below is right generally but if a page filled with tuples becomes
>> > almost empty and freezable by this cleanup, a problematic
>> > situation like this occurs.
>> >
>>
>> I now think this is not the cause of the problem I am seeing.  I made the
>> replay of FREEZE_PAGE update the FSM (both with and without FPI), but that
>> did not fix it.  With frequent crashes, it still accumulated a lot of
>> frozen and empty (but full according to FSM) pages.  I also set up replica
>> streaming and turned off crashing on the master, and the FSM of the replica
>> stays accurate, so the WAL stream and replay logic is doing the right thing
>> on the replica.
>>
>> I now think the dirtied FSM pages are somehow not getting marked as dirty,
>> or are getting marked as dirty but somehow the checkpoint is skipping
>> them.  It looks like MarkBufferDirtyHint does do some operations unlocked
>> which could explain lost update, but it seems unlikely that that would
>> happen often enough to see the amount of lost updates I am seeing.
>
> Hmm.. clearing dirty hint seems already protected by exclusive
> lock. And I think it can occur without lock failure.
>
> Other than by FPI, FSM update is omitted when record LSN is older
> than page LSN. If heap page is evicted but FSM page is not after
> vacuuming and before power cut, replaying HEAP2_CLEAN skips
> update of FSM even though FPI is not attached. Of course this
> cannot occur on standby. One FSM page covers as many heap pages
> as about 4k, so FSM can stay far longer than heap pages.
>
> ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> is already empty when entering lazy_sacn_heap, or a page of
> non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> issued to set ALL_FROZEN.
>
> Perhaps the problem will be fixed by forcing heap_xlog_visible to
> update FSM (addition to FREEZE_PAGE), or the same in
> heap_xlog_clean. (As menthined in the previous mail, I prefer the
> latter.)

Maybe it's enough just to make both heap_xlog_visible and
heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
might be unnecessary). Because the problem happens on the page that is
full according to FSM but is empty and marked as all-visible or
all-frozen. Though heap_xlog_clean loads the heap page to the memory
for redo operation, forcing heap_xlog_clean to update FSM might be
overkill for this solution. Because it can happen on every pages that
are not marked as neither all-visible nor all-frozen. Basically 100%
accuracy of FSM is not required. On the other hand, if we makes
heap_xlog_visible updates the FSM, it requires to load both heap page
and FSM page, which can also be overhead. Another idea is, we can
heap_xlog_visible to have the freespace of corresponding heap page,
and then update FSM during recovery.

>
>> > > /*
>> > >  * Update the FSM as well.
>> > >  *
>> > >  * XXX: Don't do this if the page was restored from full page image. We
>> > >  * don't bother to update the FSM in that case, it doesn't need to be
>> > >  * totally accurate anyway.
>> > >  */
>> >
>>
>> What does that save us?  If we restored from FPI, we already have the block
>> in memory (we don't need to see the old version, just the new one), so it
>> doesn't save us a random read IO.
>
> Updates on random pages can cause visits to many unloaded FSM
> pages. It may be intending to avoid that. Or, especially for
> INSERT, successive o

Re: [HACKERS] Monitoring roles patch

2017-03-27 Thread Simon Riggs

On 25 March 2017 at 16:30, Dave Page  wrote:

> I believe this and other reasons we've described are exactly why other DBMS' 
> do what we're proposing.

It would help review if you could show some links and give a
commentary on what you think others do, what they get right and what
they get wrong, so we can be sure we are providing something people
actually want and/or expect. POLA needed. I don't want to be reading
various blogs about what those numpties on the Postgres project did in
v10. Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 14:08, Craig Ringer  wrote:

> So this patch makes ReplicationSlotAcquire check that the slot
> database matches the current database and refuse to acquire the slot
> if it does not.

New patch attached that drops above requirement, so slots can still be
dropped from any DB.

This introduces a narrow race window where DROP DATABASE may ERROR if
somebody connects to a different database and runs a
pg_drop_replication_slot(...) for one of the slots being dropped by
DROP DATABASE after we check for active slots but before we've dropped
the slot. But it's hard to hit and it's pretty harmless; the worst
possible result is dropping one or more of the slots before we ERROR
out of the DROP. But you clearly didn't want them anyway, since you
were dropping the DB and dropping some slots at the same time.

I think this one's ready to go.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:21:09 +0800
Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB

Automatically drop all logical replication slots associated with a
database when the database is dropped.
---
 doc/src/sgml/func.sgml |  3 +-
 doc/src/sgml/protocol.sgml |  2 +
 src/backend/commands/dbcommands.c  | 32 +---
 src/backend/replication/slot.c | 88 ++
 src/include/replication/slot.h |  1 +
 src/test/recovery/t/006_logical_decoding.pl| 40 +-
 .../recovery/t/010_logical_decoding_timelines.pl   | 30 +++-
 7 files changed, 182 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ba6f8dd..78508d7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());

 Drops the physical or logical replication slot
 named slot_name. Same as replication protocol
-command DROP_REPLICATION_SLOT.
+command DROP_REPLICATION_SLOT. For logical slots, this must
+be called when connected to the same database the slot was created on.

   
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b3a5026..5f97141 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
  
   Drops a replication slot, freeing any reserved server-side resources. If
   the slot is currently in use by an active connection, this command fails.
+  If the slot is a logical slot that was created in a database other than
+  the database the walsender is connected to, this command fails.
  
  
   
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 5a63b1a..c0ba2b4 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -845,19 +845,22 @@ dropdb(const char *dbname, bool missing_ok)
  errmsg("cannot drop the currently open database")));
 
 	/*
-	 * Check whether there are, possibly unconnected, logical slots that refer
-	 * to the to-be-dropped database. The database lock we are holding
-	 * prevents the creation of new slots using the database.
+	 * Check whether there are active logical slots that refer to the
+	 * to-be-dropped database. The database lock we are holding prevents the
+	 * creation of new slots using the database or existing slots becoming
+	 * active.
 	 */
-	if (ReplicationSlotsCountDBSlots(db_id, &nslots, &nslots_active))
+	(void) ReplicationSlotsCountDBSlots(db_id, &nslots, &nslots_active);
+	if (nslots_active)
+	{
 		ereport(ERROR,
 (errcode(ERRCODE_OBJECT_IN_USE),
-			  errmsg("database \"%s\" is used by a logical replication slot",
+			  errmsg("database \"%s\" is used by an active logical replication slot",
 	 dbname),
- errdetail_plural("There is %d slot, %d of them active.",
-  "There are %d slots, %d of them active.",
-  nslots,
-  nslots, nslots_active)));
+ errdetail_plural("There is %d active slot",
+  "There are %d active slots",
+  nslots_active, nslots_active)));
+	}
 
 	/*
 	 * Check for other backends in the target database.  (Because we hold the
@@ -915,6 +918,11 @@ dropdb(const char *dbname, bool missing_ok)
 	dropDatabaseDependencies(db_id);
 
 	/*
+	 * Drop db-specific replication slots.
+	 */
+	ReplicationSlotsDropDBSlots(db_id);
+
+	/*
 	 * Drop pages for this database that are in the shared buffer cache. This
 	 * is important to ensure that no remaining backend tries to write out a
 	 * dirty buffer to the dead database later...
@@ -2124,11 +2132,17 @@ dbase_redo(XLogReaderState *record)
 			 * InitPostgres() cannot fully re-execute concurrentl

Re: [HACKERS] increasing the default WAL segment size

2017-03-27 Thread Simon Riggs

On 25 March 2017 at 17:02, Peter Eisentraut
 wrote:
> At this point, I suggest splitting this patch up into several
> potentially less controversial pieces.
>
> One big piece is that we currently don't support segment sizes larger
> than 64 GB, for various internal arithmetic reasons.  Your patch appears
> to address that.  So I suggest isolating that.  Assuming it works
> correctly, I think there would be no great concern about it.

+1

> The next piece would be making the various tools aware of varying
> segment sizes without having to rely on a built-in value.

Hmm

> The third piece would then be the rest that allows you to set the size
> at initdb
>
> If we take these in order, we would make it easier to test various sizes
> and see if there are any more unforeseen issues when changing sizes.  It
> would also make it easier to do performance testing so we can address
> the original question of what the default size should be.
>
> One concern I have is that your patch does not contain any tests.  There
> should probably be lots of tests.

This is looking like a reject in its current form.

Changing WAL filename to a new form seems best plan, but we don't have
time to do that and get it right, especially with no tests.

My summary of useful requirements would be
* Files smaller than 16MB and larger than 16MB are desirable
* LSN <-> filename mapping must be clear
* New filename format best for debugging and clarity

My proposal from here is that we allow only one new size in this
release, to minimize the splash zone. Keep the filename format as it
is now, using David's suggestion. Suggest adding 1GB as the only
additional option, which continues the idea of having 1GB as the max
filesize.

New filename format can come in PG11 allowing much wider range of WAL
filesizes, bigger and smaller.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Michael Paquier

On Mon, Mar 27, 2017 at 4:42 PM, Craig Ringer  wrote:
> On 27 March 2017 at 15:26, Michael Paquier  wrote:
>> On Sat, Mar 25, 2017 at 9:50 PM, Craig Ringer  wrote:
>>> I'm fairly confident that I overlooked single row mode entirely in the
>>> original patch, though it's long enough ago that it's hard for me to
>>> remember exactly.
>>>
>>> I don't really have much of an opinion on the best handling of it.
>>>
>>> I would expect to be setting single-row mode just before I requested
>>> the *result* from the next pending query, since it relates to result
>>> processing rather than query dispatch. But that's about all the
>>> opinion I have here.
>>
>> Yeah, I think that it makes sense to allow users to switch to single
>> row mode before requesting a result in the queue. It seems to me that
>> this should also be effective only during the fetching of one single
>> result set. When the client moves on to the next item in the queue we
>> should make necessary again a call to PQsetSingleRowMode().
>>
>> Regarding the patch, I have spotted the following things in the last version:
>> - src/test/modules/Makefile should include test_libpq.
>> - Regression tests from 0002 are failing:
>> ok 1 - testlibpqbatch disallowed_in_batch
>> ok 2 - testlibpqbatch simple_batch
>> ok 3 - testlibpqbatch multi_batch
>> ok 4 - testlibpqbatch batch_abort
>> ok 5 - testlibpqbatch timings
>> not ok 6 - testlibpqbatch copyfailure
>
> There are only a few more days left of this commit fest.
>
> Things are sounding pretty ready here, with some final cleanups
> pending. It'd be cool to get this into Pg 10 :)

Yes, that's one of the items I'd like to help with by the end of the CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-27 Thread Simon Riggs

On 27 March 2017 at 09:03, Craig Ringer  wrote:

> I think this one's ready to go.

Looks like something I could commit. Full review by me while offline
today, aiming to commit tomorrow barring issues raised.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for changes to recovery.conf API

2017-03-27 Thread Simon Riggs

On 7 March 2017 at 23:31, Josh Berkus  wrote:
> On 03/02/2017 07:13 AM, David Steele wrote:
>> Hi Simon,
>>
>> On 2/25/17 2:43 PM, Simon Riggs wrote:
>>> On 25 February 2017 at 13:58, Michael Paquier  
>>> wrote:
>>>
 - trigger_file is removed.
 FWIW, my only complain is about the removal of trigger_file, this is
 useful to detect a trigger file on a different partition that PGDATA!
 Keeping it costs also nothing..
>>>
>>> Sorry, that is just an error of implementation, not intention. I had
>>> it on my list to keep, at your request.
>>>
>>> New version coming up.
>>
>> Do you have an idea when the new version will be available?
>
> Please?  Having yet another PostgreSQL release go by without fixing
> recovery.conf would make me very sad.

I share your pain, but there are various things about this patch that
make me uncomfortable. I believe we are looking for an improved design
not just a different design.

I think the best time to commit such a patch is at the beginning of a
new cycle, so people have a chance to pick out pieces they don't like
and incrementally propose changes.

So I am going to mark this MovedToNextCF, barring objections from
committers willing to make it happen in this release.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> Moved to CF 2017-03. Both patches still apply.

Sorry to be late for reviewing this, but done now.  The patch applied, make 
check passed, and the code looks almost good.  I could successfully perform a 
simple archive recovery.  Finally, I broke the 2pc state file while the server 
is down, and could confirm that the server failed to start as expected, 
emitting a FATAL message.  Worked nicely.

Just two cosmetic points:

(1)
Other places use "two-phase state file", not "two-phase file".


(2)
All other places in twophase.c and most places in other files put ereport() and 
errmsg() on separate lines.  I think it would be better to align with 
surrounding code.

+   ereport(FATAL, (errmsg("corrupted two-phase 
file \"%s\"",


Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Column widening without outage

2017-03-27 Thread Aniruddha Deshpande

Hi All,


We want to extend/widen the column without outage. But as column widening
takes ACCESS EXCLUSIVE LOCK, we have seen noticeable pause on
SELECT/INSERTS. This behavior was more noticeable in tables which has
composite Foreign keys. .We tried doing it like below which resulted in
minimizing the outage but still noticeable pause for INSERTS/UPDATE can be
seen.

Environment Details :
column_test=# select version();
version
---
 EnterpriseDB 9.5.5.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)

column_test=#



Steps Followed :



a.  ALTER TABLE FINTRANS DROP CONSTRAINT FK_FINTRANS_SHOPTRANS;

b.  ALTER TABLE FINTRANS ADD CONSTRAINT FK_FINTRANS_SHOPTRANS

FOREIGN KEY (MERCHANTID, SHOPTXNO)

REFERENCES SHOPTRANS (MERCHANTID, SHOPTXNO)

MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION NOT VALID;



c.  ALTER TABLE fintrans ALTER COLUMN merchantid TYPE VARCHAR(255);

d.  ALTER TABLE shoptrans ALTER COLUMN merchantid TYPE VARCHAR(255);

e.  ALTER TABLE FINTRANS VALIDATE CONSTRAINT FK_FINTRANS_SHOPTRANS;


I have few queries on above problem, -

1. is there any way by which we can do the widening of column without
outage.

2. does ALTER TABLE ALTER COLUMN do re validation of all foreign keys again?

3.In this section of the Postgres documentation
https://www.postgresql.org/docs/9.2/static/release-9-2.html

it says  -



E.21.3.4.2. ALTER

·Reduce need to rebuild tables and indexes for certain ALTER TABLE
 ... ALTER
COLUMN TYPE operations (Noah Misch)

Increasing the length limit for a varchar or varbit column, or removing the
limit altogether, no longer requires a table rewrite. Similarly, increasing
the allowable precision of a numeric column, or changing a column from
constrained numeric to unconstrained numeric, no longer requires a table
rewrite. Table rewrites are also avoided in similar cases involving the
interval, timestamp, and timestamptz types.

·Avoid having ALTER TABLE
 revalidate
foreign key constraints in some cases where it is not necessary (Noah Misch)

so, in what circumstances ALTER TABLE will avoid revalidating foreign keys
??


Your help will be much appreciated.



Regards,

Aniruddha

Re: [HACKERS] New CORRESPONDING clause design

2017-03-27 Thread Pavel Stehule

Hi

fresh update - I enhanced Value node by location field as Tom proposal.

Few more regress tests.

But I found significant issue, that needs bigger fix - Surafel, please, can
you fix it.

It crash on

SELECT 0 AS x1, 1 AS a, 0 AS x2, 2 AS b, 0 AS x3, -1 AS x3
UNION ALL CORRESPONDING SELECT 4 AS b, 0 AS x4, 3 AS a, 0 AS x6, -1 AS x6
UNION ALL CORRESPONDING SELECT 0 AS x8, 6 AS b, -100 AS x9;

I'll mark this patch as waiting on author

Regards

Pavel
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 30792f45f1..2d60718ff1 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -1601,6 +1601,9 @@ SELECT DISTINCT ON (expression , EXCEPT
   
   
+   CORRESPONDING
+  
+  
set union
   
   
@@ -1617,9 +1620,9 @@ SELECT DISTINCT ON (expression , 
-query1 UNION ALL query2
-query1 INTERSECT ALL query2
-query1 EXCEPT ALL query2
+query1 UNION ALL CORRESPONDING BY query2
+query1 INTERSECT ALL CORRESPONDING BY query2
+query1 EXCEPT ALL CORRESPONDING BY query2
 
query1 and
query2 are queries that can use any of
@@ -1659,14 +1662,31 @@ SELECT DISTINCT ON (expression , 
 
   
-   In order to calculate the union, intersection, or difference of two
-   queries, the two queries must be union compatible,
-   which means that they return the same number of columns and
-   the corresponding columns have compatible data types, as
-   described in .
+   EXCEPT returns all rows that are in the result of
+   query1 but not in the result of
+   query2.  (This is sometimes called the
+   difference between two queries.)  Again, duplicates
+   are eliminated unless EXCEPT ALL is used.
   
- 
 
+  
+   CORRESPONDING returns all columns that are in both 
+   query1 and query2 with the same name.
+  
+
+  
+   CORRESPONDING BY returns all columns in the column list 
+   that are also in both query1 and 
+   query2 with the same name. The names in column list
+   must be unique.
+  
+
+  
+   The names of columns in result when CORRESPONDING or
+   CORRESPONDING BY clause is used must be unique in
+   query1 and query2.
+  
+ 
 
  
   Sorting Rows
diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml
index 57396d7c24..f98c22e696 100644
--- a/doc/src/sgml/sql.sgml
+++ b/doc/src/sgml/sql.sgml
@@ -859,7 +859,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressioncondition ]
 [ GROUP BY expression [, ...] ]
 [ HAVING condition [, ...] ]
-[ { UNION | INTERSECT | EXCEPT } [ ALL ] select ]
+[ { UNION | INTERSECT | EXCEPT } [ ALL ] [ CORRESPONDING [ BY ( expression ) ] ] select ]
 [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start ]
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c23d5c5285..93cd9f0ed5 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2995,6 +2995,7 @@ _copySelectStmt(const SelectStmt *from)
 	COPY_NODE_FIELD(withClause);
 	COPY_SCALAR_FIELD(op);
 	COPY_SCALAR_FIELD(all);
+	COPY_NODE_FIELD(correspondingClause);
 	COPY_NODE_FIELD(larg);
 	COPY_NODE_FIELD(rarg);
 
@@ -3010,6 +3011,8 @@ _copySetOperationStmt(const SetOperationStmt *from)
 	COPY_SCALAR_FIELD(all);
 	COPY_NODE_FIELD(larg);
 	COPY_NODE_FIELD(rarg);
+	COPY_NODE_FIELD(correspondingColumns);
+	COPY_SCALAR_FIELD(hasCorrespondingBy);
 	COPY_NODE_FIELD(colTypes);
 	COPY_NODE_FIELD(colTypmods);
 	COPY_NODE_FIELD(colCollations);
@@ -4588,6 +4591,8 @@ _copyValue(const Value *from)
  (int) from->type);
 			break;
 	}
+	COPY_LOCATION_FIELD(location);
+
 	return newnode;
 }
 
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 5941b7a2bf..dd6598d85b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1050,6 +1050,7 @@ _equalSelectStmt(const SelectStmt *a, const SelectStmt *b)
 	COMPARE_NODE_FIELD(withClause);
 	COMPARE_SCALAR_FIELD(op);
 	COMPARE_SCALAR_FIELD(all);
+	COMPARE_NODE_FIELD(correspondingClause);
 	COMPARE_NODE_FIELD(larg);
 	COMPARE_NODE_FIELD(rarg);
 
@@ -1063,6 +1064,8 @@ _equalSetOperationStmt(const SetOperationStmt *a, const SetOperationStmt *b)
 	COMPARE_SCALAR_FIELD(all);
 	COMPARE_NODE_FIELD(larg);
 	COMPARE_NODE_FIELD(rarg);
+	COMPARE_NODE_FIELD(correspondingColumns);
+	COMPARE_SCALAR_FIELD(hasCorrespondingBy);
 	COMPARE_NODE_FIELD(colTypes);
 	COMPARE_NODE_FIELD(colTypmods);
 	COMPARE_NODE_FIELD(colCollations);
@@ -2935,6 +2938,8 @@ _equalValue(const Value *a, const Value *b)
 			break;
 	}
 
+	COMPARE_LOCATION_FIELD(location);
+
 	return true;
 }
 
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6e52eb7231..7102ea96c2 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3444,6 +3444,8 @@ raw_expression_tree_walker(Node *node,
 	return true;
 if (walker(stmt->lockingClause, context))
 	return true;
+if (walker(stmt->correspondingClause, context))
+	return true;
 if (walker(stmt->withC

Re: [HACKERS] Parallel bitmap heap scan

2017-03-27 Thread Dilip Kumar

On Mon, Mar 27, 2017 at 12:53 PM, Rafia Sabih
 wrote:
> Recently, on testing TPC-H 300 scale factor I stumbled on to a error
> for Q6, the test environment is as follows,
> work_mem = 1GB,
> shared_buffers = 10 GB,
> Effective_cache_size = 10GB
> random_page_cost = seq+page_cost =0.1
>
> The error message is, ERROR:  invalid DSA memory alloc request size 1610612740
> In case required, stack trace is as follows,

Thanks for reporting.  In pagetable_allocate, DSA_ALLOC_HUGE flag were
missing while allocating the memory from the DSA.  I have also handled
the NULL pointer return from dsa_get_address.

Please verify with the attached patch.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


bitmap_hugepage_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-27 Thread Kuntal Ghosh

Thank you Robert for committing the patch.

commit fc70a4b0df38bda6a13941f1581f25fbb643c7f3

I've changed the status to Committed.

On Mon, Mar 27, 2017 at 6:09 AM, Michael Paquier
 wrote:
> On Sat, Mar 25, 2017 at 5:26 PM, Kuntal Ghosh
>  wrote:
>> On Fri, Mar 24, 2017 at 9:23 PM, Robert Haas  wrote:
>>> I think this is still not good.  The places where pgstat_bestart() has
>>> been added are not even correct.  For example, the call added to
>>> BackgroundWriterMain() occurs after the section that does
>>> error-recovery, so it would get repeated every time the background
>>> writer recovers from an error.  There are similar problems elsewhere.
>>> Furthermore, although in theory there's an idea here that we're making
>>> it no longer the responsibility of InitPostgres() to call
>>> pgstat_bestart(), the patch as proposed only removes one of the two
>>> calls, so we really don't even have a consistent practice.  I think
>>> it's better to go with the idea of having InitPostgres() be
>>> responsible for calling this for regular backends, and
>>> AuxiliaryProcessMain() for auxiliary backends.  That involves
>>> substantially fewer calls to pgstat_bestart() and they are spread
>>> across only two functions, which IMHO makes fewer bugs of omission a
>>> lot less likely.
>>
>> Agreed. Calling it from  InitPostgres() and AuxiliaryProcessMain()
>> seems correct because of the following two reasons as you've mentioned
>> up in the thread:
>> 1. security-filtering should be left to some higher-level facility
>> that can make policy decisions rather than being hard-coded in the
>> individual modules.
>> 2. makes fewer bugs of omission a lot less likely.
>
> Okay, fine for me.
>
>>> - I modified the code to tolerate a NULL return from
>>> AuxiliaryPidGetProc().  I am pretty sure that without that there's a
>>> race condition that could lead to a crash if somebody tried to call
>>> this function just as an auxiliary process was terminating.
>>
>> Wow. Haven't thought of that. If it's called after
>> AuxiliaryProcKill(), a crash is evident.
>
> This one is a good catch.
> --
> Michael



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 09:31, Craig Ringer  wrote:

> We're in the last week of the CF. If you have a patch that's nearly
> ready or getting there, now would be a good time to post it for help
> and input from others.
>
> I would really like to get this in, but we're running out of time.
>
> Even if you just post your snapshot management work, with the cosmetic
> changes discussed above, that would be a valuable start.

I'm going to pick up the last patch and:

* Ensure we only add the GID to xact records for 2pc commits and aborts

* Add separate callbacks for prepare, abort prepared, and commit
prepared (of xacts already processed during prepare), so we aren't
overloading the "commit" callback and don't have to create fake empty
transactions to pass to the commit callback;

* Add another callback to determine whether an xact should be
processed at PREPARE TRANSACTION or COMMIT PREPARED time.

* Rename the snapshot builder faux-commit stuff in the current patch
so it's clearer what's going on.

* Write tests covering DDL, abort-during-decode, etc

Some special care is needed for the callback that decides whether to
process a given xact as 2PC or not. It's called before PREPARE
TRANSACTION to decide whether to decode any given xact at prepare time
or wait until it commits. It's called again at COMMIT PREPARED time if
we crashed after we processed PREPARE TRANSACTION and advanced our
confirmed_flush_lsn such that we won't re-process the PREPARE
TRANSACTION again. Our restart_lsn might've advanced past it so we
never even decode it, so we can't rely on seeing it at all. It has
access to the xid, gid and invalidations, all of which we have at both
prepare and commit time, to make its decision from. It must have the
same result at prepare and commit time for any given xact. We can
probably use a cache in the reorder buffer to avoid the 2nd call on
commit prepared if we haven't crashed/reconnected between the two.

This proposal does not provide a way to safely decode a 2pc xact that
made catalog changes which may be aborted while being decoded. The
plugin must lock such an xact so that it can't be aborted while being
processed, or defer decoding until commit prepared. It can use the
invalidations for the commit to decide.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding the optional clause 'AS' in CREATE TRIGGER

2017-03-27 Thread Okano, Naoki

Hi Surafel,
Thank you for your review!
But I have not finish fixing a patch yet.

> v  Use  tage in documentation
ok. 

> v  Don’t modified existing test case add new one instead
These existing tests I modified are the results of commands "SELECT
pg_get_triggerdef(oid, true) ... ". 
I modified them in order to match with pg_get_functiondef(). 
The keyword 'OR REPLACE' is always added in CREATE FUNCTION. 
Does not it make sense?

> v  Comment in pg_constraint.c is extended make it short
> v  Error message can be more guider if it tells about general rule
ok. 

> v  Wrong result in concurrency case
Thank you for the detailed example case. I appreciate it.
The issue does not seem to be resolved easily because I have to 
modify pg_get_constraintdef_worker() in ruleutils.c.
And it takes some more time to modify pg_get_constraintdef_worker().

So, I am moving this patch to "Returned with Feedback".

Regards,
Okano Naoki
Fujitsu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Ashutosh Sharma

Hi,

> testing with master as of cf366e97ff, sqlsmith occasionally triggers the
> following assertion:
>
> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
>
> Backtraces always look like the one below.  It is reproducible on a
> cluster once it happens.  I could provide a tarball if needed.
>
> regards,
> Andreas
>
> #2  0x008324b1 in ExceptionalCondition 
> (conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock", errorType=errorType@entry=0x87b03d 
> "FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", 
> lineNumber=lineNumber@entry=3397) at assert.c:54
> #3  0x00706971 in MarkBufferDirtyHint (buffer=2844, 
> buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
> #4  0x004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at 
> hashutil.c:514
> #5  0x004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
> #6  0x004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
> #7  0x0061fa51 in ExecEndIndexScan (node=0x3093f30) at 
> nodeIndexscan.c:852
> #8  0x00608e59 in ExecEndNode (node=) at 
> execProcnode.c:715
> #9  0x006045b8 in ExecEndPlan (estate=0x3064000, planstate= out>) at execMain.c:1540
> #10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
> #11 0x005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
> #12 0x0085cbb3 in PortalDrop (portal=0x1a60060, 
> isTopCommit=) at portalmem.c:489
> #13 0x00736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at 
> postgres.c:
> #14 0x00738b51 in PostgresMain (argc=, 
> argv=argv@entry=0x1a6c6c8, dbname=, username=) 
> at postgres.c:4071
> #15 0x00475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
> #16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
> #17 ServerLoop () at postmaster.c:1729
> #18 0x006c8662 in PostmasterMain (argc=argc@entry=4, 
> argv=argv@entry=0x1a3f540) at postmaster.c:1337
> #19 0x0047729d in main (argc=4, argv=0x1a3f540) at main.c:228
>


Hi,

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

When scanning tuples using normal SELECT * statement, before moving to
next page in a bucket we first deal with all the killed items but we
do this without releasing lock and pin on the current page. Hence,
with SELECT queries this crash is not visible.

The attached patch fixes this. But, please note that all these changes
will get removed with the patch for page scan mode - [1].

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA%2Bg%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..0bacef8 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -478,7 +478,7 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
@@ -509,7 +509,7 @@ hashendscan(IndexScanDesc scan)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 2d92049..414cc6a 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -467,7 +467,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, true);
 
 	/*
 	 * ran off the end of this page, try the next
@@ -524,7 +524,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich

> On 27 Mar 2017, at 12:26, Craig Ringer  wrote:
> 
> On 27 March 2017 at 09:31, Craig Ringer  wrote:
> 
>> We're in the last week of the CF. If you have a patch that's nearly
>> ready or getting there, now would be a good time to post it for help
>> and input from others.
>> 
>> I would really like to get this in, but we're running out of time.
>> 
>> Even if you just post your snapshot management work, with the cosmetic
>> changes discussed above, that would be a valuable start.
> 
> I'm going to pick up the last patch and:

I’m heavily underestimated amount of changes there, but almost finished
and will send updated patch in several hours.

> * Ensure we only add the GID to xact records for 2pc commits and aborts

And only during wal_level >= logical. Done.
Also patch adds origin info to prepares and aborts.

> * Add separate callbacks for prepare, abort prepared, and commit
> prepared (of xacts already processed during prepare), so we aren't
> overloading the "commit" callback and don't have to create fake empty
> transactions to pass to the commit callback;

Done.

> * Add another callback to determine whether an xact should be
> processed at PREPARE TRANSACTION or COMMIT PREPARED time.

Also done.

> * Rename the snapshot builder faux-commit stuff in the current patch
> so it's clearer what's going on.

Hm. Okay, i’ll leave that part to you.

> * Write tests covering DDL, abort-during-decode, etc

I’ve extended test, but it is good to have some more.

> Some special care is needed for the callback that decides whether to
> process a given xact as 2PC or not. It's called before PREPARE
> TRANSACTION to decide whether to decode any given xact at prepare time
> or wait until it commits. It's called again at COMMIT PREPARED time if
> we crashed after we processed PREPARE TRANSACTION and advanced our
> confirmed_flush_lsn such that we won't re-process the PREPARE
> TRANSACTION again. Our restart_lsn might've advanced past it so we
> never even decode it, so we can't rely on seeing it at all. It has
> access to the xid, gid and invalidations, all of which we have at both
> prepare and commit time, to make its decision from. It must have the
> same result at prepare and commit time for any given xact. We can
> probably use a cache in the reorder buffer to avoid the 2nd call on
> commit prepared if we haven't crashed/reconnected between the two.

Good point. Didn’t think about restart_lsn in case when we are skipping this
particular prepare (filter_prepared() -> true, in my terms). I think that should
work properly as it use the same code path as it was before, but I’ll look at 
it.

> This proposal does not provide a way to safely decode a 2pc xact that
> made catalog changes which may be aborted while being decoded. The
> plugin must lock such an xact so that it can't be aborted while being
> processed, or defer decoding until commit prepared. It can use the
> invalidations for the commit to decide.

I had played with that two-pass catalog scan and it seems to be
working but after some time I realised that it is not useful for the main
case when commit/abort is generated after receiver side will answer to
prepares. Also that two-pass scan is a massive change in relcache.c and
genam.c (FWIW there were no problems with cache, but some problems
with index scan and handling one-to-many queries to catalog, e.g. table
with it fields)

Finally i decided to throw it and switched to filter_prepare callback and
passed there txn structure to allow access to has_catalog_changes field.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-03-27 Thread Kyotaro HORIGUCHI

Hmm, things are bit different.

At Thu, 23 Mar 2017 12:13:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170323.121307.241436413.horiguchi.kyot...@lab.ntt.co.jp>
> > Ok, I'll write a small script to generate a set of "conversion
> > dump" and try to write README.sanity_check describing how to use
> > it.
> 
> I found that there's no way to identify the character domain of a
> conversion on SQL interface. Unconditionally giving from 0 to
> 0x as a bytea string yields too-bloat result by containg
> many bogus lines.  (If \x40 is a character, convert() also
> accepts \x4040, \x404040 and \x40404040)
> 
> One more annoyance is the fact that mappings and conversion
> procedures are not in one-to-one correspondence. The
> corresnponcence is hidden in conversion_procs/*.c files so we
> should extract it from them or provide as knowledge. Both don't
> seem good.
> 
> Finally, it seems that I have no choice than resurrecting
> map_checker. The exactly the same one no longer works but
> map_dumper.c with almost the same structure will work.
> 
> If no one objects to adding map_dumper.c and
> gen_mapdumper_header.pl (tentavie name, of course), I'll make a
> patch to do that.

The scirpt or executable should be compatible between versions
but pg_mb_radix_conv is not. On the other hand more upper level
API reuiqres server stuff.

Finally I made an extension that dumps encoding conversion.

encoding_dumper('SJIS', 'UTF-8') or encoding_dumper(35, 6)

Then it returns the following output consists of two BYTEAs.

 srccode | dstcode  
-+--
 \x01| \x01
 \x02| \x02
...
 \xfc4a  | \xe9b899
 \xfc4b  | \xe9bb91
(7914 rows)

This returns in a very short time but doesn't when srccode
extends to 4 bytes. As an extreme example the following,

> =# select * from encoding_dumper('UTF-8', 'LATIN1');

takes over 2 minutes to return only 255 rows. We cannot determine
the exact domain without looking into map data so the function
cannot do other than looping through all the four-byte values.
Providing a function that gives the domain for a conversion was a
mess, especially for artithmetic-conversions. The following query
took 94 minutes to give 25M lines/125MB.  In short, that's a
crap. (the first attached)

SELECT x.conname, y.srccode, y.dstcode
FROM
 (
SELECT conname, conforencoding, contoencoding
FROM pg_conversion c
WHERE pg_char_to_encoding('UTF-8') IN (c.conforencoding, c.contoencoding)
  AND pg_char_to_encoding('SQL_ASCII')
  NOT IN (c.conforencoding, c.contoencoding)) as x,
 LATERAL (
   SELECT srccode, dstcode
   FROM  encoding_dumper(x.conforencoding, x.contoencoding)) as y
ORDER BY x.conforencoding, x.contoencoding, y.srccode;


As the another way, I added a measure to generate plain mapping
lists corresponding to .map files (similar to old maps but
simpler) and this finishes the work within a second.

$ make mapdumps

If we will not shortly change the framework of mapped character
conversion, the dumper program may be useful but I'm not sure
this is reasonable as sanity check for future modifications.  In
the PoC, pg_mb_radix_tree() is copied into map_checker.c but this
needs to be a separate file again.  (the second attached)


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 3763d91d99521f0cfc305bd2199fcb5a263d758d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 27 Mar 2017 09:46:04 +0900
Subject: [PATCH 1/2] encoding_dumper

---
 contrib/encoding_dumper/Makefile |  17 +++
 contrib/encoding_dumper/encoding_dumper--0.1.sql |  23 
 contrib/encoding_dumper/encoding_dumper.c| 167 +++
 contrib/encoding_dumper/encoding_dumper.control  |   5 +
 4 files changed, 212 insertions(+)
 create mode 100755 contrib/encoding_dumper/Makefile
 create mode 100755 contrib/encoding_dumper/encoding_dumper--0.1.sql
 create mode 100755 contrib/encoding_dumper/encoding_dumper.c
 create mode 100755 contrib/encoding_dumper/encoding_dumper.control

diff --git a/contrib/encoding_dumper/Makefile b/contrib/encoding_dumper/Makefile
new file mode 100755
index 000..54e9cfc
--- /dev/null
+++ b/contrib/encoding_dumper/Makefile
@@ -0,0 +1,17 @@
+# src/backend/util/mb/Unicode/encoding_dumper/Makefile
+
+MODULES = encoding_dumper
+EXTENSION = encoding_dumper
+DATA = encoding_dumper--0.1.sql
+PGFILEDESC = "encoding_dumper - return encoding dump"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/encoding_dumper
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/encoding_dumper/encoding_dumper--0.1.sql b/contrib/encoding_dumper/encoding_dumper--0.1.sql
new file mode 100755
index 000..4f67946
--- /dev/null
+++ b/contrib/encoding_dumper/encoding_dumper--0.1.sql
@@ -0,0 +1,23 @@
+/* .sql */
+
+-- complain if script is sourced in psql, rather than vi

Re: [HACKERS] pgbench - allow to store select results into variables

2017-03-27 Thread Rafia Sabih

On Fri, Mar 24, 2017 at 8:59 PM, Fabien COELHO  wrote:
>
> Hello Rafia,
>
>> if (my_command->argc > 2)
>> + syntax_error(source, lineno, my_command->line, my_command->argv[0],
>> + "at most on argument expected", NULL, -1);
>>
>> I suppose you mean 'one' argument here.
>
>
> Indeed.
>
>> Apart from that indentation is not correct as per pgindent, please check.
>
>
> I guess that you are refering to switch/case indentation which my emacs does
> not do as expected.
>
> Please find attached a v8 which hopefully fixes these two issues.
Looks good to me, marking as ready for committer.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] example for xmltable with XMLNAMESPACES

2017-03-27 Thread Arjen Nienhuis

It wasn't completely clear for me how to use namespaces in xmltable().
Maybe add this to the documentation. It shows the default namespace
and quoting the namespace name.

WITH xmldata(data) AS (VALUES ('
http://example.com/myns"; xmlns:B="http://example.com/b";>
 
 
 
'::xml)
)
SELECT xmltable.*
  FROM XMLTABLE(XMLNAMESPACES('http://example.com/myns' AS x,
  'http://example.com/b' AS "B"),
 '/x:example/x:item'
PASSING (SELECT data FROM xmldata)
COLUMNS foo int PATH '@foo',
  bar int PATH '@B:bar');
 foo | bar
-+-
   1 |   2
   3 |   4
   4 |   5
(3 rows)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Amit Kapila

On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila  wrote:
> On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy  
> wrote:
>> Thanks, Amit for the review.
>> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:
>>>
>>> I think one-dimensional patch has fewer places to touch, so that looks
>>> better to me.  However, I think there is still hard coding and
>>> assumptions in code which we should try to improve.
>>
>> Great!, I will continue with spares 1-dimensional improvement.
>>
>
> @@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
> num_tuples, RegProcedure procid,\
> {
> ..
>   else
> - num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
> + num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
> ..
> ..
> - metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
> - metap->hashm_highmask = (num_buckets << 1) - 1;
> + metap->hashm_maxbucket = num_buckets - 1;
> +
> + /* set hishmask, which should be sufficient to cover num_buckets. */
> + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
> + metap->hashm_lowmask = (metap->hashm_highmask >> 1);
> }
>
> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.
>

Few more comments:

1.
@@ -149,6 +149,84 @@ _hash_log2(uint32 num)
  return i;
 }

+#define SPLITPOINT_PHASES_PER_GRP 4
+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
+#define Buckets_First_Split_Group 4

The defines should be at the beginning of file.

2.
+/*
+ * At splitpoint group 0 we have 2 ^ (0 + 2) = 4 buckets, then at splitpoint
+ * group 1 we have 2 ^ (1 + 2) = 8 total buckets. As the doubling continues at
+ * splitpoint group "x" we will have 2 ^ (x + 2) total buckets. Total buckets
+ * before x splitpoint group will be 2 ^ (x + 1). At each phase of allocation
+ * within splitpoint group we add 2 ^ (x - 1) buckets, as we have to divide the
+ * task of allocation of 2 ^ (x + 1) buckets among 4 phases.
+ *
+ * Also, splitpoint group of a given bucket can be taken as
+ * (_hash_log2(bucket) - 2). Subtracted by 2 because each group have
+ * 2 ^ (x + 2) buckets.
+ */
..

+#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
+ ((num_bucket <= Buckets_First_Split_Group) ? 0 : \
+ (_hash_log2(num_bucket) - 2))

In the above computation +2 and -2 still bothers me.  I think you need
to do this because you have defined split group zero to have four
buckets, how about if you don't force that and rather define to have
split point phases only from split point which has four or more
buckets?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-03-27 Thread Jeevan Ladhe

Hi Rahila,

IIUC, your default_partition_v3.patch is trying to implement an error if new
partition is added to a table already having a default partition.

I too tried to run the test and similar to Rushabh, I see the server is
crashing
with the given test.

However, if I reverse the order of creating the partitions, i.e. if I
create a
partition with list first and later create the default partition.

The reason is, while defining new relation DefineRelation() checks for
overlapping partitions by calling check_new_partition_bound(). Where in case
of list partitions it assumes that the ndatums should be > 0, but in case of
default partition that is 0.

The crash here seems to be coming because, following assertion getting
failed in
function check_new_partition_bound():


Assert(boundinfo &&
  boundinfo->strategy == PARTITION_STRATEGY_LIST &&
  (boundinfo->ndatums > 0 || boundinfo->has_null));


So, I think the error you have added needs to be moved before this
assertion:


@@ -690,6 +715,12 @@ check_new_partition_bound(char *relname, Relation
parent, Node *bound)
boundinfo->strategy == PARTITION_STRATEGY_LIST &&
(boundinfo->ndatums > 0 || boundinfo->has_null));

+ if (boundinfo->has_def)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("parent table \"%s\" has a default partition",
+ RelationGetRelationName(parent;


If I do so, the server does not run into crash, and instead throws an error:

postgres=# CREATE TABLE list_partitioned (
a int
) PARTITION BY LIST (a);
CREATE TABLE
postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
VALUES IN (DEFAULT);
CREATE TABLE
postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN
(4,5);
ERROR:  parent table "list_partitioned" has a default partition

Regards,
Jeevan Ladhe

On Mon, Mar 27, 2017 at 12:10 PM, Rushabh Lathia 
wrote:

> I applied the patch and was trying to perform some testing, but its
> ending up with server crash with the test shared by you in your starting
> mail:
>
> postgres=# CREATE TABLE list_partitioned (
> postgres(# a int
> postgres(# ) PARTITION BY LIST (a);
> CREATE TABLE
> postgres=#
> postgres=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
>
> postgres=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN
> (4,5);
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Apart from this, few more explanation in the patch is needed to explain the
> changes for the DEFAULT partition. Like I am not quite sure what exactly
> the
> latest version of patch supports, like does that support the tuple row
> movement,
> or adding new partition will be allowed having partition table having
> DEFAULT
> partition, which is quite difficult to understand through the code changes.
>
> Another part which is missing in the patch is the test coverage, adding
> proper test coverage, which explain what is supported and what's not.
>
> Thanks,
>
> On Fri, Mar 24, 2017 at 3:25 PM, Rahila Syed 
> wrote:
>
>> Hello Rushabh,
>>
>> Thank you for reviewing.
>> Have addressed all your comments in the attached patch. The attached
>> patch currently throws an
>> error if a new partition is added after default partition.
>>
>> >Rather then adding check for default here, I think this should be
>> handle inside
>> >get_qual_for_list().
>> Have moved the check inside get_qual_for_partbound() as needed to do some
>> operations
>> before calling get_qual_for_list() for default partitions.
>>
>> Thank you,
>> Rahila Syed
>>
>> On Tue, Mar 21, 2017 at 11:36 AM, Rushabh Lathia <
>> rushabh.lat...@gmail.com> wrote:
>>
>>> I picked this for review and noticed that patch is not getting
>>> cleanly complied on my environment.
>>>
>>> partition.c: In function ‘RelationBuildPartitionDesc’:
>>> partition.c:269:6: error: ISO C90 forbids mixed declarations and code
>>> [-Werror=declaration-after-statement]
>>>   Const*val = lfirst(c);
>>>   ^
>>> partition.c: In function ‘generate_partition_qual’:
>>> partition.c:1590:2: error: ISO C90 forbids mixed declarations and code
>>> [-Werror=declaration-after-statement]
>>>   PartitionBoundSpec *spec = (PartitionBoundSpec *)bound;
>>>   ^
>>> partition.c: In function ‘get_partition_for_tuple’:
>>> partition.c:1820:5: error: array subscript has type ‘char’
>>> [-Werror=char-subscripts]
>>>  result = parent->indexes[partdesc->boundinfo->def_index];
>>>  ^
>>> partition.c:1824:16: error: assignment makes pointer from integer
>>> without a cast [-Werror]
>>>  *failed_at = RelationGetRelid(parent->reldesc);
>>> ^
>>> cc1: all warnings being treated as errors
>>>
>>> Apart from this, I was reading patch here are few more comments:
>>>
>>> 1) Variable initializing happening at two place.
>>>
>>> @@ -166,6 +170,8

Re: [HACKERS] Monitoring roles patch

2017-03-27 Thread Simon Riggs

On 23 March 2017 at 13:16, Dave Page  wrote:

> Thanks - updated patch attached.

No problems with the patch so far.

I'd like some tests though...

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Amit Kapila

On Sat, Mar 25, 2017 at 1:24 PM, Amit Kapila  wrote:
> On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
>  wrote:
>>
>> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
>> wrote:
>>>
>
>> While looking at this problem, it occurred to me that the assumptions made
>> for hash indexes are also wrong :-( Hash index has the same problem as
>> expression indexes have. A change in heap value may not necessarily cause a
>> change in the hash key. If we don't detect that, we will end up having two
>> hash identical hash keys with the same TID pointer. This will cause the
>> duplicate key scans problem since hashrecheck will return true for both the
>> hash entries.

Isn't it possible to detect duplicate keys in hashrecheck if we
compare both hashkey and tid stored in index tuple with the
corresponding values from heap tuple?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index usage for elem-contained-by-const-range clauses

2017-03-27 Thread Alexander Korotkov

On Sat, Mar 18, 2017 at 12:41 AM, Pritam Baral 
wrote:

> On Friday 10 March 2017 07:59 PM, Alexander Korotkov wrote:
>
>> Hi, Pritam!  > > I've assigned to review this patch. > > On Thu, Feb 23,
>> 2017 at 2:17 AM, Pritam Baral  wrote: > >
>>  The topic has been previously discussed[0] on the -performance mailing
>> list, > about four years ago. > > In that thread, Tom suggested[0]
>> the planner could be made to "expand > "intcol <@ >
>>  'x,y'::int4range" into "intcol between x and y", using something similar
>> > to the > index LIKE optimization (ie, the "special operator"
>> stuff in indxpath.c)". > > > That's cool idea.  But I would say more.
>> Sometimes it's useful to transform "intcol between x and y" into "intcol <@
>> 'x,y'::int4range".  btree_gin supports "intcol between x and y" as overlap
>> of "intcol >= x" and "intcol <= y".  That is very inefficient.  But it this
>> clause would be transformed into "intcol <@ 'x,y'::int4range", btree_gin
>> could handle this very efficient. > > > > This patch tries to do
>> exactly that. It's not tied to any specific datatype, > and has
>>
> been tested with both builtin types and custom range types. Most > of
> the > checking for proper datatypes, operators, and btree index happens
> before > this > code, so I haven't run into any issues yet in my
> testing. But I'm not > familiar > enough with the internals to be
> able to confidently say it can handle > all cases > just yet. > > >
> I've tried this patch.  It applies cleanly, but doesn't compile. > >
> indxpath.c:4252:1: error: conflicting types for
> 'range_elem_contained_quals' > range_elem_contained_quals(Node *leftop,
> Datum rightop) > ^ > indxpath.c:192:14: note: previous declaration is here
> > static List *range_elem_contained_quals(Node *leftop, Oid expr_op, Oid
> opfamily, >  ^ > Could you please recheck that you published
> right version of patch?
>
> So sorry. I'm attaching the correct version of the original with this,
> in case you want to test the limited implementation, because I still
> have to go through Tom's list of suggestions.
>
> BTW, the patch is for applying on top of REL9_6_2, and while I
> suspect it may work on master too, I haven't tested it since the
> original submission (Feb 23).


What is idea behind basing patch on the REL9_6_2?
This patch implements new functionality and it's definitely not going to be
considered to be committed to stable release branches.
If you are interesting in committing this patch to master, please rebase it
on master branch.  If not, please clarify the purpose of this submission.

Also, please include some numbering to the patch name, so that we could
distinguish one version of patch from another.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Rushabh Lathia

On Mon, Mar 27, 2017 at 10:59 AM, Rushabh Lathia 
wrote:

>
>
> On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
>>
>>>
>>>
>>> On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
>>> >> > wrote:
>>>
>>> On 3/25/17 09:01, David Rowley wrote:
>>> > On 25 March 2017 at 23:09, Rushabh Lathia <
>>> rushabh.lat...@gmail.com > wrote:
>>> >> Also another point which I think we should fix is, when someone
>>> set
>>> >> max_parallel_workers = 0, we should also set the
>>> >> max_parallel_workers_per_gather
>>> >> to zero. So that way it we can avoid generating the gather path
>>> with
>>> >> max_parallel_worker = 0.
>>> > I see that it was actually quite useful that it works the way it
>>> does.
>>> > If it had worked the same as max_parallel_workers_per_gather, then
>>> > likely Tomas would never have found this bug.
>>>
>>> Another problem is that the GUC system doesn't really support cases
>>> where the validity of one setting depends on the current value of
>>> another setting.  So each individual setting needs to be robust
>>> against
>>> cases of related settings being nonsensical.
>>>
>>>
>>> Okay.
>>>
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>>
>>> PFA simple patch to fix the problem.
>>>
>>>
>> I think there are two issues at play, here - the first one is that we
>> still produce parallel plans even with max_parallel_workers=0, and the
>> second one is the crash in GatherMerge when nworkers=0.
>>
>> Your patch fixes the latter (thanks for looking into it), which is
>> obviously a good thing - getting 0 workers on a busy system is quite
>> possible, because all the parallel workers can be already chewing on some
>> other query.
>>
>>
> Thanks.
>
>

I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for
each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.



> But it seems a bit futile to produce the parallel plan in the first place,
>> because with max_parallel_workers=0 we can't possibly get any parallel
>> workers ever. I wonder why compute_parallel_worker() only looks at
>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>
>>parallel_workers = Min(parallel_workers, max_parallel_workers);
>>
>>
> I agree with you here. Producing the parallel plan when
> max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
> think that we should do something like:
>
> /*
>  * In no case use more than max_parallel_workers_per_gather or
>  * max_parallel_workers.
>  */
> parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
> max_parallel_workers_per_gather));
>
>
>
>> Perhaps this was discussed and is actually intentional, though.
>>
>>
> Yes, I am not quite sure about this.
>
> Regarding handling this at the GUC level - I agree with Peter that that's
>> not a good idea. I suppose we could deal with checking the values in the
>> GUC check/assign hooks, but what we don't have is a way to undo the changes
>> in all the GUCs. That is, if I do
>>
>>SET max_parallel_workers = 0;
>>SET max_parallel_workers = 16;
>>
>> I expect to end up with just max_parallel_workers GUC changed and nothing
>> else.
>>
>> regards
>>
>> --
>> Tomas Vondra  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>
>
> --
> Rushabh Lathia
>



Regards,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 3f0c3ee..62c399e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/

[HACKERS] On How To Shorten The Steep Learning Curve Towards PG Hacking

2017-03-27 Thread Kang Yuzhe

Dear PG Hacker/Experts,


I am newbie to PG Hacking.
I have been reading the PG code base to find my space in it but without
success.

There are hundreds of Hands-on with PG Application development on the web.
Alas, there is almost none in PG hacking.

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life.  I believe that PG hacking should not
be a painful
Dear PG Hacker/Experts,


I am newbie to PG Hacking.
I have been reading the PG code base to find my space in it but without
success.

There are hundreds of Hands-on with PG Application development on the web.
Alas, there is almost none in PG hacking.

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life.  I believe that PG hacking should not
be a painful jorney but an enjoyable one!

It is my strong believe that out of my PG hacking frustrations, there may
come insights for the PG experts on ways how to devise hands-on with PG
internals so that new comers will be great coders as quickly as possible.

I also believe that we should spend our time reading great Papers and Books
related to Data Management problems BUT not PG code base.

Here are my suggestion for  the experts to devise ways to shorten the steep
learning curve towards PG Hacking.

1. Prepare Hands-on with PG internals

 For example, a complete Hands-on  with SELECT/INSERT SQL Standard PG
internals. The point is the experts can pick one fairly complex feature and
walk it from Parser to Executor in a hands-on manner explaining step by
step every technical detail.

2. Write a book on PG Internals.

There is one book on PG internals. Unfortunately, it's in Chinese.
Why not in English??
It is my strong believe that if there were a great book on PG Internals
with hands-on with some of the basic features of PG internals machinery, PG
hacking would be almost as easy as PG application development.

If the experts make the newbie understand the PG code base as quickly as
possible, that means more reviewers, more contributors and more users of PG
which in turn means more PG usability, more PG popularity, stronger PG
community.

This is my personal feelings and am the ready to be corrected and advised
the right way towards the PG hacking.

Regards,
Zeray






but an enjoyable journey!

Out of PG hacking frustrations, there may come insights for the PG experts
on ways how to devise hands-on with PG internals so that new comers will be
great coders as quickly as possible.

I also believe that we should spend our time reading great Papers and Books
related to Data Management problems BUT not PG code base.

Here are my suggestion for  the experts to devise ways to shorten the steep
learning curve towards PG Hacking.

1. Prepare Hands-on with PG internals

 For example, a complete Hands-on  with SELECT/INSERT SQL Standard PG
internals. The point is the experts can pick one fairly complex feature and
walk it fromfFrom Parser to Executor in a hands-on manner explaining step
by step every technical detail.

2. Write a book on PG Internals.

There is one book on PG internals. Unfortunately, it's in Chinese.
Why not in English??
It is my strong believe that if there were a great book on PG Internals
with hands-on with some of the basic features of PG internals machinery, PG
hacking would be almost as easy as PG application development.

If the experts make the newbie understand the PG code base as quickly as
possible, that means more reviewers, more contributors and more users of PG
which in turn means more PG usability, more PG popularity, stronger PG
community.

This is my personal feelings and am the ready to be corrected and advised
the right way towards the PG hacking.

Regards,
Zeray

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
 wrote:
>
>> But it seems a bit futile to produce the parallel plan in the first place,
>> because with max_parallel_workers=0 we can't possibly get any parallel
>> workers ever. I wonder why compute_parallel_worker() only looks at
>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>
>>parallel_workers = Min(parallel_workers, max_parallel_workers);
>>
>> Perhaps this was discussed and is actually intentional, though.
>>
>
> Yes, I am not quite sure about this.

It was intentional.  See the last paragraph of

https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-03-27 Thread Alexander Korotkov

On Fri, Mar 10, 2017 at 6:06 PM, Pavel Stehule 
wrote:

> 2017-03-10 16:00 GMT+01:00 Alexander Korotkov :
>
>> On Fri, Mar 10, 2017 at 5:16 PM, Stephen Frost 
>> wrote:
>>
>>> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>>> > On 2/24/17 16:32, Pavel Stehule wrote:
>>> > > set EXTENDED_DESCRIBE_SORT size_desc
>>> > > \dt+
>>> > > \l+
>>> > > \di+
>>> > >
>>> > > Possible variants: schema_table, table_schema, size_desc,
>>> size_asc
>>> >
>>> > I can see this being useful, but I think it needs to be organized a
>>> > little better.
>>> >
>>> > Sort key and sort direction should be separate settings.
>>> >
>>> > I'm not sure why we need to have separate settings to sort by schema
>>> > name and table name.  But if we do, then we should support that for all
>>> > object types.  I think maybe that's something we shouldn't get into
>>> > right now.
>>> >
>>> > So I would have one setting for sort key = {name|size} and on for sort
>>> > direction = {asc|desc}.
>>>
>>> Perhaps I'm trying to be overly cute here, but why not let the user
>>> simply provide a bit of SQL to be put at the end of the query?
>>>
>>> That is, something like:
>>>
>>> \pset EXTENDED_DESCRIBE_ORDER_LIMIT 'ORDER BY 5 DESC LIMIT 10'
>>>
>>
>> I think that's the question of usability.  After all, one can manually
>> type corresponding SQL instead of \d* commands.  However, it's quite
>> cumbersome to do this every time.
>> I found quite useful to being able to switch between different sortings
>> quickly.  For instance, after seeing tables sorted by name, user can
>> require them sorted by size descending, then sorted by size ascending,
>> etc...
>> Therefore, I find user-defined SQL clause to be cumbersome.  Even psql
>> variable itself seems to be cumbersome for me.
>> I would propose to add sorting as second optional argument to \d*
>> commands.  Any thoughts?
>>
>
> This proposal was here already - maybe two years ago. The psql command
> parser doesn't allow any complex syntax - more - the more parameters in one
> psql commands is hard to remember, hard to read.
>

Could you please provide a link to this discussion.  Probably working with
multiple parameters in psql commands require some rework, but that's
definitely doable.


> With my proposal, and patch I would to cover following use case. It is
> real case. Anytime when we used \dt+ in psql we needed sort by size desc.
> When we should to see a size, then the top is interesting. This case is not
> absolute, but very often, so I would to create some simple way, how to do
> some parametrization (really simple).
>

We could combine both approaches: add parameters to psql commands and add
psql DEFAULT_(SORT_COLUMNS|DIRECTION|LIMIT) parameters.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-03-27 Thread Kang Yuzhe

Dear PG Hackers/Experts,

I am newbie to PG Hacking.
I have been reading the PG code base to find my space in it but without
success.

There are hundreds of Hands-on with PG Application development on the web.
Alas, there is almost none in PG hacking.

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life.  I believe that PG hacking should not
be a painful
Dear PG Hacker/Experts,


I am newbie to PG Hacking.
I have been reading the PG code base to find my space in it but without
success.

There are hundreds of Hands-on with PG Application development on the web.
Alas, there is almost none in PG hacking.

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life.  I believe that PG hacking should not
be a painful journey but an enjoyable one!

It is my strong believe that out of my PG hacking frustrations, there may
come insights for the PG experts on ways how to devise hands-on with PG
internals so that new comers will be great coders as quickly as possible.

I also believe that we should spend our time reading great Papers and Books
related to Data Management problems BUT not PG code base.

Here are my suggestion for  the experts to devise ways to shorten the steep
learning curve towards PG Hacking.

1. Prepare Hands-on with PG internals

 For example, a complete Hands-on  with SELECT/INSERT SQL Standard PG
internals. The point is the experts can pick one fairly complex feature and
walk it from Parser to Executor in a hands-on manner explaining step by
step every technical detail.

2. Write a book on PG Internals.

There is one book on PG internals. Unfortunately, it's in Chinese.
Why not in English??
It is my strong believe that if there were a great book on PG Internals
with hands-on with some of the basic features of PG internals machinery, PG
hacking would be almost as easy as PG application development.

If the experts make the newbie understand the PG code base as quickly as
possible, that means more reviewers, more contributors and more users of PG
which in turn means more PG usability, more PG popularity, stronger PG
community.

This is my personal feelings and am the ready to be corrected and advised
the right way towards the PG hacking.

Regards,
Zeray

Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 1:48 AM, Rafia Sabih
 wrote:
> This is caused because trigger related functions are marked safe and
> using global variables, hence when executed in parallel are giving
> incorrect  output. Attached patch fixes this. I have modified only
> those functions that are showing incorrect behaviour in the regress
> output and checked regression test with force_parallel_mode = regress
> and all testcases are passing now.

If it's just that they are relying on unsynchronized global variables,
then it's sufficient to mark them parallel-restricted ('r').  Do we
really need to go all the way to parallel-unsafe ('u')?

> This concerns me that should we be checking all the system defined
> functions once again if they are actually parallel safe...?

It's certainly possible that we've made mistakes in other places, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 5:02 AM, Dilip Kumar  wrote:
> On Mon, Mar 27, 2017 at 12:53 PM, Rafia Sabih
>  wrote:
>> Recently, on testing TPC-H 300 scale factor I stumbled on to a error
>> for Q6, the test environment is as follows,
>> work_mem = 1GB,
>> shared_buffers = 10 GB,
>> Effective_cache_size = 10GB
>> random_page_cost = seq+page_cost =0.1
>>
>> The error message is, ERROR:  invalid DSA memory alloc request size 
>> 1610612740
>> In case required, stack trace is as follows,
>
> Thanks for reporting.  In pagetable_allocate, DSA_ALLOC_HUGE flag were
> missing while allocating the memory from the DSA.  I have also handled
> the NULL pointer return from dsa_get_address.
>
> Please verify with the attached patch.

Failing to pass DSA_ALLOC_HUGE is an obvious oversight, but I think
the ptbase == NULL check shouldn't be needed, because we're not
passing DSA_ALLOC_NO_OOM.  And that's good, because this is going to
be called from SH_CREATE, which isn't prepared for a NULL return
anyway.

Am I all wet?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] perlcritic

2017-03-27 Thread Peter Eisentraut

On 3/1/17 11:21, Dagfinn Ilmari Mannsåker wrote:
> diff --git a/src/pl/plperl/plc_perlboot.pl b/src/pl/plperl/plc_perlboot.pl
> index 292c9101c9..b4212f5ab2 100644
> --- a/src/pl/plperl/plc_perlboot.pl
> +++ b/src/pl/plperl/plc_perlboot.pl
> @@ -81,18 +81,15 @@ sub ::encode_array_constructor
>   } sort keys %$imports;
>   $BEGIN &&= "BEGIN { $BEGIN }";
>  
> - return qq[ package main; sub { $BEGIN $prolog $src } ];
> + # default no strict and no warnings
> + return qq[ package main; sub { no strict; no warnings; $BEGIN 
> $prolog $src } ];
>   }
>  
>   sub mkfunc
>   {
> - ## no critic (ProhibitNoStrict, ProhibitStringyEval);
> - no strict;  # default to no strict for the eval
> - no warnings;# default to no warnings for the eval
> - my $ret = eval(mkfuncsrc(@_));
> + my $ret = eval(mkfuncsrc(@_)); ## no critic 
> (ProhibitStringyEval);
>   $@ =~ s/\(eval \d+\) //g if $@;
>   return $ret;
> - ## use critic
>   }
>  
>   1;

I have no idea what this code does or how to test it, so I didn't touch it.

> diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl
> index 64227c2dce..e2653f11d8 100644
> --- a/src/tools/msvc/gendef.pl
> +++ b/src/tools/msvc/gendef.pl
> @@ -174,7 +174,7 @@ sub usage
>  
>  my %def = ();
>  
> -while (<$ARGV[0]/*.obj>)  ## no critic (RequireGlobFunction);
> +while (glob($ARGV[0]/*.obj))
>  {
>   my $objfile = $_;
>   my $symfile = $objfile;

I think what this code is meant to do might be better written as a
foreach loop.  Again, can't test it.

> diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
> index a6b24b5348..51d6a28953 100755
> --- a/src/tools/pgindent/pgindent
> +++ b/src/tools/pgindent/pgindent
> @@ -159,8 +159,7 @@ sub process_exclude
>   while (my $line = <$eh>)
>   {
>   chomp $line;
> - my $rgx;
> - eval " \$rgx = qr!$line!;";  ## no critic 
> (ProhibitStringyEval);
> + my $rgx = eval { qr!$line! };
>   @files = grep { $_ !~ /$rgx/ } @files if $rgx;
>   }
>   close($eh);

After further thinking, I changed this to just

my $rgx = qr!$line!;

which works just fine.

> @@ -435,7 +434,8 @@ sub diff
>  
>  sub run_build
>  {
> - eval "use LWP::Simple;";  ## no critic (ProhibitStringyEval);
> + require LWP::Simple;
> + LWP::Simple->import(qw(getstore is_success));
>  
>   my $code_base = shift || '.';
>   my $save_dir = getcwd();

I think this is mean to not fail compilation if you don't have that
module, so I left it as is.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] perlcritic

2017-03-27 Thread Peter Eisentraut

On 3/23/17 11:58, Daniel Gustafsson wrote:
> Given the nitpick nature of the comments, bumping status to ready for
> committer.

Committed, with your changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-03-27 Thread Michael Paquier

On Mon, Mar 27, 2017 at 9:00 PM, Kang Yuzhe  wrote:
> 1. Prepare Hands-on with PG internals
>
>  For example, a complete Hands-on  with SELECT/INSERT SQL Standard PG
> internals. The point is the experts can pick one fairly complex feature and
> walk it from Parser to Executor in a hands-on manner explaining step by step
> every technical detail.

There are resources on the net, in English as well. Take for example
this manual explaining the internals of Postgres by Hironobu Suzuki:
http://www.interdb.jp/pg/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch proposal

2017-03-27 Thread David Steele

On 3/26/17 7:34 PM, Venkata B Nagothi wrote:
> Hi David, 
> 
> On Thu, Mar 23, 2017 at 4:21 AM, David Steele  > wrote:
> 
> On 3/21/17 8:45 PM, Venkata B Nagothi wrote:
> 
> On Tue, Mar 21, 2017 at 8:46 AM, David Steele
> mailto:da...@pgmasters.net>
> 
> Unfortunately, I don't think the first patch
> (recoveryStartPoint)
> will work as currently implemented.  The problem I see is
> that the
> new function recoveryStartsHere() depends on pg_control
> containing a
> checkpoint right at the end of the backup.  There's no guarantee
> that this is true, even if pg_control is copied last.  That
> means a
> time, lsn, or xid that occurs somewhere in the middle of the
> backup
> can be selected without complaint from this code depending
> on timing.
> 
> 
> Yes, that is true.  The latest best position, checkpoint
> position, xid
> and timestamp of the restored backup of the backup is shown up
> in the
> pg_controldata, which means, that is the position from which the
> recovery would start.
> 
> 
> Backup recovery starts from the checkpoint in the backup_label, not
> from the checkpoint in pg_control.  The original checkpoint that
> started the backup is generally overwritten in pg_control by the end
> of the backup.
> 
> 
> Yes, I totally agree. My initial intention was to compare the recovery
> target position(s) with the contents in the backup_label, but, then, the
> checks would fails if the recovery is performed without the backup_label
> file. Then, i decided to compare the recovery target positions with the
> contents in the pg_control file.
> 
> 
> Which in-turn means, WALs start getting replayed
> from that position towards --> minimum recovery position (which
> is the
> end backup, which again means applying WALs generated between
> start and
> the end backup) all the way through to  --> recovery target
> position.
> 
> 
> minRecoveryPoint is only used when recovering a backup that was made
> from a standby.  For backups taken on the master, the backup end WAL
> record is used.
> 
> The best start position to check with would the position shown
> up in the
> pg_control file, which is way much better compared to the current
> postgresql behaviour.
> 
> 
> I don't agree, for the reasons given previously.
> 
> 
> As explained above, my intention was to ensure that the recovery start
> positions checks are successfully performed irrespective of the presence
> of the backup_label file.
> 
> I did some analysis before deciding to use pg_controldata's output
> instead of backup_label file contents.
> 
> Comparing the output of the pg_controldata with the contents of
> backup_label contents.
> 
> *Recovery Target LSN*
> 
> START WAL LOCATION (which is 0/9C28) in the backup_label =
> pg_controldata's latest checkpoint's REDO location (Latest
> checkpoint's REDO location:  0/9C28)
> 
> *Recovery Target TIME*
> 
> backup start time in the backup_label (START TIME: 2017-03-26
> 11:55:46 AEDT) = pg_controldata's latest checkpoint time (Time of
> latest checkpoint :  Sun 26 Mar 2017 11:55:46 AM AEDT)
> 
> *Recovery Target XID*
> 
> To begin with backup_label does contain any start XID. So, the only
> option is to depend on pg_controldata's output. 
> After a few quick tests and thorough observation, i do notice that,
> the pg_control file information is copied as it is to the backup
> location at the pg_start_backup. I performed some quick tests by
> running few transactions between pg_start_backup and pg_stop_backup.
> So, i believe, this is ideal start point for WAL replay.
> 
> Am i missing anything here ?

You are making assumptions about the contents of pg_control vs.
backup_label based on trivial tests.  With PG defaults, the backup must
run about five minutes before the values in pg_control and backup_label
will diverge.  Even if pg_control and backup_label do match, those are
the wrong values to use, and will get more incorrect the longer the
backup runs.

I believe there is a correct way to go about this, at least for time and
LSN, and I don't think your very approximate solution will pass muster
with a committer.

Since we are nearly at the end of the CF, I have marked this submission
"Returned with Feedback".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Monitoring roles patch

2017-03-27 Thread Dave Page

On Mon, Mar 27, 2017 at 3:51 AM, Simon Riggs  wrote:
> On 25 March 2017 at 16:30, Dave Page  wrote:
>
>> I believe this and other reasons we've described are exactly why other DBMS' 
>> do what we're proposing.
>
> It would help review if you could show some links and give a
> commentary on what you think others do, what they get right and what
> they get wrong, so we can be sure we are providing something people
> actually want and/or expect. POLA needed. I don't want to be reading
> various blogs about what those numpties on the Postgres project did in
> v10. Thanks

Most other DBMSs seem to provide either capabilities (or privileges,
whatever they may be called by the vendor) that can be assigned to
roles, or pre-defined roles with capabilities, or some combination of
the two.

SQL Server provides a number of server and database level roles that
are pre-configured for specific tasks, with set of capabilities. See
https://msdn.microsoft.com/en-us/library/ms189612.aspx for example.

DB2 appears to provide capabilities that can be assigned to roles. See
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.sec.doc/doc/c0050531.html

Oracle has something of a mix or roles and capabilities, eg. the DBA
role and SYSOPER privileges, e.g.
https://docs.oracle.com/cd/B28359_01/server.111/b28310/dba005.htm#ADMIN11040

What is being proposed here is a similar system, but focussing on
pre-defined roles. These make it easy to grant privileges for specific
purposes en-masse, without requiring the user to use them, i.e.
they're free to ignore them if they wish. As they are roles, they also
have the freedom to extend or restrict them in cases where privileges
are acquired through GRANT.

I believe this offers both the greatest flexibility and the most
straightforward and easy to use interface for the end user - the
ability to customise is maximised, whilst the default roles will be
both safe to use and should work out of the box for the majority of
monitoring scenarios.

The most important thing is that we'll be able to stop users having to
grant superuser privileges to their monitoring roles.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Shorten PG Hacking Steep Learning Curve....

2017-03-27 Thread Kang Yuzhe

Thanks Michael for your informative reply.

And thanks for be being patient with my some typos/duplication in the
message I sent to pg-hackers mailing list.

Regards,
Zeray

Re: [HACKERS] Monitoring roles patch

2017-03-27 Thread Robert Haas

On Sat, Mar 25, 2017 at 12:30 PM, Dave Page  wrote:
> Right - and if a user doesn't like it, they can easily just not use the 
> default role and create their own alternative instead.

OK, I see the points that both of you are making and I admit that
they've got some validity to them.  Let me make a modest proposal.
Suppose we define the pg_monitor role as strictly a bundle of
privileges that could be granted individually if desired, and
similarly define pg_read_all_settings and pg_read_all_stats as roles
that are strictly recognized by the code, without any grants.  So
instead of this:

GRANT EXECUTE ON FUNCTION pgstatindex(regclass) TO pg_read_all_stats;

We'd instead do this:

GRANT EXECUTE ON FUNCTION pgstatindex(regclass) TO pg_monitor;

(and similarly for all other GRANT statements that appear in the patch)

So, if you want, you can give somebody pg_read_all_stats, enhancing
their access to functions to which they already have access, but deny
them execute access on some of the individual functions.  The
permissions with which pg_monitor ends up are unchanged, but somebody
creating their own role has a bit more freedom to customize what it
can do.

I still don't have much confidence in the theory that every monitoring
solution ever will want the same privileges, but if we unbundle things
as much as we reasonably can, then the worst thing that happens if
pg_monitor turns out to be not what everyone wants is that some people
do it a different way instead.  Which as you say is not that bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel bitmap heap scan

2017-03-27 Thread Dilip Kumar

On Mon, Mar 27, 2017 at 5:58 PM, Robert Haas  wrote:
> Failing to pass DSA_ALLOC_HUGE is an obvious oversight, but I think
> the ptbase == NULL check shouldn't be needed, because we're not
> passing DSA_ALLOC_NO_OOM.  And that's good, because this is going to
> be called from SH_CREATE, which isn't prepared for a NULL return
> anyway.
>
> Am I all wet?

Yes you are right that we are not passing DSA_ALLOC_NO_OOM, so
dsa_allocate_extended will return error in case if allocation failed.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-27 Thread Tom Lane

Robert Haas  writes:
> On Mon, Mar 27, 2017 at 1:48 AM, Rafia Sabih
>  wrote:
>> This is caused because trigger related functions are marked safe and
>> using global variables, hence when executed in parallel are giving
>> incorrect  output.

> If it's just that they are relying on unsynchronized global variables,
> then it's sufficient to mark them parallel-restricted ('r').  Do we
> really need to go all the way to parallel-unsafe ('u')?

Color me confused, but under what circumstances would triggers get
executed by a parallel worker at all?  I thought we did not allow
updating queries to be parallelized.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 17:53, Stas Kelvich  wrote:

> I’m heavily underestimated amount of changes there, but almost finished
> and will send updated patch in several hours.

Oh, brilliant! Please post whatever you have before you knock off for
the day anyway, even if it's just a WIP, so I can pick it up tomorrow
my time and poke at its tests etc.

I'm in Western Australia +0800 time, significantly ahead of you.

> Done.
[snip]
> Also done.

Great, time is short so that's fantastic.

> I’ve extended test, but it is good to have some more.

I don't mind writing tests and I've done quite a bit with TAP now, so
happy to help there.

>> Some special care is needed for the callback that decides whether to
>> process a given xact as 2PC or not. It's called before PREPARE
>> TRANSACTION to decide whether to decode any given xact at prepare time
>> or wait until it commits. It's called again at COMMIT PREPARED time if
>> we crashed after we processed PREPARE TRANSACTION and advanced our
>> confirmed_flush_lsn such that we won't re-process the PREPARE
>> TRANSACTION again. Our restart_lsn might've advanced past it so we
>> never even decode it, so we can't rely on seeing it at all. It has
>> access to the xid, gid and invalidations, all of which we have at both
>> prepare and commit time, to make its decision from. It must have the
>> same result at prepare and commit time for any given xact. We can
>> probably use a cache in the reorder buffer to avoid the 2nd call on
>> commit prepared if we haven't crashed/reconnected between the two.
>
> Good point. Didn’t think about restart_lsn in case when we are skipping this
> particular prepare (filter_prepared() -> true, in my terms). I think that 
> should
> work properly as it use the same code path as it was before, but I’ll look at 
> it.

I suspect that's going to be fragile in the face of interleaving of
xacts if we crash between prepare and commit prepared. (Apologies if
the below is long or disjointed, it's been a long day but trying to
sort thoughts out).

Consider ("SSU" = "standby status update"):

0/050 xid 1 BEGIN
0/060 xid 1 INSERT ...

0/070 xid 2 BEGIN
0/080 xid 2 INSERT ...

0/090 xid 3 BEGIN
0/095 xid 3 INSERT ...
0/100 xid 3 PREPARE TRANSACTION 'x' => sent to client [y/n]?
SSU: confirmed_flush_lsn = 0/100, restart_lsn 0/050 (if we sent to client)

0/200 xid 2 COMMIT => sent to client
SSU: confirmed_flush_lsn = 0/200, restart_lsn 0/050

0/250 xl_running_xacts logged, xids = [1,3]

[CRASH or disconnect/reconnect]

Restart decoding at 0/050.

skip output of xid 3 PREPARE TRANSACTION @ 0/100: is <= confirmed_flush_lsn
skip output of xid 2 COMMIT @ 0/200: is <= confirmed_flush_lsn

0/300 xid 2 COMMIT PREPARED 'x' => sent to client, confirmed_flush_lsn
is > confirmed_flush_lsn

In the above, our problem is that restart_lsn is held down by some
other xact, so we can't rely on it to tell us if we replayed xid 3 to
the output plugin or not. We can't use confirmed_flush_lsn either,
since it'll advance at xid 2's commit whether or not we replayed xid
3's prepare to the client.

Since xid 3 will still be in xl_running_xacts when prepared, when we
recover SnapBuildProcessChange will return true for its changes and
we'll (re)buffer them, whether or not we landed up sending to the
client at prepare time. Nothing much to be done about that, we'll just
discard them when we process the prepare or the commit prepared,
depending on where we consult our filter callback again.

We MUST ask our filter callback again though, before we test
SnapBuildXactNeedsSkip when processing the PREPARE TRANSACTION again.
Otherwise we'll discard the buffered changes, and if we *didn't* send
them to the client already ... splat.

We can call the filter callback again on xid 3's prepare to find out
"would you have replayed it when we passed it last time". Or we can
call it when we get to the commit instead, to ask "when called last
time at prepare, did you replay or not?" But we have to consult the
callback. By default we'd just skip ReorderBufferCommit processing for
xid 3 entirely, which we'll do via the SnapBuildXactNeedsSkip call in
DecodeCommit when we process the COMMIT PREPARED.

If there was no other running xact when we decoded the PREPARE
TRANSACTION the first time around (i.e. xid 1 and 2 didn't exist in
the above), and if we do send it to the client at prepare time, I
think we can safely advance restart_lsn to the most recent
xl_running_xacts once we get replay confirmation. So we can pretend we
already committed at PREPARE TRANSACTION time for restart purposes if
we output at PREPARE TRANSACTION time, it just doesn't help us with
deciding whether to send the buffer contents at COMMIT PREPARED time
or not.

TL;DR: we can't rely on restart_lsn or confirmed_flush_lsn or
xl_running_xacts, we must ask the filter callback when we (re)decode
the PREPARE TRANSACTION record and/or at COMMIT PREPARED time.

This isn't a big deal. We just have to make sure we consult the filter
callback ag

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Tom Lane

Andres Freund  writes:
>> I personally find per-function annotation ala
>> __attribute__((optimize("no-crossjumping")))
>> cleaner anyway.  I tested that, and it seems to work.
>> 
>> Obviously we'd have to hide that behind a configure test.  Could also do
>> tests based on __GNUC__ / __GNUC_MINOR__, but that seems uglier.

Agreed.

> Checking for this isn't entirely pretty - see my attached attempt at
> doing so.  I considered hiding
> __attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind
> a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that
> makes things better.

I think it would, primarily because if we find out that some other compiler
spells this differently, we could handle it totally within configure.

Isn't our practice to put __attribute__ at the end of a function
declaration or definition, not in the middle someplace?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> Please let me know if you think this is not enough but wanted to update
> section 33.6 also?

Yes, if the right place to call PQsetSingleRowMode() is immediately
after PQbatchQueueProcess(), I think we need to update
"33.6. Retrieving Query Results Row-By-Row"
with that information, otherwise what it says is just wrong
when in batch mode.

Also, in 33.5, I suggest to not use the "currently executing
query" as a synonym for the "query currently being processed"
to avoid any confusion between what the backend is executing
and a prior query whose results are being collected by the client
at the same moment.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-27 Thread Ashutosh Sharma

Hi,

> I think you should consider refactoring this so that it doesn't need
> to use goto.  Maybe move the while (offnum <= maxoff) logic into a
> helper function and have it return itemIndex.  If itemIndex == 0, you
> can call it again.

okay, Added a helper function for _hash_readpage(). Please check v4
patch attached with this mail.

Notice that the code in the "else" branch of the
> if (itemIndex == 0) stuff could actually be removed from the else
> block without changing anything, because the "if" block always either
> returns or does a goto.  That's worthy of a little more work to try to
> make things simple and clear.

Corrected.

>
> + *  We find the first item(or, if backward scan, the last item) in
>
> Missing space.
>

Corrected.

> In _hash_dropscanbuf(), the handling of hashso_bucket_buf is now
> inconsistent with the handling of hashso_split_bucket_buf, which seems
> suspicious.

I have corrected it.

Suppose we enter this function with so->hashso_bucket_buf
> and so->currPos.buf both being valid buffers, but not the same one.
> It looks to me as if we'll release the first pin but not the second
> one.

Yes, that is because we no need to release pin on currPos.buf if it is
an overflow page. In page scan mode once we have saved all the
qualified tuples from a current page (could be an overflow page) into
items array we do release both pin and lock on a overflow page. This
was not true earlier, let us assume a case where we are supposed to
fetch only fixed number of tuples from a page using cursor and in such
a case once the number of tuples to be fetched is completed we simply
return with out releasing pin on a page being scanned. If this page is
an overflow page then by end of scan we will end up with pin held on
two buffers i.e. bucket buf and current buf which is basically
overflow buf.

My guess (which could be wrong) is that so->hashso_bucket_buf =
> InvalidBuffer should be moved back up higher in the function where it
> was before, just after the first if statement, and that the new
> condition so->hashso_bucket_buf == so->currPos.buf on the last
> _hash_dropbuf() should be removed.  If that's not right, then I think
> I need somebody to explain why not.

Okay, as i mentioned above, in case of page scan mode we only keep pin
on a bucket buf. There won't be any case where we will be having pin
on overflow buf at the end of scan. So, basically _hash_dropscan buf()
should only attempt to release pin on a bucket buf. And an attempt to
release pin on so->currPos.buf should only happen when
'so->hashso_bucket_buf == so->currPos.buf' or when
'so->hashso_split_bucket_buf == so->currPos.buf'

>
> I am suspicious that _hash_kill_items() is going to have problems if
> the overflow page is freed before it reacquires the lock.
> _btkillitems() contains safeguards against similar cases.

I have added similar check for hash_kill_items() as well.

>
> This is not a full review, but I'm out of time for the moment.

No worries. I will be ready for your further review comments any time.
Thanks for the review.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 3d0273f503d1645d6289bda78946a0af4b9e9f3a Mon Sep 17 00:00:00 2001
From: ashu 
Date: Mon, 27 Mar 2017 18:22:15 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev4

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 124 ++
 src/backend/access/hash/hashpage.c   |  17 +-
 src/backend/access/hash/hashsearch.c | 445 +++
 src/backend/access/hash/hashutil.c   |  42 +++-
 src/include/access/hash.h|  46 +++-
 6 files changed, 515 insertions(+), 168 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0bacef8..bd2827a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	Offse

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tom Lane

"Tsunakawa, Takayuki"  writes:
> All other places in twophase.c and most places in other files put ereport() 
> and errmsg() on separate lines.  I think it would be better to align with 
> surrounding code.

> + ereport(FATAL, (errmsg("corrupted two-phase 
> file \"%s\"",

Actually, the *real* problem with that coding is it lacks a SQLSTATE
(errcode call).  The only places where it's acceptable to leave that
out are for internal "can't happen" cases, which this surely isn't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 9:25 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 27, 2017 at 1:48 AM, Rafia Sabih
>>  wrote:
>>> This is caused because trigger related functions are marked safe and
>>> using global variables, hence when executed in parallel are giving
>>> incorrect  output.
>
>> If it's just that they are relying on unsynchronized global variables,
>> then it's sufficient to mark them parallel-restricted ('r').  Do we
>> really need to go all the way to parallel-unsafe ('u')?
>
> Color me confused, but under what circumstances would triggers get
> executed by a parallel worker at all?  I thought we did not allow
> updating queries to be parallelized.

Right, we don't.  But if the updating query fires a trigger, and the
trigger runs an SQL statement that is itself safe for parallelism,
*that* statement could be run in parallel.  In almost all cases it
won't make sense because triggers aren't likely to contain SQL
statements that are expensive enough to justify running them in
parallel, but there's no a priori reason to disallow it.

(And, indeed, I think this commit and the subsequent breakage shows
the value of making sure that parallel query is allowed in as many
places as possible.  Running the regression tests with
force_parallel_mode=regress is the best automated tool we have to find
cases where functions are labeled as being more safe than they
actually are, but those tests only try inserting the invisible
single-copy Gather node in cases where parallel query is allowable at
all.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] perlcritic

2017-03-27 Thread Dagfinn Ilmari Mannsåker

Peter Eisentraut  writes:

> On 3/1/17 11:21, Dagfinn Ilmari Mannsåker wrote:
>> diff --git a/src/pl/plperl/plc_perlboot.pl b/src/pl/plperl/plc_perlboot.pl
>> index 292c9101c9..b4212f5ab2 100644
>> --- a/src/pl/plperl/plc_perlboot.pl
>> +++ b/src/pl/plperl/plc_perlboot.pl
>> @@ -81,18 +81,15 @@ sub ::encode_array_constructor
>>  } sort keys %$imports;
>>  $BEGIN &&= "BEGIN { $BEGIN }";
>>  
>> -return qq[ package main; sub { $BEGIN $prolog $src } ];
>> +# default no strict and no warnings
>> +return qq[ package main; sub { no strict; no warnings; $BEGIN 
>> $prolog $src } ];
>>  }
>>  
>>  sub mkfunc
>>  {
>> -## no critic (ProhibitNoStrict, ProhibitStringyEval);
>> -no strict;  # default to no strict for the eval
>> -no warnings;# default to no warnings for the eval
>> -my $ret = eval(mkfuncsrc(@_));
>> +my $ret = eval(mkfuncsrc(@_)); ## no critic 
>> (ProhibitStringyEval);
>>  $@ =~ s/\(eval \d+\) //g if $@;
>>  return $ret;
>> -## use critic
>>  }
>>  
>>  1;
>
> I have no idea what this code does or how to test it, so I didn't touch it.

This code compiles a string of perl source into a subroutine reference.
It's is called by plperl_create_sub() in src/pl/plperl/plperl.c, which
is called whenever a plperl function needs to be compiled, i.e. during
CREATE FUNCTION (unless check_function_bodies is off) and when the
function is executed and the compiled form is not already cached in
plperl_proc_hash.

The change reduces the scope of the stricture and warning disablement to
just the compiled code, instead of the surrounding compiling code too.
Putting them inside the sub block has no runtime overhead, since they're
compile-time directives, not runtime statements.

It can be tested by creating a plperl function with a construct that
would fall foul of warnings or strictures, which
src/pl/plperl/sql/plperl_elog.sql does.

>> diff --git a/src/tools/msvc/gendef.pl b/src/tools/msvc/gendef.pl
>> index 64227c2dce..e2653f11d8 100644
>> --- a/src/tools/msvc/gendef.pl
>> +++ b/src/tools/msvc/gendef.pl
>> @@ -174,7 +174,7 @@ sub usage
>>  
>>  my %def = ();
>>  
>> -while (<$ARGV[0]/*.obj>)  ## no critic (RequireGlobFunction);
>> +while (glob($ARGV[0]/*.obj))
>>  {
>>  my $objfile = $_;
>>  my $symfile = $objfile;
>
> I think what this code is meant to do might be better written as a
> foreach loop.  Again, can't test it.

glob("...") is exactly equivalent to <...> (except when <...> parses as
readline, which is why Perl::Critic complains).

Writing it as 'for my $objfile (glob("$ARGV[0]/*.obj")) { ... }' would
be neater, I agree.

>> difff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
>> index a6b24b5348..51d6a28953 100755
>> --- a/src/tools/pgindent/pgindent
>> +++ b/src/tools/pgindent/pgindent
>> @@ -159,8 +159,7 @@ sub process_exclude
>>  while (my $line = <$eh>)
>>  {
>>  chomp $line;
>> -my $rgx;
>> -eval " \$rgx = qr!$line!;";  ## no critic 
>> (ProhibitStringyEval);
>> +my $rgx = eval { qr!$line! };
>>  @files = grep { $_ !~ /$rgx/ } @files if $rgx;
>>  }
>>  close($eh);
>
> After further thinking, I changed this to just
>
> my $rgx = qr!$line!;
>
> which works just fine.

That changes the behaviour from silently skipping invalid regular
expressions in the exclude file to dying on encountering one.  That
might be desirable, but should be done deliberately.

>> @@ -435,7 +434,8 @@ sub diff
>>  
>>  sub run_build
>>  {
>> -eval "use LWP::Simple;";  ## no critic (ProhibitStringyEval);
>> +require LWP::Simple;
>> +LWP::Simple->import(qw(getstore is_success));
>>  
>>  my $code_base = shift || '.';
>>  my $save_dir = getcwd();
>
> I think this is mean to not fail compilation if you don't have that
> module, so I left it as is.

Yes, it's using string eval to defer the compilation of the "use"
statement to runtime.  The require+import does exactly the same thing,
since they are run-time already, so won't be called until run_build is.

While looking at this again, I realised that the 'do' statement in
src/tools/msvc/install.pl will break on the upcoming perl 5.26, which
doesn't include '.' in @INC (the search path for 'require' and 'do') by
default.

if (-e "src/tools/msvc/buildenv.pl")
{
do "src/tools/msvc/buildenv.pl";
}

Attached is a final patch with the above changes, which I think should
be applied before this can be considered complete.

>From 1d388d13d572912df2faa7d1c4004a635f956306 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Wed, 1 Mar 2017 15:32:45 +
Subject: [PATCH] Fix most remaining perlcritic exceptions

The ProhibitStringyEval one is unavoidable when compili

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-03-27 Thread Andreas Karlsson


Hi,

Here is my review. I agree with the goal of the refactoring, as we want 
to make it easier to dump all the properties for the database object. 
But I think we need to solve the issues with the special casing of 
postgres and template1 which I personally would find very surprising if 
pg_dump -C did. On the other hand I think that we cannot get away from 
having pg_dumpall give them a special treatment.


The nitpicking section is for minor code style errors.

= Functional review

I have not done an in depth functional review due to the discussion 
about how postgres and template1 should be handled.


- The patch does not apply cleanly anymore

- I do not like the change in behavior which causes "pg_dump -C 
postgres" to no longer include CREATE DATABASE. Special treatment of 
specific databases based on name makes sense in pg_dumpall, but not in 
pg_dump.


- There are test failures in the pg_dump tests. It seems like some could 
be related to that you do not include CREATE DATABASE postgres in the 
dumps but I also get errors like 'ERROR:  syntax error at or near 
"fault_tablespace"'.


not ok 691 - createdb: dumps CREATE DATABASE postgres
not ok 3003 - pg_dumpall_dbprivs: dumps CREATE DATABASE dump_test
not ok 11 - restore full dump using environment variables for connection 
parameters

not ok 12 - no dump errors
not ok 13 - restore full dump with command-line options for connection 
parameters

not ok 14 - no dump errors

= Code review

- As a response to "TBD -- is it necessary to get the default encoding": 
I think so, but either way changing this seems unrelated to this patch.


- I know it is taken from the old pg_dumpall code, but the way the 
database owner is handled seems I wrong.think we should set it like the 
owner for other objects. And more importantly it should respect --no-owner.


- The logic for switching database when setting the default table space 
is broken. You generate "\ connect" rather than "\connect".


- I saw the comment "Note that we do not support initial privileges 
(pg_init_privs) on databases." and wondered: why not? I definitly think 
that we should support this.


= Nitpicking

- You should probably use SGML style  over  and 
 for inline tags.


- In "database-level properties such as Ownership, ACLs, [...]" I do not 
think that "Ownerships" shuld be capitalized.


- There are two extra spaces on the lines below, and a space is missing 
after the closing tag.


 ALTER ROLE IN DATABASE ...  SET commands.

with --create option to dump  ALTER ROLE IN DATABASE ...  SET 



- On the following comment ".." should be "...", since that is the 
correct way to write an ellipsis.


* Frame the ALTER .. SET .. commands and fill it in buf.

- Rename arrayitem to configitem in makeAlterConfigCommand().

- In makeAlterConfigCommand() you should do "*pos++ = '\0';" rather than 
"*pos = 0;" and then remove the later + 1 so our code matches with the 
code in dumpFunc(). Either is correct, but it would be nice if both 
pieces of code looked more similar.


- You removed an empty line in pg_backup_utils.h between globals 
variables and function declartions which I think should be left there. 
It should be directly after g_verbose.


- There is something wrong with the indentation of the query for 
collecting info about databases in dumpDatabase() for PG >= 9.6.


- Missing space before "'' as rdatacl" in dumpDatabase(), and a missing 
space at the end of the string.


- Double space in 'FROM pg_database  "' in dumpDatabase().

Andreas



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Fabien COELHO




ISTM that PQExpBuffer is partially a memory leak. Something should need 
to be freed?


I copied that pattern from somewhere else, so yeah, I duplicated whatever
leak was there.


Hmmm. Indeed some commands do not free, but there is a single use and the 
commands exits afterwards, eg "createuser".


I think that you could use another pattern where you init the 
PQExpBufferData structure instead of create it, so that only the string is 
malloced.



I think that you should use appendPQExpBufferChar and Str instead of
relying on the format variant which is probably expensive. Something like:

  if (num_options > 0)
append...Char(buf, ' ');
  append...Str(buf, ...);


All flavors of appendPQExpBuffer*() I can find have a const *char format 
string, so no way to append a naked string. If you know differently, I'm 
listening. Not fixed.


These prototypes are from "pqexpbuffer.h", and do not seem to rely on a 
format:


 extern void appendPQExpBufferChar(PQExpBuffer str, char ch);
 extern void appendPQExpBufferStr(PQExpBuffer str, const char *data);

--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tom Lane

Robert Haas  writes:
> On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
>  wrote:
>> But it seems a bit futile to produce the parallel plan in the first place,
>> because with max_parallel_workers=0 we can't possibly get any parallel
>> workers ever. I wonder why compute_parallel_worker() only looks at
>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>> parallel_workers = Min(parallel_workers, max_parallel_workers);
>> Perhaps this was discussed and is actually intentional, though.

> It was intentional.  See the last paragraph of
> https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com

Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 16:20, Simon Riggs  wrote:
> On 27 March 2017 at 09:03, Craig Ringer  wrote:
>
>> I think this one's ready to go.
>
> Looks like something I could commit. Full review by me while offline
> today, aiming to commit tomorrow barring issues raised.

Great.

Meanwhile I'm going to be trying to work with Stas on 2PC logical
decoding, while firming up the next patches in this series to see if
we can progress a bit further.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Alexander Korotkov

On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane  wrote:

> "Tsunakawa, Takayuki"  writes:
> > All other places in twophase.c and most places in other files put
> ereport() and errmsg() on separate lines.  I think it would be better to
> align with surrounding code.
>
> > + ereport(FATAL, (errmsg("corrupted
> two-phase file \"%s\"",
>
> Actually, the *real* problem with that coding is it lacks a SQLSTATE
> (errcode call).  The only places where it's acceptable to leave that
> out are for internal "can't happen" cases, which this surely isn't.
>

Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't
happen" case as well.

if (ControlFile->backupEndRequired)
> ereport(FATAL,
> (errmsg("WAL ends before end of online backup"),
>  errhint("All WAL generated while online backup was taken must be
> available at recovery.")));
> else if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
> ereport(FATAL,
> (errmsg("WAL ends before end of online backup"),
>  errhint("Online backup started with pg_start_backup() must be ended with
> pg_stop_backup(), and all WAL up to that point must be available at
> recovery.")));
> else
> ereport(FATAL,
>   (errmsg("WAL ends before consistent recovery point")));


Should we consider fixing them?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] standardized backwards incompatibility tag for commits

2017-03-27 Thread Robert Haas

On Sat, Mar 25, 2017 at 4:15 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> Seems like it'd be good to standardize how we're declaring that a commit
>> contains backwards incompatible changes.  I've seen
>> - 'BACKWARDS INCOMPATIBLE CHANGE'
>> - 'BACKWARD INCOMPATIBILITY'
>> - a lot of free-flow text annotations like "as a
>>   backward-incompatibility", "This makes a backwards-incompatible change"
>
>> Especially the latter are easy to miss when looking through the commit
>> log and I'd bet some get missed when generating the release notes.
>
> Bruce might have a different opinion, but for my own part I do not think
> it would make any difference in creating the release notes.  The important
> thing is that the information be there in the log entry, not exactly how
> it's spelled.

The other thing is that not all incompatibilities are equally
important.  I think there's pretty much a continuous spectrum from
things that are laughably unlikely to cause problems for anyone in
practice ("hey, you fixed the server crash that I was relying on to
bounce my server!") to things that don't intentionally change
user-visible behavior but might easily still create problems for
people ("hey, your new expression evaluation engine is prohibitively
slow on my test case!") to things that create small but noticeable
user-visible changes ("hey, SELECT 1 the_new_keyword_you_just_added
now fails!") to things that break user SQL in a way that requires it
to be updated ("hey, you changed the definition of pg_stat_activity!"
or much more significantly "hey, i have to go add explicit casts to a
bunch of things that didn't use to need them") to things that cause
the same syntax to have frighteningly different behavior ("hey,
TRUNCATE now recurses to child tables!").  When people mention that
there is a backward-incompatibility in the commit message, they're
just saying that they think that behavior change from that commit is
particularly noteworthy.  But I don't think there's any real
consistency among committers about which changes rise to the level of
requiring such an annotation.  I doubt that "Backward-Incompatibility:
{Yes|No}" is granular enough to be useful even if we all agreed on
what it meant.  "Backward-Incompatibility: TYPE_OF_INCOMPATIBILITY"
might be useful, but agreeing on a set of definitions and getting
everyone to remember to apply the tag in relevant cases might be hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Corey Huinker

>
>
> I think that you could use another pattern where you init the
> PQExpBufferData structure instead of create it, so that only the string is
> malloced.


In v26, I have the functions return PQExpBuffer. The two calling functions
then free it, which should solve any leak.


>
>
> I think that you should use appendPQExpBufferChar and Str instead of
>>> relying on the format variant which is probably expensive. Something
>>> like:
>>>
>>>   if (num_options > 0)
>>> append...Char(buf, ' ');
>>>   append...Str(buf, ...);
>>>
>>
>> All flavors of appendPQExpBuffer*() I can find have a const *char format
>> string, so no way to append a naked string. If you know differently, I'm
>> listening. Not fixed.
>>
>
> These prototypes are from "pqexpbuffer.h", and do not seem to rely on a
> format:
>
>
Here's an addendum that does that. I can combine them in v27, but figured
this was quicker.
From 2fa083eb0115278f817a2d1d0439a47951a9c48b Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Mon, 27 Mar 2017 10:07:36 -0400
Subject: [PATCH 2/2] simpler append

---
 src/bin/psql/command.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index aa9dad4..a6168df 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -218,10 +218,9 @@ gather_boolean_expression(PsqlScanState scan_state, bool expansion, bool warn)
 	while ((value = psql_scan_slash_option(scan_state, slash_opt, NULL, false)))
 	{
 		/* add a space in between subsequent tokens */
-		if (num_options == 0)
-			appendPQExpBuffer(exp_buf, "%s", value);
-		else
-			appendPQExpBuffer(exp_buf, " %s", value);
+		if (num_options > 0)
+			appendPQExpBufferChar(exp_buf, ' ');
+		appendPQExpBufferStr(exp_buf, value);
 		num_options++;
 	}
 
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to drop old-style (V0) functions?

2017-03-27 Thread Tom Lane

Craig Ringer  writes:
> I didn't have any way to make

> seg_l = (SEG *) DatumGetPointer(DirectFunctionCall2(seg_union,
> PointerGetDatum(seg_l),
> PointerGetDatum(sort_items[i].data)));

> pretty, but *shrug*.

For the builtin types, fmgr.h generally defines a datum-to-specific-
pointer-type macro, eg it has these for bytea:

#define DatumGetByteaP(X)   ((bytea *) PG_DETOAST_DATUM(X))
#define PG_GETARG_BYTEA_P(n)DatumGetByteaP(PG_GETARG_DATUM(n))

(A type that doesn't have toast support would just use DatumGetPointer
instead of PG_DETOAST_DATUM.)  Replacing "(SEG *) DatumGetPointer(...)"
with "DatumGetSegP(...)" would help a little here.

I wouldn't necessarily do that if this is the only place that would
benefit, but there might be more scope for upgrading seg.c's notation
than just here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tom Lane

Alexander Korotkov  writes:
> On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane  wrote:
>> Actually, the *real* problem with that coding is it lacks a SQLSTATE
>> (errcode call).  The only places where it's acceptable to leave that
>> out are for internal "can't happen" cases, which this surely isn't.

> Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't
> happen" case as well.
> Should we consider fixing them?

Yup.  Just remember that the default is
XX000EERRCODE_INTERNAL_ERROR  internal_error

If that's not how you want the error case reported, you need an errcode()
call.

We might need more ERRCODEs than are there now, if none of the existing
ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED
and ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
example?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-27 Thread Robert Haas

On Thu, Mar 23, 2017 at 8:54 PM, Amit Langote
 wrote:
> On 2017/03/23 23:47, Amit Langote wrote:
>> On Thu, Mar 23, 2017 at 11:27 PM, Maksim Milyutin
>>  wrote:
>>> Hi!
>>>
>>> I have noticed that there is scheduled unlinking of nonexistent physical
>>> storage under partitioned table when we execute DROP TABLE statement on this
>>> partitioned table. Though this action doesn't generate any error under
>>> typical behavior of postgres because the error of storage's lack is caught
>>> through if-statement [1] I think it is not safe.
>>>
>>> My patch fixes this issue.
>>>
>>> 1. src/backend/storage/smgr/md.c:1385
>>
>> Good catch, will incorporate that in the main patch.
>
> And here is the updated patch.

I think you should go back to the earlier strategy of disallowing heap
reloptions to be set on the partitioned table.  The thing is, we don't
really know which way we're going to want to go in the future.  Maybe
we'll come up with a system where you can set options on the
partitioned table, and those options will cascade to the children.  Or
maybe we'll come up with a system where partitioned tables have a
completely different set of options to control behaviors specific to
partitioned tables.  If we do the latter, then we don't want to also
have to support useless heap reloptions for forward compatibility, nor
do we want to break backward-compatibility to remove support.  If we
do the former, then it's better if we allow it in the same release
where it starts working.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump emits ALTER TABLE ONLY partitioned_table

2017-03-27 Thread Robert Haas

On Fri, Feb 17, 2017 at 3:23 AM, Amit Langote
 wrote:
> In certain cases, pg_dump's dumpTableSchema() emits a separate ALTER TABLE
> command for those schema elements of a table that could not be included
> directly in the CREATE TABLE command for the table.
>
> For example:
>
> create table p (a int, b int) partition by range (a);
> create table p1 partition of p for values from (1) to (10) partition by
> range (b);
> create table p11 partition of p1 for values from (1) to (10);
>
> pg_dump -s gives:
>
> CREATE TABLE p (
> a integer NOT NULL,
> b integer
> )
> PARTITION BY RANGE (a);
>
> CREATE TABLE p1 PARTITION OF p
> FOR VALUES FROM (1) TO (10)
> PARTITION BY RANGE (b);
> ALTER TABLE ONLY p1 ALTER COLUMN a SET NOT NULL;
> ALTER TABLE ONLY p1 ALTER COLUMN b SET NOT NULL;
>
> 
>
> Note the ONLY in the above emitted command.  Now if I run the above
> commands in another database, the following error occurs:
>
> ERROR:  constraint must be added to child tables too
>
> That's because specifying ONLY for the AT commands on partitioned tables
> that must recurse causes an error.
>
> Attached patch fixes that - it prevents emitting ONLY for those ALTER
> TABLE commands, which if run, would cause an error like the one above.

Isn't it bogus that this is generating ALTER TABLE .. SET NOT NULL
columns at all?  You didn't say anything like that when setting up the
database, so why should it be there when dumping?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Fabien COELHO



Hello,


I think that you could use another pattern where you init the
PQExpBufferData structure instead of create it, so that only the string is
malloced.


In v26, I have the functions return PQExpBuffer. The two calling functions
then free it, which should solve any leak.


Yep, it works as well.


Here's an addendum that does that. I can combine them in v27, but figured
this was quicker.


It works.

However having just one full patch with a number would help so that I can 
say "ready to committer" or not on something.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] rename pg_log directory?

2017-03-27 Thread Peter Eisentraut

On 3/6/17 18:01, Andreas Karlsson wrote:
> On 02/27/2017 03:05 PM, Peter Eisentraut wrote:
>> How about changing the default for log_directory from 'pg_log' to, say,
>> 'log'?
> 
> I have attached a patch which does this. I do not care much about which 
> name is picked as long as we get rid off the "pg_" prefix.

Committed.

While digging around a bit, I found in release-old.sgml that before
PostgreSQL 7.2, pg_clog was called pg_log.  Go figure.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-27 Thread Robert Haas

On Thu, Mar 9, 2017 at 7:20 PM, Amit Langote
 wrote:
> On 2017/03/10 9:10, Amit Langote wrote:
>> On 2017/03/09 23:25, Robert Haas wrote:
>>> On Fri, Feb 17, 2017 at 1:47 AM, Amit Langote wrote:
 I updated the patch.  Now it's reduced to simply removing the check in
 transformInsertStmt() that prevented using *any* ON CONFLICT on
 partitioned tables at all.
>>>
>>> This patch no longer applies.
>>
>> Rebased patch is attached.
>
> Oops, really attached this time,

Committed with a bit of wordsmithing of the documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] rename pg_log directory?

2017-03-27 Thread Andreas Karlsson


On 03/27/2017 04:38 PM, Peter Eisentraut wrote:

Committed.


Thanks!


While digging around a bit, I found in release-old.sgml that before
PostgreSQL 7.2, pg_clog was called pg_log.  Go figure.


Yeah, I noticed that too when writing the patch. :)

Andreas



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-27 Thread Robert Haas

On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
> While going over the contrib modules, I noticed that sepgsql was not
> updated for partitioned tables.  What that appears to mean is that it's
> not possible to define labels on partitioned tables.

It works for me:

rhaas=# load 'dummy_seclabel';
LOAD
rhaas=# create table foo (a int, b text) partition by range (a);
CREATE TABLE
rhaas=# security label on table foo is 'classified';
SECURITY LABEL

What exactly is the problem you're seeing?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in get_partition_for_tuple

2017-03-27 Thread Robert Haas

On Wed, Mar 22, 2017 at 4:00 AM, Amit Langote
 wrote:
>> You're right, fixed that in the attached updated patch.
>
> Added this to the partitioning open items list, to avoid being forgotten
> about.
>
> https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Partitioning

Thanks for adding it there.  I'm making a sweep through that list now,
or at least the parts of it that obvious pertain to my commits.  This
patch looks good, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Alexander Korotkov

On Mon, Mar 27, 2017 at 11:26 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> > Moved to CF 2017-03. Both patches still apply.
>
> Sorry to be late for reviewing this, but done now.  The patch applied,
> make check passed, and the code looks almost good.  I could successfully
> perform a simple archive recovery.  Finally, I broke the 2pc state file
> while the server is down, and could confirm that the server failed to start
> as expected, emitting a FATAL message.  Worked nicely.
>

I've tested moving recovery.conf away case which was originally reported by
Magnus.

When I'm trying to promote standby when recovery.conf is moved away, I get
FATAL error.

waiting for server to promote2017-03-27 17:12:38.225 MSK [30307] LOG:
 received promote request
2017-03-27 17:12:38.225 MSK [30311] FATAL:  terminating walreceiver process
due to administrator command
2017-03-27 17:12:38.225 MSK [30307] LOG:  redo done at 0/328
2017-03-27 17:12:38.226 MSK [30307] LOG:  selected new timeline ID: 2
2017-03-27 17:12:38.253 MSK [30307] FATAL:  could not open file
"recovery.conf": No such file or directory
2017-03-27 17:12:38.253 MSK [30306] LOG:  startup process (PID 30307)
exited with exit code 1
2017-03-27 17:12:38.253 MSK [30306] LOG:  terminating any other active
server processes
2017-03-27 17:12:38.256 MSK [30306] LOG:  database system is shut down
... stopped waiting
server is still promoting

If I try to start it after crash, it becomes a master on timeline 1.  Just
like in case we deleted recovery.conf while server was shut down.

waiting for server to start2017-03-27 17:16:54.186 MSK [30413] LOG:
 listening on IPv6 address "::1", port 5430
2017-03-27 17:16:54.186 MSK [30413] LOG:  listening on IPv6 address
"fe80::1%lo0", port 5430
2017-03-27 17:16:54.186 MSK [30413] LOG:  listening on IPv4 address
"127.0.0.1", port 5430
2017-03-27 17:16:54.187 MSK [30413] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.5430"
2017-03-27 17:16:54.195 MSK [30414] LOG:  database system was interrupted
while in recovery at log time 2017-03-27 17:10:23 MSK
2017-03-27 17:16:54.195 MSK [30414] HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2017-03-27 17:16:54.218 MSK [30414] LOG:  database system was not properly
shut down; automatic recovery in progress
2017-03-27 17:16:54.219 MSK [30414] LOG:  redo starts at 0/260
2017-03-27 17:16:54.219 MSK [30414] LOG:  consistent recovery state reached
at 0/360
2017-03-27 17:16:54.219 MSK [30414] LOG:  invalid record length at
0/360: wanted 24, got 0
2017-03-27 17:16:54.219 MSK [30414] LOG:  redo done at 0/328
2017-03-27 17:16:54.224 MSK [30413] LOG:  database system is ready to
accept connections
 done
server started

# select pg_is_in_recovery();
 pg_is_in_recovery
───
 f
(1 row)

# select pg_walfile_name(pg_current_wal_location());
 pg_walfile_name
──
 00010003
(1 row)

If instead I put recovery back and start server, then streaming replication
continues normally.

waiting for server to start2017-03-27 17:32:16.887 MSK [30783] LOG:
 listening on IPv6 address "::1", port 5430
2017-03-27 17:32:16.887 MSK [30783] LOG:  listening on IPv6 address
"fe80::1%lo0", port 5430
2017-03-27 17:32:16.887 MSK [30783] LOG:  listening on IPv4 address
"127.0.0.1", port 5430
2017-03-27 17:32:16.888 MSK [30783] LOG:  listening on Unix socket
"/tmp/.s.PGSQL.5430"
2017-03-27 17:32:16.897 MSK [30784] LOG:  database system was interrupted
while in recovery at log time 2017-03-27 17:28:05 MSK
2017-03-27 17:32:16.897 MSK [30784] HINT:  If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2017-03-27 17:32:16.914 MSK [30784] LOG:  entering standby mode
2017-03-27 17:32:16.916 MSK [30784] LOG:  redo starts at 0/828
2017-03-27 17:32:16.916 MSK [30784] LOG:  consistent recovery state reached
at 0/960
2017-03-27 17:32:16.916 MSK [30784] LOG:  invalid record length at
0/960: wanted 24, got 0
2017-03-27 17:32:16.916 MSK [30783] LOG:  database system is ready to
accept read only connections
2017-03-27 17:32:16.920 MSK [30788] LOG:  started streaming WAL from
primary at 0/900 on timeline 1
 done
server started

# select pg_is_in_recovery();
 pg_is_in_recovery
───
 t
(1 row)

Thus, I think patch is working as expected in this case.

Also, I'd like to notice that it passes check-world without warnings on my
laptop.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Partitioned tables and relfilenode

2017-03-27 Thread Maksim Milyutin


On 24.03.2017 03:54, Amit Langote wrote:


And here is the updated patch.



Perhaps you forgot my fix in the updated patch

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3999e6e..36917c8 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1823,7 +1823,8 @@ heap_drop_with_catalog(Oid relid)
 */
if (rel->rd_rel->relkind != RELKIND_VIEW &&
rel->rd_rel->relkind != RELKIND_COMPOSITE_TYPE &&
-   rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
+   rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE &&
+   rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
{
RelationDropStorage(rel);
}

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical replication worker and statistics

2017-03-27 Thread Fujii Masao

Hi,

Logical replication worker should call pgstat_report_stat()?
Currently it doesn't seem to do that and no statistics about
table accesses by logical replication workers are collected.
For example, this can prevent autovacuum from working on
those tables properly.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Andres Freund

On 2017-03-27 09:33:43 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Checking for this isn't entirely pretty - see my attached attempt at
> > doing so.  I considered hiding
> > __attribute__((optimize("no-crossjumping"))) in execInterpExpr.c behind
> > a macro (like PG_DISABLE_CROSSJUMPING), but I don't really think that
> > makes things better.
> 
> I think it would, primarily because if we find out that some other compiler
> spells this differently, we could handle it totally within configure.

I'm unconvinced that we could sensibly map different compiler's options
on the same option name - I'd be surprised if they would have a similar
enough effect.


> Isn't our practice to put __attribute__ at the end of a function
> declaration or definition, not in the middle someplace?

We could move it to the declaration - which doesn't seem like an
improvement here, given that it's about optimization not API changes -
but in definitions you can't move it after the function name:
static Datum
ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
#ifdef HAVE_FUNCATTR_NO_CROSSJUMPING
__attribute__((optimize("no-crossjumping")))
#endif
:
/home/andres/src/postgresql/src/backend/executor/execExprInterp.c:279:1: error: 
attributes should be specified before the declarator in a function definition


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Tom Lane

Andres Freund  writes:
> On 2017-03-27 09:33:43 -0400, Tom Lane wrote:
>> I think it would, primarily because if we find out that some other compiler
>> spells this differently, we could handle it totally within configure.

> I'm unconvinced that we could sensibly map different compiler's options
> on the same option name - I'd be surprised if they would have a similar
> enough effect.

[ shrug... ]  OK, next question is whether pgindent treats this
layout sanely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-27 Thread Mike Palmiotto

On Mon, Mar 27, 2017 at 10:47 AM, Robert Haas  wrote:
> On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
>> While going over the contrib modules, I noticed that sepgsql was not
>> updated for partitioned tables.  What that appears to mean is that it's
>> not possible to define labels on partitioned tables.
>
> It works for me:
>
> rhaas=# load 'dummy_seclabel';
> LOAD
> rhaas=# create table foo (a int, b text) partition by range (a);
> CREATE TABLE
> rhaas=# security label on table foo is 'classified';
> SECURITY LABEL
>
> What exactly is the problem you're seeing?

IIRC the initial concern was that contrib/sepgsql was not updated for
RELKIND_PARTITIONED_TABLE, so the post-create hook would not work
properly for partitioned tables.

I've looked into it a bit and saw a post-alter hook in
StoreCatalogInheritance1. It seems like it may just be an issue of
adding the RELKIND_PARTITIONED_TABLE to sepgsql_relation_post_create.

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-03-27 Thread Robert Haas

On Thu, Mar 9, 2017 at 8:23 PM, Amit Langote
 wrote:
> On 2017/03/10 3:26, Robert Haas wrote:
>> I think you might have the titles for 0002 and 0003 backwards.
>
> Oops, you're right.
>
>> On Fri, Mar 3, 2017 at 2:51 AM, Amit Langote wrote:
>>> 0002: some cosmetic fixes to create_table.sgml
>>
>> I think this sentence may be unclear to some readers:
>>
>> + One might however want to set it for only some partitions,
>> +  which is possible by doing SET NOT NULL on 
>> individual
>> +  partitions.
>>
>> I think you could replace this with something like: Even if there is
>> no NOT NULL constraint on the parent, such a constraint
>> can still be added to individual partitions, if desired; that is, the
>> children can disallow nulls even if the parent allows them, but not
>> the other way around.
>
> Reads much better, done that way.  Thanks.
>
>>> 0003: add clarification about NOT NULL constraint on partition columns in
>>>   alter_table.sgml
>>
>> This is about list-ifying a note, but I think we should try to
>> de-note-ify it.  It's a giant block of text that is not obviously more
>> noteworthy than the surrounding text; I think  should be saved
>> for things that particularly need to be called out.
>
> OK.  The patch is now just about de-note-ifying the block of text.  Since
> I don't see any other lists in the Parameters portion of the page, I also
> take back my list-ifying proposal.
>
> Attached updated patches.

Committed 0002, 0003.

I think the section on BRIN in 0001 is just silly.  BRIN is a very
useful index type, possibly more useful than anything except btree,
but I think documenting it as an alternative method of partitioning is
over the top.

+ The following forms of partitioning can be implemented in
+ PostgreSQL:

Any form of partitioning can be implemented, at least to some degree,
using inheritance or UNION ALL views.  I think what this should say is
that PostgreSQL has native support for list and range partitioning,
and then it can go on top say that if this built-in support is not
suitable for a particular use case (either because you need some other
partitioning scheme or due to some other limitation), inheritance or
UNION ALL views can be used instead, adding flexibility but losing
some of the performance benefits of built-in declarative partitioning.


 Partitions may have their own indexes, constraints and default values,
-distinct from other partitions. Partitions do not inherit indexes from
-the partitioned table.
+distinct from other partitions. Partitions do not currently inherit
+indexes from the partitioned table.
+   
+
+   
+See  for more details creating partitioned
+tables and partitions.


I don't think we should add "currently"; that amounts to speculation
about what will happen in future versions.  Also, I favor collapsing
these into one paragraph.  A single-sentence paragraph tends to look
OK when you're reading the SGML directly, but it looks funny in the
rendered version.

+sub-partitioning.  It is not currently possible to
+alter a regular table into a partitioned table or vice versa.  However,
+it is possible to add a regular or partitioned table containing data into
+a partition of a partitioned table, or remove a partition; see

I think we should say "as a partition" rather than "into a partition",
assuming you're talking about ATTACH PARTITION here.

-   partitioned tables.  For example, specifying ONLY
-   when querying data from a partitioned table would not make much sense,
-   because all the data is contained in partitions, so this raises an
-   error.  Specifying ONLY when modifying schema is
-   not desirable in certain cases with partitioned tables where it may be
-   fine for regular inheritance parents (for example, dropping a column
-   from only the parent); an error will be thrown in that case.
+   partitioned tables.  Specifying ONLY when modifying
+   schema is not desirable in certain cases with partitioned tables
+   whereas it may be fine for regular inheritance parents (for example,
+   dropping a column from only the parent); an error will be thrown in
+   that case.

I don't see why this is an improvement.

-data inserted into the partitioned table cannot be routed to foreign table
-partitions.
+data inserted into the partitioned table is currently not routed to foreign
+table partitions.

Again, let's not speculate about the future.

+   Note that it is currently not supported to propagate index definition
+   from the master partitioned table to its partitions; in fact, it is
+   not possible to define indexes on partitioned tables in the first
+   place.  This might change in future releases.

Same here.

+There are currently the following limitations of using partitioned tables:

And here.  Better to write "The following limitations apply to
partitioned tables:"

+   It is currentl

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Daniel Verite

Vaishnavi Prabakaran wrote:

> Am going to include the test which you shared in the test patch. Please let
> me know if you want to cover anymore specific cases to gain confidence.

One bit of functionality that does not work in batch mode and is left
as is by the patch is the PQfn() fast path interface and the large object
functions that use it.

Currently, calling lo_* functions inside a batch will fail with a message
that depends on whether the internal lo_initialize() has been successfully
called before.

If it hasn't, PQerrorMessage() will be:
   "Synchronous command execution functions are not allowed in batch mode"
which is fine, but it comes by happenstance from lo_initialize()
calling PQexec() to fetch the function OIDs from pg_catalog.pg_proc.

OTOH, if lo_initialize() has succeeded before, a call to a large
object function will fail with a different message:
  "connection in wrong state"
which is emitted by PQfn() based on conn->asyncStatus != PGASYNC_IDLE

Having an unified error message would be more user friendly.

Concerning the doc, when saying in 33.5.2:
 "In batch mode only asynchronous operations are permitted"
the server-side functions are indeed ruled out, since PQfn() is
synchronous, but maybe we should be a bit more explicit
about that?

Chapter 34.3 (Large Objects / Client Interfaces) could also
mention the incompatibility with batch mode.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Andres Freund

On 2017-03-27 11:22:40 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-27 09:33:43 -0400, Tom Lane wrote:
> >> I think it would, primarily because if we find out that some other compiler
> >> spells this differently, we could handle it totally within configure.
> 
> > I'm unconvinced that we could sensibly map different compiler's options
> > on the same option name - I'd be surprised if they would have a similar
> > enough effect.
> 
> [ shrug... ]  OK, next question is whether pgindent treats this
> layout sanely.

The patch was run through pgindent, i.e. it seems to be ok with it.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Corey Huinker

On Mon, Mar 27, 2017 at 10:34 AM, Fabien COELHO  wrote:

>
> Hello,
>
> I think that you could use another pattern where you init the
>>> PQExpBufferData structure instead of create it, so that only the string
>>> is
>>> malloced.
>>>
>>
>> In v26, I have the functions return PQExpBuffer. The two calling functions
>> then free it, which should solve any leak.
>>
>
> Yep, it works as well.
>
> Here's an addendum that does that. I can combine them in v27, but figured
>> this was quicker.
>>
>
> It works.
>
> However having just one full patch with a number would help so that I can
> say "ready to committer" or not on something.
>
> --
> Fabien.
>

And here you go (sorry for the delay, had errands to run).


0001-psql-if-v27.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/Python: Add cursor and execute methods to plan object

2017-03-27 Thread Peter Eisentraut

On 3/22/17 11:46, Andrew Dunstan wrote:
> This is a very simple patch that does what it advertises. It applies
> cleanly and provides tests for both the new methods (plan.cursor and
> plan.execute).
> 
> Marking Ready For Committer.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 11:22 AM, Mike Palmiotto
 wrote:
> On Mon, Mar 27, 2017 at 10:47 AM, Robert Haas  wrote:
>> On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
>>> While going over the contrib modules, I noticed that sepgsql was not
>>> updated for partitioned tables.  What that appears to mean is that it's
>>> not possible to define labels on partitioned tables.
>>
>> It works for me:
>>
>> rhaas=# load 'dummy_seclabel';
>> LOAD
>> rhaas=# create table foo (a int, b text) partition by range (a);
>> CREATE TABLE
>> rhaas=# security label on table foo is 'classified';
>> SECURITY LABEL
>>
>> What exactly is the problem you're seeing?
>
> IIRC the initial concern was that contrib/sepgsql was not updated for
> RELKIND_PARTITIONED_TABLE, so the post-create hook would not work
> properly for partitioned tables.
>
> I've looked into it a bit and saw a post-alter hook in
> StoreCatalogInheritance1. It seems like it may just be an issue of
> adding the RELKIND_PARTITIONED_TABLE to sepgsql_relation_post_create.

I thought that kind of thing might be the issue, but it didn't seem to
match Stephen's description.  Adding RELKIND_PARTITIONED_TABLE to that
function seems like it would probably be sufficient to make that hook
work, although that would need to be tested, but there are numerous
other references to RELKIND_RELATION in contrib/sepgsql, some of which
probably also need to be updated to consider
RELKIND_PARTITIONED_TABLE.

In my view, this is really an enhancement to sepgsql to handle a new
PostgreSQL feature rather than a defect in the partitioning commit, so
I don't think it falls on Amit Langote (as the author) or me (as the
committer) to fix.  There might similarly be updates to sepgsql to do
something with permissions on logical replication's new publication
and subscription objects, but we should similarly regard those as
possible new features, not things that Petr or Peter need to work on.
 Note that sepgsql hasn't been updated to work with RLS yet, either,
but we didn't regard that as an open item for RLS, or if we did the
resolution was just to document it.  I am not opposed to giving a
little more time to get this straightened out, but if a patch doesn't
show up fairly soon then I think we should just document that sepgsql
doesn't support partitioned tables in v10.  sepgsql has a fairly
lengthy list of implementation restrictions already, so one more is
not going to kill anybody -- or if it will then that person should
produce a patch soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
>>  wrote:
>>> But it seems a bit futile to produce the parallel plan in the first place,
>>> because with max_parallel_workers=0 we can't possibly get any parallel
>>> workers ever. I wonder why compute_parallel_worker() only looks at
>>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>> parallel_workers = Min(parallel_workers, max_parallel_workers);
>>> Perhaps this was discussed and is actually intentional, though.
>
>> It was intentional.  See the last paragraph of
>> https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com
>
> Since this has now come up twice, I suggest adding a comment there
> that explains why we're intentionally ignoring max_parallel_workers.

Hey, imagine if the comments explained the logic behind the code!

Good idea.  How about the attached?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


add-worker-comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Tom Lane

As to the point of whether it actually helps or not ...

on gcc 6.3.1 (Fedora 25), yes it does.  Seems to be one "jmp *something"
per EEO_NEXT or EEO_JUMP.

on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
the dispatches getting routed through a common location, to *all* of them
(except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
survives as a separate jump).  This seems like a bug, but there it is.

So this means we'd need some serious research to decide whether to apply
it.  And I'm suspecting we'd end up with a compiler version test.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Jeff Janes

On Fri, Mar 24, 2017 at 12:49 AM, Amit Kapila 
wrote:

> On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma 
> wrote:
> >>
> >> I think it would have been probably okay to use *int* for ntuples as
> >> that matches with what you are actually assigning in the function.
> >
> > okay, corrected it. Attached is newer version of patch.
> >
>
> Thanks, this version looks good to me.
>

It solves the problem for me.  I'd like to test that I get the right answer
on the standby, not just the absence of a crash, but I don't know how to do
that effectively.  Has anyone used the new wal replay block consistency
tool on hash indexes since this microvacuum code was committed?

Jeff

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas

On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
 wrote:
> About the original issue reported by Tomas, I did more debugging and
> found that - problem was gather_merge_clear_slots() was not returning
> the clear slot when nreader is zero (means nworkers_launched = 0).
> Due to the same scan was continue even all the tuple are exhausted,
> and then end up with server crash at gather_merge_getnext(). In the patch
> I also added the Assert into gather_merge_getnext(), about the index
> should be less then the nreaders + 1 (leader).

Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be.  Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication worker and statistics

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 11:14 AM, Fujii Masao  wrote:
> Logical replication worker should call pgstat_report_stat()?
> Currently it doesn't seem to do that and no statistics about
> table accesses by logical replication workers are collected.
> For example, this can prevent autovacuum from working on
> those tables properly.

Yeah, that doesn't sound good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Allow vacuums to report oldestxmin

2017-03-27 Thread Masahiko Sawada

On Sun, Mar 26, 2017 at 2:26 AM, Masahiko Sawada  wrote:
> On Sun, Mar 26, 2017 at 1:37 AM, Fujii Masao  wrote:
>> On Mon, Mar 6, 2017 at 9:37 PM, Masahiko Sawada  
>> wrote:
>>> On Fri, Mar 3, 2017 at 10:50 PM, Simon Riggs  wrote:
 Allow vacuums to report oldestxmin

 Allow VACUUM and Autovacuum to report the oldestxmin value they
 used while cleaning tables, helping to make better sense out of
 the other statistics we report in various cases.

 Branch
 --
 master

 Details
 ---
 http://git.postgresql.org/pg/commitdiff/9eb344faf54a849898d9be012ddfa8204cfeb57c

 Modified Files
 --
 src/backend/commands/vacuumlazy.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)


>>>
>>> Should we change the example in vacuum.sgml file as well? Attached patch.
>>
>> "tuples" in the above should be "row versions"?
>> We should review not only this line but also all the lines in the example
>> of VERBOSE output, I think.
>
> Right. These verbose log messages are out of date. I ran
> VACUUM(VERBOSE, ANALYZE) with same scenario as current example as
> possible. Attached patch updates verbose log messages.
>
>

Surprisingly the changes "tuples" -> "row versions" in vacuumlazy.c is
introduced by commit feb4f44d296b88b7f0723f4a4f3945a371276e0b in 2003.
If this patch is applied, it should back-patch to all supported
branches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-27 Thread Mike Palmiotto

On Mon, Mar 27, 2017 at 11:46 AM, Robert Haas  wrote:
> On Mon, Mar 27, 2017 at 11:22 AM, Mike Palmiotto
>  wrote:
>> On Mon, Mar 27, 2017 at 10:47 AM, Robert Haas  wrote:
>>> On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
 While going over the contrib modules, I noticed that sepgsql was not
 updated for partitioned tables.  What that appears to mean is that it's
 not possible to define labels on partitioned tables.
>>>
>>> It works for me:
>>>
>>> rhaas=# load 'dummy_seclabel';
>>> LOAD
>>> rhaas=# create table foo (a int, b text) partition by range (a);
>>> CREATE TABLE
>>> rhaas=# security label on table foo is 'classified';
>>> SECURITY LABEL
>>>
>>> What exactly is the problem you're seeing?
>>
>> IIRC the initial concern was that contrib/sepgsql was not updated for
>> RELKIND_PARTITIONED_TABLE, so the post-create hook would not work
>> properly for partitioned tables.
>>
>> I've looked into it a bit and saw a post-alter hook in
>> StoreCatalogInheritance1. It seems like it may just be an issue of
>> adding the RELKIND_PARTITIONED_TABLE to sepgsql_relation_post_create.
>
> I thought that kind of thing might be the issue, but it didn't seem to
> match Stephen's description.  Adding RELKIND_PARTITIONED_TABLE to that
> function seems like it would probably be sufficient to make that hook
> work, although that would need to be tested, but there are numerous
> other references to RELKIND_RELATION in contrib/sepgsql, some of which
> probably also need to be updated to consider
> RELKIND_PARTITIONED_TABLE.

Agreed.

>
> In my view, this is really an enhancement to sepgsql to handle a new
> PostgreSQL feature rather than a defect in the partitioning commit, so
> I don't think it falls on Amit Langote (as the author) or me (as the
> committer) to fix.  There might similarly be updates to sepgsql to do
> something with permissions on logical replication's new publication
> and subscription objects, but we should similarly regard those as
> possible new features, not things that Petr or Peter need to work on.

I'll keep these items in mind for future reference. Thanks.

>  Note that sepgsql hasn't been updated to work with RLS yet, either,
> but we didn't regard that as an open item for RLS, or if we did the
> resolution was just to document it.  I am not opposed to giving a
> little more time to get this straightened out, but if a patch doesn't
> show up fairly soon then I think we should just document that sepgsql
> doesn't support partitioned tables in v10.  sepgsql has a fairly
> lengthy list of implementation restrictions already, so one more is
> not going to kill anybody -- or if it will then that person should
> produce a patch soon.

Okay, I'll make sure I get something fleshed out today or tomorrow.

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread David Rowley

On 28 March 2017 at 04:57, Robert Haas  wrote:
> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>  wrote:
>> About the original issue reported by Tomas, I did more debugging and
>> found that - problem was gather_merge_clear_slots() was not returning
>> the clear slot when nreader is zero (means nworkers_launched = 0).
>> Due to the same scan was continue even all the tuple are exhausted,
>> and then end up with server crash at gather_merge_getnext(). In the patch
>> I also added the Assert into gather_merge_getnext(), about the index
>> should be less then the nreaders + 1 (leader).
>
> Well, you and David Rowley seem to disagree on what the fix is here.
> His patches posted upthread do A, and yours do B, and from a quick
> look those things are not just different ways of spelling the same
> underlying fix, but actually directly conflicting ideas about what the
> fix should be.  Any chance you can review his patches, and maybe he
> can review yours, and we could try to agree on a consensus position?
> :-)

When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.

I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-27 Thread Kevin Grittner

On Sat, Mar 25, 2017 at 9:24 PM, Mengxing Liu
 wrote:

> I've updated the proposal according to your comments.
> But I am still wondering that can you review it for a double-check
> to make sure I've made everything clear?

Additional comments added.

I'm afraid a few new issues came to mind reading it again.  (Nothing
serious; just some points that could benefit from a little
elaboration.)

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Tom Lane

I wrote:
> As to the point of whether it actually helps or not ...
> on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> the dispatches getting routed through a common location, to *all* of them
> (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> survives as a separate jump).  This seems like a bug, but there it is.

So after a bit of googling, this is a very longstanding complaint:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785

(hm, think I know the second submitter)

I'm not sure we should be relying on a gcc bug fix that's barely four
months old.  Even assuming it fixes the issue without new regressions,
most people are not going to have it anytime soon.

My feeling at this point is that we might be better off disabling
the computed-goto case by default.  At the very least, we're going
to need a version check that restricts it to latest gcc.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Andres Freund

On 2017-03-27 11:52:05 -0400, Tom Lane wrote:
> As to the point of whether it actually helps or not ...
> 
> on gcc 6.3.1 (Fedora 25), yes it does.  Seems to be one "jmp *something"
> per EEO_NEXT or EEO_JUMP.
> 
> on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> the dispatches getting routed through a common location, to *all* of them
> (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> survives as a separate jump).  This seems like a bug, but there it is.

Gah :(.  I have gcc 5,6,7 here, but nothing earlier.  I'm now inclined
to go the version check routine, for the individual versions we can (and
want) confirm this on...  I'm not too concerned about not doing so on
gcc 4.4 or older...


> So this means we'd need some serious research to decide whether to apply
> it.  And I'm suspecting we'd end up with a compiler version test.

Indeed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve performance of find_all_inheritors()

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 12:08 PM, Teodor Sigaev  wrote:
> Improve performance of find_all_inheritors()
>
> Previous coding uses three nested loops which obviously were a pain for
> large number of table's children. Patch replaces inner loop with
> a hashmap.
>
> Author: Aleksander Alekseev
> Reviewed-by: me
>
> https://commitfest.postgresql.org/13/1058/

I'm a little worried that this will be noticeably slower when the
number of partitions is small, like 4 or 8 or 16.  In other places,
we've found it necessary to support both a list-based strategy and a
hash-based strategy to avoid regressing small cases (e.g.
join_rel_list + join_rel_hash in PlannerInfo, ResourceArray in
resowner.c, etc).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
 wrote:
> On 28 March 2017 at 04:57, Robert Haas  wrote:
>> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>>  wrote:
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>
>> Well, you and David Rowley seem to disagree on what the fix is here.
>> His patches posted upthread do A, and yours do B, and from a quick
>> look those things are not just different ways of spelling the same
>> underlying fix, but actually directly conflicting ideas about what the
>> fix should be.  Any chance you can review his patches, and maybe he
>> can review yours, and we could try to agree on a consensus position?
>> :-)
>
> When comparing Rushabh's to my minimal patch, both change
> gather_merge_clear_slots() to clear the leader's slot. My fix
> mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> leader's slot, but seems that's not required since
> gather_merge_readnext() sets the leader's slot to the output of
> ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> mistake. Rushabh's patch sidesteps this by adding a conditional
> pfree() to not free slot that we didn't allocate in the first place.
>
> I do think the code could be improved a bit. I don't like the way
> GatherMergeState's nreaders and nworkers_launched are always the same.
> I think this all threw me off a bit and may have been the reason for
> the bug in the first place.

Yeah, if those fields are always the same, then I think that they
should be merged.  That seems undeniably a good idea.  Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing.  I think
that at least needs some comments or something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev


Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even

I saw a lot such cases.


If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.

Agree

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tom Lane

Robert Haas  writes:
> On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
>> Since this has now come up twice, I suggest adding a comment there
>> that explains why we're intentionally ignoring max_parallel_workers.

> Good idea.  How about the attached?

WFM ... but seems like there should be some flavor of this statement
in the user-facing docs too (ie, "max_parallel_workers_per_gather >
max_parallel_workers is a bad idea unless you're trying to test what
happens when a plan can't get all the workers it planned for").  The
existing text makes some vague allusions suggesting that the two
GUCs might be interrelated, but I think it could be improved.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev


Thank you, pushed

Michael Paquier wrote:

On Fri, Mar 24, 2017 at 11:36 PM, Teodor Sigaev  wrote:

And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.



Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even
no option to configure that at command level. And surely we cannot
support any fancy scenario that people figure out using PGDATA.
Existing callers of fsync_fname don't bother about this case as well
by the way, take the calls related to pg_logical and pg_repslot.

If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Ashutosh Sharma

>> >>
>> >> I think it would have been probably okay to use *int* for ntuples as
>> >> that matches with what you are actually assigning in the function.
>> >
>> > okay, corrected it. Attached is newer version of patch.
>> >
>>
>> Thanks, this version looks good to me.
>
>
> It solves the problem for me.

Great..Thanks for confirming.

I'd like to test that I get the right answer
> on the standby, not just the absence of a crash, but I don't know how to do
> that effectively.  Has anyone used the new wal replay block consistency tool
> on hash indexes since this microvacuum code was committed?

Yes, I have used it for hash index but I used it after below commit,

commit 9abbf4727de746222ad8fc15b17348065389ae43
Author: Robert Haas 
Date:   Mon Mar 20 15:55:27 2017 -0400

Another fix for single-page hash index vacuum.

The WAL consistency checking code needed to be updated for the new
page status bit, but that didn't get done previously.

All I did was set 'wal_consistency_checking = 'hash'' in
postgresql.conf and ran test cases on primary server. If there is any
inconsistent block on standby the tool would probably terminate the
recovery process and you would see following message in the server
logfile.

"inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u"

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas

On Mon, Mar 27, 2017 at 12:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
>>> Since this has now come up twice, I suggest adding a comment there
>>> that explains why we're intentionally ignoring max_parallel_workers.
>
>> Good idea.  How about the attached?
>
> WFM ... but seems like there should be some flavor of this statement
> in the user-facing docs too (ie, "max_parallel_workers_per_gather >
> max_parallel_workers is a bad idea unless you're trying to test what
> happens when a plan can't get all the workers it planned for").  The
> existing text makes some vague allusions suggesting that the two
> GUCs might be interrelated, but I think it could be improved.

Do you have a more specific idea?  I mean, this seems like a
degenerate case of what the documentation for
max_parallel_workers_per_gather says already. Even if
max_parallel_workers_per_gather <= Min(max_worker_processes,
max_parallel_workers), it's quite possible that you'll regularly be
generating plans that can't obtain the budgeted number of workers.
The only thing that is really special about the case where
max_parallel_workers_per_gather > Min(max_worker_processes,
max_parallel_workers) is that this can happen even on an
otherwise-idle system.  I'm not quite sure how to emphasize that
without seeming to state the obvious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Rushabh Lathia

On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas  wrote:

> On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
>  wrote:
> > On 28 March 2017 at 04:57, Robert Haas  wrote:
> >> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
> >>  wrote:
> >>> About the original issue reported by Tomas, I did more debugging and
> >>> found that - problem was gather_merge_clear_slots() was not returning
> >>> the clear slot when nreader is zero (means nworkers_launched = 0).
> >>> Due to the same scan was continue even all the tuple are exhausted,
> >>> and then end up with server crash at gather_merge_getnext(). In the
> patch
> >>> I also added the Assert into gather_merge_getnext(), about the index
> >>> should be less then the nreaders + 1 (leader).
> >>
> >> Well, you and David Rowley seem to disagree on what the fix is here.
> >> His patches posted upthread do A, and yours do B, and from a quick
> >> look those things are not just different ways of spelling the same
> >> underlying fix, but actually directly conflicting ideas about what the
> >> fix should be.  Any chance you can review his patches, and maybe he
> >> can review yours, and we could try to agree on a consensus position?
> >> :-)
> >
> > When comparing Rushabh's to my minimal patch, both change
> > gather_merge_clear_slots() to clear the leader's slot. My fix
> > mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> > leader's slot, but seems that's not required since
> > gather_merge_readnext() sets the leader's slot to the output of
> > ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> > mistake. Rushabh's patch sidesteps this by adding a conditional
> > pfree() to not free slot that we didn't allocate in the first place.
> >
> > I do think the code could be improved a bit. I don't like the way
> > GatherMergeState's nreaders and nworkers_launched are always the same.
> > I think this all threw me off a bit and may have been the reason for
> > the bug in the first place.
>
> Yeah, if those fields are always the same, then I think that they
> should be merged.  That seems undeniably a good idea.


Hmm I agree that it's good idea, and I will work on that as separate patch.


> Possibly we
> should fix the crash bug first, though, and then do that afterwards.
> What bugs me a little about Rushabh's fix is that it looks like magic.
> You have to know that we're looping over two things and freeing them
> up, but there's one more of one thing than the other thing.  I think
> that at least needs some comments or something.
>
>
So in my second version of patch I change  gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.


Thanks,
Rushabh Lathia
www.EnterpriseDB.com

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Andres Freund

On 2017-03-27 12:18:37 -0400, Tom Lane wrote:
> I wrote:
> > As to the point of whether it actually helps or not ...
> > on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> > the dispatches getting routed through a common location, to *all* of them
> > (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> > survives as a separate jump).  This seems like a bug, but there it is.
> 
> So after a bit of googling, this is a very longstanding complaint:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
> 
> (hm, think I know the second submitter)

I don't think that's precisely the same issue - as long as some of the
goto branches survive, we're not hitting the full brunt of the compgoto
thing.  I think we're essentially avoiding some of that because we're
"precomputing" the dispatch_table lookup.

To count the number of jumps I used:
gdb  -batch -ex 'disassemble/s ExecInterpExpr' execExprInterp.o|grep jmpq|grep 
-v ExecInterpExpr|wc -l
which'll only include indirect jumps (since otherwise jumps will look
like "jmpq   0x35f2 ")

standard flags  -fno-crossjumping
gcc-5 (5.4.1-8):34  82
gcc-6 (6.3.0-8):34  82
gcc-7 (7.0.1):  71  108
gcc-snapshot:   72  108

So that doesn't look too bad.


> My feeling at this point is that we might be better off disabling
> the computed-goto case by default.  At the very least, we're going
> to need a version check that restricts it to latest gcc.

In my measurements it's still faster in at least gcc-5/6, even without
the option (largely because it avoids array bounds checks on the jump
table built for the switch).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 188 matches

Mail list logo