date:20170518

[HACKERS] Typo in json.c

2017-05-18 Thread Daniel Gustafsson

Spotted while reading code, patch attached.

cheers ./daniel



typo-json.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in json.c

2017-05-18 Thread Heikki Linnakangas


On 05/18/2017 10:17 AM, Daniel Gustafsson wrote:

Spotted while reading code, patch attached.


Applied, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1/m68k: static assertion failed: "MAXALIGN too small to fit int32"

2017-05-18 Thread Heikki Linnakangas


On 05/17/2017 10:39 PM, Christoph Berg wrote:

Not sure if a lot of people still care about m68k, but it's still one
of the unofficial Debian ports (it used to be the first non-x86 port
done decades ago):

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels 
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 
-fdebug-prefix-map=/<>=. -fstack-protector-strong -Wformat -Werror=format-security 
-I/usr/include/mit-krb5 -no-pie -I../../../../src/include -I/<>/build/../src/include 
-Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/tcl8.6   -c -o slab.o 
/<>/build/../src/backend/utils/mmgr/slab.c
In file included from /<>/build/../src/include/postgres.h:47:0,
 from 
/<>/build/../src/backend/utils/mmgr/slab.c:53:
/<>/build/../src/backend/utils/mmgr/slab.c: In function 
'SlabContextCreate':
/<>/build/../src/include/c.h:753:7: error: static assertion failed: 
"MAXALIGN too small to fit int32"
  do { _Static_assert(condition, errmessage); } while(0)
   ^
/<>/build/../src/backend/utils/mmgr/slab.c:198:2: note: in 
expansion of macro 'StaticAssertStmt'
  StaticAssertStmt(MAXIMUM_ALIGNOF >= sizeof(int),
  ^~~~
: recipe for target 'slab.o' failed
make[5]: *** [slab.o] Error 1


If that's all that prevents it from working, by all means let's fix it. 
I think this should do it, although I don't have a system to test it on:


diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 0fcfcb4c78..e59154ddda 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -194,9 +194,9 @@ SlabContextCreate(MemoryContext parent,
 MAXALIGN(sizeof(SlabChunk)),
 "padding calculation in SlabChunk is 
wrong");

-   /* otherwise the linked list inside freed chunk isn't guaranteed to fit 
*/
-   StaticAssertStmt(MAXIMUM_ALIGNOF >= sizeof(int),
-"MAXALIGN too small to fit int32");
+   /* Make sure the linked list node fits inside a freed chunk */
+   if (chunkSize < sizeof(int))
+   chunkSize = sizeof(int);

/* chunk, including SLAB header (both addresses nicely aligned) */
fullChunkSize = MAXALIGN(sizeof(SlabChunk) + MAXALIGN(chunkSize));

It adds a few instructions to check that on all platforms, unless the 
compiler can optimize that away, but this is not performance critical.


I'll commit that, barring objections. If you can verify that it fixes 
the problem before that, that'd be great, otherwise I guess we'll find 
out some time after the commit.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] zajimavy clanek - par vychytavek v pg

2017-05-18 Thread Pavel Stehule

Ahoj

viz http://blog.cleverelephant.ca/2017/05/great-postgresql.html

Pavel

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,
> Takayuki
> Done.  I'll examine whether we can use SQLSTATE.

I tried conceivable errors during connection.  Those SQLSTATEs are as follows:

[transient error (after which you may want to try next host)]
53300 FATAL:  too many connections for role "tuna"
57P03 FATAL:  the database system is starting up

[configuration error (after which you may give up connection without other 
hosts. Really?)]
55000 FATAL:  database "template0" is not currently accepting connections
3D000 FATAL:  database "aaa" does not exist
28000 FATAL:  no pg_hba.conf entry for host "::1", user "tunakawa", database 
"postgres", SSL off
28000 FATAL:  role "nouser" does not exist
28P01 FATAL:  password authentication failed for user "tuna"
28P01 DETAIL:  Password does not match for user "tuna".


I looked through the SQLSTATEs, and thought the below ones could possibly be 
returned during connection:

https://www.postgresql.org/docs/devel/static/errcodes-appendix.html

[transient error]
Class 08 - Connection Exception
Class 40 - Transaction Rollback 
Class 53 - Insufficient Resources 
Class 54 - Program Limit Exceeded 
Class 55 - Object Not In Prerequisite State 
Class 57 - Operator Intervention 
Class 58 - System Error (errors external to PostgreSQL itself) 
Class XX - Internal Error 

[configuration error]
Class 28 - Invalid Authorization Specification 
Class 3D - Invalid Catalog Name 
Class 42 - Syntax Error or Access Rule Violation 

So, how about trying connection to the next host when the class code is neither 
28, 3D, nor 42?

Honestly, I'm not happy with this approach, for a maintenance reason that 
others are worried about.  Besides, when the connection target is not postgres 
and returns invalid data, no SQLSTATE is available.  I'm sorry to repeat 
myself, but I believe PgJDBC's approach is practically good.  If you think the 
SQLSTATE is the only way to go, I will put up with it.  It would be 
disappointing if nothing is done.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Regression in join selectivity estimations when using foreign keys

2017-05-18 Thread David Rowley

I've been analyzing a reported regression case between a 9.5 plan and
a 9.6 plan. I tracked this down to the foreign key join selectivity
code, specifically the use_smallest_selectivity code which is applied
to outer joins where the referenced table is on the outer side of the
join.

A vastly simplified example case is:

create table fkest (a int, b int, c int unique, primary key(a,b));
create table fkest1 (a int, b int, primary key(a,b));

insert into fkest select x/10,x%10, x from generate_Series(1,400) x;
insert into fkest1 select x/10,x%10 from generate_Series(1,400) x;

alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b)
references fkest;

analyze fkest;
analyze fkest1;

explain (costs on) select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
where f.c = 1;
 QUERY PLAN

 Hash Left Join  (cost=24.15..41.89 rows=996 width=36)
   Hash Cond: ((f.a = f3.a) AND (f.b = f3.b))
   ->  Hash Left Join  (cost=12.15..28.36 rows=100 width=28)
 Hash Cond: ((f.a = f2.a) AND (f.b = f2.b))
 ->  Nested Loop Left Join  (cost=0.15..16.21 rows=10 width=20)
   ->  Seq Scan on fkest f  (cost=0.00..8.00 rows=1 width=12)
 Filter: (c = 1)
   ->  Index Only Scan using fkest1_pkey on fkest1 f1
(cost=0.15..8.17 rows=1 width=8)
 Index Cond: ((a = f.a) AND (b = f.b))
 ->  Hash  (cost=6.00..6.00 rows=400 width=8)
   ->  Seq Scan on fkest1 f2  (cost=0.00..6.00 rows=400 width=8)
   ->  Hash  (cost=6.00..6.00 rows=400 width=8)
 ->  Seq Scan on fkest1 f3  (cost=0.00..6.00 rows=400 width=8)
(13 rows)

-- now drop the foreign key to simulate the behaviour pre-9.6

alter table fkest1 drop constraint fkest1_a_b_fkey;

explain (costs on) select * from fkest f
left join fkest1 f1 on f.a = f1.a and f.b = f1.b
left join fkest1 f2 on f.a = f2.a and f.b = f2.b
left join fkest1 f3 on f.a = f3.a and f.b = f3.b
where f.c = 1;
 QUERY PLAN

 Nested Loop Left Join  (cost=0.44..32.62 rows=1 width=36)
   ->  Nested Loop Left Join  (cost=0.29..24.41 rows=1 width=28)
 ->  Nested Loop Left Join  (cost=0.15..16.21 rows=1 width=20)
   ->  Seq Scan on fkest f  (cost=0.00..8.00 rows=1 width=12)
 Filter: (c = 1)
   ->  Index Only Scan using fkest1_pkey on fkest1 f1
(cost=0.15..8.17 rows=1 width=8)
 Index Cond: ((a = f.a) AND (b = f.b))
 ->  Index Only Scan using fkest1_pkey on fkest1 f2
(cost=0.15..8.17 rows=1 width=8)
   Index Cond: ((a = f.a) AND (b = f.b))
   ->  Index Only Scan using fkest1_pkey on fkest1 f3
(cost=0.15..8.17 rows=1 width=8)
 Index Cond: ((a = f.a) AND (b = f.b))
(11 rows)

The problem is that the use_smallest_selectivity in
get_foreign_key_join_selectivity() here is calculating 1/10 instead of
1/100 for each join level. Really it should be multiplying the
selectivities instead of taking the minimum selectivity, but even that
does not seem correct.

That bad estimate propagates up the join tree until some pretty insane
estimates come back.

The following is the offending code:

foreach(cell, removedlist)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
Selectivity csel;

csel = clause_selectivity(root, (Node *) rinfo,
 0, jointype, sjinfo);
thisfksel = Min(thisfksel, csel);
}

This code is only applied to outer joins. I've studied this a bit and
generally, I'm unsure why we're treating outer joins in a special way
at all. Some comments indicate that this is in regards to NULLs, but
I'm not sure why we've code special code for outer joins and none for
INNER joins.

It's quite easy to show how this regresses in general in regards to
NULLs on INNER joins with:

create table part (partid int primary key);
create table sale (saleid int primary key, partid int references part);
insert into part select generate_series(1,1000);
insert into sale select x,NULL from generate_series(1,1000) x;
analyze part;
analyze sale;
explain analyze select * from part p inner join sale s on p.partid =
s.partid; -- using foreign key join estimations
QUERY PLAN
---
 Hash Join  (cost=27.50..55.12 rows=1000 width=12) (actual
time=0.733..0.733 rows=0 loops=1)
   Hash Cond: (s.partid = p.partid)
   ->  Seq Scan on sale s  (cost=0.00..15.00 rows=1000 width=8)
(actual time=0.018..0.332 rows=1000 loops=1)
   ->  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual
time=0.336..0.336 rows=1000 loop

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-05-18 Thread Ashutosh Bapat

On Thu, Apr 6, 2017 at 6:37 AM, Robert Haas  wrote:
>
>> There's a relevant comment in 0006, build_joinrel_partition_info()
>> (probably that name needs to change, but I will do that once we have
>> settled on design)
>> +   /*
>> +* Construct partition keys for the join.
>> +*
>> +* An INNER join between two partitioned relations is partition by key
>> +* expressions from both the relations. For tables A and B
>> partitioned by a and b
>> +* respectively, (A INNER JOIN B ON A.a = B.b) is partitioned by both A.a
>> +* and B.b.
>> +*
>> +* An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
>> +* B.b NULL. These rows may not fit the partitioning conditions imposed 
>> on
>> +* B.b. Hence, strictly speaking, the join is not partitioned by B.b.
>> +* Strictly speaking, partition keys of an OUTER join should include
>> +* partition key expressions from the OUTER side only. Consider a join 
>> like
>> +* (A LEFT JOIN B on (A.a = B.b) LEFT JOIN C ON B.b = C.c. If we do not
>> +* include B.b as partition key expression for (AB), it prohibits us from
>> +* using partition-wise join when joining (AB) with C as there is no
>> +* equi-join between partition keys of joining relations. But two NULL
>> +* values are never equal and no two rows from mis-matching partitions 
>> can
>> +* join. Hence it's safe to include B.b as partition key expression for
>> +* (AB), even though rows in (AB) are not strictly partitioned by B.b.
>> +*/
>>
>> I think that also needs to be reviewed carefully.
>
> The following passage from src/backend/optimizer/README seems highly relevant:
>
> ===
> The planner's treatment of outer join reordering is based on the following
> identities:
>
> 1.  (A leftjoin B on (Pab)) innerjoin C on (Pac)
> = (A innerjoin C on (Pac)) leftjoin B on (Pab)
>
> where Pac is a predicate referencing A and C, etc (in this case, clearly
> Pac cannot reference B, or the transformation is nonsensical).
>
> 2.  (A leftjoin B on (Pab)) leftjoin C on (Pac)
> = (A leftjoin C on (Pac)) leftjoin B on (Pab)
>
> 3.  (A leftjoin B on (Pab)) leftjoin C on (Pbc)
> = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
>
> Identity 3 only holds if predicate Pbc must fail for all-null B rows
> (that is, Pbc is strict for at least one column of B).  If Pbc is not
> strict, the first form might produce some rows with nonnull C columns
> where the second form would make those entries null.
> ===
>
> In other words, I think your statement that null is never equal to
> null is a bit imprecise.  Somebody could certainly create an operator
> that is named "=" which returns true in that case, and then they could
> say, hey, two nulls are equal (when you use that operator).  The
> argument needs to be made in terms of the formal properties of the
> operator.

[.. some portion clipped .. ]

> The relevant logic is in have_partkey_equi_join:
>
> +   /* Skip clauses which are not equality conditions. */
> +   if (rinfo->hashjoinoperator == InvalidOid &&
> !rinfo->mergeopfamilies)
> +   continue;
>
> Actually, I think the hashjoinoperator test is formally and
> practically unnecessary here; lower down there is a test to see
> whether the partitioning scheme's operator family is a member of
> rinfo->mergeopfamilies, which will certainly fail if we got through
> this test with rinfo->mergeopfamilies == NIL just on the strength of
> rinfo->hashjoinoperator != InvalidOid.  So you can just bail out if
> rinfo->mergeopfamilies == NIL.  But the underlying point here is that
> the only thing you really know about the function is that it's got to
> be a strategy-3 operator in some btree opclass; if that guarantees
> strictness, then so be it -- but I wasn't able to find anything in the
> code or documentation off-hand that supports that contention, so we
> might need to think a bit more about why (or if) this is guaranteed to
> be true.
>
>> Partition-wise joins
>> may be happy including partition keys from all sides, but
>> partition-wise aggregates may not be, esp. when pushing complete
>> aggregation down to partitions. In that case, rows with NULL partition
>> key, which falls on nullable side of join, will be spread across
>> multiple partitions. Proabably, we should separate nullable and
>> non-nullable partition key expressions.
>
> I don't think I understand quite what you're getting at here.  Can you
> spell this out in more detail?  To push an aggregate down to
> partitions, you need the grouping key to match the applicable
> partition key, and the partition key shouldn't allow nulls in more
> than one place.  Now I think your point may be that outer join
> semantics could let them creep in there, e.g. SELECT b.x, sum(a.y)
> FROM a LEFT JOIN b ON a.x = b.x GROUP BY 1 -- which would indeed be a
> good test case for partitionwise aggregate.  I'd be inclined to think
> that we s

Re: [HACKERS] Cached plans and statement generalization

2017-05-18 Thread Konstantin Knizhnik




On 15.05.2017 18:31, Robert Haas wrote:

On Wed, May 10, 2017 at 12:11 PM, Konstantin Knizhnik
 wrote:

Robert, can you please explain why using TRY/CATCH is not safe here:

This is definitely not a safe way of using TRY/CATCH.

This has been discussed many, many times on this mailing list before,
and I don't really want to go over it again here.  We really need a
README or some documentation about this so that we don't have to keep
answering this same question again and again.

First of all I want to notice that new version of my patch is not using 
PG_TRY/PG_CATCH.

But I still want to clarify for myself whats wrong with this constructions.
I searched both hackers mailing list archive and world-wide using google 
but failed to find any references except of

sharing non-volatilie variables between try and catch blocks.
Can you please point me at the thread where this problem was discussed 
or just explain in few words the source of the problem?


From my own experience I found out that PG_TRY/PG_CATCH mechanism is 
not providing proper cleanup (unlike C++ exceptions).
If there are opened relations, catalog cache entries,... then throwing 
error will not release them.
It will cause no problems if error is handled in PostgresMain which 
aborts current transaction and releases all resources in any case.
But if I want to ignore this error and continue query execution, then 
warnings about resources leaks can be reported.

Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1/m68k: static assertion failed: "MAXALIGN too small to fit int32"

2017-05-18 Thread Christoph Berg

Re: Heikki Linnakangas 2017-05-18 
> I'll commit that, barring objections. If you can verify that it fixes the
> problem before that, that'd be great, otherwise I guess we'll find out some
> time after the commit.

Please go ahead, I don't think I have online access to a m68k machine.
(It got demoted to an unofficial port some time ago and the old Debian
porter machines got taken down).

Thanks,
Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-18 Thread Amit Langote

On 2017/05/18 7:13, Thomas Munro wrote:
> On Wed, May 17, 2017 at 7:42 PM, Thomas Munro
>  wrote:
>> On Wed, May 17, 2017 at 6:04 PM, Amit Langote
>>  wrote:
>>> targetRelInfo should instead be set to mtstate->rootResultRelInfo that was
>>> set in ExecInitModifyTable() as described above, IOW, as follows:
>>>
>>> /* Partitioned table. */
>>> if (mtstate->rootResultRelInfo != NULL)
>>> targetRelInfo = mtstate->rootResultRelInfo;
>>
>> Ah, I see.  Thank you.  Fixed in the attached.
> 
> Here's a post-pgindent rebase.

I read through the latest patch.  Some comments:

Do we need to update documentation?  Perhaps, some clarification on the
inheritance/partitioning behavior somewhere.

+typedef struct TriggerTransitionState
+{
...
+boolttf_delete_old_table;

Just curious: why ttf_?  TriggerTransition field?

-Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL));
+Assert((enrmd->reliddesc == InvalidOid) !=
+   (enrmd->tupdesc == NULL));

Perhaps, unintentional change?

+original_tuple = tuple;
 map = mtstate->mt_partition_tupconv_maps[leaf_part_index];
 if (map)
 {
@@ -570,8 +572,17 @@ ExecInsert(ModifyTableState *mtstate,
 setLastTid(&(tuple->t_self));
 }

+/*
+ * If we inserted into a partitioned table, then insert routing logic may
+ * have converted the tuple to a partition's format.  Make the original
+ * unconverted tuple available for transition tables.
+ */
+if (mtstate->mt_transition_state != NULL)
+mtstate->mt_transition_state->original_insert_tuple = original_tuple;

I'm not sure if it's significant for transition tables, but what if a
partition's BR trigger modified the tuple?  Would we want to include the
modified version of the tuple in the transition table or the original as
the patch does?  Same for the code in CopyFrom().

  * 'tup_conv_maps' receives an array of TupleConversionMap objects with one
  *  entry for every leaf partition (required to convert input tuple based
  *  on the root table's rowtype to a leaf partition's rowtype after tuple
- *  routing is done
+ *  routing is done)

Oh, thanks! :)

Other than the above minor nitpicks, patch looks good to me.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Michael Paquier

On Thu, May 18, 2017 at 5:05 PM, Tsunakawa, Takayuki
 wrote:
> So, how about trying connection to the next host when the class code is 
> neither 28, 3D, nor 42?
>
> Honestly, I'm not happy with this approach, for a maintenance reason that 
> others are worried about.  Besides, when the connection target is not 
> postgres and returns invalid data, no SQLSTATE is available.  I'm sorry to 
> repeat myself, but I believe PgJDBC's approach is practically good.  If you 
> think the SQLSTATE is the only way to go, I will put up with it.  It would be 
> disappointing if nothing is done.

FWIW, I am of the opinion to not have an implementation based on any
SQLSTATE codes, as well as not doing something similar to JDBC.
Keeping things simple is one reason, a second is that the approach
taken by libpq is correct at its root.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-05-18 Thread Amit Kapila

On Wed, May 17, 2017 at 4:05 PM, Dilip Kumar  wrote:
> On Wed, May 17, 2017 at 3:15 PM, Amit Kapila  wrote:
>>> Earlier I thought that option1 is better but later I think that this
>>> can complicate the situation as we are firing first BR update then BR
>>> delete and can change the row multiple time and defining such
>>> behaviour can be complicated.
>>>
>>
>> If we have to go by this theory, then the option you have preferred
>> will still execute BR triggers for both delete and insert, so input
>> row can still be changed twice.
>
> Yeah, right as per my theory above Option3 have the same problem.
>
> But after putting some more thought I realised that only for "Before
> Update" or the "Before Insert" trigger row can be changed.
>

Before Row Delete triggers can suppress the delete operation itself
which is kind of unintended in this case.  I think without the user
being aware it doesn't seem advisable to execute multiple BR triggers.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

2017-05-18 Thread Sveinn Sveinsson

The patch fixed the problem, thanks a lot.
Regards,
Sveinn.

On fim 18.maí 2017 01:53, Amit Langote wrote:
> On 2017/05/18 10:49, Amit Langote wrote:
>> On 2017/05/18 2:14, Dilip Kumar wrote:
>>> On Wed, May 17, 2017 at 7:41 PM,   wrote:
 (gdb) bt
 #0  0x0061ab1b in list_nth ()
 #1  0x005e4081 in ExecLockNonLeafAppendTables ()
 #2  0x005f4d52 in ExecInitMergeAppend ()
 #3  0x005e0365 in ExecInitNode ()
 #4  0x005f35a7 in ExecInitLimit ()
 #5  0x005e00f3 in ExecInitNode ()
 #6  0x005dd207 in standard_ExecutorStart ()
 #7  0x006f96d2 in PortalStart ()
 #8  0x006f5c7f in exec_simple_query ()
 #9  0x006f6fac in PostgresMain ()
 #10 0x00475cdc in ServerLoop ()
 #11 0x00692ffa in PostmasterMain ()
 #12 0x00476600 in main ()
>> Thanks for the test case Sveinn and thanks Dilip for analyzing.
>>
>>> Seems like the issue is that the plans under multiple subroots are
>>> pointing to the same partitioned_rels.
>> That's correct.
>>
>>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
>>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
>>> problem is that set_plan_refs called for different subroot is updating
>>> the same partition_rel info and make this value completely wrong which
>>> will ultimately make ExecLockNonLeafAppendTables to access the out of
>>> bound "rte" index.
>> Yes.
>>
>>> set_plan_refs
>>> {
>>> [clipped]
>>> case T_MergeAppend:
>>> {
>>> [clipped]
>>>
>>> foreach(l, splan->partitioned_rels)
>>> {
>>>  lfirst_int(l) += rtoffset;
>>>
>>>
>>> I think the solution should be that create_merge_append_path make the
>>> copy of partitioned_rels list?
>> Yes, partitioned_rels should be copied.
>>
>>> Attached patch fixes the problem but I am not completely sure about the fix.
>> Thanks for creating the patch, although I think a better fix would be to
>> make get_partitioned_child_rels() do the list_copy.  That way, any other
>> users of partitioned_rels will not suffer the same issue.  Attached patch
>> implements that, along with a regression test.
>>
>> Added to the open items.
> Oops, forgot to cc -hackers.  Patch attached again.
>
> Thanks,
> Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-18 Thread Kevin Grittner

On Thu, May 18, 2017 at 5:16 AM, Amit Langote
 wrote:

> Do we need to update documentation?  Perhaps, some clarification on the
> inheritance/partitioning behavior somewhere.

Yeah, I think so.

> -Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL));
> +Assert((enrmd->reliddesc == InvalidOid) !=
> +   (enrmd->tupdesc == NULL));
>
> Perhaps, unintentional change?

Agreed; line is not long enough to need to wrap.

> I'm not sure if it's significant for transition tables, but what if a
> partition's BR trigger modified the tuple?  Would we want to include the
> modified version of the tuple in the transition table or the original as
> the patch does?  Same for the code in CopyFrom().

Good spot!  If the BR trigger on the child table modifies or
suppresses the action, I strongly feel that must be reflected in the
transition table.  This needs to be fixed.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallowing multiple queries per PQexec()

2017-05-18 Thread Surafel Temesgen

hey Vaishnavi

>
> I think GUC's name can be something like "multiple_query_execution" and
> setting it ON/OFF will be better. I think others will also come up with
> some suggestions here as the current name doesn't go well with other
> existing GUCs.
>
Thank you very much for the suggestion multiple_query_execution is better
and can be set using ON/OFF or true/false as documented in
https://www.postgresql.org/docs/9.5/static/config-setting.html

Regards
Surafel

Re: [HACKERS] Hash Functions

2017-05-18 Thread Robert Haas

On Thu, May 18, 2017 at 1:53 AM, Jeff Davis  wrote:
> For instance, it makes little sense to have individual check
> constraints, indexes, permissions, etc. on a hash-partitioned table.
> It doesn't mean that we should necessarily forbid them, but it should
> make us question whether combining range and hash partitions is really
> the right design.

I think that it definitely makes sense to have individual indexes on a
hash-partitioned table.  If you didn't, then as things stand today,
you'd have no indexes at all, which can't be good.  In the future, we
might have some system where an index created on the parent cascades
down to all of the children, but even then, you might want to REINDEX
just one of those child indexes, or better yet, create a replacement
index concurrently and then drop the old one concurrently.  You might
also want to add the same sort of new index to every partition, but
not in a single operation - for reasons of load, length of maintenance
window, time for which a snapshot is held open, etc.

I agree that separate constraints and permissions on hash partitions
don't make much sense.  To a lesser extent, that's true of other kinds
of partitioning as well.  I mean, there is probably some use case for
setting separate permissions on a range-partitioned table, but it's a
pretty thin use case.  It certainly seems possible that many users
would prefer a rule that enforces uniform permissions across the
entire partitioning hierarchy.  This is one of the key things that had
to be decided in regard to the partitioning implementation we now
have: for which things should we enforce uniformity, and for which
things should we allow diversity?  I advocated for enforcing
uniformity only in areas where we could see a clear advantage to it,
which led to the fairly minimal approach of enforcing only that we had
no multiple inheritance and no extra columns in the children, but
that's certainly an arguable position.  Other people argued for more
restrictions, I believe out of a desire to create more administrative
simplicity, but there is a risk of cutting yourself off from useful
configurations there, and it seems very difficult to me to draw a hard
line between what is useful and what is useless.

For example, consider a hash-partitioned table.  Could it make sense
to have some but not all partitions be unlogged?  I think it could.
Suppose you have a cluster of machines each of which has a replica of
the same hash-partitioned table.  Each server uses logged tables for
the partitions for which it is the authoritative source of
information, and unlogged tables for the others.  In the event of
crash, the data for any tables that are lost are replicated from the
master for that machine.  I can think of some disadvantages of that
design, but I can think of some advantages, too, and I think it's
pretty hard to say that nobody should ever want to do it.  And if it's
legitimate to want to do that, then what if I want to use
trigger-based replication rather than logical replication?  Then I
might need triggers on some partitions but not all, or maybe different
triggers on different partitions.

Even for a permissions grant, suppose my production system is having
some problem that can't be replicated on the test data set.  Is it
reasonable to want to give a trusted developer access to a slice, but
not all of, my production data?  I could allow them access to just one
partition.  Maybe not a common desire, but is that enough reason to
ban it?  I'd say it's arguable.  I don't think that there are bright
lines around any of this stuff.  My experience with this area has led
me to give up on the idea of complete uniformity as impractical, and
instead look at it from the perspective of "what do we absolutely have
to ban in order for this to be sane?".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1 sequence regression failure on sparc64

2017-05-18 Thread Tom Lane

Christoph Berg  writes:
> *** 34,44 
> --- 34,47 
>   CREATE SEQUENCE sequence_test7 AS bigint;
>   CREATE SEQUENCE sequence_test8 AS integer MAXVALUE 10;
>   CREATE SEQUENCE sequence_test9 AS integer INCREMENT BY -1;
> + ERROR:  MINVALUE (-9223372036854775808) is out of range for sequence data 
> type integer
>   CREATE SEQUENCE sequence_test10 AS integer MINVALUE -10 START 1;
>   CREATE SEQUENCE sequence_test11 AS smallint;
>   CREATE SEQUENCE sequence_test12 AS smallint INCREMENT -1;
> + ERROR:  MINVALUE (-9223372036854775808) is out of range for sequence data 
> type smallint
>   CREATE SEQUENCE sequence_test13 AS smallint MINVALUE -32768;
>   CREATE SEQUENCE sequence_test14 AS smallint MAXVALUE 32767 INCREMENT -1;
> + ERROR:  MINVALUE (-9223372036854775808) is out of range for sequence data 
> type smallint
>   CREATE SEQUENCE sequence_testx AS text;
>   ERROR:  sequence type must be smallint, integer, or bigint
>   CREATE SEQUENCE sequence_testx AS nosuchtype;

Well, that's just wacko.  Somehow sequence.c's init_params() must
be falling down on the job in selecting the right seqform->seqmin,
but I'm darned if I see anything that's either wrong or potentially
machine-dependent in that code.  It almost looks like it must be
a compiler bug, though I hesitate to jump to that conclusion so
quickly.  Peter, any ideas?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-18 Thread Robert Haas

On Wed, May 17, 2017 at 6:58 AM, Masahiko Sawada  wrote:
> I think the above changes can solve this issue but It seems to me that
> holding AccessExclusiveLock on pg_subscription by DROP SUBSCRIPTION
> until commit could lead another deadlock problem in the future. So I'd
> to contrive ways to reduce lock level somehow if possible. For
> example, if we change the apply launcher so that it gets the
> subscription list only when pg_subscription gets invalid, apply
> launcher cannot try to launch the apply worker being stopped. We
> invalidate pg_subscription at commit of DROP SUBSCRIPTION and the
> apply launcher can get new subscription list which doesn't include the
> entry we removed. That way we can reduce lock level to
> ShareUpdateExclusiveLock and solve this issue.
> Also in your patch, we need to change DROP SUBSCRIPTION as well to
> resolve another case I encountered, where DROP SUBSCRIPTION waits for
> apply worker while holding a tuple lock on pg_subscription_rel and the
> apply worker waits for same tuple on pg_subscription_rel in
> SetSubscriptionRelState().

I don't really understand the issue being discussed here in any
detail, but as a general point I'd say that it might be more
productive to make the locks finer-grained rather than struggling to
reduce the lock level.  For example, instead of locking all of
pg_subscription, use LockSharedObject() to lock the individual
subscription, still with AccessExclusiveLock.  That means that other
accesses to that subscription also need to take a lock so that you
actually get a conflict when there should be one, but that should be
doable.  I expect that trying to manage locking conflicts using only
catalog-wide locks is a doomed strategy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statement_timeout is not working as expected with postgres_fdw

2017-05-18 Thread Robert Haas

On Wed, May 17, 2017 at 6:57 AM, Amit Kapila  wrote:
> +1.  Why not similar behavior for any other statements executed in
> this module by do_sql_command?

The other cases are not quite the same situation.  It would be good to
accept interrupts in all cases, but there's no problem with a session
continuing to be used after a failure in configure_remote_session()
because the connection hasn't been entered in the hash table at that
point yet, and the TRY/CATCH block in connect_pg_server() ensures that
the connection also gets closed.  So we don't need to worry about
those statements leaving behind messed-up sessions; they won't; only
the transaction control commands have that part of the problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Robert Haas

On Thu, May 18, 2017 at 7:06 AM, Michael Paquier
 wrote:
> FWIW, I am of the opinion to not have an implementation based on any
> SQLSTATE codes, as well as not doing something similar to JDBC.
> Keeping things simple is one reason, a second is that the approach
> taken by libpq is correct at its root.

Because why?

I was initially on the same side as you and Tom, but now I'm really
wavering.  What good is a feature that's supposed to find you a usable
connection if it sometimes decides not to find one?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-05-18 Thread Robert Haas

On Thu, May 18, 2017 at 2:52 AM, Dilip Kumar  wrote:
> Most of the queries show decent improvement, however, Q14 shows
> regression at work_mem = 4MB. On analysing this case, I found that
> number of pages_fetched calculated by "Mackert and Lohman formula" is
> very high (1112817) compared to the actual unique heap pages fetched
> (293314). Therefore, while costing bitmap scan using 1112817 pages and
> 4MB of work_mem, we predicted that even after we lossify all the pages
> it can not fit into work_mem, hence bitmap scan was not selected.

You might need to adjust effective_cache_size.  The Mackert and Lohman
formula isn't exactly counting unique pages fetched.  It will count
the same page twice if it thinks the page will be evicted from the
cache after the first fetch and before the second one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Robert Haas

On Thu, May 18, 2017 at 2:45 AM, Masahiko Sawada  wrote:
> On Thu, May 18, 2017 at 3:19 PM, Michael Paquier
>  wrote:
>> On Thu, May 18, 2017 at 2:59 PM, Masahiko Sawada  
>> wrote:
>>> It seems to me that it's not good idea to forcibly set ANALYZE in
>>> spite of  ANALYZE option is not specified. One reason is that it would
>>> make us difficult to grep it from such as server log. I think It's
>>> better to use the same vacuum option to the all listed relations.
>>
>> Even now, if you use VACUUM without listing ANALYZE directly, with
>> relation listing a set of columns, then ANALYZE is implied.
>
> Oh.. I'd missed that behavior. Thanks!
>
>>  I agree
>> with your point that the same options should be used for all the
>> relations, and it seems to me that if at least one relation listed has
>> a column list, then ANALYZE should be implied for all relations.
>
> +1

Ugh, really?  Are we sure that the current behavior is anything other
than a bug?  The idea that VACUUM foo (a) implies ANALYZE doesn't
really sit very well with me in the first place.  I'd be more inclined
to reject that with an ERROR complaining that the column list can't be
specified except for ANALYZE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Tom Lane

Robert Haas  writes:
> Ugh, really?  Are we sure that the current behavior is anything other
> than a bug?  The idea that VACUUM foo (a) implies ANALYZE doesn't
> really sit very well with me in the first place.  I'd be more inclined
> to reject that with an ERROR complaining that the column list can't be
> specified except for ANALYZE.

Yeah, that's probably more sensible.  I think the rationale was "if you
specify columns you must want the ANALYZE option, so why make you type
that in explicitly?".   But I can see the argument that it's likely to
confuse users who might have a weaker grasp of the semantics.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-18 Thread Noah Misch

On Mon, May 15, 2017 at 03:28:14AM +, Noah Misch wrote:
> On Mon, May 08, 2017 at 06:27:30PM +0900, Masahiko Sawada wrote:
> > I encountered a situation where DROP SUBSCRIPTION got stuck when
> > initial table sync is in progress. In my environment, I created
> > several tables with some data on publisher. I created subscription on
> > subscriber and drop subscription immediately after that. It doesn't
> > always happen but I often encountered it on my environment.
> > 
> > ps -x command shows the following.
> > 
> >  96796 ?Ss 0:00 postgres: masahiko postgres [local] DROP
> > SUBSCRIPTION
> >  96801 ?Ts 0:00 postgres: bgworker: logical replication
> > worker for subscription 40993waiting
> >  96805 ?Ss 0:07 postgres: bgworker: logical replication
> > worker for subscription 40993 sync 16418
> >  96806 ?Ss 0:01 postgres: wal sender process masahiko [local] 
> > idle
> >  96807 ?Ss 0:00 postgres: bgworker: logical replication
> > worker for subscription 40993 sync 16421
> >  96808 ?Ss 0:00 postgres: wal sender process masahiko [local] 
> > idle
> > 
> > The DROP SUBSCRIPTION process (pid 96796) is waiting for the apply
> > worker process (pid 96801) to stop while holding a lock on
> > pg_subscription_rel. On the other hand the apply worker is waiting for
> > acquiring a tuple lock on pg_subscription_rel needed for heap_update.
> > Also table sync workers (pid 96805 and 96807) are waiting for the
> > apply worker process to change their status.
> > 
> > Also, even when DROP SUBSCRIPTION is done successfully, the table sync
> > worker can be orphaned because I guess that the apply worker can exit
> > before change status of table sync worker.
> > 
> > I'm using 1f30295.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is past due for
your status update.  Please reacquaint yourself with the policy on open item
ownership[1] and then reply immediately.  If I do not hear from you by
2017-05-19 16:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Masahiko Sawada

On Fri, May 19, 2017 at 12:03 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Ugh, really?  Are we sure that the current behavior is anything other
>> than a bug?  The idea that VACUUM foo (a) implies ANALYZE doesn't
>> really sit very well with me in the first place.  I'd be more inclined
>> to reject that with an ERROR complaining that the column list can't be
>> specified except for ANALYZE.
>
> Yeah, that's probably more sensible.  I think the rationale was "if you
> specify columns you must want the ANALYZE option, so why make you type
> that in explicitly?".   But I can see the argument that it's likely to
> confuse users who might have a weaker grasp of the semantics.
>

I'd not known such VACUUM behavior so I was a bit surprised but
considering consistency with current behavior I thought that is not
bad idea. But complaining with error seems more sensible.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1/m68k: static assertion failed: "MAXALIGN too small to fit int32"

2017-05-18 Thread Andres Freund

On 2017-05-18 10:48:48 +0300, Heikki Linnakangas wrote:
> If that's all that prevents it from working, by all means let's fix it. I
> think this should do it, although I don't have a system to test it on:

Yes, that's what I thought about doing too.


> It adds a few instructions to check that on all platforms, unless the
> compiler can optimize that away, but this is not performance critical.

Yea, that seems fairly harmless. Context creation is much more
heavyweight than those 2-3 instructions.


> I'll commit that, barring objections. If you can verify that it fixes the
> problem before that, that'd be great, otherwise I guess we'll find out some
> time after the commit.

lgtm.


Thanks!

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-18 Thread Dilip Kumar

On Wed, May 17, 2017 at 2:07 PM, amul sul  wrote:
>> I would suggest "non-zero positive", since that's what we are using in
>> the documentation.
>>
>
> Understood, Fixed in the attached version.

Why non-zero positive?  We do support zero for the remainder right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Precalculate stable functions, infrastructure v1

2017-05-18 Thread Marina Polyakova


Here's v2 of the patches. Changes from v1:


And here there's v3 of planning and execution: common executor steps for 
all types of cached expression.


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyFrom 5e89221251670526eb2b5750868ac73eee48f10b Mon Sep 17 00:00:00 2001
From: Marina Polyakova 
Date: Mon, 15 May 2017 15:31:21 +0300
Subject: [PATCH 2/3] Precalculate stable functions, planning and execution v3

Now in Postgresql only immutable functions are precalculated; stable functions
are calculated for every row so in fact they don't differ from volatile
functions.

This patch includes:
- replacement nonvolatile functions and operators by appropriate cached
expressions
- planning and execution cached expressions
- regression tests
---
 src/backend/executor/execExpr.c|   37 +
 src/backend/executor/execExprInterp.c  |   37 +
 src/backend/optimizer/path/allpaths.c  |9 +-
 src/backend/optimizer/path/clausesel.c |   13 +
 src/backend/optimizer/plan/planagg.c   |1 +
 src/backend/optimizer/plan/planner.c   |   28 +
 src/backend/optimizer/util/clauses.c   |   55 +
 src/backend/utils/adt/ruleutils.c  |5 +
 src/include/executor/execExpr.h|   19 +
 src/include/optimizer/planner.h|3 +
 src/include/optimizer/tlist.h  |8 +-
 src/pl/plpgsql/src/pl_exec.c   |   10 +
 .../expected/precalculate_stable_functions.out | 2625 
 src/test/regress/serial_schedule   |1 +
 .../regress/sql/precalculate_stable_functions.sql  |  949 +++
 15 files changed, 3797 insertions(+), 3 deletions(-)
 create mode 100644 src/test/regress/expected/precalculate_stable_functions.out
 create mode 100644 src/test/regress/sql/precalculate_stable_functions.sql

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 5a34a46..3c2068d 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -865,6 +865,43 @@ ExecInitExprRec(Expr *node, PlanState *parent, ExprState *state,
 break;
 			}
 
+		case T_CachedExpr:
+			{
+int 		adjust_jump;
+
+/*
+ * Allocate and fill scratch memory used by all steps of
+ * CachedExpr evaluation.
+ */
+scratch.d.cachedexpr.isExecuted = (bool *) palloc(sizeof(bool));
+scratch.d.cachedexpr.resnull = (bool *) palloc(sizeof(bool));
+scratch.d.cachedexpr.resvalue = (Datum *) palloc(sizeof(Datum));
+
+*scratch.d.cachedexpr.isExecuted = false;
+*scratch.d.cachedexpr.resnull = false;
+*scratch.d.cachedexpr.resvalue = (Datum) 0;
+scratch.d.cachedexpr.jumpdone = -1;
+
+/* add EEOP_CACHEDEXPR_IF_CACHED step */
+scratch.opcode = EEOP_CACHEDEXPR_IF_CACHED;
+ExprEvalPushStep(state, &scratch);
+adjust_jump = state->steps_len - 1;
+
+/* add subexpression steps */
+ExecInitExprRec((Expr *) get_subexpr((CachedExpr *) node),
+parent, state, resv, resnull);
+
+/* add EEOP_CACHEDEXPR_SUBEXPR_END step */
+scratch.opcode = EEOP_CACHEDEXPR_SUBEXPR_END;
+ExprEvalPushStep(state, &scratch);
+
+/* adjust jump target */
+state->steps[adjust_jump].d.cachedexpr.jumpdone =
+	state->steps_len;
+
+break;
+			}
+
 		case T_ScalarArrayOpExpr:
 			{
 ScalarArrayOpExpr *opexpr = (ScalarArrayOpExpr *) node;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index fed0052..ac7b7f8 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -279,6 +279,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 	TupleTableSlot *innerslot;
 	TupleTableSlot *outerslot;
 	TupleTableSlot *scanslot;
+	MemoryContext oldContext;	/* for EEOP_CACHEDEXPR_* */
 
 	/*
 	 * This array has to be in the same order as enum ExprEvalOp.
@@ -309,6 +310,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 		&&CASE_EEOP_FUNCEXPR_STRICT,
 		&&CASE_EEOP_FUNCEXPR_FUSAGE,
 		&&CASE_EEOP_FUNCEXPR_STRICT_FUSAGE,
+		&&CASE_EEOP_CACHEDEXPR_IF_CACHED,
+		&&CASE_EEOP_CACHEDEXPR_SUBEXPR_END,
 		&&CASE_EEOP_BOOL_AND_STEP_FIRST,
 		&&CASE_EEOP_BOOL_AND_STEP,
 		&&CASE_EEOP_BOOL_AND_STEP_LAST,
@@ -721,6 +724,40 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
 			EEO_NEXT();
 		}
 
+		EEO_CASE(EEOP_CACHEDEXPR_IF_CACHED)
+		{
+			if (*op->d.cachedexpr.isExecuted)
+			{
+/* use saved result and skip subexpression evaluation */
+*op->resnull = *op->d.cachedexpr.resnull;
+if (!(*op->resnull))
+	*op->resvalue = *op->d.cachedexpr.resvalue;
+
+EEO_JUMP(op->d.cachedexpr.jumpdone);
+			}
+
+			/*
+			 * Switch per-query memory context for subexpression evaluation.
+			 * It is necessary to save result between all tuples.
+			 */
+			oldContext = MemoryContextSwitchTo(econtext->ec

Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.

2017-05-18 Thread Tom Lane

Piotr Stefaniak  writes:
> That, I assume, would be me. Coincidentally, I'm about to push my fixes
> upstream (FreeBSD). Before that happens, my changes can be obtained from
> https://github.com/pstef/freebsd_indent and tested, if anyone wishes.

I spent a little bit of time on portability testing, because we are
certainly going to insist that this tool be portable to more than
just FreeBSD.  Things are not very good as it stands:

* Makefile is 100% BSD-specific.  Possibly we could just agree to
disagree on that point, and include a PG-style makefile that is not
like upstream's.  I attach the one I used for test purposes.

* __FBSDID() macros fail to compile anywhere else than FreeBSD.
Couldn't you hide those inside #if 0, as you're already doing for
the ancient sccsid strings?

* Please put the copyright[] string in indent.c inside #if 0 too,
as that draws unreferenced-variable warnings on some compilers.

* There's one use of bcopy(), please replace with memmove().

* References to  and  are problematic, as both
are BSD-isms not found in POSIX.  They seem to be available anyway
on Linux, but I bet not on Windows or SysV-tradition boxes.
I removed the  includes and didn't see any problems,
but removing  exposes calls to err() and errx(), which
we'd have to find replacements for.  Maybe just change them to
standard-conforming printf() + exit()?

regards, tom lane

#-
#
# Makefile for src/tools/freebsd_indent
#
# Copyright (c) 2003-2017, PostgreSQL Global Development Group
#
# src/tools/freebsd_indent/Makefile
#
#-

subdir = src/tools/freebsd_indent
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global

override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)

OBJS= args.o indent.o io.o lexi.o parse.o pr_comment.o

all: freebsd_indent

freebsd_indent: $(OBJS)
	$(CC) $(CFLAGS) $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) -o $@$(X)

install: all installdirs
	$(INSTALL_PROGRAM) freebsd_indent$(X) '$(DESTDIR)$(bindir)/freebsd_indent$(X)'

installdirs:
	$(MKDIR_P) '$(DESTDIR)$(bindir)'

uninstall:
	rm -f '$(DESTDIR)$(bindir)/freebsd_indent$(X)'

clean distclean maintainer-clean:
	rm -f freebsd_indent$(X) $(OBJS)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Precalculate stable functions, infrastructure v1

2017-05-18 Thread Andres Freund

Hi,

On 2017-05-18 19:00:09 +0300, Marina Polyakova wrote:
> > Here's v2 of the patches. Changes from v1:
> 
> And here there's v3 of planning and execution: common executor steps for all
> types of cached expression.

I've not followed this thread, but just scanned this quickly because it
affects execExpr* stuff.

> + case T_CachedExpr:
> + {
> + int adjust_jump;
> +
> + /*
> +  * Allocate and fill scratch memory used by all 
> steps of
> +  * CachedExpr evaluation.
> +  */
> + scratch.d.cachedexpr.isExecuted = (bool *) 
> palloc(sizeof(bool));
> + scratch.d.cachedexpr.resnull = (bool *) 
> palloc(sizeof(bool));
> + scratch.d.cachedexpr.resvalue = (Datum *) 
> palloc(sizeof(Datum));
> +
> + *scratch.d.cachedexpr.isExecuted = false;
> + *scratch.d.cachedexpr.resnull = false;
> + *scratch.d.cachedexpr.resvalue = (Datum) 0;

Looks like having something like struct CachedExprState would be better,
than these separate allocations?  That also allows to aleviate some size
concerns when adding new fields (see below)

> @@ -279,6 +279,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, 
> bool *isnull)
>   TupleTableSlot *innerslot;
>   TupleTableSlot *outerslot;
>   TupleTableSlot *scanslot;
> + MemoryContext oldContext;   /* for EEOP_CACHEDEXPR_* */

I'd rather not have this on function scope - a) the stack pressure in
ExecInterpExpr is quite noticeable in profiles already b) this is going
to trigger warnings because of unused vars, because the compiler doesn't
understand that EEOP_CACHEDEXPR_IF_CACHED always follows
EEOP_CACHEDEXPR_SUBEXPR_END.

How about instead storing oldcontext in the expression itself?

I'm also not sure how acceptable it is to just assume it's ok to leave
stuff in per_query_memory, in some cases that could prove to be
problematic.

> + case T_CachedExpr:
> + {
> + CachedExpr *cachedexpr = (CachedExpr *) node;
> + Node   *new_subexpr = 
> eval_const_expressions_mutator(
> + get_subexpr(cachedexpr), context);
> + CachedExpr *new_cachedexpr;
> +
> + /*
> +  * If unsafe transformations are used cached 
> expression should
> +  * be always simplified.
> +  */
> + if (context->estimate)
> + Assert(IsA(new_subexpr, Const));
> +
> + if (IsA(new_subexpr, Const))
> + {
> + /* successfully simplified it */
> + return new_subexpr; 
> + }
> + else
> + {
> + /*
> +  * The expression cannot be simplified 
> any further, so build
> +  * and return a replacement CachedExpr 
> node using the
> +  * possibly-simplified arguments of 
> subexpression.
> +  */

Is this actually a meaningful path?  Shouldn't always have done const
evaluation before adding CachedExpr's?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-18 Thread Andres Freund

On 2017-05-18 11:57:57 +0300, Konstantin Knizhnik wrote:
> From my own experience I found out that PG_TRY/PG_CATCH mechanism is not
> providing proper cleanup (unlike C++ exceptions).

Right, simply because there's no portable way to transparently do so.
Would be possible on elf glibc platforms, but ...

> If there are opened relations, catalog cache entries,... then throwing error
> will not release them.
> It will cause no problems if error is handled in PostgresMain which aborts
> current transaction and releases all resources in any case.
> But if I want to ignore this error and continue query execution, then
> warnings about resources leaks can be reported.
> Is it want you mean by unsafety of PG_TRY/PG_CATCH constructions?

There's worse than just leaking resources.  Everything touching the
database might cause persistent corruption if you don't roll back.
Consider an insert that failed with a foreign key exception, done from
some function.  If you ignore that error, the row will still be visible,
but the foreign key will be violated.   If you want to continue after a
PG_CATCH you have to use a subtransaction/savepoint for the PG_TRY
contents, like several PLs do.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOT NULL constraints on range partition key columns

2017-05-18 Thread Robert Haas

On Tue, May 16, 2017 at 8:19 PM, Amit Langote
 wrote:
> Thanks for the review.  I updated the comments.

I found several more places that also needed to be updated using 'git
grep'.  Committed with various additions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1 sequence regression failure on sparc64

2017-05-18 Thread Andres Freund

On 2017-05-18 10:11:48 -0400, Tom Lane wrote:
> Christoph Berg  writes:
> > *** 34,44 
> > --- 34,47 
> >   CREATE SEQUENCE sequence_test7 AS bigint;
> >   CREATE SEQUENCE sequence_test8 AS integer MAXVALUE 10;
> >   CREATE SEQUENCE sequence_test9 AS integer INCREMENT BY -1;
> > + ERROR:  MINVALUE (-9223372036854775808) is out of range for sequence data 
> > type integer
> >   CREATE SEQUENCE sequence_test10 AS integer MINVALUE -10 START 1;
> >   CREATE SEQUENCE sequence_test11 AS smallint;
> >   CREATE SEQUENCE sequence_test12 AS smallint INCREMENT -1;
> > + ERROR:  MINVALUE (-9223372036854775808) is out of range for sequence data 
> > type smallint
> >   CREATE SEQUENCE sequence_test13 AS smallint MINVALUE -32768;
> >   CREATE SEQUENCE sequence_test14 AS smallint MAXVALUE 32767 INCREMENT -1;
> > + ERROR:  MINVALUE (-9223372036854775808) is out of range for sequence data 
> > type smallint
> >   CREATE SEQUENCE sequence_testx AS text;
> >   ERROR:  sequence type must be smallint, integer, or bigint
> >   CREATE SEQUENCE sequence_testx AS nosuchtype;
> 
> Well, that's just wacko.  Somehow sequence.c's init_params() must
> be falling down on the job in selecting the right seqform->seqmin,
> but I'm darned if I see anything that's either wrong or potentially
> machine-dependent in that code.  It almost looks like it must be
> a compiler bug, though I hesitate to jump to that conclusion so
> quickly.  Peter, any ideas?

Weird.  Christoph, IIRC you can help gaining access to a porter machine
to reproduce this?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1 sequence regression failure on sparc64

2017-05-18 Thread Peter Eisentraut

On 5/18/17 10:11, Tom Lane wrote:
> Well, that's just wacko.  Somehow sequence.c's init_params() must
> be falling down on the job in selecting the right seqform->seqmin,
> but I'm darned if I see anything that's either wrong or potentially
> machine-dependent in that code.  It almost looks like it must be
> a compiler bug, though I hesitate to jump to that conclusion so
> quickly.  Peter, any ideas?

If we had a typo or something in that code, the build farm should have
caught it by now.

I would try compiling with lower -O and see what happens.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] different column orders in regression test database

2017-05-18 Thread Peter Eisentraut

When you dump out the regression test database and load it back in, a
few tables end up with different column orders:

Original:

 Table "public.f_star"
 Column | Type | Collation | Nullable | Default
+--+---+--+-
 class  | character(1) |   |  |
 aa | integer  |   |  |
 cc | name |   |  |
 ee | smallint |   |  |
 ff | polygon  |   |  |
 f  | integer  |   |  |
 e  | integer  |   |  |
 a  | text |   |  |

Reloaded:

 Table "public.f_star"
 Column | Type | Collation | Nullable | Default
+--+---+--+-
 class  | character(1) |   |  |
 aa | integer  |   |  |
 a  | text |   |  |
 cc | name |   |  |
 ee | smallint |   |  |
 e  | integer  |   |  |
 ff | polygon  |   |  |
 f  | integer  |   |  |

This table is part of a lengthy inheritance chain, so this might be
intentional or too hard to fix.  This behavior goes back to 9.2 and
possibly further.

But this is a bit more suspicious:

Original:

 Table "public.mlparted11"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 b  | integer |   | not null |
 a  | integer |   | not null |
Partition of: mlparted1 FOR VALUES FROM (2) TO (5)

Reloaded:

 Table "public.mlparted11"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   | not null |
 b  | integer |   | not null |
Partition of: mlparted1 FOR VALUES FROM (2) TO (5)

The same applies for other tables in this partitioning group:
public.mlparted12, public.mlparted2, public.mlparted4

But the root table public.mlparted matches on both sides.

While you can create all kinds of dubious messes with general
inheritance, this should probably not be allowed to happen in the
restricted setting of partitioning.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1/m68k: static assertion failed: "MAXALIGN too small to fit int32"

2017-05-18 Thread Heikki Linnakangas


On 05/18/2017 12:31 PM, Christoph Berg wrote:

Re: Heikki Linnakangas 2017-05-18 

I'll commit that, barring objections. If you can verify that it fixes the
problem before that, that'd be great, otherwise I guess we'll find out some
time after the commit.


Please go ahead, I don't think I have online access to a m68k machine.
(It got demoted to an unofficial port some time ago and the old Debian
porter machines got taken down).


Ok, pushed, let's see if the port machine likes it.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10beta1 sequence regression failure on sparc64

2017-05-18 Thread Christoph Berg

Re: Peter Eisentraut 2017-05-18 
<7a4d3b0f-78da-2a5b-7f3b-8b3509c1e...@2ndquadrant.com>
> If we had a typo or something in that code, the build farm should have
> caught it by now.
> 
> I would try compiling with lower -O and see what happens.

Trying -O0 now.

Re: Andres Freund 2017-05-18 <20170518184811.7c44jcvwjvnzc...@alap3.anarazel.de>
> Weird.  Christoph, IIRC you can help gaining access to a porter machine
> to reproduce this?

I'll let the build run over night (the machine has 32 cores but is
dead slow) and see what I can arrange tomorrow. (Peter will already
have access, it's notker.debian.net.)

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] different column orders in regression test database

2017-05-18 Thread Tom Lane

Peter Eisentraut  writes:
> When you dump out the regression test database and load it back in, a
> few tables end up with different column orders:
> ...
> This table is part of a lengthy inheritance chain, so this might be
> intentional or too hard to fix.  This behavior goes back to 9.2 and
> possibly further.

Yeah, the variation in f_star and friends is intentional (and very
very old).

Can't say about the partition tests though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] different column orders in regression test database

2017-05-18 Thread Thomas Munro

On Fri, May 19, 2017 at 7:21 AM, Peter Eisentraut
 wrote:
> But this is a bit more suspicious:
>
> Original:
>
>  Table "public.mlparted11"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  b  | integer |   | not null |
>  a  | integer |   | not null |
> Partition of: mlparted1 FOR VALUES FROM (2) TO (5)
>
> Reloaded:
>
>  Table "public.mlparted11"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  a  | integer |   | not null |
>  b  | integer |   | not null |
> Partition of: mlparted1 FOR VALUES FROM (2) TO (5)
>
> The same applies for other tables in this partitioning group:
> public.mlparted12, public.mlparted2, public.mlparted4
>
> But the root table public.mlparted matches on both sides.
>
> While you can create all kinds of dubious messes with general
> inheritance, this should probably not be allowed to happen in the
> restricted setting of partitioning.

That's because if you attach a partition with a different column
ordering, pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
... command, so the ordering it lost.

Example:

create table p (a int, b int) partition by list (a);
create table c (b int, a int);
alter table p attach partition c for values in (42);

Then "c" is dumped as:

CREATE TABLE c PARTITION OF p
FOR VALUES IN (42);

If you wanted to preserve column orders for partitions I guess you'd
have to teach to to detect the difference (ignoring dropped columns?)
and generate the two step create-and-attach commands.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

2017-05-18 Thread Piotr Stefaniak

On 2017-05-17 23:46, Tom Lane wrote:
> I hacked around that by putting back a detab/entab step at the end
> using the existing subroutines in pgindent.  That about halved the
> size of the diff, but it's still too big to post.  Much of what
> I'm seeing with this version is randomly different decisions about
> how far to indent comments

pgindent doesn't set the -c indent option ("The column  in which comments
on code start."), so indent uses the default value of 33 (meaning column
32). If the code pushes the comment further than column 32, indent only
places a single tab between the two just to separate them.

This, given 4-column tabs, should result in placing the comment on
bitSize[INDEX_MAX_KEYS]; from your example onto column 44 - which the
newer version of indent does (if you tell it -ts4), unlike the older
one. I think that this is an improvement.

> It does seem to be handling formatting around sizeof() calls a lot better
> than the old code, as well as function pointer typedefs.  So those are
> huge wins.  But can we avoid the changes mentioned above?  I'd like the
> new version to only differ in ways that are clear improvements.

I don't know how to avoid the improvement. Try removing -ts4 as well as
putting back detab+entab.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.

2017-05-18 Thread Piotr Stefaniak

On 2017-05-18 18:13, Tom Lane wrote:
> Piotr Stefaniak  writes:
>> That, I assume, would be me. Coincidentally, I'm about to push my fixes
>> upstream (FreeBSD). Before that happens, my changes can be obtained from
>> https://github.com/pstef/freebsd_indent and tested, if anyone wishes.
> 
> I spent a little bit of time on portability testing, because we are
> certainly going to insist that this tool be portable to more than
> just FreeBSD.  Things are not very good as it stands:
> 
> * Makefile is 100% BSD-specific.  Possibly we could just agree to
> disagree on that point, and include a PG-style makefile that is not
> like upstream's.  I attach the one I used for test purposes.

This would have to live outside of FreeBSD for obvious (or not) reasons.
Most likely as a part of pg_bsd_indent.  I use the original ("BSD")
Makefile on Linux, feeding it to bmake(1). But I don't expect bmake to
become a requirement for pg_bsd_indent.

> * __FBSDID() macros fail to compile anywhere else than FreeBSD.
> Couldn't you hide those inside #if 0, as you're already doing for
> the ancient sccsid strings?

The use of __FBSDID macro won't be going anywhere from FreeBSD indent,
I'm afraid. But I think it could be if-def'd out under systems other
than FreeBSD.

> * Please put the copyright[] string in indent.c inside #if 0 too,
> as that draws unreferenced-variable warnings on some compilers.
> 
> * There's one use of bcopy(), please replace with memmove().

These could probably be done upstream. I'd like to convert the strings
to comments.

> * References to  and  are problematic, as both
> are BSD-isms not found in POSIX.  They seem to be available anyway
> on Linux, but I bet not on Windows or SysV-tradition boxes.
> I removed the  includes and didn't see any problems,
> but removing  exposes calls to err() and errx(), which
> we'd have to find replacements for.  Maybe just change them to
> standard-conforming printf() + exit()?

I'll look into why indent includes sys/cdefs.h.  I don't expect to be
allowed to replace err() and errx() with equivalent solutions based on
ISO C and POSIX. I fear the idea of detecting err*() support prior to
compilation... Probably a much simpler solution would be to replace
err*() with equivalent solutions in the PG's fork of indent.  printf()
and exit() would probably be insufficient, you'd also need strerror(), I
think.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-18 Thread Andres Freund

On 2017-05-15 10:34:02 -0400, Peter Eisentraut wrote:
> On 5/10/17 09:12, Michael Paquier wrote:
> > Looking at 0001 and 0002... So you are correctly blocking nextval()
> > when ALTER SEQUENCE holds a lock on the sequence object. And
> > concurrent calls of nextval() don't conflict. As far as I can see this
> > matches the implementation of 3.
> > 
> > Here are some minor comments.
> 
> Committed after working in your comments.  Thanks!

There's still weird behaviour, unfortunately.  If you do an ALTER
SEQUENCE changing minval/maxval w/ restart in a transaction, and abort,
you'll a) quite possibly not be able to use the sequence anymore,
because it may of bounds b) DDL still isn't transactional.

At the very least that'd need to be documented.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

2017-05-18 Thread Tom Lane

Piotr Stefaniak  writes:
> On 2017-05-17 23:46, Tom Lane wrote:
>> ... Much of what
>> I'm seeing with this version is randomly different decisions about
>> how far to indent comments

> pgindent doesn't set the -c indent option ("The column in which comments
> on code start."), so indent uses the default value of 33 (meaning column
> 32). If the code pushes the comment further than column 32, indent only
> places a single tab between the two just to separate them.

Well, actually what it does is to push the comment to what it thinks is
the next tab stop.  So the core issue here is that the comments get
formatted differently for -ts4 than -ts8.  I think that's arguably a bug;
the width of tabs should only affect how whitespace is stored, not
formatting decisions.  Don't suppose you'd like to introduce a separate
parameter that defines the extra-indentation step for comments?

> This, given 4-column tabs, should result in placing the comment on
> bitSize[INDEX_MAX_KEYS]; from your example onto column 44 - which the
> newer version of indent does (if you tell it -ts4), unlike the older
> one. I think that this is an improvement.

It may or may not be an improvement, but right now what I want is to see
what this version of indent does differently, with as few extraneous
changes as possible.  We can argue later about whether we're willing to
accept gratuitous comment reformatting, but when one can't even find the
positive changes in amongst the noise, the chances of getting this accepted
are not good.

> I don't know how to avoid the improvement. Try removing -ts4 as well as
> putting back detab+entab.

I tried that but it did not produce as good a match to the old results as
what I'd previously arrived at by trial and error, which was to hack
pr_comment() like this:

@@ -148,7 +151,9 @@ pr_comment(void)
else
ps.com_col = ps.decl_on_line || ps.ind_level == 0
? ps.decl_com_ind : ps.com_ind;
-   if (ps.com_col <= target_col)
+   if (ps.com_col < target_col)
+   ps.com_col = 8 * (1 + (target_col - 1) / 8) + 1;
+   else if (ps.com_col == target_col)
ps.com_col = tabsize * (1 + (target_col - 1) / tabsize) + 1;
if (ps.com_col + 24 > adj_max_col)
adj_max_col = ps.com_col + 24;

I'm not really sure why the old behavior seems to be to move only 4 spaces
when right at the boundary, but there you have it.

I also found that there was extra spacing getting inserted for some cases
like

case afairlylonglabel:  /* comment */

which I eventually tracked down to the fact that this bit:

/*
 * turn everything so far into a label
 */
{
int len = e_code - s_code;

CHECK_SIZE_LAB(len + 3);
memcpy(e_lab, s_code, len);
e_lab += len;
*e_lab++ = ':';
*e_lab++ = ' ';
*e_lab = '\0';
e_code = s_code;
}

is inserting an extra space into the "lab" string, causing pr_comment()
to think that the label extends one character to the right of where
it really does, so that it moves the comment over when it need not.
I am not sure why it's like that, but compensating for it in pr_comment()
like this improved matters:

@@ -137,8 +136,12 @@ pr_comment(void)
target_col = count_spaces(compute_code_target(), s_code);
else {
target_col = 1;
-   if (s_lab != e_lab)
+   if (s_lab != e_lab) {
target_col = count_spaces(compute_label_target(), s_lab);
+   /* ignore any trailing space in lab for this purpose */
+   if (e_lab[-1] == ' ')
+   target_col--;
+   }
}
if (s_lab != e_lab && s_lab[1] == 'e' &&
(strncmp(s_lab, "#endif", 6) == 0 ||

(I see that the extra space after colon is inserted by the old version
of indent too, which makes it even less clear why the boundary-case
behavior is like this.  I have a feeling that this is hacking things
at the wrong place.)

That got me to a point where there was little enough noise that I could
start to see the real changes, and I soon noticed that there was a fair
amount of apparently buggy behavior, like this change:

diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index 78d71ab..630bacc 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -33,8 +33,8 @@ PG_FUNCTION_INFO_V1(pg_prewarm);
 typedef enum
 {
PREWARM_PREFETCH,
-   PREWARM_READ,
-   PREWARM_BUFFER
+   PREWARM_READ,
+   PREWARM_BUFFER
 } PrewarmType;

 static char blockbuffer[BLCKSZ];

Curiously, there are other enum declarations that don't get the phony
extra indentation.  I traced through it a bit and eventually found that
the difference between

Re: [HACKERS] different column orders in regression test database

2017-05-18 Thread Peter Eisentraut

On 5/18/17 16:21, Thomas Munro wrote:
> That's because if you attach a partition with a different column
> ordering,

Is it intentional and sensible to allow that in the first place?  Or was
it just inherited from inheritance?

> pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
> ... command, so the ordering it lost.

So it appears that either the above should be prohibited or pg_dump
should be fixed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] different column orders in regression test database

2017-05-18 Thread Thomas Munro

On Fri, May 19, 2017 at 10:53 AM, Peter Eisentraut
 wrote:
> On 5/18/17 16:21, Thomas Munro wrote:
>> That's because if you attach a partition with a different column
>> ordering,
>
> Is it intentional and sensible to allow that in the first place?  Or was
> it just inherited from inheritance?

Can't speak for the authors but I'm sure it's intentional.  Making an
existing table fit into a partitioning hierarchy is a useful thing to
be able to do, and you can't reorder columns.

>> pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
>> ... command, so the ordering it lost.
>
> So it appears that either the above should be prohibited or pg_dump
> should be fixed.

pg_dump already knows how to do create-then-attach for binary
upgrades, for a less debatable reason: tuple format must be preserved.
To make normal dump/restore preserve the order, we could either make
it *always* write create-then-attach, or do it only if required.  I'd
vote for doing it only if required because of different column order,
because I don't want to see 1,000 partitions dumped in "long format"
when the short and sweet CREATE... PARTITION OF ... syntax could
usually be used.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] different column orders in regression test database

2017-05-18 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 5/18/17 16:21, Thomas Munro wrote:
> > That's because if you attach a partition with a different column
> > ordering,
> 
> Is it intentional and sensible to allow that in the first place?  Or was
> it just inherited from inheritance?

I think it was deliberately allowed.  Note that if you have a table with
dropped columns which you want to make a partition of another table
without them, there will need to be some physical transformation of
the tuples anyway in order for reading to work; we certainly don't want
to reject such cases.

> > pg_dump dumps it with a normal CREATE TABLE ... PARTITION OF
> > ... command, so the ordering it lost.
> 
> So it appears that either the above should be prohibited or pg_dump
> should be fixed.

Are you proposing that if the ordering of the columns of a partition is
not identical to that of its parent table, the partition should be
dumped as a regular CREATE TABLE followed by ALTER TABLE .. ATTACH PARTITION,
instead of a single CREATE TABLE .. PARTITION OF command?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Michael Paquier

On Fri, May 19, 2017 at 12:03 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Ugh, really?  Are we sure that the current behavior is anything other
>> than a bug?

Point was raised already upthread by me ince, which is what lead me to
the reasoning of my last argument:
https://www.postgresql.org/message-id/31695.1494471...@sss.pgh.pa.us
And, like you, I saw that as an oversight.

> The idea that VACUUM foo (a) implies ANALYZE doesn't
>> really sit very well with me in the first place.  I'd be more inclined
>> to reject that with an ERROR complaining that the column list can't be
>> specified except for ANALYZE.
>
> Yeah, that's probably more sensible.  I think the rationale was "if you
> specify columns you must want the ANALYZE option, so why make you type
> that in explicitly?".   But I can see the argument that it's likely to
> confuse users who might have a weaker grasp of the semantics.

I am fine with an ERROR if a column list is specified without ANALYZE
listed in the options. But that should happen as well for the case
where only one relation is listed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Michael Paquier

On Fri, May 19, 2017 at 9:06 AM, Michael Paquier
 wrote:
> I am fine with an ERROR if a column list is specified without ANALYZE
> listed in the options. But that should happen as well for the case
> where only one relation is listed.

Perhaps this could be changed for 10? Changing the behavior in
back-branches looks sensitive to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Michael Paquier

On Thu, May 18, 2017 at 11:30 PM, Robert Haas  wrote:
> On Thu, May 18, 2017 at 7:06 AM, Michael Paquier
>  wrote:
>> FWIW, I am of the opinion to not have an implementation based on any
>> SQLSTATE codes, as well as not doing something similar to JDBC.
>> Keeping things simple is one reason, a second is that the approach
>> taken by libpq is correct at its root.
>
> Because why?

Because it is critical to let the user know as well *why* an error
happened. Imagine that this feature is used with multiple nodes, all
primaries. If a DB admin busted the credentials in one of them then
all the load would be redirected on the other nodes, without knowing
what is actually causing the error. Then the node where the
credentials have been changed would just run idle, and the application
would be unaware of that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] fix for table syncing with different column order

2017-05-18 Thread Peter Eisentraut

The issues with the different column orders in the regression test
database also revealed that logical replication table syncing was broken
for that case.  Here is a fix and a test.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Fix-table-syncing-with-different-column-order.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation about pg_stat_bgwriter

2017-05-18 Thread Peter Eisentraut

On 5/10/17 04:38, Kyotaro HORIGUCHI wrote:
> Hi. While I read the documentation I found the following
> description about some columns in pg_stat_bgwriter.
> 
> https://www.postgresql.org/docs/devel/static/monitoring-stats.html#pg-stat-bgwriter-view
> 
> This table shows cluster-global values, not per-backend values.
> 
>> maxwritten_clean:
>>   Number of times the background writer stopped a cleaning scan
>>   because it had written too many buffers
>> buffers_backend:
>>   Number of buffers written directly by a backend
>> buffers_backend_fsync:
>>   Number of times a backend had to execute its own fsync call
>>   (normally the background writer handles those even when the
>>   backend does its own write)
> Since the values are the summary in a cluster, the 'a backend's
> in the last two seems wrong *to me*. I suppose the 'a backend'
> should be just 'backends' or 'backends other than the background
> writer' (This seems a bit verbose.).

The text looks correct to me as it currently stands.  Your alternative
phrasings are also correct.  But there is no need to change this, I think.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOT NULL constraints on range partition key columns

2017-05-18 Thread Amit Langote

On 2017/05/19 3:06, Robert Haas wrote:
> On Tue, May 16, 2017 at 8:19 PM, Amit Langote
>  wrote:
>> Thanks for the review.  I updated the comments.
> 
> I found several more places that also needed to be updated using 'git
> grep'.  Committed with various additions.

Thanks a lot.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting error at the time of dropping subscription and few more issues

2017-05-18 Thread Peter Eisentraut

On 5/12/17 13:25, Masahiko Sawada wrote:
>> postgres=#  alter subscription sub set publication pub refresh;
>> NOTICE:  removed subscription for table public.t1
>> NOTICE:  removed subscription for table public.t2
>> ALTER SUBSCRIPTION
>>
>> I think  - in publication too ,we should provide NOTICE messages.
>>
> I guess one of the reason why we emit such a NOTICE message is that
> subscriber cannot control which table the upstream server replicate.
> So if a table got disassociated on the publisher the subscriber should
> report that to user. On the other hand, since the publication can
> control it and the changes are obvious, I'm not sure we really need to
> do that.
> 
> BTW I think it's better for the above NOTICE message to have subscription 
> name.

Why?  These come directly has a result of the ALTER SUBSCRIPTION
command, so you see what they refer to.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Tom Lane

Michael Paquier  writes:
> On Fri, May 19, 2017 at 9:06 AM, Michael Paquier
>  wrote:
>> I am fine with an ERROR if a column list is specified without ANALYZE
>> listed in the options. But that should happen as well for the case
>> where only one relation is listed.

> Perhaps this could be changed for 10? Changing the behavior in
> back-branches looks sensitive to me.

It would make more sense to me to change it as part of the feature
addition, when/if this patch gets committed.  Otherwise, we just break
code that works today and we can't point to any solid benefit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Peter Eisentraut

On 5/17/17 13:19, Tom Lane wrote:
> I agree with Robert's point that major redesign of the feature on the
> basis of one complaint isn't necessarily the way to go.  Since the
> existing behavior is already out in beta1, let's wait and see if anyone
> else complains.  We don't need to fix it Right This Instant.
> 
> Maybe add this to the list of open issues to reconsider mid-beta?

The problem is that if we decide to change the behavior mid-beta, then
we'll only have the rest of beta to find out whether people will like
the other behavior.

I would aim for the behavior that is most suitable for refinement in the
future.  The current behavior seems to match that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Michael Paquier

On Fri, May 19, 2017 at 10:00 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Fri, May 19, 2017 at 9:06 AM, Michael Paquier
>>  wrote:
>>> I am fine with an ERROR if a column list is specified without ANALYZE
>>> listed in the options. But that should happen as well for the case
>>> where only one relation is listed.
>
>> Perhaps this could be changed for 10? Changing the behavior in
>> back-branches looks sensitive to me.
>
> It would make more sense to me to change it as part of the feature
> addition, when/if this patch gets committed.  Otherwise, we just break
> code that works today and we can't point to any solid benefit.

Fine for me as well. I would suggest to split the patch into two parts
to ease review then:
- Rework this error handling for one relation.
- The main patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - using names as primary names of plpgsql function parameters instead $ based names

2017-05-18 Thread Peter Eisentraut

On 5/15/17 14:34, Pavel Stehule wrote:
> Now, I when I working on plpgsql_check, I have to check function
> parameters. I can use fn_vargargnos and out_param_varno for list of
> arguments and related varno(s). when I detect some issue, I am using
> refname. It is not too nice now, because these refnames are $ based.
> Long names are alias only. There are not a possibility to find
> related alias.
> 
> So, my proposal. Now, we can use names as refname of parameter
> variable. $ based name can be used as alias. From user perspective
> there are not any change. 
> 
> Comments, notes?
> 
> here is a patch

I don't understand what this is changing.  There are not documentation
or test changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting error at the time of dropping subscription and few more issues

2017-05-18 Thread Masahiko Sawada

On Fri, May 19, 2017 at 9:54 AM, Peter Eisentraut
 wrote:
> On 5/12/17 13:25, Masahiko Sawada wrote:
>>> postgres=#  alter subscription sub set publication pub refresh;
>>> NOTICE:  removed subscription for table public.t1
>>> NOTICE:  removed subscription for table public.t2
>>> ALTER SUBSCRIPTION
>>>
>>> I think  - in publication too ,we should provide NOTICE messages.
>>>
>> I guess one of the reason why we emit such a NOTICE message is that
>> subscriber cannot control which table the upstream server replicate.
>> So if a table got disassociated on the publisher the subscriber should
>> report that to user. On the other hand, since the publication can
>> control it and the changes are obvious, I'm not sure we really need to
>> do that.
>>
>> BTW I think it's better for the above NOTICE message to have subscription 
>> name.
>
> Why?  These come directly has a result of the ALTER SUBSCRIPTION
> command, so you see what they refer to.
>

Because I thought it would be helpful for DBA when the log is appeared
in server log. It doesn't appear by default though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix refresh_option syntax of ALTER SUBSCRIPTION in document

2017-05-18 Thread Peter Eisentraut

On 5/17/17 01:26, Masahiko Sawada wrote:
> While reading documentation I found refresh_option syntax of ALTER
> SUBSCRIPTION in documentation is not correct.
> 
> ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (refresh_option value [, ...] 
> )
> should be changed to
> ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (refresh_option [=
> value] [, ...])
> 
> Attached patch fixes this.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Eisentraut
> The problem is that if we decide to change the behavior mid-beta, then we'll
> only have the rest of beta to find out whether people will like the other
> behavior.
> 
> I would aim for the behavior that is most suitable for refinement in the
> future.  The current behavior seems to match that.

I think the pre-final release period is the very timing for refinement, in the 
perspective of users and PG developers as users.  One thing I'm worried is that 
people here might become more conservative against change once the final 
version is released.

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] different column orders in regression test database

2017-05-18 Thread Peter Eisentraut

On 5/18/17 19:07, Thomas Munro wrote:
> To make normal dump/restore preserve the order, we could either make
> it *always* write create-then-attach, or do it only if required.  I'd
> vote for doing it only if required because of different column order,
> because I don't want to see 1,000 partitions dumped in "long format"
> when the short and sweet CREATE... PARTITION OF ... syntax could
> usually be used.

Doing it the long way only when necessary makes sense.  Maybe never
doing it the long way also makes sense, as long as we're clear that
that's what we want.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-05-18 Thread Michael Paquier

On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
 wrote:
> On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
>  wrote:
>> I had my eyes on the WAL sender code this morning, and I have noticed
>> that walsender.c is not completely consistent with the PID lookups it
>> does in walsender.c. In two code paths, the PID value is checked
>> without holding the WAL sender spin lock (WalSndRqstFileReload and
>> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
>> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
>> doing for ages.
>
> There is also code that accesses shared walsender state without
> spinlocks over in syncrep.c.  I think that file could use a few words
> of explanation for why it's OK to access pid, state and flush without
> synchronisation.

Yes, that is read during the quorum and priority sync evaluation.
Except sync_standby_priority, all the other variables should be
protected using the spin lock of the WAL sender. walsender_private.h
is clear regarding that. So the current coding is inconsistent even
there. Attached is an updated patch.
-- 
Michael


walsnd-pid-races-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Michael Paquier

On Fri, May 19, 2017 at 11:01 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Eisentraut
>> The problem is that if we decide to change the behavior mid-beta, then we'll
>> only have the rest of beta to find out whether people will like the other
>> behavior.
>>
>> I would aim for the behavior that is most suitable for refinement in the
>> future.  The current behavior seems to match that.
>
> I think the pre-final release period is the very timing for refinement, in 
> the perspective of users and PG developers as users.

Sure that is the correct period to argue.

>  One thing I'm worried is that people here might become more conservative 
> against change once the final version is released.

Any redesign after release would finish by being a new feature, which
would be in this case a new connection parameter or an extra option
that works with the current parameter, say something to allow soft or
hard failures when multiple hosts are defined.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-05-18 Thread Masahiko Sawada

On Wed, May 17, 2017 at 1:30 AM, Robert Haas  wrote:
> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila  wrote:
>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane  wrote:
>>> Robert Haas  writes:
 On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada  
 wrote:
> ... I'd like to propose to change relation
> extension lock management so that it works using LWLock instead.
>>>
 That's not a good idea because it'll make the code that executes while
 holding that lock noninterruptible.
>>>
>>> Is that really a problem?  We typically only hold it over one kernel call,
>>> which ought to be noninterruptible anyway.
>>
>> During parallel bulk load operations, I think we hold it over multiple
>> kernel calls.
>
> We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
> kernel call, no?  Nor is vm_extend.

Yeah, these functions could call more than one kernel calls while
holding extension lock.

> Also, it's not just the backend doing the filesystem operation that's
> non-interruptible, but also any waiters, right?
>
> Maybe this isn't a big problem, but it does seem to be that it would
> be better to avoid it if we can.
>

I agree to change it to be interruptible for more safety.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> On Thu, May 18, 2017 at 11:30 PM, Robert Haas  wrote:
> > Because why?
> 
> Because it is critical to let the user know as well *why* an error happened.
> Imagine that this feature is used with multiple nodes, all primaries. If
> a DB admin busted the credentials in one of them then all the load would
> be redirected on the other nodes, without knowing what is actually causing
> the error. Then the node where the credentials have been changed would just
> run idle, and the application would be unaware of that.

In that case, the DBA can know the authentication errors in the server log of 
the idle instance.

I'm sorry to repeat myself, but libpq connection failover is the feature for 
HA.  So I believe what to prioritize is HA.

And the documentation looks somewhat misleading.  I get the impression that 
libpq tries hosts until success regardless of failure reason: it aims for 
successful connection, not giving up early.  Who can read this as "libpq gives 
up connection under some circumstances?"


https://www.postgresql.org/docs/devel/static/libpq-connect.html
--
It is possible to specify multiple host components, each with an optional port 
component, in a single URI. A URI of the form 
postgresql://host1:port1,host2:port2,host3:port3/ is equivalent to a connection 
string of the form host=host1,host2,host3 port=port1,port2,port3. Each host 
will be tried in turn until a connection is successfully established.

...
If multiple host names are specified, each will be tried in turn in the order 
given.
--

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-18 Thread Peter Eisentraut

On 5/18/17 11:11, Noah Misch wrote:
> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is past due for
> your status update.  Please reacquaint yourself with the policy on open item
> ownership[1] and then reply immediately.  If I do not hear from you by
> 2017-05-19 16:00 UTC, I will transfer this item to release management team
> ownership without further notice.

There is no progress on this issue at the moment.  I will report again
next Wednesday.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.)

2017-05-18 Thread Tom Lane

I wrote:
> Curiously, there are other enum declarations that don't get the phony
> extra indentation.  I traced through it a bit and eventually found that
> the difference between OK and not OK is that the declarations that don't
> get messed up look like "typedef enum enumlabel ...", ie the problem
> with this one is there's no extra identifier after "enum".  The proximate
> cause of that is this line in indent.c:

>   ps.in_decl = ps.decl_on_line = ps.last_token != type_def;

I found that things seem to work better with this change:

@@ -939,7 +938,7 @@ check_type:
}
ps.in_or_st = true; /* this might be a structure or initialization
 * declaration */
-   ps.in_decl = ps.decl_on_line = ps.last_token != type_def;
+   ps.in_decl = ps.decl_on_line = (ps.last_token != type_def || type_code 
== structure);
if ( /* !ps.in_or_st && */ ps.dec_nest <= 0)
ps.just_saw_decl = 2;
prefix_blankline_requested = 0;

I have no idea if that's a correct or sufficient fix, but it makes the
oddnesses around unnamed enum and struct declarations in the PG sources
go away.

There remain a small number of cases that look like bugs.  One is the
case I showed before:

@@ -531,7 +531,7 @@ static bool
 ean2string(ean13 ean, bool errorOK, char *result, bool shortType)
 {
const char *(*TABLE)[2];
-   const unsigned (*TABLE_index)[2];
+   const unsigned(*TABLE_index)[2];
enum isn_type type = INVALID;
 
char   *aux;

I have an impression that this might be related to the ps.in_decl business,
but the patch shown above did not change it.  Possibly related is this:

diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 986fe6e..a2d11e5 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -149,8 +149,8 @@ typedef struct GinBtreeData
bool(*findItem) (GinBtree, GinBtreeStack *);
 
/* insert methods */
-   OffsetNumber (*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
-   GinPlaceToPageRC (*beginPlaceToPage) (GinBtree, Buffer, GinBtreeStack *, 
void *, BlockNumber, void **, Page *, Page *);
+   OffsetNumber(*findChildPtr) (GinBtree, Page, BlockNumber, OffsetNumber);
+   GinPlaceToPageRC(*beginPlaceToPage) (GinBtree, Buffer, GinBtreeStack *, 
void *, BlockNumber, void **, Page *, Page *);
void(*execPlaceToPage) (GinBtree, Buffer, GinBtreeStack *, void *, 
BlockNumber, void *);
void   *(*prepareDownlink) (GinBtree, Buffer);
void(*fillRoot) (GinBtree, Page, BlockNumber, Page, BlockNumber, 
Page);

Whatever's going on there, it seems to affect only a very small number
of places.  No idea why.

Also, multiple comments on the same line are now run together, which
is surely not better:

@@ -2144,11 +2144,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool 
opportunistic)
 */
NewPage->xlp_magic = XLOG_PAGE_MAGIC;
 
-   /* NewPage->xlp_info = 0; *//* done by memset */
+   /* NewPage->xlp_info = 0; *//* done by memset */
NewPage->xlp_tli = ThisTimeLineID;
NewPage->xlp_pageaddr = NewPageBeginPtr;
 
-   /* NewPage->xlp_rem_len = 0; */ /* done by memset */
+   /* NewPage->xlp_rem_len = 0; *//* done by memset */
 
/*
 * If online backup is not in progress, mark the header to indicate

Also, I found two places where an overlength comment line is simply busted
altogether --- notice that a character is missing at the split point:

@@ -257,7 +257,8 @@ get_pgpid(bool is_status_request)
/*
 * The Linux Standard Base Core Specification 3.1 says this should
 * return '4, program or service status is unknown'
-* https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-g
+* https://refspecs.linu
+* base.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-g
 * eneric/iniscrptact.html
 */
exit(is_status_request ? 4 : 1);

@@ -629,7 +629,8 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, 
PGresult **res)
/*
 * This code erroneously assumes '\.' on a line alone
 * inside a quoted CSV string terminates the \copy.
-* http://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@w
+* http://w
+* w.postgresql.org/message-id/E1TdNVQ-0001ju-GO@w
 * rigleys.postgresql.org
 */
if (strcmp(buf, "\\.\n") == 0 ||

Another problem is that the handling of unrecognized typedef names as
field types in a struct has gotten significantly worse:

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index a35addf..919d262 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -114,14 +114,14 @@ typedef struct SH_TYPE
uint32  grow_threshold;
 
/* hash buckets */
-   SH_ELEMENT_TYPE *data;
+   SH_ELEMENT

Re: [HACKERS] Replication status in logical replication

2017-05-18 Thread Masahiko Sawada

On Wed, Apr 12, 2017 at 5:31 AM, Simon Riggs  wrote:
> On 22 March 2017 at 02:50, Masahiko Sawada  wrote:
>
>> When using logical replication, I ran into a situation where the
>> pg_stat_replication.state is not updated until any wal record is sent
>> after started up. For example, I set up logical replication with 2
>> subscriber and restart the publisher server, but I see the following
>> status for a while (maybe until autovacuum run).
> ...
>
>> Attached patch fixes this behavior by updating WalSndCaughtUp before
>> trying to read next WAL if already caught up.
>
> Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or
> as far as it goes).
>
> Objections to commit?
>

Seems we still have this issue. Any update or comment on this? Barring
any objections, I'll add this to the open item so it doesn't get
missed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-18 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> >  One thing I'm worried is that people here might become more conservative
> against change once the final version is released.
> 
> Any redesign after release would finish by being a new feature, which would
> be in this case a new connection parameter or an extra option that works
> with the current parameter, say something to allow soft or hard failures
> when multiple hosts are defined.

Hmm... but I can't imagine the parameter would be very meaningful for users.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Bossart, Nathan

On 5/18/17, 6:12 PM, "Michael Paquier"  wrote:
> Fine for me as well. I would suggest to split the patch into two parts
> to ease review then:
> - Rework this error handling for one relation.
> - The main patch.

I’d be happy to do so, but I think part one would be pretty small, and almost 
all of the same code needs to be changed in the main patch anyway.  I do not 
foresee a huge impact on review-ability either way.  If others disagree, I can 
split it up.

Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Tom Lane

"Bossart, Nathan"  writes:
> On 5/18/17, 6:12 PM, "Michael Paquier"  wrote:
>> Fine for me as well. I would suggest to split the patch into two parts
>> to ease review then:
>> - Rework this error handling for one relation.
>> - The main patch.

> I’d be happy to do so, but I think part one would be pretty small, and almost 
> all of the same code needs to be changed in the main patch anyway.  I do not 
> foresee a huge impact on review-ability either way.  If others disagree, I 
> can split it up.

Yeah, I'm dubious that that's really necessary.  If the change proves
bigger than you're anticipating, maybe it's worth a two-step approach,
but I share your feeling that it probably isn't.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Bossart, Nathan

On 5/18/17, 8:03 PM, "Tom Lane"  wrote:
>”Bossart, Nathan"  writes:
>> On 5/18/17, 6:12 PM, "Michael Paquier"  wrote:
>>> Fine for me as well. I would suggest to split the patch into two parts
>>> to ease review then:
>>> - Rework this error handling for one relation.
>>> - The main patch.
>>
>> I’d be happy to do so, but I think part one would be pretty small, and 
>> almost all of the same code needs to be changed in the main patch anyway.  I 
>> do not foresee a huge impact on review-ability either way.  If others 
>> disagree, I can split it up.
>
>Yeah, I'm dubious that that's really necessary.  If the change proves
>bigger than you're anticipating, maybe it's worth a two-step approach,
>but I share your feeling that it probably isn’t.

Just in case it was missed among the discussion, I’d like to point out that v5 
of the patch includes the “ERROR if ANALYZE not specified” change.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation about pg_stat_bgwriter

2017-05-18 Thread Kyotaro HORIGUCHI

Hello, thank you for the reply.

At Thu, 18 May 2017 20:48:44 -0400, Peter Eisentraut 
 wrote in 
<343d4cdb-e25d-867d-2830-6502eca4d...@2ndquadrant.com>
> On 5/10/17 04:38, Kyotaro HORIGUCHI wrote:
> > Hi. While I read the documentation I found the following
> > description about some columns in pg_stat_bgwriter.
> > 
> > https://www.postgresql.org/docs/devel/static/monitoring-stats.html#pg-stat-bgwriter-view
> > 
> > This table shows cluster-global values, not per-backend values.
> > 
> >> maxwritten_clean:
> >>   Number of times the background writer stopped a cleaning scan
> >>   because it had written too many buffers
> >> buffers_backend:
> >>   Number of buffers written directly by a backend
> >> buffers_backend_fsync:
> >>   Number of times a backend had to execute its own fsync call
> >>   (normally the background writer handles those even when the
> >>   backend does its own write)
> > Since the values are the summary in a cluster, the 'a backend's
> > in the last two seems wrong *to me*. I suppose the 'a backend'
> > should be just 'backends' or 'backends other than the background
> > writer' (This seems a bit verbose.).
> 
> The text looks correct to me as it currently stands.  Your alternative
> phrasings are also correct.  But there is no need to change this, I think.

It's enough for me to know that it's not a kind of typo. Thanks.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-18 Thread Michael Paquier

On Fri, May 19, 2017 at 12:17 PM, Bossart, Nathan  wrote:
> Just in case it was missed among the discussion, I’d like to point out that 
> v5 of the patch includes the “ERROR if ANALYZE not specified” change.

As long as I don't forget...

+VACUUM vactst (i);
Looking at the tests of v5, I think that you should as well add a test
that lists multiple relations with one or more relations listing a
column list for a VACUUM query, without ANALYZE specified in the
options as the parsings of VacuumStmt and AnalyzeStmt are two
different code paths, giving something like that:
VACUUM (FREEZE) rel1, rel2(col1,col2); --error
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - using names as primary names of plpgsql function parameters instead $ based names

2017-05-18 Thread Pavel Stehule

2017-05-19 3:14 GMT+02:00 Peter Eisentraut :

> On 5/15/17 14:34, Pavel Stehule wrote:
> > Now, I when I working on plpgsql_check, I have to check function
> > parameters. I can use fn_vargargnos and out_param_varno for list of
> > arguments and related varno(s). when I detect some issue, I am using
> > refname. It is not too nice now, because these refnames are $ based.
> > Long names are alias only. There are not a possibility to find
> > related alias.
> >
> > So, my proposal. Now, we can use names as refname of parameter
> > variable. $ based name can be used as alias. From user perspective
> > there are not any change.
> >
> > Comments, notes?
> >
> > here is a patch
>
> I don't understand what this is changing.  There are not documentation
> or test changes.
>

This change is visible only for tools like plpgsql_check probably and
similar tools. Now, this info is not available from user space (maybe only
from some error message, I have to recheck it)

What is changed.

PLpgSQL variables has field refname - it can be used if you iterate over
variables or it is used for some error messages. When some variables is
searching, then namespace aliases are used. Now for any function argument
is created variable with refname "$x" and namespace aliases "$x" and "name"
if name exists. There are not any way, how to get a aliases related to
variable. When I raise a warning in plpgsql about function arguments I have
to print $x based messages, what is not too readable if function has lot of
parameters.

The proposal is the change of refname "$x" to "name" for all variables
created for function arguments.

There are another possibilities - maintain list of all aliases for
variables or dynamically search all related aliases in namespace tree. Both
little bit more code.

Regards

Pavel

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-18 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 18 May 2017 10:16:35 -0400, Robert Haas  wrote 
in 
> On Wed, May 17, 2017 at 6:58 AM, Masahiko Sawada  
> wrote:
> > I think the above changes can solve this issue but It seems to me that
> > holding AccessExclusiveLock on pg_subscription by DROP SUBSCRIPTION
> > until commit could lead another deadlock problem in the future. So I'd
> > to contrive ways to reduce lock level somehow if possible. For
> > example, if we change the apply launcher so that it gets the
> > subscription list only when pg_subscription gets invalid, apply
> > launcher cannot try to launch the apply worker being stopped. We
> > invalidate pg_subscription at commit of DROP SUBSCRIPTION and the
> > apply launcher can get new subscription list which doesn't include the
> > entry we removed. That way we can reduce lock level to
> > ShareUpdateExclusiveLock and solve this issue.
> > Also in your patch, we need to change DROP SUBSCRIPTION as well to
> > resolve another case I encountered, where DROP SUBSCRIPTION waits for
> > apply worker while holding a tuple lock on pg_subscription_rel and the
> > apply worker waits for same tuple on pg_subscription_rel in
> > SetSubscriptionRelState().

Sorry, I don't have enough time to consider this
profoundly. Perhaps will return later.

> I don't really understand the issue being discussed here in any
> detail, but as a general point I'd say that it might be more
> productive to make the locks finer-grained rather than struggling to
> reduce the lock level.  For example, instead of locking all of
> pg_subscription, use LockSharedObject() to lock the individual
> subscription, still with AccessExclusiveLock.  That means that other
> accesses to that subscription also need to take a lock so that you
> actually get a conflict when there should be one, but that should be
> doable.  I expect that trying to manage locking conflicts using only
> catalog-wide locks is a doomed strategy.

Thank you for the suggestion. I think it is a bit differnt from
that. The problem here is that a replication worker may try
reading exactly the tuple for the subscription being deleted just
before responding to a received termination request. So the
finer-graind lock doesn't help.

The focus of resolving this is preventing blocking of workers
caused by DROP SUBSCRIPTION. So Sadasan's patch immediately
released the lock on pg_subscrption and uses another lock for
exclusion. My patch just give up to read the catalog when not
available.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] PostgreSQL 10 Beta 1 Released!

2017-05-18 Thread Huong Dangminh

Hi,

> * 10 Beta Release Notes:
>   https://www.postgresql.org/docs/devel/static/release-10.html

Just a minute thing, but changing of hot_standby default value is not fully 
noted in release-10.sgml.
Please find the attached patch.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


hot_standby_change_release_note_10.patch
Description: hot_standby_change_release_note_10.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [Bug fix]If recovery.conf has target_session_attrs=read-write, the standby fails to start.

2017-05-18 Thread Higuchi, Daisuke

Hello, 

I found a problem with libpq connection failover. 
If primary_conninfo in recovery.conf has 'target_session_attrs=read-write', the 
standby fails to start. 

How to reproduce the bug: 
1. Prepare two standby (standby_1, standby_2) for one master.
   On standby_1, primary_conninfo in recovery.conf specifies master. 
   On standby_2, primary_conninfo in recovery.conf specifies master and 
standby_2 and target_session_attrs=read-write.
   For example, the standby_2's recovery.conf setting is as follows.
   ---
   standby_mode = 'on'
   primary_conninfo = 'host=localhost,localhost port=5432,5433 
target_session_attrs=read-write'
   recovery_target_timeline = 'latest'
   ---
2. Starts master, standby_1 and standby_2. When try to start standby_2, the 
following error is output. 
   ---
  Test "show transaction_read_only" failed on "localhost: 5432"
  ERROR: not connected to database
  Test "show transaction_read_only" failed on "localhost: 5433"
  ERROR: not connected to database
   ---
I wanted to test if standby_2 re-connects to the new master when master is 
stopped and standby_1 is promoted, but I failed at the start-up stage.

The reason of this problem is that sending 'show transaction_read_only' is 
failed. 
'show' must be in uppercase letters because the streaming replication protocol 
only accepts capital letters. 
In psql and libpq applications that do not use the streaming replication 
protocol, this error does not occur.

The attached patch fixes this. This patch changes 'show transaction_read_only' 
to 'SHOW transaction_read_only'.

Regards, 
Daisuke, Higuchi



target_session_attrs_in_recovery_conf_v1.patch
Description: target_session_attrs_in_recovery_conf_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] different column orders in regression test database

2017-05-18 Thread Amit Langote

On 2017/05/19 11:02, Peter Eisentraut wrote:
> On 5/18/17 19:07, Thomas Munro wrote:
>> To make normal dump/restore preserve the order, we could either make
>> it *always* write create-then-attach, or do it only if required.  I'd
>> vote for doing it only if required because of different column order,
>> because I don't want to see 1,000 partitions dumped in "long format"
>> when the short and sweet CREATE... PARTITION OF ... syntax could
>> usually be used.
> 
> Doing it the long way only when necessary makes sense.  Maybe never
> doing it the long way also makes sense, as long as we're clear that
> that's what we want.

I tend to prefer the latter - never doing it the long way (which is what
happens today [1]).  It's always better for all the partitions to have the
same tuple descriptor as the parent in the target database, which is what
the short CREATE TABLE .. PARTITION OF syntax would result in.  The long
format is unavoidable in the case of --binary-upgrade dump mode for
obvious reasons.

Thanks,
Amit

[1]

create table p (a int, b char) partition by list (a);
create table p1 (c int, b char, a int);
alter table p1 drop c;
alter table p attach partition p1 for values in (1);
insert into p values (1, 'a');
select tableoid::regclass, * from p;
 tableoid | a | b
--+---+---
 p1   | 1 | a
(1 row)

$ pg_dump

CREATE TABLE p (
a integer,
b character(1)
)
PARTITION BY LIST (a);

CREATE TABLE p1 PARTITION OF p
FOR VALUES IN (1);

COPY p1 (b, a) FROM stdin;
a   1
\.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Bug fix]If recovery.conf has target_session_attrs=read-write, the standby fails to start.

2017-05-18 Thread Michael Paquier

On Fri, May 19, 2017 at 1:16 PM, Higuchi, Daisuke
 wrote:
> The reason of this problem is that sending 'show transaction_read_only' is 
> failed.
> 'show' must be in uppercase letters because the streaming replication 
> protocol only accepts capital letters.
> In psql and libpq applications that do not use the streaming replication 
> protocol, this error does not occur.
>
> The attached patch fixes this. This patch changes 'show 
> transaction_read_only' to 'SHOW transaction_read_only'.

 if (!PQsendQuery(conn,
- "show transaction_read_only"))
+ "SHOW transaction_read_only"))
Or perhaps the replication command parser could be made more flexible
with lower-case characters for replication commands?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-18 Thread Amit Langote

On 2017/05/19 1:09, Dilip Kumar wrote:
> On Wed, May 17, 2017 at 2:07 PM, amul sul  wrote:
>>> I would suggest "non-zero positive", since that's what we are using in
>>> the documentation.
>>>
>>
>> Understood, Fixed in the attached version.
> 
> Why non-zero positive?  We do support zero for the remainder right?

Using "non-negative integers" (for remainders) was suggested upthread.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-18 Thread Thomas Munro

On Fri, May 19, 2017 at 1:38 AM, Kevin Grittner  wrote:
> On Thu, May 18, 2017 at 5:16 AM, Amit Langote
>  wrote:
>
>> Do we need to update documentation?  Perhaps, some clarification on the
>> inheritance/partitioning behavior somewhere.
>
> Yeah, I think so.

Here is an attempt at documenting the situation in the CREATE TRIGGER
notes section.

>> -Assert((enrmd->reliddesc == InvalidOid) != (enrmd->tupdesc == NULL));
>> +Assert((enrmd->reliddesc == InvalidOid) !=
>> +   (enrmd->tupdesc == NULL));
>>
>> Perhaps, unintentional change?
>
> Agreed; line is not long enough to need to wrap.

Fixed.

>> I'm not sure if it's significant for transition tables, but what if a
>> partition's BR trigger modified the tuple?  Would we want to include the
>> modified version of the tuple in the transition table or the original as
>> the patch does?  Same for the code in CopyFrom().
>
> Good spot!  If the BR trigger on the child table modifies or
> suppresses the action, I strongly feel that must be reflected in the
> transition table.  This needs to be fixed.

Gah.  Right.  In the attached version, there is a still an 'original
tuple' optimisation for insertions (avoiding parent -> child -> parent
conversion), but it's disabled if there are any BEFORE INSERT or
INSTEAD OF INSERT row-level triggers.

That's demonstrated by this part of the regression test, which
modifies the value inserted into the 'CCC' partition (and similar case
for COPY):

insert into parent values ('AAA', 42), ('BBB', 42), ('CCC', 66);
NOTICE:  trigger = parent_stmt_trig, old table = , new table =
(AAA,42), (BBB,42), (CCC,1066)

On Thu, May 18, 2017 at 10:16 PM, Amit Langote
 wrote:
> +typedef struct TriggerTransitionState
> +{
> ...
> +boolttf_delete_old_table;
>
> Just curious: why ttf_?  TriggerTransition field?

Oops.  Changed to "tts_".  I had renamed this struct but not the members.

-- 
Thomas Munro
http://www.enterprisedb.com


transition-tuples-from-child-tables-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Multiple table synchronizations are processed serially

2017-05-18 Thread Masahiko Sawada

Hi,

While testing table synchronization in logical replication, I found
that multiple table synchronizations for a subscription are processed
serially due to lock wait.

I setup pgbench tables on publisher (scalefactor = 1000) and on
subscriber, and truncated these tables on subscriber, and then created
publication for all pgbench tables. When I created a subscription I
got the following log messages.

2017-05-19 13:12:56.311 JST [54970] LOG:  logical replication apply
for subscription hoge_sub started
2017-05-19 13:12:56.314 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:56.317 JST [54972] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_accounts started
2017-05-19 13:12:57.315 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:57.318 JST [54974] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_branches started
2017-05-19 13:12:58.317 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:58.320 JST [54976] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_history started
2017-05-19 13:12:59.319 JST [54970] LOG:  starting logical replication
worker for subscription "hoge_sub"
2017-05-19 13:12:59.322 JST [54978] LOG:  logical replication sync for
subscription hoge_sub, table pgbench_tellers started

Seems all four table sync workers are launched at the almost same
time, but three workers of them get stuck in idle transaction state
when creating replication slot. That is these waiting workers cannot
proceed its work until first connected table sync worker finishes. ps
command shows the followings.

54898 ?Ss 0:00 postgres: bgworker: logical replication launcher
54970 ?Ss 0:00 postgres: bgworker: logical replication
worker for subscription 16432
54971 ?Ss 0:00 postgres: wal sender process masahiko [local] idle
54972 ?Rs10:38 postgres: bgworker: logical replication
worker for subscription 16432 sync 16395
54973 ?Ss 1:52 postgres: wal sender process masahiko [local] COPY
54974 ?Ss 0:00 postgres: bgworker: logical replication
worker for subscription 16432 sync 16398
54975 ?Ss 0:00 postgres: wal sender process masahiko
[local] idle in transaction waiting
54976 ?Ss 0:00 postgres: bgworker: logical replication
worker for subscription 16432 sync 16389
54977 ?Ss 0:00 postgres: wal sender process masahiko
[local] idle in transaction waiting
54978 ?Ss 0:00 postgres: bgworker: logical replication
worker for subscription 16432 sync 16392
54979 ?Ss 0:00 postgres: wal sender process masahiko
[local] idle in transaction waiting

Also, here is backtrace of the waiting wal sender process.

#0  0x7f24b4030783 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0084c58d in WaitEventSetWaitBlock (set=0x25ed8e0,
cur_timeout=-1, occurred_events=0x7fff1ea322f0, nevents=1) at
latch.c:1045
#2  0x0084c454 in WaitEventSetWait (set=0x25ed8e0, timeout=-1,
occurred_events=0x7fff1ea322f0, nevents=1, wait_event_info=50331652)
at latch. c:997
#3  0x0084bbe5 in WaitLatchOrSocket (latch=0x7f24ad385024,
wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at
latch.c:385
#4  0x0084bac6 in WaitLatch (latch=0x7f24ad385024,
wakeEvents=1, timeout=0, wait_event_info=50331652) at latch.c:339
#5  0x00863632 in ProcSleep (locallock=0x25f2fb0,
lockMethodTable=0xbdb020) at proc.c:1255
#6  0x0085c83c in WaitOnLock (locallock=0x25f2fb0,
owner=0x2604dd8) at lock.c:1702
#7  0x0085b541 in LockAcquireExtended (locktag=0x7fff1ea32800,
lockmode=5, sessionLock=0 '\000', dontWait=0 '\000',
reportMemoryError=1 '\001\ ') at lock.c:998
#8  0x0085ab46 in LockAcquire (locktag=0x7fff1ea32800,
lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:688
#9  0x00859a16 in XactLockTableWait (xid=598, rel=0x0,
ctid=0x0, oper=XLTW_None) at lmgr.c:587
#10 0x0080d87a in SnapBuildWaitSnapshot (running=0x25ed8a8,
cutoff=599) at snapbuild.c:1400
#11 0x0080d64c in SnapBuildFindSnapshot (builder=0x26ab530,
lsn=14202458264, running=0x25ed8a8) at snapbuild.c:1311
#12 0x0080d1bb in SnapBuildProcessRunningXacts
(builder=0x26ab530, lsn=14202458264, running=0x25ed8a8) at
snapbuild.c:1106
#13 0x007fb779 in DecodeStandbyOp (ctx=0x25f7090,
buf=0x7fff1ea32930) at decode.c:314
#14 0x007fb28f in LogicalDecodingProcessRecord (ctx=0x25f7090,
record=0x25f7350) at decode.c:117
#15 0x007ff0d5 in DecodingContextFindStartpoint
(ctx=0x25f7090) at logical.c:458
#16 0x008141c8 in CreateReplicationSlot (cmd=0x25ecba8) at
walsender.c:941
#17 0x00814ec5 in exec_replication_command
(cmd_string=0x26511e0 "CREATE_REPLICATION_SLOT
\"hoge_sub_16432_sync_16398\" TEMPORARY LOGICAL pgoutput
USE_SNAPSHOT") at walsender.c:

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-18 Thread Amit Langote

On 2017/05/19 14:01, Thomas Munro wrote:
> On Fri, May 19, 2017 at 1:38 AM, Kevin Grittner  wrote:
>> On Thu, May 18, 2017 at 5:16 AM, Amit Langote
>>  wrote:
>>
>>> Do we need to update documentation?  Perhaps, some clarification on the
>>> inheritance/partitioning behavior somewhere.
>>
>> Yeah, I think so.
> 
> Here is an attempt at documenting the situation in the CREATE TRIGGER
> notes section.

Looks good, thanks.

>>> I'm not sure if it's significant for transition tables, but what if a
>>> partition's BR trigger modified the tuple?  Would we want to include the
>>> modified version of the tuple in the transition table or the original as
>>> the patch does?  Same for the code in CopyFrom().
>>
>> Good spot!  If the BR trigger on the child table modifies or
>> suppresses the action, I strongly feel that must be reflected in the
>> transition table.  This needs to be fixed.
> 
> Gah.  Right.  In the attached version, there is a still an 'original
> tuple' optimisation for insertions (avoiding parent -> child -> parent
> conversion), but it's disabled if there are any BEFORE INSERT or
> INSTEAD OF INSERT row-level triggers.
> 
> That's demonstrated by this part of the regression test, which
> modifies the value inserted into the 'CCC' partition (and similar case
> for COPY):
> 
> insert into parent values ('AAA', 42), ('BBB', 42), ('CCC', 66);
> NOTICE:  trigger = parent_stmt_trig, old table = , new table =
> (AAA,42), (BBB,42), (CCC,1066)

Seems to work correctly.

I saw in the latest patch that now ExecSetupTriggerTransitionState() looks
at mtstate->mt_partition_dispatch_info when setting up the transition
conversion map.  In the case where it's non-NULL, you may have realized
that mt_transition_tupconv_map will be an exact copy of
mt_partition_tupconv_maps that's already built.  Would it perhaps be a
good idea to either share the same or make a copy using memcpy() instead
of doing the convert_tuples_by_name() calls again?

> On Thu, May 18, 2017 at 10:16 PM, Amit Langote
>  wrote:
>> +typedef struct TriggerTransitionState
>> +{
>> ...
>> +boolttf_delete_old_table;
>>
>> Just curious: why ttf_?  TriggerTransition field?
> 
> Oops.  Changed to "tts_".  I had renamed this struct but not the members.

Ah.  BTW, maybe it's not a problem, but the existing TupleTableSlot's
member names are prefixed with tts_, too. :)

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-05-18 Thread Masahiko Sawada

On Fri, May 19, 2017 at 11:05 AM, Michael Paquier
 wrote:
> On Thu, May 18, 2017 at 1:43 PM, Thomas Munro
>  wrote:
>> On Thu, May 11, 2017 at 1:48 PM, Michael Paquier
>>  wrote:
>>> I had my eyes on the WAL sender code this morning, and I have noticed
>>> that walsender.c is not completely consistent with the PID lookups it
>>> does in walsender.c. In two code paths, the PID value is checked
>>> without holding the WAL sender spin lock (WalSndRqstFileReload and
>>> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
>>> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
>>> doing for ages.
>>
>> There is also code that accesses shared walsender state without
>> spinlocks over in syncrep.c.  I think that file could use a few words
>> of explanation for why it's OK to access pid, state and flush without
>> synchronisation.
>
> Yes, that is read during the quorum and priority sync evaluation.
> Except sync_standby_priority, all the other variables should be
> protected using the spin lock of the WAL sender. walsender_private.h
> is clear regarding that. So the current coding is inconsistent even
> there. Attached is an updated patch.

Also, as Horiguchi-san pointed out earlier, walreceiver seems need the
similar fix.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-18 Thread Thomas Munro

On Fri, May 19, 2017 at 5:51 PM, Amit Langote
 wrote:
> I saw in the latest patch that now ExecSetupTriggerTransitionState() looks
> at mtstate->mt_partition_dispatch_info when setting up the transition
> conversion map.  In the case where it's non-NULL, you may have realized
> that mt_transition_tupconv_map will be an exact copy of
> mt_partition_tupconv_maps that's already built.  Would it perhaps be a
> good idea to either share the same or make a copy using memcpy() instead
> of doing the convert_tuples_by_name() calls again?

Isn't it the opposite?  mt_partition_tupconv_maps holds maps that
convert the parent format to the partition format.
mt_transition_tupconv_maps holds maps that convert the partition
format to the parent format (= transition tuplestore format).

>> On Thu, May 18, 2017 at 10:16 PM, Amit Langote
>>  wrote:
>>> +typedef struct TriggerTransitionState
>>> +{
>>> ...
>>> +boolttf_delete_old_table;
>>>
>>> Just curious: why ttf_?  TriggerTransition field?
>>
>> Oops.  Changed to "tts_".  I had renamed this struct but not the members.
>
> Ah.  BTW, maybe it's not a problem, but the existing TupleTableSlot's
> member names are prefixed with tts_, too. :)

Would TransitionCaptureState be a better name for this struct?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-18 Thread Amit Langote

On 2017/05/19 15:16, Thomas Munro wrote:
> On Fri, May 19, 2017 at 5:51 PM, Amit Langote
>  wrote:
>> I saw in the latest patch that now ExecSetupTriggerTransitionState() looks
>> at mtstate->mt_partition_dispatch_info when setting up the transition
>> conversion map.  In the case where it's non-NULL, you may have realized
>> that mt_transition_tupconv_map will be an exact copy of
>> mt_partition_tupconv_maps that's already built.  Would it perhaps be a
>> good idea to either share the same or make a copy using memcpy() instead
>> of doing the convert_tuples_by_name() calls again?
> 
> Isn't it the opposite?  mt_partition_tupconv_maps holds maps that
> convert the parent format to the partition format.
> mt_transition_tupconv_maps holds maps that convert the partition
> format to the parent format (= transition tuplestore format).

You're right, never mind.

>>> On Thu, May 18, 2017 at 10:16 PM, Amit Langote
>>>  wrote:
 +typedef struct TriggerTransitionState
 +{
 ...
 +boolttf_delete_old_table;

 Just curious: why ttf_?  TriggerTransition field?
>>>
>>> Oops.  Changed to "tts_".  I had renamed this struct but not the members.
>>
>> Ah.  BTW, maybe it's not a problem, but the existing TupleTableSlot's
>> member names are prefixed with tts_, too. :)
> 
> Would TransitionCaptureState be a better name for this struct?

Yes.  Although, losing the Trigger prefix might make it sound a bit
ambiguous though.  Right above its definition, we have TriggerData.  So,
maybe TriggerTransitionCaptureState or TriggerTransitionCaptureData or
TriggerTransitionData may be worth considering.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hash Functions

2017-05-18 Thread Jeff Davis

On Thursday, May 18, 2017, Robert Haas  wrote:
> My experience with this area has led
> me to give up on the idea of complete uniformity as impractical, and
> instead look at it from the perspective of "what do we absolutely have
> to ban in order for this to be sane?".

I could agree to something like that. Let's explore some of the challenges
there and potential solutions:

1. Dump/reload of hash partitioned data.

Falling back to restore-through-the-root seems like a reasonable answer
here. Moving to a different encoding is not an edge case, but it's not
common either, so a performance penalty seems acceptable. I'm not
immediately sure how we'd implement this in pg_dump/restore, so I'd feel a
little more comfortable if I saw a sketch.

2. Having a lot of hash partitions would be cumbersome

The user would need to create and manage each partition, and try to do
global operations in a sane way. The normal case would probably involve
scripts to do things like add an index to all partitions, or a column. Many
partitions would also just pollute the namespace unless you remember to put
them in a separate schema (yes, it's easy, but most people will still
forget). Some syntax sugar would go a long way here.

3. The user would need to specify details they really don't care about for
each partition.

Things like "modulus 16, remainder 0", "modulus 16, remainder 1" are
tedious boilerplate. And if the user makes a mistake, then 1/16 of inserts
start failing. Probably would be caught during testing, but not exactly a
good user experience. I'm not thrilled about this, considering that all the
user really wants is 16 partitions, but it's not the end of the world.

4. Detach is a foot-gun

If you detach a partition, random inserts will start failing. Not thrilled
about this, but a hapless user would accept most of the blame if they
stumble over it. Another way of saying this is with hash partitioning you
really need the whole set for the table to be online at all. But we can't
really enforce that, because it would limit some of the flexibility that
you have in mind.

Stepping back, your approach might be closer to the general postgres
philosophy of allowing the user to assemble from spare parts first, then a
few releases later we offer some pre-built subassemblies, and a few
releases later we make the typical cases work out of the box. I'm fine with
it as long as we don't paint ourselves into a corner.

Of course we still have work to do on the hash functions. We should solve
at least the most glaring portability problems, and try to harmonize the
hash opfamilies. If you agree, I can put together a patch or two.

Regards,
Jeff Davis

88 matches

Mail list logo