[HACKERS] Burst in WAL size when UUID is used as PK while full_page_writes are enabled

2017-10-26 Thread sanyam jain
Hi,

I was reading the blog 
https://blog.2ndquadrant.com/on-the-impact-of-full-page-writes .

My queries:

How randomness of UUID will likely to create new leaf page in btree index?
In my understanding as the size of UUID is 128 bits i.e. twice of BIGSERIAL , 
more number of pages will be required to store the same number of rows and 
hence there can be increase in WAL size due to FPW .
When compared the index size in local setup UUID index is ~2x greater in size.


Thanks,

Sanyam Jain



Re: [HACKERS] proposal: schema variables

2017-10-26 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Pavel Stehule
> I propose a  new database object - a variable. The variable is persistent
> object, that holds unshared session based not transactional in memory value
> of any type. Like variables in any other languages. The persistence is
> required for possibility to do static checks, but can be limited to session
> - the variables can be temporal.
> 
> 
> My proposal is related to session variables from Sybase, MSSQL or MySQL
> (based on prefix usage @ or @@), or package variables from Oracle (access
> is controlled by scope), or schema variables from DB2. Any design is coming
> from different sources, traditions and has some advantages or disadvantages.
> The base of my proposal is usage schema variables as session variables for
> stored procedures. It should to help to people who try to port complex
> projects to PostgreSQL from other databases.

Very interesting.  I hope I could join the review and testing.

How do you think this would contribute to easing the port of Oracle PL/SQL 
procedures?  Would the combination of orafce and this feature promote 
auto-translation of PL/SQL procedures?  I'm curious what will be the major road 
blocks after adding the schema variable.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementing pg_receivewal --no-sync

2017-10-26 Thread Robert Haas
On Wed, Oct 25, 2017 at 10:03 PM, Michael Paquier
 wrote:
> This sentence is actually wrong, a feedback message is never sent with
> the feedback message.

Eh?

I think this looks basically fine, though I'd omit the short option
for it.  There are only so many letters in the alphabet, so let's not
use them up for developer-convenience options.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-26 Thread Tatsuo Ishii
> Hi,
> 
> I propose a  new database object - a variable. The variable is persistent
> object, that holds unshared session based not transactional in memory value
> of any type. Like variables in any other languages. The persistence is
> required for possibility to do static checks, but can be limited to session
> - the variables can be temporal.
> 
> My proposal is related to session variables from Sybase, MSSQL or MySQL
> (based on prefix usage @ or @@), or package variables from Oracle (access
> is controlled by scope), or schema variables from DB2. Any design is coming
> from different sources, traditions and has some advantages or
> disadvantages. The base of my proposal is usage schema variables as session
> variables for stored procedures. It should to help to people who try to
> port complex projects to PostgreSQL from other databases.
> 
> The Sybase  (T-SQL) design is good for interactive work, but it is weak for
> usage in stored procedures - the static check is not possible. Is not
> possible to set some access rights on variables.
> 
> The ADA design (used on Oracle) based on scope is great, but our
> environment is not nested. And we should to support other PL than PLpgSQL
> more strongly.
> 
> There is not too much other possibilities - the variable that should be
> accessed from different PL, different procedures (in time) should to live
> somewhere over PL, and there is the schema only.
> 
> The variable can be created by CREATE statement:
> 
> CREATE VARIABLE public.myvar AS integer;
> CREATE VARIABLE myschema.myvar AS mytype;
> 
> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type
>   [ DEFAULT expression ] [[NOT] NULL]
>   [ ON TRANSACTION END { RESET | DROP } ]
>   [ { VOLATILE | STABLE } ];
> 
> It is dropped by command DROP VARIABLE  [ IF EXISTS] varname.
> 
> The access rights is controlled by usual access rights - by commands
> GRANT/REVOKE. The possible rights are: READ, WRITE
> 
> The variables can be modified by SQL command SET (this is taken from
> standard, and it natural)
> 
> SET varname = expression;
> 
> Unfortunately we use the SET command for different purpose. But I am
> thinking so we can solve it with few tricks. The first is moving our GUC to
> pg_catalog schema. We can control the strictness of SET command. In one
> variant, we can detect custom GUC and allow it, in another we can disallow
> a custom GUC and allow only schema variables. A new command LET can be
> alternative.
> 
> The variables should be used in queries implicitly (without JOIN)
> 
> SELECT varname;
> 
> The SEARCH_PATH is used, when varname is located. The variables can be used
> everywhere where query parameters are allowed.
> 
> I hope so this proposal is good enough and simple.
> 
> Comments, notes?

Just q quick follow up. Looks like a greate feature!

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: BRIN bloom indexes

2017-10-26 Thread Robert Haas
On Thu, Oct 19, 2017 at 10:15 PM, Tomas Vondra
 wrote:
> Let's see a query like this:
>
> select * from bloom_test
>  where id = '8db1d4a6-31a6-e9a2-4e2c-0e842e1f1772';
>
> The minmax index produces this plan
>
>Heap Blocks: lossy=2061856
>  Execution time: 22707.891 ms
>
> Now, the bloom index:
>
>Heap Blocks: lossy=25984
>  Execution time: 338.946 ms

It's neat to see BRIN being extended like this.  Possibly we could
consider making it a contrib module rather than including it in core,
although I don't have strong feelings about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-26 Thread Pavel Stehule
Hi

2017-10-27 0:07 GMT+02:00 Nico Williams :

> On Thu, Oct 26, 2017 at 09:21:24AM +0200, Pavel Stehule wrote:
> > Comments, notes?
>
> I like it.
>
> I would further like to move all of postgresql.conf into the database,
> as much as possible, as well as pg_ident.conf and pg_hba.conf.
>
> Variables like current_user have a sort of nesting context
> functionality: calling a SECURITY DEFINER function "pushes" a new value
> onto current_user, then when the function returns the new value of
> current_user is "popped" and the previous value restored.
>

My proposal doesn't expecting with nesting, because there is only one scope
- schema / session - but I don't think so it is necessary

current_user is a function - it is based on parser magic in Postgres. The
origin from Oracle uses the feature of ADA language. When function has no
parameters then parenthesis are optional. So current_user, current_time are
functions current_user(), current_time().


> It might be nice to be able to generalize this.
>
> Questions that then arise:
>
>  - can one see up the stack?
>  - are there permissions issues with seeing up the stack?
>

these variables are pined to schema - so there is not any relation to
stack. It is like global variables.

Theoretically we can introduce "functional" variables, where the value is
based on immediate evaluation of expression. It can be very similar to
current current_user.

>
>
> I recently posted proposing a feature such that SECURITY DEFINER
> functions could observe the _caller_'s current_user.
>

your use case is good example - this proposed feature doesn't depend on
stack, depends on security context (security context stack) what is super
set of call stack

Regards

Pavel



> Nico
> --
>


Re: [HACKERS] [PATCH] Lockable views

2017-10-26 Thread Robert Haas
On Wed, Oct 11, 2017 at 11:36 AM, Yugo Nagata  wrote:
> In the attached patch, only automatically-updatable views that do not have
> INSTEAD OF rules or INSTEAD OF triggers are lockable. It is assumed that
> those views definition have only one base-relation. When an auto-updatable
> view is locked, its base relation is also locked. If the base relation is a
> view again, base relations are processed recursively. For locking a view,
> the view owner have to have he priviledge to lock the base relation.

Why is this the right behavior?

I would have expected LOCK TABLE v to lock the view and nothing else.

See 
http://postgr.es/m/AANLkTi=kupesjhrdevgfbt30au_iyro6zwk+fwwy_...@mail.gmail.com
for previous discussion of this topic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-10-26 Thread Robert Haas
On Wed, Oct 25, 2017 at 1:59 PM, Amit Kapila  wrote:
>> Another thing to consider is that, if we could replace satisfiesfunc,
>> it would probably break some existing code.  There are multiple places
>> in the code that compare snapshot->satisfies to
>> HeapTupleSatisfiesHistoricMVCC and HeapTupleSatisfiesMVCC.
>>
>> I think the storage API should just leave snapshots alone.  If a
>> storage engine wants to call HeapTupleSatisfiesVisibility() with that
>> snapshot, it can do so.  Otherwise it can switch on
>> snapshot->satisfies and handle each case however it likes.
>>
>
> How will it switch satisfies at runtime?  There are places where we
> might know which visibility function (*MVCC , *Dirty, etc) needs to be
> called, but I think there are other places (like heap_fetch) where it
> is not clear and we decide based on what is stored in
> snapshot->satisfies.

An alternative storage engine needs to provide its own implementation
of heap_fetch, and that replacement implementation can implement MVCC
and other snapshot behavior in any way it likes.

My point here is that I think it's better if the table access method
stuff doesn't end up modifying snapshots.  I think it's fine for a
table access method to get passed a standard snapshot.  Some code may
be needed to cater to the access method's specific needs, but that
code can live inside the table access method, without contaminating
the snapshot stuff.  We have to try to draw some boundary around table
access methods -- we don't want to end up teaching everything in the
system about them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-26 Thread Robert Haas
On Fri, Oct 27, 2017 at 3:17 AM, Amit Langote
 wrote:
>> I don't think we really want to get into theorem-proving here, because
>> it's slow.
>
> Just to be clear, I'm saying we could use theorem-proving (if at all) just
> for the default partition.

I don't really see why it should be needed there either.  We've got
all the bounds in order, so we should know where there are any gaps
that are covered by the default partition in the range we care about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [bug fix] postgres.exe crashes with access violation on Windows while starting up

2017-10-26 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,
> Takayuki
> The reason is for not outputing the crash dump is a) the crash occurred
> before installing the Windows exception handler
> (pgwin32_install_crashdump_handler() call) and b) the effect of the
> following call in postmaster is inherited in the child process.
> 
>   /* In case of general protection fault, don't show GUI popup
> box */
>   SetErrorMode(SEM_FAILCRITICALERRORS |
> SEM_NOGPFAULTERRORBOX);
> 
> But I'm not sure in what order we should do
> pgwin32_install_crashdump_handler(), startup_hacks() and steps therein,
> MemoryContextInit().  I think that's another patch.

Just installing the handler at the beginning of main() seems fine.  Patch 
attached.

Regards
Takayuki Tsunakawa
 


crash_dump_before_installing_handler.patch
Description: crash_dump_before_installing_handler.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bug fix] postgres.exe crashes with access violation on Windows while starting up

2017-10-26 Thread Tsunakawa, Takayuki
Hello,

We encountered a rare and hard-to-investigate problem on Windows, which one of 
our customers reported.  Please find the attached patch to fix that.  I'll add 
this to the next CF.


PROBLEM
==

PostgreSQL sometimes crashes with the following messages.  This is infrequent 
(but frequent for the customer); it occurred about 10 times in the past 5 
months.

LOG:  server process (PID 2712) was terminated by exception 0xC005
HINT:  See C include file "ntstatus.h" for a description of the hexadecimal 
value.
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
LOG:  all server processes terminated; reinitializing

"server process" shows that an client backend crashed.  The above messages 
indicate that the process was not running an SQL command.

PostgreSQL runs as a Windows service.

No crash dump was produced anywhere, despite the facts:
- /crashdumps folder exists and is writable by the PostgreSQL user 
account (which is the user postgres.exe runs as)
- The Windows registry configuration allows dumping the crash dump


CAUSE
==

We believe WSAStartup() in main.c failed.  The only conceivable error is:

WSAEPROCLIM
10067
Too many processes.
A Windows Sockets implementation may have a limit on the number of applications 
that can use it simultaneously. WSAStartup may fail with this error if the 
limit has been reached.

But I couldn't find what the limit is and whether we can tune it.  We couldn't 
reproduce the problem.

When I pretend that WSAStartup() failed while a client backend is starting up, 
I could see the same phenomenon as the customer.  This problem only occurs when 
PostgreSQL runs as a Windows service.

The bug is in write_eventlog().  It calls pgwin32_message_to_utf16() which in 
turn calls palloc(), which requires the memory management system to be set up 
(CurrentMemoryContext != NULL).


FIX
==

Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in 
write_console().


NOTE
==

The reason is for not outputing the crash dump is a) the crash occurred before 
installing the Windows exception handler (pgwin32_install_crashdump_handler() 
call) and b) the effect of the following call in postmaster is inherited in the 
child process.

/* In case of general protection fault, don't show GUI popup 
box */
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);

But I'm not sure in what order we should do 
pgwin32_install_crashdump_handler(), startup_hacks() and steps therein, 
MemoryContextInit().  I think that's another patch.

Regards
Takayuki Tsunakawa




write_eventlog_crash.patch
Description: write_eventlog_crash.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-26 Thread Amit Langote
On 2017/10/26 20:34, Robert Haas wrote:
> On Thu, Oct 26, 2017 at 1:17 PM, Amit Langote
>  wrote:
>> It can perhaps taught to not make that conclusion by taking into account
>> the default partition's partition constraint, which includes constraint
>> inherited from the parent, viz. 1 <= col1 < 50001.  To do that, it might
>> be possible to summon up predtest.c's powers to conclude from the default
>> partition's partition constraint that it cannot contain any keys < 1, but
>> then we'll have to frame up a clause expression describing the latter.
>> Generating such a clause expression can be a bit daunting for a
>> multi-column key.   So, I haven't yet tried really hard to implement this.
>>  Any thoughts on that?
> 
> I don't think we really want to get into theorem-proving here, because
> it's slow.

Just to be clear, I'm saying we could use theorem-proving (if at all) just
for the default partition.

> Whatever we're going to do we should be able to do without
> that - keeping it in the form of btree-strategy + value.  It doesn't
> seem that hard.  Suppose we're asked to select partitions from tprt
> subject to (<, 1).  Well, we determine that some of the tprt_1
> partitions may be relevant, so we tell tprt_1 to select partitions
> subject to (>=, 1, <, 1).  We know to do that because we know that
> 1 < 5 and we know to include >= 1 because we haven't got any
> lower bound currently at all.  What's the problem?

Hmm, that's interesting.  With the approach that the patch currently
takes, (>= 1) wouldn't be passed down when selecting the partitions of
tprt_1.  The source of values (+ btree strategy) to use to select
partitions is the same original set of clauses for all partitioned tables
in the tree, as the patch currently implements it.  Nothing would get
added to that set (>= 1, as in this example), nor subtracted (such as
clauses that are trivially true).

I will think about this approach in general and to solve this problem in
particular.

> In some sense it's tempting to say that this case just doesn't matter
> very much; after all, subpartitioning on the same column used to
> partition at the top level is arguably lame.  But if we can get it
> right in a relatively straightforward manner then let's do it.

Yeah, I tend to agree.

Thanks for the input.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] taking stdbool.h into use

2017-10-26 Thread Tom Lane
Alvaro Herrera  writes:
> Michael Paquier wrote:
>> It seems to me that this proves the point of the proposed patch. You
>> had better use a zero-equality comparison for such bitwise operation,
>> and so you ought to do that:
>> boolisprimary = (flags & INDEX_CREATE_IS_PRIMARY) != 0;

> Right, exactly.  But my point is that with the whole patch series
> applied I didn't get any warnings.

While warnings for this would be lovely, I don't see how we can expect to
get any.  This is perfectly correct C code no matter whether isprimary
is C99 bool or is typedef'd to char ... you just end up with different
values of isprimary, should the RHS produce something other than 1/0.
The compiler has no way to know that assigning, say, 4 in the char
variable case is not quite your intent.  Maybe you could hope for a
warning if the bit value were far enough left to actually not fit into
"char", but otherwise there's nothing wrong.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FieldSelect/FieldStore dependencies, take 2

2017-10-26 Thread Tom Lane
I had a nagging feeling that commit f3ea3e3e8 was not quite covering
all the bases with respect to what dependencies to record for
FieldSelect/FieldStore nodes: it looked at the result type, but what
about the input type?  Just now, while fooling around with domains
over composite, I stumbled across a case that shows what's missing:

regression=# create type complex as (r float8, i float8);
CREATE TYPE
regression=# create domain dcomplex as complex check((value).r > (value).i);
CREATE DOMAIN
regression=# select pg_get_constraintdef((select max(oid) from pg_constraint));
  pg_get_constraintdef   
-
 CHECK (((VALUE).r > (VALUE).i))
(1 row)

regression=# alter type complex drop attribute r;
ALTER TYPE
regression=# select pg_get_constraintdef((select max(oid) from pg_constraint));
 pg_get_constraintdef 
--
 CHECK (((VALUE)."pg.dropped.1" > (VALUE).i))
(1 row)

Nothing seems to crash at this point, probably because we insert
nulls into dropped columns, so the CHECK just sees a NULL value
for "r" whenever it runs.  But obviously, the next dump/reload
or pg_upgrade is not going to end well.

So what this shows is that we need a dependency on the particular
column named by the FieldSelect or FieldStore.  Under normal
circumstances, that obviates the need for a dependency on the
FieldSelect's result type, which would match the column type.
I think concretely what we need is the attached.

(BTW, the getBaseType() is only necessary in HEAD, since before
domains-over-composites the argument of a FieldSelect couldn't
be a domain type.)

regards, tom lane

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 3b214e5..033c435 100644
*** a/src/backend/catalog/dependency.c
--- b/src/backend/catalog/dependency.c
*** find_expr_references_walker(Node *node,
*** 1716,1725 
  	else if (IsA(node, FieldSelect))
  	{
  		FieldSelect *fselect = (FieldSelect *) node;
  
! 		/* result type might not appear anywhere else in expression */
! 		add_object_address(OCLASS_TYPE, fselect->resulttype, 0,
! 		   context->addrs);
  		/* the collation might not be referenced anywhere else, either */
  		if (OidIsValid(fselect->resultcollid) &&
  			fselect->resultcollid != DEFAULT_COLLATION_OID)
--- 1716,1739 
  	else if (IsA(node, FieldSelect))
  	{
  		FieldSelect *fselect = (FieldSelect *) node;
+ 		Oid			argtype = getBaseType(exprType((Node *) fselect->arg));
+ 		Oid			reltype = get_typ_typrelid(argtype);
  
! 		/*
! 		 * We need a dependency on the specific column named in FieldSelect,
! 		 * assuming we can identify the pg_class OID for it.  (Probably we
! 		 * always can at the moment, but in future it might be possible for
! 		 * argtype to be RECORDOID.)  If we can make a column dependency then
! 		 * we shouldn't need a dependency on the column's type; but if we
! 		 * can't, make a dependency on the type, as it might not appear
! 		 * anywhere else in the expression.
! 		 */
! 		if (OidIsValid(reltype))
! 			add_object_address(OCLASS_CLASS, reltype, fselect->fieldnum,
! 			   context->addrs);
! 		else
! 			add_object_address(OCLASS_TYPE, fselect->resulttype, 0,
! 			   context->addrs);
  		/* the collation might not be referenced anywhere else, either */
  		if (OidIsValid(fselect->resultcollid) &&
  			fselect->resultcollid != DEFAULT_COLLATION_OID)
*** find_expr_references_walker(Node *node,
*** 1729,1738 
  	else if (IsA(node, FieldStore))
  	{
  		FieldStore *fstore = (FieldStore *) node;
  
! 		/* result type might not appear anywhere else in expression */
! 		add_object_address(OCLASS_TYPE, fstore->resulttype, 0,
! 		   context->addrs);
  	}
  	else if (IsA(node, RelabelType))
  	{
--- 1743,1762 
  	else if (IsA(node, FieldStore))
  	{
  		FieldStore *fstore = (FieldStore *) node;
+ 		Oid			reltype = get_typ_typrelid(fstore->resulttype);
  
! 		/* similar considerations to FieldSelect, but multiple column(s) */
! 		if (OidIsValid(reltype))
! 		{
! 			ListCell   *l;
! 
! 			foreach(l, fstore->fieldnums)
! add_object_address(OCLASS_CLASS, reltype, lfirst_int(l),
!    context->addrs);
! 		}
! 		else
! 			add_object_address(OCLASS_TYPE, fstore->resulttype, 0,
! 			   context->addrs);
  	}
  	else if (IsA(node, RelabelType))
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-26 Thread Tsunakawa, Takayuki
From: Alvaro Herrera [mailto:alvhe...@alvh.no-ip.org]
> Tsunakawa, Takayuki wrote:
> 
> > (Although unrelated to this, I've also been wondering why PostgreSQL
> > flushes WAL to disk when writing a page in the shared buffer, because
> > PostgreSQL doesn't use WAL for undo.)
> 
> The reason is that if the system crashes after writing the data page to
> disk, but before writing the WAL, the data page would be inconsistent with
> data in pages that weren't flushed, since there is no WAL to update those
> other pages.  Also, if the system crashes after partially writing the page
> (say it writes the first 4kB) then the page is downright corrupted with
> no way to fix it.
> 
> So there has to be a barrier that ensures that the WAL is flushed up to
> the last position that modified a page (i.e. that page's LSN) before actually
> writing that page to disk.  And this is why we can't use mmap() for shared
> buffers -- there is no mechanism to force the WAL down if the operation
> system has the liberty to flush pages whenever it likes.

I see.  The latter is a torn page problem, which is solved by a full page image 
WAL record.  I understood that an example of the former problem is the 
inconsistency between a table page and an index page -- if an index page is 
flushed to disk without slushing the WAL and the corresponding table page, an 
index entry would point to a wroing table record after recovery.

Thanks, my long-standing question has beenn solved.


Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] ECPG: fails to recognize embedded parameters

2017-10-26 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Meskes
> Thanks for spotting and fixing. I just committed your patch to master and
> backported to 9.4, 9.5, 9.6 and 10. It doesn't apply cleanly to 9.3. But
> then it might not be important enough to investigate and backported to this
> old a version.

Thanks.  I'm OK with 9.3.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] taking stdbool.h into use

2017-10-26 Thread Michael Paquier
On Thu, Oct 26, 2017 at 3:48 PM, Alvaro Herrera  wrote:
> Right, exactly.  But my point is that with the whole patch series
> applied I didn't get any warnings.

Sorry, I misread your message. You use Linux I suppose, what's your compiler?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] taking stdbool.h into use

2017-10-26 Thread Alvaro Herrera
Michael Paquier wrote:
> On Thu, Oct 26, 2017 at 10:51 AM, Alvaro Herrera
>  wrote:
> > I gave this a quick run, to see if my compiler would complain for things
> > like this:
> >
> >boolisprimary = flags & INDEX_CREATE_IS_PRIMARY;
> >
> > (taken from the first patch at
> > https://postgr.es/m/20171023161503.ohkybquxrlech7d7@alvherre.pgsql )
> >
> > which is assigning a value other than 1/0 to a bool variable without an
> > explicit cast.  I thought it would provoke a warning, but it does not.
> > Is that expected?  Is my compiler too old/new?
> 
> It seems to me that this proves the point of the proposed patch. You
> had better use a zero-equality comparison for such bitwise operation,
> and so you ought to do that:
> boolisprimary = (flags & INDEX_CREATE_IS_PRIMARY) != 0;

Right, exactly.  But my point is that with the whole patch series
applied I didn't get any warnings.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-10-26 Thread Nico Williams
On Thu, Oct 26, 2017 at 09:21:24AM +0200, Pavel Stehule wrote:
> Comments, notes?

I like it.

I would further like to move all of postgresql.conf into the database,
as much as possible, as well as pg_ident.conf and pg_hba.conf.

Variables like current_user have a sort of nesting context
functionality: calling a SECURITY DEFINER function "pushes" a new value
onto current_user, then when the function returns the new value of
current_user is "popped" and the previous value restored.

It might be nice to be able to generalize this.

Questions that then arise:

 - can one see up the stack?
 - are there permissions issues with seeing up the stack?

I recently posted proposing a feature such that SECURITY DEFINER
functions could observe the _caller_'s current_user.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: custom autovacuum entries

2017-10-26 Thread Michael Paquier
On Thu, Oct 26, 2017 at 1:47 PM, Tom Lane  wrote:
> Jordan Deitch  writes:
>> I would like to remove records from a time series table older than a
>> certain age. My understanding of current standard practice is to configure
>> an external scheduler like cron to perform the deletion.
>> My proposal is to allow clients to register functions on the tail end of
>> the autovacuum sequence.
>
> There's already a mechanism for writing custom background workers.
> I doubt that adding random user-defined stuff to what autovacuum has
> to do is a better idea; that will mostly make autovac scheduling even
> harder than it is now.

I agree with Tom here, there is no point to have more facilities in
autovacuum so as it does extra work at its tail. There may be not even
any need to develop your own background worker. Have you looked at
pg_cron [1]?

[1]: https://github.com/citusdata/pg_cron
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] taking stdbool.h into use

2017-10-26 Thread Michael Paquier
On Thu, Oct 26, 2017 at 10:51 AM, Alvaro Herrera
 wrote:
> I gave this a quick run, to see if my compiler would complain for things
> like this:
>
>boolisprimary = flags & INDEX_CREATE_IS_PRIMARY;
>
> (taken from the first patch at
> https://postgr.es/m/20171023161503.ohkybquxrlech7d7@alvherre.pgsql )
>
> which is assigning a value other than 1/0 to a bool variable without an
> explicit cast.  I thought it would provoke a warning, but it does not.
> Is that expected?  Is my compiler too old/new?

It seems to me that this proves the point of the proposed patch. You
had better use a zero-equality comparison for such bitwise operation,
and so you ought to do that:
boolisprimary = (flags & INDEX_CREATE_IS_PRIMARY) != 0;
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: custom autovacuum entries

2017-10-26 Thread Tom Lane
Jordan Deitch  writes:
> I would like to remove records from a time series table older than a
> certain age. My understanding of current standard practice is to configure
> an external scheduler like cron to perform the deletion.
> My proposal is to allow clients to register functions on the tail end of
> the autovacuum sequence.

There's already a mechanism for writing custom background workers.
I doubt that adding random user-defined stuff to what autovacuum has
to do is a better idea; that will mostly make autovac scheduling even
harder than it is now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: custom autovacuum entries

2017-10-26 Thread Jordan Deitch
Hi hackers,

I would like to remove records from a time series table older than a
certain age. My understanding of current standard practice is to configure
an external scheduler like cron to perform the deletion.
My proposal is to allow clients to register functions on the tail end of
the autovacuum sequence.
Alternatively I would suggest some mechanism for scheduling within
PostgreSQL


Thanks,
Jordan Deitch


Re: [HACKERS] [GENERAL] Postgres 10 manual breaks links with anchors

2017-10-26 Thread Tom Lane
Peter Eisentraut  writes:
> On 10/16/17 03:19, Thomas Kellerer wrote:
>> I don't know if this is intentional, but the Postgres 10 manual started to 
>> use lowercase IDs as anchors in the manual.

> Here is a patch that can be applied to PG 10 to put the upper case
> anchors back.
> The question perhaps is whether we want to maintain this patch
> indefinitely, or whether a clean break is better.

In view of commit 1ff01b390, aren't we more or less locked into
lower-case anchors going forward?  I'm not sure I see the point
of changing v10 back to the old way if v11 will be incompatible
anyhow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread Tom Lane
David Rowley  writes:
> On 27 October 2017 at 01:44, Ashutosh Bapat
>  wrote:
>> I think Antonin has a point. The join processing may deem some base
>> relations dummy (See populate_joinrel_with_paths()). So an Append
>> relation which had multiple child alive at the end of
>> set_append_rel_size() might ultimately have only one child after
>> partition-wise join has worked on it.

TBH, that means partition-wise join planning is broken and needs to be
redesigned.  It's impossible that we're going to make sane planning
choices if the sizes of relations change after we've already done some
planning work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] taking stdbool.h into use

2017-10-26 Thread Alvaro Herrera
Peter Eisentraut wrote:
> Here is an updated patch set.  This is just a rebase of the previous
> set, no substantial changes.  Based on the discussion so far, I'm
> proposing that 0001 through 0007 could be ready to commit after review,
> whereas the remaining two need more work at some later time.

I gave this a quick run, to see if my compiler would complain for things
like this:

   boolisprimary = flags & INDEX_CREATE_IS_PRIMARY;

(taken from the first patch at
https://postgr.es/m/20171023161503.ohkybquxrlech7d7@alvherre.pgsql )

which is assigning a value other than 1/0 to a bool variable without an
explicit cast.  I thought it would provoke a warning, but it does not.
Is that expected?  Is my compiler too old/new?

config.log says
gcc version 6.3.0 20170516 (Debian 6.3.0-18) 

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Postgres 10 manual breaks links with anchors

2017-10-26 Thread Peter Eisentraut
On 10/16/17 03:19, Thomas Kellerer wrote:
> I don't know if this is intentional, but the Postgres 10 manual started to 
> use lowercase IDs as anchors in the manual.
> 
> So, if I have e.g.: the following URL open in my browser:
> 
>
> https://www.postgresql.org/docs/current/static/sql-createindex.html#sql-createindex-concurrently
> 
> I cannot simply switch to an older version by replacing "current" with e.g. 
> "9.5" because in the 9.5 manual the anchor was all uppercase, and the URL 
> would need to be: 
> 
>
> https://www.postgresql.org/docs/9.5/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
> 
> Is this intentional? 
> 
> This also makes "cleaning" up links in e.g. StackOverflow that point to 
> outdated versions of the manual a bit more cumbersome. 

Here is a patch that can be applied to PG 10 to put the upper case
anchors back.

The question perhaps is whether we want to maintain this patch
indefinitely, or whether a clean break is better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 9ae0f3c465228a28c2fe50eb2a038e87ccc15b2d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 26 Oct 2017 15:19:56 -0400
Subject: [PATCH] doc: Convert ids to upper case at build time

This makes the produced HTML anchors upper case, making it backward
compatible with the previous (9.6) build system.
---
 doc/src/sgml/stylesheet-html-common.xsl | 25 +
 1 file changed, 25 insertions(+)

diff --git a/doc/src/sgml/stylesheet-html-common.xsl 
b/doc/src/sgml/stylesheet-html-common.xsl
index 72fac1e806..9d0d10f776 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -263,4 +263,29 @@ set   toc,title
   
 
 
+
+
+
+
+  
+  
+
+  
+
+
+  
+
+
+  
+  
+id-
+
+  
+
+
+  
+
+  
+
+
 
-- 
2.14.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread David Rowley
On 27 October 2017 at 01:44, Ashutosh Bapat
 wrote:
> I think Antonin has a point. The join processing may deem some base
> relations dummy (See populate_joinrel_with_paths()). So an Append
> relation which had multiple child alive at the end of
> set_append_rel_size() might ultimately have only one child after
> partition-wise join has worked on it.

hmm, I see. partition-wise join is able to remove eliminate partitions
on the other side of the join that can't be matched to:

# set enable_partition_wise_join=on;
SET
# explain select count(*) from ab ab1 inner join ab ab2 on ab1.a=ab2.a
where ab1.a between 1 and 2 and ab2.a between 1 and 10001;
   QUERY PLAN

 Aggregate  (cost=0.00..0.01 rows=1 width=8)
   ->  Result  (cost=0.00..0.00 rows=0 width=0)
 One-Time Filter: false
(3 rows)

# \d+ ab
Table "public.ab"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
 a  | integer |   |  | | plain   |  |
 b  | integer |   |  | | plain   |  |
Partition key: RANGE (a)
Partitions: ab_p1 FOR VALUES FROM (1) TO (1),
ab_p2 FOR VALUES FROM (1) TO (2)



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-10-26 Thread Sokolov Yura

On 2017-09-27 14:46, Stas Kelvich wrote:
On 7 Sep 2017, at 18:58, Nikhil Sontakke  
wrote:


Hi,

FYI all, wanted to mention that I am working on an updated version of
the latest patch that I plan to submit to a later CF.



Cool!

So what kind of architecture do you have in mind? Same way as is it
was implemented before?
As far as I remember there were two main issues:

* Decodong of aborted prepared transaction.

If such transaction modified catalog then we can’t read reliable info
with our historic snapshot,
since clog already have aborted bit for our tx it will brake
visibility logic. There are some way to
deal with that — by doing catalog seq scan two times and counting
number of tuples (details
upthread) or by hijacking clog values in historic visibility function.
But ISTM it is better not solve this
issue at all =) In most cases intended usage of decoding of 2PC
transaction is to do some form
of distributed commit, so naturally decoding will happens only with
in-progress transactions and
we commit/abort will happen only after it is decoded, sent and
response is received. So we can
just have atomic flag that prevents commit/abort of tx currently being
decoded. And we can filter
interesting prepared transactions based on GID, to prevent holding
this lock for ordinary 2pc.

* Possible deadlocks that Andres was talking about.

I spend some time trying to find that, but didn’t find any. If locking
pg_class in prepared tx is the only
example then (imho) it is better to just forbid to prepare such
transactions. Otherwise if some realistic
examples that can block decoding are actually exist, then we probably
need to reconsider the way
tx being decoded. Anyway this part probably need Andres blessing.


Just rebased patch logical_twophase_v6 to master.

Fixed small issues:
- XactLogAbortRecord wrote DBINFO twice, but it was decoded in
  ParseAbortRecord only once. Second DBINFO were parsed as ORIGIN.
  Fixed by removing second write of DBINFO.
- SnapBuildPrepareTxnFinish tried to remove xid from `running` instead
  of `committed`. And it removed only xid, without subxids.
- test_decoding skipped returning "COMMIT PREPARED" and "ABORT 
PREPARED",


Big issue were with decoding ddl-including two-phase transactions:
- prepared.out were misleading. We could not reproduce decoding body of
  "test_prepared#3" with logical_twophase_v6.diff. It was skipped if
  `pg_logical_slot_get_changes` were called without
  `twophase-decode-with-catalog-changes` set, and only "COMMIT PREPARED
  test_prepared#3" were decoded.
The reason is "PREPARE TRANSACTION" is passed to `pg_filter_prepare`
twice:
- first on "PREPARE" itself,
- second - on "COMMIT PREPARED".
In v6, `pg_filter_prepare` without `with-catalog-changes` first time
answered "true" (ie it should not be decoded), and second time (when
transaction became committed) it answered "false" (ie it should be
decoded). But second time in DecodePrepare 
`ctx->snapshot_builder->start_decoding_at`

is already in a future compared to `buf->origptr` (because it is
on "COMMIT PREPARED" lsn). Therefore DecodePrepare just called
ReorderBufferForget.
If `pg_filter_prepare` is called with `with-catalog-changes`, then
it returns "false" both times, thus DeocdePrepare decodes transaction
in first time, and calls `ReorderBufferForget` in second time.

I didn't found a way to fix it gracefully. I just change 
`pg_filter_prepare`
to return same answer both times: "false" if called 
`with-catalog-changes`

(ie need to call DecodePrepare), and "true" otherwise. With this
change, catalog changing two-phase transaction is decoded as simple
one-phase transaction, if `pg_logical_slot_get_changes` is called
without `with-catalog-changes`.

--
With regards,
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

logical_twophase_v7.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-10-26 Thread Michael Paquier
On Thu, Oct 26, 2017 at 3:05 AM, Kyotaro HORIGUCHI
 wrote:
> The largest obstacle to do that is that walreceiver is not
> utterly concerned to record internals. In other words, it doesn't
> know what a record is. Teaching that introduces much complexity
> and the complexity slows down the walreceiver.
>
> Addition to that, this "problem" occurs also on replication
> without a slot. The latest patch also help the case.

That's why replication slots have been introduced to begin with. The
WAL receiver gives no guarantee that a segment will be retained or not
based on the beginning of a record. That's sad that the WAL receiver
does not track properly restart LSN and instead just uses the flush
LSN. I am beginning to think that a new message type used to report
the restart LSN when a replication slot is in use would just be a
better and more stable solution. I haven't looked at the
implementation details yet though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline ID in backup_label file

2017-10-26 Thread Michael Paquier
On Thu, Oct 26, 2017 at 5:50 AM, David Steele  wrote:
> On 10/25/17 10:10 PM, Craig Ringer wrote:
>> On 26 October 2017 at 02:50, Michael Paquier  
>> wrote:
>>>
>>> I would find interesting to add at the bottom of the backup_label file
>>> a new field called "START TIMELINE: %d" to put this information in a
>>> more understandable shape. Any opinions?
>>
>> Strong "yes" from me.
>
> +1

Thanks for the feedback. Attached is a patch to achieve so, I have
added as well a STOP TIMELINE field in the backup history file. Note
that START TIMELINE gets automatically into the backup history file.
Added a CF entry as well.
-- 
Michael


backup_label_tli.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-10-26 Thread Emre Hasegeli
> The patch introduces way to configure FTS based on CASE/WHEN/THEN/ELSE
> construction.

Interesting feature.  I needed this flexibility before when I was
implementing text search for a Turkish private listing application.
Aleksandr and Arthur were kind enough to discuss it with me off-list
today.

> 1) Multilingual search. Can be used for FTS on a set of documents in
> different languages (example for German and English languages).
>
> ALTER TEXT SEARCH CONFIGURATION multi
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part WITH CASE
> WHEN english_hunspell AND german_hunspell THEN
>   english_hunspell UNION german_hunspell
> WHEN english_hunspell THEN english_hunspell
> WHEN german_hunspell THEN german_hunspell
> ELSE german_stem UNION english_stem
>   END;

I understand the need to support branching, but this syntax is overly
complicated.  I don't think there is any need to support different set
of dictionaries as condition and action.  Something like this might
work better:

ALTER TEXT SEARCH CONFIGURATION multi
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
  word, hword, hword_part WITH
CASE english_hunspell UNION german_hunspell
WHEN MATCH THEN KEEP
ELSE german_stem UNION english_stem
END;

To put it formally:

ALTER TEXT SEARCH CONFIGURATION name
ADD MAPPING FOR token_type [, ... ] WITH config

where config is one of:

dictionary_name
config { UNION | INTERSECT | EXCEPT } config
CASE config WHEN [ NO ] MATCH THEN [ KEEP ELSE ] config END

> 2) Combination of exact search with morphological one. This patch not
> fully solve the problem but it is a step toward solution. Currently, we
> should split exact and morphological search in query manually and use
> separate index for each part. With new way to configure FTS we can use
> following configuration:
>
> ALTER TEXT SEARCH CONFIGURATION exact_and_morph
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>   word, hword, hword_part WITH CASE
> WHEN english_hunspell THEN english_hunspell UNION simple
> ELSE english_stem UNION simple
>   END

This could be:

CASE english_hunspell
THEN KEEP
ELSE english_stem
END
UNION
simple

> 3) Using different dictionaries for recognizing and output generation.
> As I mentioned before, in new syntax condition and command are separate
> and we can use it for some more complex text processing. Here an
> example for processing only nouns:
>
> ALTER TEXT SEARCH CONFIGURATION nouns_only
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part WITH CASE
>   WHEN english_noun THEN english_hunspell
> END

This would also still work with the simpler syntax because
"english_noun", still being a dictionary, would pass the tokens to the
next one.

> 4) Special stopword processing allows us to discard stopwords even if
> the main dictionary doesn't support such feature (in example pl_ispell
> dictionary keeps stopwords in text):
>
> ALTER TEXT SEARCH CONFIGURATION pl_without_stops
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
> word, hword, hword_part WITH CASE
> WHEN simple_pl IS NOT STOPWORD THEN pl_ispell
>   END

Instead of supporting old way of putting stopwords on dictionaries, we
can make them dictionaries on their own.  This would then become
something like:

CASE polish_stopword
WHEN NO MATCH THEN polish_isspell
END


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Process variadic arguments consistently in json functions

2017-10-26 Thread Michael Paquier
On Thu, Oct 26, 2017 at 5:18 AM, Andrew Dunstan  wrote:
> Argh! I see that in your v6 patch and I thought I'd caught all of it but
> apparently not for master and REL_10. I wonder how that happened?

I am fine to take the blame. Likely an M-c pushed in emacs..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread Tom Lane
David Rowley  writes:
> It seems like a good idea for the planner not to generate Append or
> MergeAppend paths when the node contains just a single subpath.
> ...
> Perhaps there is also a "Method 3" which I've not thought about which
> would be much cleaner.

Have you considered turning AppendPath with a single child into more
of a special case?  I think this is morally equivalent to your ProxyPath
proposal, but it would take less new boilerplate code to support.
If you taught create_append_path to inherit the child's pathkeys
when there's only one child, and taught createplan.c to skip making the
useless Append node, I think you might be nearly done.

This might seem a bit ugly, but considering that zero-child AppendPaths
are already a special case (and a much bigger one), I don't have much
of a problem with decreeing that one-child is also a special case.

As an example of why this might be better than a separate ProxyPath
node type, consider this call in add_paths_to_append_rel:

/*
 * If we found unparameterized paths for all children, build an unordered,
 * unparameterized Append path for the rel.  (Note: this is correct even
 * if we have zero or one live subpath due to constraint exclusion.)
 */
if (subpaths_valid)
add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
  partitioned_rels));

That comment could stand a bit of adjustment with this approach, but
the code itself requires no extra work.

Now, you would have to do something about this in create_append_plan:

/*
 * XXX ideally, if there's just one child, we'd not bother to generate an
 * Append node but just return the single child.  At the moment this does
 * not work because the varno of the child scan plan won't match the
 * parent-rel Vars it'll be asked to emit.
 */

but that's going to come out in the wash someplace, no matter what you do.
Leaving it for the last minute is likely to reduce the number of places
you have to teach about this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-26 Thread Robert Haas
On Thu, Oct 26, 2017 at 4:29 PM, Ants Aasma  wrote:
> If the UI
> were something like "WAITVISIBILITY token", then we can later change
> the token to something other than LSN.

That assumes, probably optimistically, that nobody will develop a
dependency on it being, precisely, an LSN.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-10-26 Thread Robert Haas
On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada  wrote:
> Since the previous patch conflicts with current HEAD, I attached the
> updated patch for next CF.

I think we should back up here and ask ourselves a couple of questions:

1. What are we trying to accomplish here?

2. Is this the best way to accomplish it?

To the first question, the problem as I understand it as follows:
Heavyweight locks don't conflict between members of a parallel group.
However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
don't arise, because parallel operations are strictly read-only
(except for inserts by the leader into a just-created table, when only
one member of the group can be taking the lock anyway).  However, once
we allow writes, they become possible, so some solution is needed.

To the second question, there are a couple of ways we could fix this.
First, we could continue to allow these locks to be taken in the
heavyweight lock manager, but make them conflict even between members
of the same lock group.  This is, however, complicated.  A significant
problem (or so I think) is that the deadlock detector logic, which is
already quite hard to test, will become even more complicated, since
wait edges between members of a lock group need to exist at some times
and not other times.  Moreover, to the best of my knowledge, the
increased complexity would have no benefit, because it doesn't look to
me like we ever take any other heavyweight lock while holding one of
these four kinds of locks.  Therefore, no deadlock can occur: if we're
waiting for one of these locks, the process that holds it is not
waiting for any other heavyweight lock.  This gives rise to a second
idea: move these locks out of the heavyweight lock manager and handle
them with separate code that does not have deadlock detection and
doesn't need as many lock modes.  I think that idea is basically
sound, although it's possibly not the only sound idea.

However, that makes me wonder whether we shouldn't be a bit more
aggressive with this patch: why JUST relation extension locks?  Why
not all four types of locks listed above?  Actually, tuple locks are a
bit sticky, because they have four lock modes.  The other three kinds
are very similar -- all you can do is "take it" (implicitly, in
exclusive mode), "try to take it" (again, implicitly, in exclusive
mode), or "wait for it to be released" (i.e. share lock and then
release).  Another idea is to try to handle those three types and
leave the tuple locking problem for another day.

I suggest that a good thing to do more or less immediately, regardless
of when this patch ends up being ready, would be to insert an
insertion that LockAcquire() is never called while holding a lock of
one of these types.  If that assertion ever fails, then the whole
theory that these lock types don't need deadlock detection is wrong,
and we'd like to find out about that sooner or later.

On the details of the patch, it appears that RelExtLockAcquire()
executes the wait-for-lock code with the partition lock held, and then
continues to hold the partition lock for the entire time that the
relation extension lock is held.  That not only makes all code that
runs while holding the lock non-interruptible but makes a lot of the
rest of this code pointless.  How is any of this atomics code going to
be reached by more than one process at the same time if the entire
bucket is exclusive-locked?  I would guess that the concurrency is not
very good here for the same reason.  Of course, just releasing the
bucket lock wouldn't be right either, because then ext_lock might go
away while we've got a pointer to it, which wouldn't be good.  I think
you could make this work if each lock had both a locker count and a
pin count, and the object can only be removed when the pin_count is 0.
So the lock algorithm would look like this:

- Acquire the partition LWLock.
- Find the item of interest, creating it if necessary.  If out of
memory for more elements, sweep through the table and reclaim
0-pin-count entries, then retry.
- Increment the pin count.
- Attempt to acquire the lock atomically; if we succeed, release the
partition lock and return.
- If this was a conditional-acquire, then decrement the pin count,
release the partition lock and return.
- Release the partition lock.
- Sleep on the condition variable until we manage to atomically
acquire the lock.

The unlock algorithm would just decrement the pin count and, if the
resulting value is non-zero, broadcast on the condition variable.

Although I think this will work, I'm not sure this is actually a great
algorithm.  Every lock acquisition has to take and release the
partition lock, use at least two more atomic ops (to take the pin and
the lock), and search a hash table.  I don't think that's going to be
staggeringly fast.  Maybe it's OK.  It's not that much worse, possibly
not any worse, than 

Re: [HACKERS] How to determine that a TransactionId is really aborted?

2017-10-26 Thread Alvaro Herrera
Eric Ridge wrote:

> > Again, you'll probably need to put this low level requirement into
> > context if you want sound advice from this list.
> 
> I'm just thinking out lout here, but the context is likely something
> along the lines of externally storing all transaction ids, and
> periodically asking Postgres if they're
> known-to-be-aborted-by-all-transactions -- one at a time.

I think if you just check the global xmin, then you're going to maintain
a very long list of transactions "potentially running" whenever there
are long-lived transactions (pg_dump, for example).  You could try to
add a TransactionIdIsInProgress() somewhere and discard the xact
downright (by DidCommit and DidAbort) if it returns false.  A single
long-running transaction can keep the global xmin down by hundreds of
millions of Xids.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-26 Thread Alvaro Herrera
Tsunakawa, Takayuki wrote:

> (Although unrelated to this, I've also been wondering why PostgreSQL
> flushes WAL to disk when writing a page in the shared buffer, because
> PostgreSQL doesn't use WAL for undo.)

The reason is that if the system crashes after writing the data page to
disk, but before writing the WAL, the data page would be inconsistent
with data in pages that weren't flushed, since there is no WAL to update
those other pages.  Also, if the system crashes after partially writing
the page (say it writes the first 4kB) then the page is downright
corrupted with no way to fix it.

So there has to be a barrier that ensures that the WAL is flushed up to
the last position that modified a page (i.e. that page's LSN) before
actually writing that page to disk.  And this is why we can't use mmap()
for shared buffers -- there is no mechanism to force the WAL down if the
operation system has the liberty to flush pages whenever it likes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CurTransactionContext freed before transaction COMMIT ???

2017-10-26 Thread Amit Kapila
On Wed, Oct 25, 2017 at 8:04 PM, Gaddam Sai Ram
 wrote:
> Thanks for the response,
>
> Can you check if CurTransactionContext is valid at that point?
>
>
> Any Postgres function to check if CurTransactionContext is valid or not?
>
> To see, if this problem is related to CurTransactionContext, can you try to
> populate the list in TopMemoryContext and see if that works.
>
>
> Did you mean TopTransactionContext?
>

No, I mean what I have written.  I suspect in your case
TopTransactionContext will be same as CurTransactionContext because
you don't have any subtransaction.

> As of now, we don't free our dlist. We solely depend on Postgres to free our
> dlist while it frees the TopTransactionContext. But if we do allocate in
> TopMemoryContext, we need to take care of freeing our allocations.
>

Can't we do it temporarily to test?  I am not suggesting to make this
a permanent change rather a way to see the reason of the problem.

> And one more issue is, we found this bug once in all the testing we did. So
> trying to replay this bug seems very difficult.
>

Oh, then it is tricky.  I think there is a good chance that this is
some of your application issues where you probably haven't used memory
context as required, so probably you need to figure out a way to
reproduce this issue, otherwise, it might be difficult to track down
the actual cause.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-26 Thread Ants Aasma
On Mon, Oct 23, 2017 at 12:29 PM, Ivan Kartyshov
 wrote:
> Ants Aasma писал 2017-09-26 13:00:
>>
>> Exposing this interface as WAITLSN will encode that visibility order
>> matches LSN order. This removes any chance of fixing for example
>> visibility order of async/vs/sync transactions. Just renaming it so
>> the token is an opaque commit visibility token that just happens to be
>> a LSN would still allow for progress in transaction management. For
>> example, making PostgreSQL distributed will likely want timestamp
>> and/or vector clock based visibility rules.
>
>
> I'm sorry I did not understand exactly what you meant.
> Please explain this in more detail.

Currently transactions on the master become visible when xid is
removed from proc array. This depends on the order of acquiring
ProcArrayLock, which can happen in a different order from inserting
the commit record to wal. Whereas on the standby the transactions will
become visible in the same order that commit records appear in wal.
The difference can be quite large when transactions are using
different values for synchronous_commit. Fixing this is not easy, but
we may want to do it someday. IIRC CSN threads contained more
discussion on this topic. If we do fix it, it seems likely that what
we need to wait on is not LSN, but some other kind of value. If the UI
were something like "WAITVISIBILITY token", then we can later change
the token to something other than LSN.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-26 Thread Masahiko Sawada
On Thu, Oct 26, 2017 at 2:36 PM, Ashutosh Bapat
 wrote:
> On Wed, Oct 25, 2017 at 3:15 AM, Masahiko Sawada  
> wrote:
>>
>> Foreign Transaction Resolver
>> ==
>> I introduced a new background worker called "foreign transaction
>> resolver" which is responsible for resolving the transaction prepared
>> on foreign servers. The foreign transaction resolver process is
>> launched by backend processes when commit/rollback transaction. And it
>> periodically resolves the queued transactions on a database as long as
>> the queue is not empty. If the queue has been empty for the certain
>> time specified by foreign_transaction_resolver_time GUC parameter, it
>> exits. It means that the backend doesn't launch a new resolver process
>> if the resolver process is already working. In this case, the backend
>> process just adds the entry to the queue on shared memory and wake it
>> up. The maximum number of resolver process we can launch is controlled
>> by max_foreign_transaction_resolvers. So we recommends to set larger
>> max_foreign_transaction_resolvers value than the number of databases.
>> The resolver process also tries to resolve dangling transaction as
>> well in a cycle.
>>
>> Processing Sequence
>> =
>> I've changed the processing sequence of resolving foreign transaction
>> so that the second phase of two-phase commit protocol (COMMIT/ROLLBACK
>> prepared) is executed by a resolver process, not by backend process.
>> The basic processing sequence is following;
>>
>> * Backend process
>> 1. In pre-commit phase, the backend process saves fdwxact entries, and
>> then prepares transaction on all foreign servers that can execute
>> two-phase commit protocol.
>> 2. Local commit.
>> 3. Enqueue itself to the shmem queue and change its status to WAITING
>> 4. launch or wakeup a resolver process and wait
>>
>> * Resolver process
>> 1. Dequeue the waiting process from shmem qeue
>> 2. Collect the fdwxact entries that are associated with the waiting 
>> process.
>> 3. Resolve foreign transactoins
>> 4. Release the waiting process
>
> Why do we want the the backend to linger behind, once it has added its
> foreign transaction entries in the shared memory and informed resolver
> about it? The foreign connections may take their own time and even
> after that there is no guarantee that the foreign transactions will be
> resolved in case the foreign server is not available. So, why to make
> the backend wait?

Because I don't want to break the current user semantics. that is,
currently it's guaranteed that the subsequent reads can see the
committed result of previous writes even if the previous transactions
were distributed transactions. And it's ensured by writer side. If we
can make the reader side ensure it, the backend process don't need to
wait for the resolver process.

The waiting backend process are released by resolver process after the
resolver process tried to resolve foreign transactions. Even if
resolver process failed to either connect to foreign server or to
resolve foreign transaction the backend process will be released and
the foreign transactions are leaved as dangling transaction in that
case, which are processed later. Also if resolver process takes a long
time to resolve foreign transactions for whatever reason the user can
cancel it by Ctl-c anytime.

>>
>> 5. Wake up and restart
>>
>> This is still under the design phase and I'm sure that there is room
>> for improvement and consider more sensitive behaviour but I'd like to
>> share the current status of the patch. The patch includes regression
>> tests but not includes fully documentation.
>
> Any background worker, backend should be child of the postmaster, so
> we should not let a backend start a resolver process. It should be the
> job of the postmaster.
>

Of course I won't. I used the term of "the backend process launches
the resolver process" for explaining easier. Sorry for confusing you.
The backend process calls RegisterDynamicBackgroundWorker() function
to launch a resolver process, so they are launched by postmaster.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Adding table_constraint description in ALTER TABLE synopsis

2017-10-26 Thread Lætitia Avrot
Hi,

In documentation, I've found that table_constraint is used in the ALTER
TABLE synopsis but that definition of table_constraint is missing, so I
submitted bug #14873.

I found the table_constraint definition in the CREATE TABLE synopsis and I
just copied/pasted it on the ALTER TABLE synopsis.

The patch should apply to MASTER.I build and tested it successfully on my
computer.

There shouldn't be any platform-specific content.

You will find enclosed my patch. I tried my best to follow instructions on
how to submit a patch.

Regards,

Lætitia
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 234ccb7..41acda0 100644
*** a/doc/src/sgml/ref/alter_table.sgml
--- b/doc/src/sgml/ref/alter_table.sgml
***
*** 85,90  ALTER TABLE [ IF EXISTS ] name
--- 85,101 
  OWNER TO { new_owner | CURRENT_USER | SESSION_USER }
  REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING }
  
+ and table_constraint is:
+ 
+ [ CONSTRAINT constraint_name ]
+ { CHECK ( expression ) [ NO INHERIT ] |
+   UNIQUE ( column_name [, ... ] ) index_parameters |
+   PRIMARY KEY ( column_name [, ... ] ) index_parameters |
+   EXCLUDE [ USING index_method ] ( exclude_element WITH operator [, ... ] ) index_parameters [ WHERE ( predicate ) ] |
+   FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ]
+ [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] }
+ [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+ 
  and table_constraint_using_index is:
  
  [ CONSTRAINT constraint_name ]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - option to build using ppoll() for larger connection counts

2017-10-26 Thread Rady, Doug
Without this patch, one is limited to '(FD_SETSIZE - 10)’ number of connections.
Example of something that fails without this patch but works with the patch:

Without the patch:

$ pgbench -j 3000 -c 1500
invalid number of clients: "1500"


With the patch:

$ pgbench -j 3000 -c 1500
starting vacuum...end.
transaction type: 
scaling factor: 2000
query mode: simple
number of clients: 1500
number of threads: 1500
number of transactions per client: 10
number of transactions actually processed: 15000/15000
latency average = 631.730 ms
tps = 2374.430587 (including connections establishing)
tps = 4206.524986 (excluding connections establishing)


--
doug
 

On 10/26/17, 04:46, "Robert Haas"  wrote:

On Mon, Sep 25, 2017 at 8:01 PM, Rady, Doug  wrote:
> This patch enables building pgbench to use ppoll() instead of select()
>
> to allow for more than (FD_SETSIZE - 10) connections.  As implemented,
>
> when using ppoll(), the only connection limitation is system resources.

So what's an example of something that fails without this patch but
works with the patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline ID in backup_label file

2017-10-26 Thread David Steele
On 10/25/17 10:10 PM, Craig Ringer wrote:
> On 26 October 2017 at 02:50, Michael Paquier  
> wrote:
>>
>> I would find interesting to add at the bottom of the backup_label file
>> a new field called "START TIMELINE: %d" to put this information in a
>> more understandable shape. Any opinions?
> 
> Strong "yes" from me.

+1

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread Ashutosh Bapat
On Thu, Oct 26, 2017 at 5:07 PM, David Rowley
 wrote:
> On 26 October 2017 at 23:42, Antonin Houska  wrote:
>> David Rowley  wrote:
>>
>>> Method 1:
>>>
>>> In set_append_rel_size() detect when just a single subpath would be
>>> added to the Append path.
>>
>> I spent some time reviewing the partition-wise join patch during the last CF
>> and have an impression that set_append_rel_size() is called too early to find
>> out how many child paths will eventually exist if Append represents a join of
>> a partitioned table. I think the partition matching takes place at later 
>> stage
>> and that determines the actual number of partitions, and therefore the number
>> of Append subpaths.
>
> I understand that we must wait until set_append_rel_size() is being
> called on the RELOPT_BASEREL since partitions may themselves be
> partitioned tables and we'll only know what's left after we've
> recursively checked all partitions of the baserel.
>
> I've not studied the partition-wise join code yet, but if we've
> eliminated all but one partition in set_append_rel_size() then I
> imagine we'd just need to ensure partition-wise join is not attempted
> since we'd be joining directly to the only partition anyway.
>

I think Antonin has a point. The join processing may deem some base
relations dummy (See populate_joinrel_with_paths()). So an Append
relation which had multiple child alive at the end of
set_append_rel_size() might ultimately have only one child after
partition-wise join has worked on it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-26 Thread Ashutosh Bapat
On Wed, Oct 25, 2017 at 3:15 AM, Masahiko Sawada  wrote:
>
> Foreign Transaction Resolver
> ==
> I introduced a new background worker called "foreign transaction
> resolver" which is responsible for resolving the transaction prepared
> on foreign servers. The foreign transaction resolver process is
> launched by backend processes when commit/rollback transaction. And it
> periodically resolves the queued transactions on a database as long as
> the queue is not empty. If the queue has been empty for the certain
> time specified by foreign_transaction_resolver_time GUC parameter, it
> exits. It means that the backend doesn't launch a new resolver process
> if the resolver process is already working. In this case, the backend
> process just adds the entry to the queue on shared memory and wake it
> up. The maximum number of resolver process we can launch is controlled
> by max_foreign_transaction_resolvers. So we recommends to set larger
> max_foreign_transaction_resolvers value than the number of databases.
> The resolver process also tries to resolve dangling transaction as
> well in a cycle.
>
> Processing Sequence
> =
> I've changed the processing sequence of resolving foreign transaction
> so that the second phase of two-phase commit protocol (COMMIT/ROLLBACK
> prepared) is executed by a resolver process, not by backend process.
> The basic processing sequence is following;
>
> * Backend process
> 1. In pre-commit phase, the backend process saves fdwxact entries, and
> then prepares transaction on all foreign servers that can execute
> two-phase commit protocol.
> 2. Local commit.
> 3. Enqueue itself to the shmem queue and change its status to WAITING
> 4. launch or wakeup a resolver process and wait
>
> * Resolver process
> 1. Dequeue the waiting process from shmem qeue
> 2. Collect the fdwxact entries that are associated with the waiting 
> process.
> 3. Resolve foreign transactoins
> 4. Release the waiting process

Why do we want the the backend to linger behind, once it has added its
foreign transaction entries in the shared memory and informed resolver
about it? The foreign connections may take their own time and even
after that there is no guarantee that the foreign transactions will be
resolved in case the foreign server is not available. So, why to make
the backend wait?

>
> 5. Wake up and restart
>
> This is still under the design phase and I'm sure that there is room
> for improvement and consider more sensitive behaviour but I'd like to
> share the current status of the patch. The patch includes regression
> tests but not includes fully documentation.

Any background worker, backend should be child of the postmaster, so
we should not let a backend start a resolver process. It should be the
job of the postmaster.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Process variadic arguments consistently in json functions

2017-10-26 Thread Andrew Dunstan


On 10/26/2017 12:12 AM, Michael Paquier wrote:
> On Wed, Oct 25, 2017 at 5:24 AM, Andrew Dunstan  wrote:
>> Process variadic arguments consistently in json functions
>>
>> json_build_object and json_build_array and the jsonb equivalents did not
>> correctly process explicit VARIADIC arguments. They are modified to use
>> the new extract_variadic_args() utility function which abstracts away
>> the details of the call method.
>>
>> Michael Paquier, reviewed by Tom Lane and Dmitry Dolgov.
>>
>> Backpatch to 9.5 for the jsonb fixes and 9.4 for the json fixes, as
>> that's where they originated.
> - * Copyright (c) 2014-2017, PostgreSQL Global Development Group
> + * COPYRIGHT (c) 2014-2017, PostgreSQL Global Development Group
> Andrew, I have just noticed that this noise diff has crept in. You may
> want to fix that.


Argh! I see that in your v6 patch and I thought I'd caught all of it but
apparently not for master and REL_10. I wonder how that happened?

Will fix.

cheers

andrew




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pgbench - option to build using ppoll() for larger connection counts

2017-10-26 Thread Robert Haas
On Mon, Sep 25, 2017 at 8:01 PM, Rady, Doug  wrote:
> This patch enables building pgbench to use ppoll() instead of select()
>
> to allow for more than (FD_SETSIZE - 10) connections.  As implemented,
>
> when using ppoll(), the only connection limitation is system resources.

So what's an example of something that fails without this patch but
works with the patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transform for pl/perl

2017-10-26 Thread Robert Haas
On Tue, Oct 24, 2017 at 1:01 PM, anthony  wrote:
> Hello.
> Please, check out jsonb transform
> (https://www.postgresql.org/docs/9.5/static/sql-createtransform.html)
> for pl/perl language I've implemented.

Neat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-26 Thread Robert Haas
On Tue, Sep 26, 2017 at 12:00 PM, Ants Aasma  wrote:
> Exposing this interface as WAITLSN will encode that visibility order
> matches LSN order.

That would be a bad thing to encode because it doesn't.

Well... actually on the standby it does, and that's the only thing
that matters in this case I guess.  But I agree with you that's it's
not a wonderful thing to bake into the UI, because we might want to
change it some day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread David Rowley
On 26 October 2017 at 23:42, Antonin Houska  wrote:
> David Rowley  wrote:
>
>> Method 1:
>>
>> In set_append_rel_size() detect when just a single subpath would be
>> added to the Append path.
>
> I spent some time reviewing the partition-wise join patch during the last CF
> and have an impression that set_append_rel_size() is called too early to find
> out how many child paths will eventually exist if Append represents a join of
> a partitioned table. I think the partition matching takes place at later stage
> and that determines the actual number of partitions, and therefore the number
> of Append subpaths.

I understand that we must wait until set_append_rel_size() is being
called on the RELOPT_BASEREL since partitions may themselves be
partitioned tables and we'll only know what's left after we've
recursively checked all partitions of the baserel.

I've not studied the partition-wise join code yet, but if we've
eliminated all but one partition in set_append_rel_size() then I
imagine we'd just need to ensure partition-wise join is not attempted
since we'd be joining directly to the only partition anyway.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] legitimacy of using PG_TRY , PG_CATCH , PG_END_TRY in C function

2017-10-26 Thread Robert Haas
On Mon, Oct 23, 2017 at 10:27 AM, Andres Freund  wrote:
> It'd probably be a good idea to expand on this in the sgml docs. This
> has confused quite anumber of people...

That's a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-26 Thread Robert Haas
On Thu, Oct 26, 2017 at 1:17 PM, Amit Langote
 wrote:
> It can perhaps taught to not make that conclusion by taking into account
> the default partition's partition constraint, which includes constraint
> inherited from the parent, viz. 1 <= col1 < 50001.  To do that, it might
> be possible to summon up predtest.c's powers to conclude from the default
> partition's partition constraint that it cannot contain any keys < 1, but
> then we'll have to frame up a clause expression describing the latter.
> Generating such a clause expression can be a bit daunting for a
> multi-column key.   So, I haven't yet tried really hard to implement this.
>  Any thoughts on that?

I don't think we really want to get into theorem-proving here, because
it's slow.  Whatever we're going to do we should be able to do without
that - keeping it in the form of btree-strategy + value.  It doesn't
seem that hard.  Suppose we're asked to select partitions from tprt
subject to (<, 1).  Well, we determine that some of the tprt_1
partitions may be relevant, so we tell tprt_1 to select partitions
subject to (>=, 1, <, 1).  We know to do that because we know that
1 < 5 and we know to include >= 1 because we haven't got any
lower bound currently at all.  What's the problem?

In some sense it's tempting to say that this case just doesn't matter
very much; after all, subpartitioning on the same column used to
partition at the top level is arguably lame.  But if we can get it
right in a relatively straightforward manner then let's do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-10-26 Thread Rushabh Lathia
While re-basing the parallel-B-tree-index-build patch on top v22 patch
sets, found cosmetic review:

1) BufFileSetEstimate is removed but it's still into buffile.h

+extern size_t BufFileSetEstimate(int stripes);


2) BufFileSetCreate is renamed with BufFileSetInit, but used at below
place in comments:

* Attach to a set of named BufFiles that was created with BufFileSetCreate.

Thanks,

On Wed, Oct 25, 2017 at 11:33 AM, Thomas Munro <
thomas.mu...@enterprisedb.com> wrote:

> On Tue, Oct 24, 2017 at 10:10 PM, Thomas Munro
>  wrote:
> > Here is an updated patch set that does that ^.
>
> It's a bit hard to understand what's going on with the v21 patch set I
> posted yesterday because EXPLAIN ANALYZE doesn't tell you anything
> interesting.  Also, if you apply the multiplex_gather patch[1] I
> posted recently and set multiplex_gather to off then it doesn't tell
> you anything at all, because the leader has no hash table (I suppose
> that could happen with unpatched master given sufficiently bad
> timing).  Here's a new version with an extra patch that adds some
> basic information about load balancing to EXPLAIN ANALYZE, inspired by
> what commit bf11e7ee did for Sort.
>
> Example output:
>
> enable_parallel_hash = on, multiplex_gather = on:
>
>  ->  Parallel Hash (actual rows=100 loops=3)
>Buckets: 131072  Batches: 16
>Leader:Shared Memory Usage: 3552kB  Hashed: 396120  Batches
> Probed: 7
>Worker 0:  Shared Memory Usage: 3552kB  Hashed: 276640  Batches
> Probed: 6
>Worker 1:  Shared Memory Usage: 3552kB  Hashed: 327240  Batches
> Probed: 6
>->  Parallel Seq Scan on simple s (actual rows=33 loops=3)
>
>  ->  Parallel Hash (actual rows=1000 loops=8)
>Buckets: 131072  Batches: 256
>Leader:Shared Memory Usage: 2688kB  Hashed: 1347720
> Batches Probed: 36
>Worker 0:  Shared Memory Usage: 2688kB  Hashed: 1131360
> Batches Probed: 33
>Worker 1:  Shared Memory Usage: 2688kB  Hashed: 1123560
> Batches Probed: 38
>Worker 2:  Shared Memory Usage: 2688kB  Hashed: 1231920
> Batches Probed: 38
>Worker 3:  Shared Memory Usage: 2688kB  Hashed: 1272720
> Batches Probed: 34
>Worker 4:  Shared Memory Usage: 2688kB  Hashed: 1234800
> Batches Probed: 33
>Worker 5:  Shared Memory Usage: 2688kB  Hashed: 1294680
> Batches Probed: 37
>Worker 6:  Shared Memory Usage: 2688kB  Hashed: 1363240
> Batches Probed: 35
>->  Parallel Seq Scan on big s2 (actual rows=125 loops=8)
>
> enable_parallel_hash = on, multiplex_gather = off (ie no leader
> participation):
>
>  ->  Parallel Hash (actual rows=100 loops=2)
>Buckets: 131072  Batches: 16
>Worker 0:  Shared Memory Usage: 3520kB  Hashed: 475920  Batches
> Probed: 9
>Worker 1:  Shared Memory Usage: 3520kB  Hashed: 524080  Batches
> Probed: 8
>->  Parallel Seq Scan on simple s (actual rows=50 loops=2)
>
> enable_parallel_hash = off, multiplex_gather = on:
>
>  ->  Hash (actual rows=100 loops=3)
>Buckets: 131072  Batches: 16
>Leader:Memory Usage: 3227kB
>Worker 0:  Memory Usage: 3227kB
>Worker 1:  Memory Usage: 3227kB
>->  Seq Scan on simple s (actual rows=100 loops=3)
>
> enable_parallel_hash = off, multiplex_gather = off:
>
>  ->  Hash (actual rows=100 loops=2)
>Buckets: 131072  Batches: 16
>Worker 0:  Memory Usage: 3227kB
>Worker 1:  Memory Usage: 3227kB
>->  Seq Scan on simple s (actual rows=100 loops=2)
>
> parallelism disabled (traditional single-line output, unchanged):
>
>  ->  Hash (actual rows=100 loops=1)
>Buckets: 131072  Batches: 16  Memory Usage: 3227kB
>->  Seq Scan on simple s (actual rows=100 loops=1)
>
> (It actually says "Tuples Hashed", not "Hashed" but I edited the above
> to fit on a standard punchcard.)  Thoughts?
>
> [1] https://www.postgresql.org/message-id/CAEepm%3D2U%2B%
> 2BLp3bNTv2Bv_kkr5NE2pOyHhxU%3DG0YTa4ZhSYhHiw%40mail.gmail.com
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>



-- 
Rushabh Lathia


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread David Rowley
On 26 October 2017 at 23:30, Robert Haas  wrote:
> On Wed, Oct 25, 2017 at 11:59 PM, David Rowley
>  wrote:
>> As of today, because we include this needless [Merge]Append node, we
>> cannot parallelise scans below the Append.
>
> Without disputing your general notion that singe-child Append or
> MergeAppend nodes are a pointless waste, how does a such a needless
> node prevent parallelizing scans beneath it?

hmm, I think I was wrong about that now. I had been looking over the
regression test diffs after having made tenk1 a partitioned table with
a single partition containing all the rows, but it looks like I read
the diff backwards. The planner actually parallelized the Append
version, not the non-Append version, like I had previously thought.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to determine that a TransactionId is really aborted?

2017-10-26 Thread Robert Haas
On Sun, Oct 22, 2017 at 9:23 PM, Eric Ridge  wrote:
> When sitting inside an extension, and given an arbitrary TransactionId, how 
> can you determine that it aborted/crashed *and* that no other active 
> transaction thinks it is still running?
>
> I've tried to answer this question myself (against the 9.3 sources), and it 
> seems like it's just:
>
> {
>TransactionId oldestXmin = GetOldestXmin (false, false);
>TransactionId xid = 42;
>
>if (TransactionIdPrecedes(xid, oldestXmin) &&
>   !TransactionIdDidCommit(xid) &&
>   !TransactionIdIsInProgress(xid)) /* not even sure this is necessary? */
>{
>   /* xid is aborted/crashed and no active transaction cares */
>}
> }
>
> Can anyone confirm or deny that this is correct?  I feel like it is correct, 
> but I'm no expert.

I think TransactionIdPrecedes(xid, oldestXmin) &&
!TransactionIdDidCommit(xid) is sufficient.  If the transaction ID
precedes oldestXmin, then it's either committed or aborted; no other
state is possible.  If it didn't commit, it aborted, and it is right
to test that using !TransactionIdDidCommit(xid) since commits are
always recorded but aborts are only usually recorded.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-10-26 Thread Robert Haas
On Mon, Oct 23, 2017 at 7:36 AM, Haribabu Kommi
 wrote:
> Apologies for not providing much details.
>
> pg_dumpall is used to produce the following statements for database,
>
> "Create database" (other than default database) or
> "Alter database set tablespace" for default database (if required)
>
> ACL queries related to database
> Alter database config
> Alter database role config
>
> whereas, pg_dump used to produce only "create database statement".

How about adding a new flag --set-db-properties that doesn't produce
CREATE DATABASE but does dump the other stuff?  -C would dump both
CREATE DATABASE *and* the other stuff.  Then you could dump built-in
databases with --set-db-properties and others with -C.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show "aggressive" or not in autovacuum logs

2017-10-26 Thread Robert Haas
On Thu, Oct 26, 2017 at 10:18 AM, Kyotaro HORIGUCHI
 wrote:
> Thank you. I forgot that point. Changed them so that the messages
> are detected as msgids.

Committed, changing "aggressive" to "aggressively" in one place for
correct English.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread Antonin Houska
David Rowley  wrote:

> Method 1:
> 
> In set_append_rel_size() detect when just a single subpath would be
> added to the Append path.

I spent some time reviewing the partition-wise join patch during the last CF
and have an impression that set_append_rel_size() is called too early to find
out how many child paths will eventually exist if Append represents a join of
a partitioned table. I think the partition matching takes place at later stage
and that determines the actual number of partitions, and therefore the number
of Append subpaths.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-10-26 Thread Masahiko Sawada
On Fri, Sep 8, 2017 at 4:32 AM, Masahiko Sawada  wrote:
> On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro
>  wrote:
>> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada  
>> wrote:
>>> The previous patch conflicts with current HEAD, I rebased the patch to
>>> current HEAD.
>>
>> Hi Masahiko-san,
>>
>> FYI this doesn't build anymore.  I think it's just because the wait
>> event enumerators were re-alphabetised in pgstat.h:
>>
>> ../../../../src/include/pgstat.h:820:2: error: redeclaration of
>> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’
>>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>>   ^
>> ../../../../src/include/pgstat.h:806:2: note: previous definition of
>> ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here
>>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>>   ^
>> ../../../../src/include/pgstat.h:821:2: error: redeclaration of
>> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’
>>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>>   ^
>> ../../../../src/include/pgstat.h:807:2: note: previous definition of
>> ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here
>>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>>   ^
>>
>
> Thank you for the information! Attached rebased patch.
>

Since the previous patch conflicts with current HEAD, I attached the
updated patch for next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..b928c1a 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -609,8 +609,8 @@ brin_page_cleanup(Relation idxrel, Buffer buf)
 	 */
 	if (PageIsNew(page))
 	{
-		LockRelationForExtension(idxrel, ShareLock);
-		UnlockRelationForExtension(idxrel, ShareLock);
+		LockRelationForExtension(idxrel, RELEXT_SHARED);
+		UnlockRelationForExtension(idxrel, RELEXT_SHARED);
 
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 		if (PageIsNew(page))
@@ -702,7 +702,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 			 */
 			if (!RELATION_IS_LOCAL(irel))
 			{
-LockRelationForExtension(irel, ExclusiveLock);
+LockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 extensionLockHeld = true;
 			}
 			buf = ReadBuffer(irel, P_NEW);
@@ -754,7 +754,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 }
 
 if (extensionLockHeld)
-	UnlockRelationForExtension(irel, ExclusiveLock);
+	UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 
 ReleaseBuffer(buf);
 return InvalidBuffer;
@@ -764,7 +764,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
 		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
 
 		if (extensionLockHeld)
-			UnlockRelationForExtension(irel, ExclusiveLock);
+			UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 
 		page = BufferGetPage(buf);
 
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076..4c15b45 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -570,7 +570,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 	else
 	{
 		if (needLock)
-			LockRelationForExtension(irel, ExclusiveLock);
+			LockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 
 		buf = ReadBuffer(irel, P_NEW);
 		if (BufferGetBlockNumber(buf) != mapBlk)
@@ -582,7 +582,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 			 * page from under whoever is using it.
 			 */
 			if (needLock)
-UnlockRelationForExtension(irel, ExclusiveLock);
+UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 			LockBuffer(revmap->rm_metaBuf, BUFFER_LOCK_UNLOCK);
 			ReleaseBuffer(buf);
 			return;
@@ -591,7 +591,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 		page = BufferGetPage(buf);
 
 		if (needLock)
-			UnlockRelationForExtension(irel, ExclusiveLock);
+			UnlockRelationForExtension(irel, RELEXT_EXCLUSIVE);
 	}
 
 	/* Check that it's a regular block (or an empty page) */
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27..1690d21 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -325,13 +325,13 @@ GinNewBuffer(Relation index)
 	/* Must extend the file */
 	needLock = !RELATION_IS_LOCAL(index);
 	if (needLock)
-		LockRelationForExtension(index, ExclusiveLock);
+		LockRelationForExtension(index, RELEXT_EXCLUSIVE);
 
 	buffer = ReadBuffer(index, P_NEW);
 	LockBuffer(buffer, GIN_EXCLUSIVE);
 
 	if (needLock)
-		UnlockRelationForExtension(index, ExclusiveLock);
+		UnlockRelationForExtension(index, RELEXT_EXCLUSIVE);
 
 	return buffer;
 }
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 31425e9..e9f84bc 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -716,10 +716,10 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 	needLock = !RELATION_IS_LOCAL(index);
 
 

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread Robert Haas
On Wed, Oct 25, 2017 at 11:59 PM, David Rowley
 wrote:
> As of today, because we include this needless [Merge]Append node, we
> cannot parallelise scans below the Append.

Without disputing your general notion that singe-child Append or
MergeAppend nodes are a pointless waste, how does a such a needless
node prevent parallelizing scans beneath it?

> Invent a ProxyPath concept that allows us to add Paths belonging to
> one relation to another relation. The ProxyPaths can have
> translation_vars to translate targetlists to reference the correct
> Vars. These ProxyPaths could exist up as far as createplan, where we
> could perform the translation and build the ProxyPath's subnode
> instead.

This idea sounds like it might have some legs, although I'm not sure
"proxy" is really the right terminology.

Like both you and Ashutosh, I suspect that there might be some other
applications for this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Index only scan for cube and seg

2017-10-26 Thread Andrey Borodin
Hi hackers!

Here are patches enabling Index Only Scan for cube and seg extensions.

These patches follow this discussion [0].

For cube there is new default opclass. We cannot drop old opclass, because it 
could TOAST come cube values in rare occasions. Index Only Scan is enabled only 
for newly created indexes. Btw I can add fetch to old opclass so that IOS would 
be enabled.
For seg compress and decompress functions are dropped from opclass and 
extension. Index Only Scan is enabled.

There are two more functions which can be deleted
ghstore_decompress
g_intbig_decompress
But it will not lead to any feasible consequences.


[0] 
https://www.postgresql.org/message-id/flat/CAJEAwVELVx9gYscpE%3DBe6iJxvdW5unZ_LkcAaVNSeOwvdwtD%3DA%40mail.gmail.com#CAJEAwVELVx9gYscpE=Be6iJxvdW5unZ_LkcAaVNSeOwvdwtD=a...@mail.gmail.com



 

0001-Create-cube-opclass-without-toasting.patch
Description: Binary data


0001-Enable-Index-Only-Scan-in-seg.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-10-26 Thread Ashutosh Bapat
On Thu, Oct 26, 2017 at 3:29 AM, David Rowley
 wrote:
> However, method 2 appears to also require some Var
> translation in Path targetlists which contain this Proxy path, either
> that or some global Var replacement would need to be done during
> setrefs.
>

For inheritance and partitioning, we translate parent expressions to
child expression by applying Var translations. The translated
expressions differ only in the Var nodes (at least for partitioning
this true, and I believe it to be true for inheritance). I have been
toying with an idea of having some kind of a (polymorphic?) Var node
which can represent parent and children. That will greatly reduce the
need to translate the expression trees and will also help in this
implementation. I am wondering, if ProxyPath could be helpful in
avoiding the cost of reparameterize_path_by_child() introduced for
partition-wise join.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-10-26 Thread Kyotaro HORIGUCHI
Hello. Thank you for looking this.

At Mon, 16 Oct 2017 17:58:03 +0900, Michael Paquier  
wrote in 
> On Thu, Sep 7, 2017 at 12:33 PM, Kyotaro HORIGUCHI
>  wrote:
> > At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund  wrote 
> > in <20170906192353.ufp2dq7wm5fd6...@alap3.anarazel.de>
> >> I'm not following. All we need to use is the beginning of the relevant
> >> records, that's easy enough to keep track of. We don't need to read the
> >> WAL or anything.
> >
> > The beginning is already tracked and nothing more to do.
> 
> I have finally allocated time to look at your newly-proposed patch,
> sorry for the time it took. Patch 0002 forgot to include sys/stat.h to
> allow the debugging tool to compile :)
> 
> > The first *problem* was WaitForWALToBecomeAvailable requests the
> > beginning of a record, which is not on the page the function has
> > been told to fetch. Still tliRecPtr is required to determine the
> > TLI to request, it should request RecPtr to be streamed.
> 
> [...]
> 
> > The rest to do is let XLogPageRead retry other sources
> > immediately. To do this I made ValidXLogPageHeader@xlogreader.c
> > public (and renamed to XLogReaderValidatePageHeader).
> >
> > The patch attached fixes the problem and passes recovery
> > tests. However, the test for this problem is not added. It needs
> > to go to the last page in a segment then put a record continues
> > to the next segment, then kill the standby after receiving the
> > previous segment but before receiving the whole record.
> 
> +typedef struct XLogPageHeaderData *XLogPageHeader;
> [...]
> +/* Validate a page */
> +extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
> +   XLogRecPtr recptr, XLogPageHeader 
> hdr);
> Instead of exposing XLogPageHeaderData, I would recommend just using
> char* and remove this declaration. The comment on top of
> XLogReaderValidatePageHeader needs to make clear what caller should
> provide.

Seems reasonable. Added several lines in the comment for the
function.

> +if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> +  (XLogPageHeader) readBuf))
> +goto next_record_is_invalid;
> +
> [...]
> -ptr = tliRecPtr;
> +ptr = RecPtr;
>  tli = tliOfPointInHistory(tliRecPtr, 
> expectedTLEs);
> 
>  if (curFileTLI > 0 && tli < curFileTLI)
> The core of the patch is here (the patch has no comment so it is hard
> to understand what's the point of what is being done), and if I

Hmm, sorry. Added a brief comment there.

> understand that correctly, you allow the receiver to fetch the
> portions of a record spawned across multiple segments from different
> sources, and I am not sure that we'd want to break that promise.

We are allowing consecutive records at a segment boundary from
different sources are in the same series of xlog records. A
continuation records never spans over two TLIs but I might be
missing something here. (I found that an error message shows an
incorrect record pointer. The error message seems still be
useful.)

> Shouldn't we instead have the receiver side track the beginning of the
> record and send that position for the physical slot's restart_lsn?

The largest obstacle to do that is that walreceiver is not
utterly concerned to record internals. In other words, it doesn't
know what a record is. Teaching that introduces much complexity
and the complexity slows down the walreceiver.

Addition to that, this "problem" occurs also on replication
without a slot. The latest patch also help the case.

> This way the receiver would retain WAL segments from the real
> beginning of a record. restart_lsn for replication slots is set when
> processing the standby message in ProcessStandbyReplyMessage() using
> now the flush LSN, so a more correct value should be provided using
> that. Andres, what's your take on the matter?


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From b23c1d69ad86fcbb992cb21c604f587d82441001 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Sep 2017 12:14:55 +0900
Subject: [PATCH 1/2] Allow switch WAL source midst of record.

The corrent recovery machinary assumes the whole of a record is
avaiable from single source. This prevents a standby from restarting
under a certain condition. This patch allows source switching during
reading a series of continuation records.
---
 src/backend/access/transam/xlog.c   | 14 --
 src/backend/access/transam/xlogreader.c | 28 ++--
 src/include/access/xlogreader.h |  4 
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 

[HACKERS] Subscriber resets additional columns to NULL on UPDATE

2017-10-26 Thread Petr Jelinek
Hi,

I found bug in logical replication where extra (nullable) columns on
subscriber will be reset to NULL value when update comes from provider.

The issue is apparently that we /points finger at himself/ forgot to
check specifically for columns that are not part of attribute map in
slot_modify_cstrings() so the extra columns will fall through to the
else block which sets the value to NULL.

Attached patch fixes it and adds couple of tests for this scenario.

This is rather serious issue so it would be good if we could get it
fixed in 10.1.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From c641eb6170f3dec26cf52264e1f931393aee434e Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 26 Oct 2017 11:11:42 +0200
Subject: [PATCH] Don't reset additional columns on subscriber to NULL after
 UPDATE

When publisher tables has fewer columns than subscriber, the update of
row on publisher should result in update of only common columns.
Previous coding mistakenly reset the values of additonal columns on
subscriber to NULL because it failed to skip updates of columns not
found in attribute map.
---
 src/backend/replication/logical/worker.c   |  7 ++-
 src/test/subscription/t/008_diff_schema.pl | 77 ++
 2 files changed, 82 insertions(+), 2 deletions(-)
 create mode 100644 src/test/subscription/t/008_diff_schema.pl

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bc6d8246a7..0e68670767 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -391,10 +391,13 @@ slot_modify_cstrings(TupleTableSlot *slot, LogicalRepRelMapEntry *rel,
 		Form_pg_attribute att = TupleDescAttr(slot->tts_tupleDescriptor, i);
 		int			remoteattnum = rel->attrmap[i];
 
-		if (remoteattnum >= 0 && !replaces[remoteattnum])
+		if (remoteattnum < 0)
 			continue;
 
-		if (remoteattnum >= 0 && values[remoteattnum] != NULL)
+		if (!replaces[remoteattnum])
+			continue;
+
+		if (values[remoteattnum] != NULL)
 		{
 			Oid			typinput;
 			Oid			typioparam;
diff --git a/src/test/subscription/t/008_diff_schema.pl b/src/test/subscription/t/008_diff_schema.pl
new file mode 100644
index 00..a6b4597403
--- /dev/null
+++ b/src/test/subscription/t/008_diff_schema.pl
@@ -0,0 +1,77 @@
+# Basic logical replication test
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Initialize publisher node
+my $node_publisher = get_new_node('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = get_new_node('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;
+
+# Create some preexisting content on publisher
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b varchar)");
+$node_publisher->safe_psql('postgres',
+	"INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres', "CREATE TABLE test_tab (a int primary key, b text, c timestamptz default now(), d bigint default 999)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+$node_subscriber->safe_psql('postgres',
+"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub"
+);
+
+# Wait for subscriber to finish initialization
+my $caughtup_query =
+"SELECT pg_current_wal_lsn() <= replay_lsn FROM pg_stat_replication WHERE application_name = '$appname';";
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for subscriber to catch up";
+
+# Also wait for initial table sync to finish
+my $synced_query =
+"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');";
+$node_subscriber->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";
+
+my $result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check initial data was copied to subscriber');
+
+# update the tuples on publisher and check the additional columns on
+# subscriber didn't change
+$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b = md5(b)");
+
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+  or die "Timed out while waiting for subscriber to catch up";
+
+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT count(*), count(c), count(d = 999) FROM test_tab");
+is($result, qq(2|2|2), 'check extra columns contain local defaults');
+
+# change the local values of extra on subscriber, update 

Re: [HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?

2017-10-26 Thread Ashutosh Bapat
On Wed, Oct 18, 2017 at 5:44 AM, Craig Ringer  wrote:
> On 18 October 2017 at 02:01, Alexander Korotkov
>  wrote:
>> On Wed, Sep 6, 2017 at 4:42 PM, Alexander Korotkov
>>  wrote:
>>>
>>> We're currently blocking writing queries on standby if even they are
>>> modifying contents of foreign tables.  But do we have serious reasons for
>>> that?
>>> Keeping in the mind FDW-sharding, making FDW-tables writable from standby
>>> would be good to prevent single-master bottleneck.
>>> I wrote simple patch enabling writing to foreign tables from standbys.  It
>>> works from the first glance for me.
>>
>>
>> No interest yet, but no objections too :-)
>> I'm going to add this to next commitfest.
>
> Superficially at least, it sounds like a good idea.
>
> We should only need a virtual xid when we're working with foreign
> tables since we don't do any local heap changes.
>
> How's it work with savepoints?
>

In a nearby thread, we are discussing about atomic commit of
transactions involving foreign transactions. For maintaining
consistency, atomicity of transactions writing to foreign server, we
will need to create local transactions. Will that be possible on
standby? Obviously, we could add a restriction that no consistency and
atomic commit is guaranteed when foreign servers are written from a
standby. I am not sure how easy would be to impose such a restriction
and whether such a restriction would be practical.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] ECPG: fails to recognize embedded parameters

2017-10-26 Thread Michael Meskes
> The cause is that next_insert() in ecpglib unconditionally skips the
> next character after the backslash.
> 
> Could you review and commit the attached patch?  I also attached the
> test program for convenience.

Thanks for spotting and fixing. I just committed your patch to master
and backported to 9.4, 9.5, 9.6 and 10. It doesn't apply cleanly to
9.3. But then it might not be important enough to investigate and
backported to this old a version.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show "aggressive" or not in autovacuum logs

2017-10-26 Thread Kyotaro HORIGUCHI
Thank you for the comment.

(Thank you Sawada-san for reviewng, too.)

At Thu, 19 Oct 2017 13:03:38 +0200, Alvaro Herrera  
wrote in <20171019110338.awwzc3y674co7wof@alvherre.pgsql>
> Kyotaro HORIGUCHI wrote:
> 
> > How about the followings?
> > 
> > "automatic [agressive ]vacuum of table \"%s..."
> > "[aggressive ]vacuuming \"%s..."
> 
> That form of log message seems acceptable to me (first one is missing a 'g').
> 
> In any case, please do not construct the sentence with %s expanding the
> word, because that is going to cause a problem for translations.  Use an
> 'if' test with two full sentences instead.  Yes, as Robert said, it's
> going to add more strings, but that's okay.

Thank you. I forgot that point. Changed them so that the messages
are detected as msgids.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 7252bfc0fafcf9d4d38067913325cf82c88d1e1e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 28 Aug 2017 13:12:25 +0900
Subject: [PATCH] Show "aggressive" or not in vacuum messages

VACUUM VERBOSE or autovacuum emits log message with "%u skipped
frozen" but we cannot tell whether the vacuum was non-freezing (or not
aggressive) vacuum or freezing (or aggressive) vacuum having no tuple
to freeze. This patch adds indication of aggressive (auto)vacuum in
log messages and VACUUM VERBOSE message.
---
 src/backend/commands/vacuumlazy.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 30b1c08..1080fcf 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -355,6 +355,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	   params->log_min_duration))
 		{
 			StringInfoData buf;
+			char *msgfmt;
 
 			TimestampDifference(starttime, endtime, , );
 
@@ -373,7 +374,11 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 			 * emitting individual parts of the message when not applicable.
 			 */
 			initStringInfo();
-			appendStringInfo(, _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"),
+			if (aggressive)
+msgfmt = _("automatic aggressive vacuum of table \"%s.%s.%s\": index scans: %d\n");
+			else
+msgfmt = _("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n");
+			appendStringInfo(, msgfmt,
 			 get_database_name(MyDatabaseId),
 			 get_namespace_name(RelationGetNamespace(onerel)),
 			 RelationGetRelationName(onerel),
@@ -486,10 +491,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pg_rusage_init();
 
 	relname = RelationGetRelationName(onerel);
-	ereport(elevel,
-			(errmsg("vacuuming \"%s.%s\"",
-	get_namespace_name(RelationGetNamespace(onerel)),
-	relname)));
+	if (aggressive)
+		ereport(elevel,
+(errmsg("aggressive vacuuming \"%s.%s\"",
+		get_namespace_name(RelationGetNamespace(onerel)),
+		relname)));
+	else
+		ereport(elevel,
+(errmsg("vacuuming \"%s.%s\"",
+		get_namespace_name(RelationGetNamespace(onerel)),
+		relname)));
 
 	empty_pages = vacuumed_pages = 0;
 	num_tuples = tups_vacuumed = nkeep = nunused = 0;
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Try to fix endless loop in ecpg with informix mode

2017-10-26 Thread 高增琦
Hi,

I tried some tests with ecpg informix mode.
When trying to store float data into a integer var, I got endless loop.

The reason is:
In informix mode, ecpg can accept
string form of float number when processing query result.
During checking the string form of float number, it seems
that ecpg forgot to skip characters after '.'.
Then outer loop will never stop because it hopes to see '\0'.

The first patch will reproduce the problem in ecpg's regress test.
The second patch tries to fix it in simple way.

-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com


0001-Edit-ecpg-regress-test-to-trigger-endless-loop.patch
Description: Binary data


0002-Fix-endless-loop-in-ecpg-with-informix-mode.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-10-26 Thread Etsuro Fujita

Hi Maksim,

On 2017/10/02 21:37, Maksim Milyutin wrote:

On 11.09.2017 16:01, Etsuro Fujita wrote:
* Query planning: the patch creates copies of Query/Plan with a 
foreign partition as target from the original Query/Plan for each 
foreign partition and invokes PlanForeignModify with those copies, to 
allow the FDW to do query planning for remote INSERT with the existing 
API.  To make such Queries the similar way inheritance_planner does, I 
modified transformInsertStmt so that the inh flag for the partitioned 
table's RTE is set to true, which allows (1) expand_inherited_rtentry 
to build an RTE and AppendRelInfo for each partition in the 
partitioned table and (2) make_modifytable to build such Queries using 
adjust_appendrel_attrs and those AppendRelInfos.


* explain.c: I modified show_modifytable_info so that we can show 
remote queries for foreign partitions in EXPLAIN for INSERT into a 
partitioned table the same way as for inherited UPDATE/DELETE cases. 
Here is an example:


postgres=# explain verbose insert into pt values (1), (2);
    QUERY PLAN
---
 Insert on public.pt  (cost=0.00..0.03 rows=2 width=4)
   Foreign Insert on public.fp1
 Remote SQL: INSERT INTO public.t1(a) VALUES ($1)
   Foreign Insert on public.fp2
 Remote SQL: INSERT INTO public.t2(a) VALUES ($1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
 Output: "*VALUES*".column1
(7 rows)

I think I should add more explanation about the patch, but I don't 
have time today, so I'll write additional explanation in the next 
email. Sorry about that.


Could you update your patch, it isn't applied on the actual state of 
master. Namely conflict in src/backend/executor/execMain.c


Attached is an updated version.

* As mentioned in "Query planning", the patch builds an RTE for each 
partition so that the FDW can make reference to that RTE in eg, 
PlanForeignModify.  set_plan_references also uses such RTEs to record 
plan dependencies for plancache.c to invalidate cached plans.  See an 
example for that added to the regression tests in postgres_fdw.


* As mentioned in "explain.c", the EXPLAIN shows all partitions beneath 
the ModifyTable node.  One merit of that is we can show remote queries 
for foreign partitions in the output as shown above.  Another one I can 
think of is when reporting execution stats for triggers.  Here is an 
example for that:


postgres=# explain analyze insert into list_parted values (1, 'hi 
there'), (2, 'hi there');

  QUERY PLAN
--
 Insert on list_parted  (cost=0.00..0.03 rows=2 width=36) (actual 
time=0.375..0.375 rows=0 loops=1)

   Insert on part1
   Insert on part2
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=36) 
(actual time=0.004..0.007 rows=2 loops=1)

 Planning time: 0.089 ms
 Trigger part1brtrig on part1: time=0.059 calls=1
 Trigger part2brtrig on part2: time=0.021 calls=1
 Execution time: 0.422 ms
(8 rows)

This would allow the user to understand easily that "part1" and "part2" 
in the trigger lines are the partitions of list_parted.  So, I think 
it's useful to show partition info in the ModifyTable node even in the 
case where a partitioned table only contains plain tables.


* The patch modifies make_modifytable and ExecInitModifyTable so that 
the former can allow the FDW to construct private plan data for each 
foreign partition and accumulate it all into a list, and that the latter 
can perform BeginForeignModify for each partition using that plan data 
stored in the list passed from make_modifytable.  Other changes I made 
to the executor are: (1) currently, we set the RT index for the root 
partitioned table to ri_RangeTableIndex of partitions' ResultRelInfos, 
but the proposed EXPLAIN requires that the partition's 
ri_RangeTableIndex is set to the RT index for that partition's RTE, to 
show that partition info in the output.  So, I made that change. 
Because of that, ExecPartitionCheck, ExecConstraints, and 
ExecWithCheckOptions are adjusted accordingly.  (2) I added a new member 
to ResultRelInfo (ie, ri_PartitionIsValid), and modified 
CheckValidResultRel so that it checks a given partition without throwing 
an error and save the result in that flag so that ExecInsert determines 
using that flag whether a partition chosen by ExecFindPartition is valid 
for tuple-routing as proposed before.


* copy.c: I still don't think it's a good idea to implement COPY 
tuple-routing for foreign partitions using PlanForeignModify.  (I plan 
to propose new FDW APIs for bulkload as discussed before, to implement 
this feature.)  So, I kept that as-is.  Two things I changed there are: 
(1) currently, ExecSetupPartitionTupleRouting verifies partitions using 
CheckValidResultRel, but I don't think we need the 

[HACKERS] proposal: schema variables

2017-10-26 Thread Pavel Stehule
Hi,

I propose a  new database object - a variable. The variable is persistent
object, that holds unshared session based not transactional in memory value
of any type. Like variables in any other languages. The persistence is
required for possibility to do static checks, but can be limited to session
- the variables can be temporal.

My proposal is related to session variables from Sybase, MSSQL or MySQL
(based on prefix usage @ or @@), or package variables from Oracle (access
is controlled by scope), or schema variables from DB2. Any design is coming
from different sources, traditions and has some advantages or
disadvantages. The base of my proposal is usage schema variables as session
variables for stored procedures. It should to help to people who try to
port complex projects to PostgreSQL from other databases.

The Sybase  (T-SQL) design is good for interactive work, but it is weak for
usage in stored procedures - the static check is not possible. Is not
possible to set some access rights on variables.

The ADA design (used on Oracle) based on scope is great, but our
environment is not nested. And we should to support other PL than PLpgSQL
more strongly.

There is not too much other possibilities - the variable that should be
accessed from different PL, different procedures (in time) should to live
somewhere over PL, and there is the schema only.

The variable can be created by CREATE statement:

CREATE VARIABLE public.myvar AS integer;
CREATE VARIABLE myschema.myvar AS mytype;

CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type
  [ DEFAULT expression ] [[NOT] NULL]
  [ ON TRANSACTION END { RESET | DROP } ]
  [ { VOLATILE | STABLE } ];

It is dropped by command DROP VARIABLE  [ IF EXISTS] varname.

The access rights is controlled by usual access rights - by commands
GRANT/REVOKE. The possible rights are: READ, WRITE

The variables can be modified by SQL command SET (this is taken from
standard, and it natural)

SET varname = expression;

Unfortunately we use the SET command for different purpose. But I am
thinking so we can solve it with few tricks. The first is moving our GUC to
pg_catalog schema. We can control the strictness of SET command. In one
variant, we can detect custom GUC and allow it, in another we can disallow
a custom GUC and allow only schema variables. A new command LET can be
alternative.

The variables should be used in queries implicitly (without JOIN)

SELECT varname;

The SEARCH_PATH is used, when varname is located. The variables can be used
everywhere where query parameters are allowed.

I hope so this proposal is good enough and simple.

Comments, notes?

regards

Pavel


Re: [HACKERS] More stats about skipped vacuums

2017-10-26 Thread Kyotaro HORIGUCHI
Mmm. I've failed to create a brand-new thread..

Thank you for the comment.

At Fri, 20 Oct 2017 19:15:16 +0900, Masahiko Sawada  
wrote in