Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2016-11-23 Thread Michael Paquier
On Sat, Nov 12, 2016 at 3:42 AM, Andreas Karlsson  wrote:
> On 11/11/2016 07:40 PM, Andreas Karlsson wrote:
>> Here is a new version of the patch with the only differences;
>>
>> 1) The SSL tests have been changed to use reload rather than restart

Did you check if the tests pass? I am getting a couple of failures
like this one:
psql: server certificate for "common-name.pg-ssltest.test" does not
match host name "127.0.0.1"
not ok 11 - sslrootcert=ssl/root+server_ca.crt sslmode=verify-full
Attached are the logs of the run I did, and the same behavior shows
for macOS and Linux. The shape of the tests look correct to me after
review. Still, seeing failing tests with sslmode=verify-full is a
problem that needs to be addressed. This may be pointing to an
incorrect CA load handling, though I could not spot a problem when
going through the code.

>> 2) Rebased on master
>
> And here with a fix to a comment.

config.sgml needs an update as it still mentions that SSL parameter
require a restart when updated.

I have done a couple of tests on Linux, switching ssl mode between on
and off and testing connection attempts with sslmode. Things are
proving to work as I would expect them to be, so basically that's
nice:
- switching to off with sslmode=require triggers an error:
psql: server does not support SSL, but SSL was required
- switching to on with sslmode=require connects with SSL.
- switching to off with sslmode=prefer connects without SSL.
- switching to on with sslmode=prefer connects with SSL.

I have done as well a couple of tests with Windows, where switching
ssl between on and off is proving to work properly for each new
connection. There is no surprise here, and that's as documented in the
patch.
-- 
Michael


regress_log_001_ssltests
Description: Binary data


001_ssltests_master.log
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-23 Thread Ashutosh Bapat
>
>> Sorry. I think the current version is better than previous one. The
>> term "subselect alias" is confusing in the previous version. In the
>> current version, "Get the relation and column alias for a given Var
>> node," we need to add word "identifiers" like "Get the relation and
>> column identifiers for a given Var node".
>
>
> OK, but one thing I'm concerned about is the term "relation alias" seems a
> bit confusing because we already used the term for the alias of the form
> "rN".  To avoid that, how about saying "table alias", not "relation alias"?
> (in which case, the comment would be something like "Get the table and
> column identifiers for a given Var node".)

table will be misleading as subquery can represent a join and
corresponding alias would represent the join. Relation is better term.


>
>
>> If we deparse from and search into different objects, that's not very
>> maintainable code. Changes to any one of them require changes to the
>> other, thus creating avenues for bugs.  Even if slighly expensive, we
>> should search into and deparse from the same targetlist.
>
>
> I don't think so; in the current version, we essentially deparse from and
> search into the same object, ie, foreignrel->reltarget->exprs, since the
> tlist created by build_tlist_to_deparse is build from the
> foreignrel->reltarget->exprs.  Also, since it is just created for deparsing
> the relation as a subquery in deparseRangeTblRef and isn't stored into
> fpinfo or anywhere alse, we would need no concern about creating such
> avenues.  IMO I think adding comments would be enough for this.

build_tlist_to_depase() calls pull_var_nodes() before creating the
tlist, whereas the code that searches does not do that. Code-wise
those are not the same things.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-23 Thread Etsuro Fujita

On 2016/11/23 0:30, Ashutosh Bapat wrote:

You wrote:

The comment below says "get the aliases", but what the function really
returns
is the identifiers used for creating aliases. Please correct the comment.
+/*
+ * Get the relation and column alias for a given Var node, which belongs
to
+ * input foreignrel. They are returned in *tabno and *colno respectively.
+ */


I wrote:

Actually, this was rewritten into the above by *you*.  The original comment
I added was:

+ /*
+  * Get info about the subselect alias to given expression.
+  *
+  * The subselect table and column numbers are returned to *tabno and
*colno,
+  * respectively.
+  */

I'd like to change the comment into something like the original one.



Sorry. I think the current version is better than previous one. The
term "subselect alias" is confusing in the previous version. In the
current version, "Get the relation and column alias for a given Var
node," we need to add word "identifiers" like "Get the relation and
column identifiers for a given Var node".


OK, but one thing I'm concerned about is the term "relation alias" seems 
a bit confusing because we already used the term for the alias of the 
form "rN".  To avoid that, how about saying "table alias", not "relation 
alias"?  (in which case, the comment would be something like "Get the 
table and column identifiers for a given Var node".)



We discussed that we have to deparse and search from the same targetlist.
And
that the targetlist should be saved in fpinfo, the first time it gets
created.
But the patch seems to be searching in foreignrel->reltarget->exprs and
deparsing from the tlist returned by add_to_flat_tlist(tlist,
foreignrel->reltarget->exprs).
+foreach(lc, foreignrel->reltarget->exprs)
+{
+if (equal(lfirst(lc), (Node *) node))
+{
+*colno = i;
+return;
+}
+i++;
+}
I guess, the reason why you are doing it this way, is SELECT clause for
the
outermost query gets deparsed before FROM clause. For later we call
deparseRangeTblRef(), which builds the tlist. So, while deparsing SELECT
clause, we do not have tlist to build from.



That's right.



In that case, I guess, we have to
build the tlist in get_subselect_alias_id() if it's not available and
stick it
in fpinfo. Subsequent calls to get_subselect_alias_id() should find tlist
there. Then in deparseRangeTblRef() assert that there's a tlist in fpinfo
and use it to build the SELECT clause of subquery. That way, we don't
build
tlist unless it's needed and also use the same tlist for all searches.
Please
use tlist_member() to search into the tlist.



I don't think that's a good idea because that would make the code
unnecessarily complicated and inefficient.  I think the direct search into
the foreignrel->reltarget->exprs shown above would be better because that's
more simple and efficient than what you proposed.  Note that since (1) the
foreignrel->reltarget->exprs doesn't contain any PHVs and (2) the
local_conds is empty, the tlist created for the subquery by
build_tlist_to_deparse is guaranteed to have one-to-one correspondence with
the foreignrel->reltarget->exprs, so the direct search works well.



If we deparse from and search into different objects, that's not very
maintainable code. Changes to any one of them require changes to the
other, thus creating avenues for bugs.  Even if slighly expensive, we
should search into and deparse from the same targetlist.


I don't think so; in the current version, we essentially deparse from 
and search into the same object, ie, foreignrel->reltarget->exprs, since 
the tlist created by build_tlist_to_deparse is build from the 
foreignrel->reltarget->exprs.  Also, since it is just created for 
deparsing the relation as a subquery in deparseRangeTblRef and isn't 
stored into fpinfo or anywhere alse, we would need no concern about 
creating such avenues.  IMO I think adding comments would be enough for 
this.  Anyway, I think this is an optional issue, so I'd like to leave 
this for the committer's judge.



I think I
have explained this before.


My apologies for having misunderstood your words.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNDO and in-place update

2016-11-23 Thread Peter Geoghegan
On Wed, Nov 23, 2016 at 11:32 PM, Tsunakawa, Takayuki
 wrote:
> IMHO, overall, there should be pros and cons of the current approach and the 
> new UNDo one (like Oracle?), depending on the workload.  Under update-heavy 
> workload, the UNDO method may be better.  OTOH, under the mostly-INSERT 
> workload (like data warehouse?), the current method will be better because it 
> writes no log for UNDO.

I believe that you are correct about that.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNDO and in-place update

2016-11-23 Thread Tsunakawa, Takayuki
IMHO, overall, there should be pros and cons of the current approach and the 
new UNDo one (like Oracle?), depending on the workload.  Under update-heavy 
workload, the UNDO method may be better.  OTOH, under the mostly-INSERT 
workload (like data warehouse?), the current method will be better because it 
writes no log for UNDO.

Furthermore, it maybe the best to be able to switch the method for each table 
and/or tablespace.  For example, in pgbench, history table uses the current 
method,
and other tables use the UNDO method.  Is it time to introduce a pluggable 
storage system?

Because PostgreSQL is a follower in the UNDO approach, I think it will be 
better to study other DBMSs well (Oracle and MySQL?).  That includes not only 
their manuals, but also whitepapers and books.  Especially, I expect good books 
to give deep knowledge on performance tuning and troubleshooting, from which we 
will be able to know the cons that Oracle's materials don't state.

Regards
Takayuki Tsunakawa






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-11-23 Thread Amit Kapila
On Thu, Nov 24, 2016 at 10:29 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
>> Thanks for the clarification, I could reproduce the issue and confirms that
>> patch has fixed it.  Find logs of cascading standby at  PG9.2 Head and Patch
>> attached (I have truncated few lines at end of server log generated in Head
>> as those were repetitive).  I think the way you have directly explained
>> the bug steps in code comments is not right (think if we start writing bug
>> steps for each bug fix, how the code will look like).  So I have modified
>> the comment to explain the situation and reason of check,  see if you find
>> that as okay?
>
> Thank you, I'm happy with your comment.
>

Okay, I have marked the patch as 'Ready for Committer'.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-23 Thread Ashutosh Bapat
On Thu, Nov 24, 2016 at 12:02 PM, Amit Langote
 wrote:
> On 2016/11/24 15:10, Ashutosh Bapat wrote:
>> On Thu, Nov 24, 2016 at 11:34 AM, Amit Langote wrote:
>>> On 2016/11/24 14:35, Ashutosh Bapat wrote:
 IIUC, it should allow "create table t1_p1 partition of t1 (a primary
 key) ...", (a primary key) is nothing but "column_name
 column_constraint", but here's what happens
 create table t1_p1 partition of t1 (a primary key) for values from (0) to 
 (100);
 ERROR:  syntax error at or near "primary"
 LINE 1: create table t1_p1 partition of t1 (a primary key) for value...
>>>
>>> You have to specify column constraints using the keywords WITH OPTIONS,
>>> like below:
>>>
>>> create table p1 partition of p (
>>> a with options primary key
>>> ) for values in (1);
>>
>> Oh, sorry for not noticing it. You are right. Why do we need "with
>> option" there? Shouldn't user be able to specify just "a primary key";
>> it's not really an "option", it's a constraint.
>
> I just adopted the existing syntax for specifying column/table constraints
> of a table created with CREATE TABLE OF type_name.

Hmm, I don't fine it quite intuitive. But others might find it so.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-23 Thread Amit Langote
On 2016/11/24 15:10, Ashutosh Bapat wrote:
> On Thu, Nov 24, 2016 at 11:34 AM, Amit Langote wrote:
>> On 2016/11/24 14:35, Ashutosh Bapat wrote:
>>> IIUC, it should allow "create table t1_p1 partition of t1 (a primary
>>> key) ...", (a primary key) is nothing but "column_name
>>> column_constraint", but here's what happens
>>> create table t1_p1 partition of t1 (a primary key) for values from (0) to 
>>> (100);
>>> ERROR:  syntax error at or near "primary"
>>> LINE 1: create table t1_p1 partition of t1 (a primary key) for value...
>>
>> You have to specify column constraints using the keywords WITH OPTIONS,
>> like below:
>>
>> create table p1 partition of p (
>> a with options primary key
>> ) for values in (1);
> 
> Oh, sorry for not noticing it. You are right. Why do we need "with
> option" there? Shouldn't user be able to specify just "a primary key";
> it's not really an "option", it's a constraint.

I just adopted the existing syntax for specifying column/table constraints
of a table created with CREATE TABLE OF type_name.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-23 Thread Ashutosh Bapat
On Thu, Nov 24, 2016 at 11:34 AM, Amit Langote
 wrote:
>
> Hi Ashutosh,
>
> On 2016/11/24 14:35, Ashutosh Bapat wrote:
>> I am trying to create a partitioned table with primary keys on the
>> partitions. Here's the corresponding syntax as per documentation
>> CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [
>> IF NOT EXISTS ] table_name
>> PARTITION OF parent_table [ (
>>   { column_name WITH OPTIONS [ column_constraint [ ... ] ]
>> | table_constraint }
>> [, ... ]
>> ) ] partition_bound_spec
>>
>> IIUC, it should allow "create table t1_p1 partition of t1 (a primary
>> key) ...", (a primary key) is nothing but "column_name
>> column_constraint", but here's what happens
>> create table t1_p1 partition of t1 (a primary key) for values from (0) to 
>> (100);
>> ERROR:  syntax error at or near "primary"
>> LINE 1: create table t1_p1 partition of t1 (a primary key) for value...
>
> You have to specify column constraints using the keywords WITH OPTIONS,
> like below:
>
> create table p1 partition of p (
> a with options primary key
> ) for values in (1);

Oh, sorry for not noticing it. You are right. Why do we need "with
option" there? Shouldn't user be able to specify just "a primary key";
it's not really an "option", it's a constraint.

>
>> The same syntax also suggests using table_constraints but that too doesn't 
>> work
>>  create table t1_p1 partition of t1 (primary key (a) )  for values
>> from (0) to (100);
>> ERROR:  inherited relation "t1" is not a table or foreign table
>>
>> of course t1 is a table, what it isn't?
>
> It's a bug.  Forgot to consider RELKIND_PARTITIONED_TABLE to an if block
> in the code that checks inheritance parent relation's relkind when
> creating an index constraint (primary key) on a child table.  Will fix,
> thanks for catching it.
>
> Thanks,
> Amit
>
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-23 Thread Amit Langote

Hi Ashutosh,

On 2016/11/24 14:35, Ashutosh Bapat wrote:
> I am trying to create a partitioned table with primary keys on the
> partitions. Here's the corresponding syntax as per documentation
> CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [
> IF NOT EXISTS ] table_name
> PARTITION OF parent_table [ (
>   { column_name WITH OPTIONS [ column_constraint [ ... ] ]
> | table_constraint }
> [, ... ]
> ) ] partition_bound_spec
> 
> IIUC, it should allow "create table t1_p1 partition of t1 (a primary
> key) ...", (a primary key) is nothing but "column_name
> column_constraint", but here's what happens
> create table t1_p1 partition of t1 (a primary key) for values from (0) to 
> (100);
> ERROR:  syntax error at or near "primary"
> LINE 1: create table t1_p1 partition of t1 (a primary key) for value...

You have to specify column constraints using the keywords WITH OPTIONS,
like below:

create table p1 partition of p (
a with options primary key
) for values in (1);

> The same syntax also suggests using table_constraints but that too doesn't 
> work
>  create table t1_p1 partition of t1 (primary key (a) )  for values
> from (0) to (100);
> ERROR:  inherited relation "t1" is not a table or foreign table
> 
> of course t1 is a table, what it isn't?

It's a bug.  Forgot to consider RELKIND_PARTITIONED_TABLE to an if block
in the code that checks inheritance parent relation's relkind when
creating an index constraint (primary key) on a child table.  Will fix,
thanks for catching it.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-23 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> Maybe a workable compromise would be to leave the file present, and have
> the stats collector re-write it every (say) five minutes.  Then I'd be okay
> with having an immediate shutdown skip writing the file; you'd be losing
> up to five minutes' worth of activity, but not going completely nuts.  So
> the stats collector's normal activities would include writing the temp file
> on-demand and the permanent file on a timed cycle.
> 
> The other components of the fix (deleting on PITR rewind or stats collector
> crash) would remain the same.

The manual says:

"Also, the collector itself emits a new report at most once per 
PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building the 
server). So the displayed information lags behind actual activity."

Doesn't this mean that the stats collector writes files in pg_stat_tmp/ every 
500ms?  If true, how about just moving those files into appropriate locations 
during recovery, instead of removing the files?

I also find others's ideas woth considering -- WAL-logging the stats files, 
type-specific stats files, etc. -- but I'm afraid those ideas would only be 
employed in a new major release, not in released versions.  I'm asking for a 
remedy for a user (and potential users) who use older releases.  And, I don't 
yet understand why patch would make the situation for existing users, and why 
stop writing files during immediate/abnormal shutdown requires other efforts.

Regards
Takayuki Tsunakawa






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-23 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> Robert Haas  writes:
> > I agree.  However, in many cases, the major cost of a fast shutdown is
> > getting the dirty data already in the operating system buffers down to
> > disk, not in writing out shared_buffers itself.  The latter is
> > probably a single-digit number of gigabytes, or maybe double-digit.
> > The former might be a lot more, and the write of the pgstat file may
> > back up behind it.  I've seen cases where an 8kB buffered write from
> > Postgres takes tens of seconds to complete because the OS buffer cache
> > is already saturated with dirty data, and the stats files could easily
> > be a lot more than that.
> 
> I think this is mostly FUD, because we don't fsync the stats files.  Maybe
> we should, but we don't today.  So even if we have managed to get the system
> into a state where physical writes are heavily backlogged, that's not a
> reason to assume that the stats collector will be unable to do its thing
> promptly.  All it has to do is push a relatively small amount of data into
> kernel buffers.

I'm sorry for my late reply, yesterday was a national holiday in Japan.

It's not FUD.  I understand you hit the slow stats file write problem during 
some regression test.  You said it took 57 seconds to write the stats file 
during the postmaster shutdown.  That caused pg_ctl stop to fail due to its 60 
second timeout.  Even the regression test environment suffered from the trouble.

Regards
Takayuki Tsunakawa






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-23 Thread Ashutosh Bapat
Unsurprisingly ALTER TABLE worked
alter table t1_p1 add primary key(a);
ALTER TABLE
postgres=# \d+ t1_p1
   Table "public.t1_p1"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
 a  | integer |   | not null | | plain   |  |
 b  | integer |   |  | | plain   |  |
Indexes:
"t1_p1_pkey" PRIMARY KEY, btree (a)
Inherits: t1

On Thu, Nov 24, 2016 at 11:05 AM, Ashutosh Bapat
 wrote:
> I am trying to create a partitioned table with primary keys on the
> partitions. Here's the corresponding syntax as per documentation
> CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [
> IF NOT EXISTS ] table_name
> PARTITION OF parent_table [ (
>   { column_name WITH OPTIONS [ column_constraint [ ... ] ]
> | table_constraint }
> [, ... ]
> ) ] partition_bound_spec
>
> IIUC, it should allow "create table t1_p1 partition of t1 (a primary
> key) ...", (a primary key) is nothing but "column_name
> column_constraint", but here's what happens
> create table t1_p1 partition of t1 (a primary key) for values from (0) to 
> (100);
> ERROR:  syntax error at or near "primary"
> LINE 1: create table t1_p1 partition of t1 (a primary key) for value...
>
> The same syntax also suggests using table_constraints but that too doesn't 
> work
>  create table t1_p1 partition of t1 (primary key (a) )  for values
> from (0) to (100);
> ERROR:  inherited relation "t1" is not a table or foreign table
>
> of course t1 is a table, what it isn't?
>
> Am I missing something? How do I define constraints on the partitions?
>
>
> On Tue, Nov 22, 2016 at 2:45 PM, Amit Langote
>  wrote:
>>
>> Updated patches attached.  I merged what used to be 0006 and 0007 into one.
>>
>> On 2016/11/19 2:23, Robert Haas wrote:
>>> On Fri, Nov 18, 2016 at 5:59 AM, Amit Langote
>>>  wrote:
 Oh but wait, that means I can insert rows with NULLs in the range
 partition key if I choose to insert it directly into the partition,
 whereas I have been thinking all this while that there could never be
 NULLs in the partition key of a range partition.  What's more,
 get_qual_for_partbound() (patch 0003) emits a IS NOT NULL constraint for
 every partition key column in case of a range partition.  Is that
 wrongheaded altogether?  (also see my reply to your earlier message about
 NULLs in the range partition key)
>>>
>>> The easiest thing to do might be to just enforce that all of the
>>> partition key columns have to be not-null when the range-partitioned
>>> table is defined, and reject any attempt to DROP NOT NULL on them
>>> later.  That's probably better that shoehorning it into the table
>>> constraint.
>>
>> Agreed that forcing range partitioning columns to be NOT NULL during table
>> creation would be a better approach.  But then we would have to reject
>> using expressions in the range partition key, right?
>>
 Thanks for the idea below!

> 1. Forget the idea of a tree.  Instead, let the total number of tables
> in the partitioning hierarchy be N and let the number of those that
>>
>> [ ... ]
>>
>
> No recursion, minimal pointer chasing, no linked lists.  The whole
> thing is basically trivial aside from the cost of
> list/range_partition_for_tuple itself; optimizing that is a different
> project.  I might have some details slightly off here, but hopefully
> you can see what I'm going for: you want to keep the computation that
> happens in get_partition_for_tuple() to an absolute minimum, and
> instead set things up in advance so that getting the partition for a
> tuple is FAST.  And you want the data structures that you are using in
> that process to be very compact, hence arrays instead of linked lists.

 This sounds *much* better.  Here is a quick attempt at coding the design
 you have outlined above in the attached latest set of patches.
>>>
>>> That shrank both 0006 and 0007 substantially, and it should be faster,
>>> too.   I bet you can shrink them further:
>>
>> Some changes described below have reduced the size to a certain degree.
>>
>>>
>>> - Why is PartitionKeyExecInfo a separate structure and why does it
>>> have a NodeTag?  I bet you can dump the node tag, merge it into
>>> PartitionDispatch, and save some more code and some more
>>> pointer-chasing.
>>
>> OK, I merged the fields of what used to be PartitionKeyExecInfo into
>> PartitionDispatchData as the latter's new fields key and keystate.
>>
>>> - I still think it's a seriously bad idea for list partitioning and
>>> range partitioning to need different code-paths all over the place
>>> here. List partitions support nulls but not multi-column partitioning
>>> 

Re: [HACKERS] Declarative partitioning - another take

2016-11-23 Thread Ashutosh Bapat
I am trying to create a partitioned table with primary keys on the
partitions. Here's the corresponding syntax as per documentation
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [
IF NOT EXISTS ] table_name
PARTITION OF parent_table [ (
  { column_name WITH OPTIONS [ column_constraint [ ... ] ]
| table_constraint }
[, ... ]
) ] partition_bound_spec

IIUC, it should allow "create table t1_p1 partition of t1 (a primary
key) ...", (a primary key) is nothing but "column_name
column_constraint", but here's what happens
create table t1_p1 partition of t1 (a primary key) for values from (0) to (100);
ERROR:  syntax error at or near "primary"
LINE 1: create table t1_p1 partition of t1 (a primary key) for value...

The same syntax also suggests using table_constraints but that too doesn't work
 create table t1_p1 partition of t1 (primary key (a) )  for values
from (0) to (100);
ERROR:  inherited relation "t1" is not a table or foreign table

of course t1 is a table, what it isn't?

Am I missing something? How do I define constraints on the partitions?


On Tue, Nov 22, 2016 at 2:45 PM, Amit Langote
 wrote:
>
> Updated patches attached.  I merged what used to be 0006 and 0007 into one.
>
> On 2016/11/19 2:23, Robert Haas wrote:
>> On Fri, Nov 18, 2016 at 5:59 AM, Amit Langote
>>  wrote:
>>> Oh but wait, that means I can insert rows with NULLs in the range
>>> partition key if I choose to insert it directly into the partition,
>>> whereas I have been thinking all this while that there could never be
>>> NULLs in the partition key of a range partition.  What's more,
>>> get_qual_for_partbound() (patch 0003) emits a IS NOT NULL constraint for
>>> every partition key column in case of a range partition.  Is that
>>> wrongheaded altogether?  (also see my reply to your earlier message about
>>> NULLs in the range partition key)
>>
>> The easiest thing to do might be to just enforce that all of the
>> partition key columns have to be not-null when the range-partitioned
>> table is defined, and reject any attempt to DROP NOT NULL on them
>> later.  That's probably better that shoehorning it into the table
>> constraint.
>
> Agreed that forcing range partitioning columns to be NOT NULL during table
> creation would be a better approach.  But then we would have to reject
> using expressions in the range partition key, right?
>
>>> Thanks for the idea below!
>>>
 1. Forget the idea of a tree.  Instead, let the total number of tables
 in the partitioning hierarchy be N and let the number of those that
>
> [ ... ]
>

 No recursion, minimal pointer chasing, no linked lists.  The whole
 thing is basically trivial aside from the cost of
 list/range_partition_for_tuple itself; optimizing that is a different
 project.  I might have some details slightly off here, but hopefully
 you can see what I'm going for: you want to keep the computation that
 happens in get_partition_for_tuple() to an absolute minimum, and
 instead set things up in advance so that getting the partition for a
 tuple is FAST.  And you want the data structures that you are using in
 that process to be very compact, hence arrays instead of linked lists.
>>>
>>> This sounds *much* better.  Here is a quick attempt at coding the design
>>> you have outlined above in the attached latest set of patches.
>>
>> That shrank both 0006 and 0007 substantially, and it should be faster,
>> too.   I bet you can shrink them further:
>
> Some changes described below have reduced the size to a certain degree.
>
>>
>> - Why is PartitionKeyExecInfo a separate structure and why does it
>> have a NodeTag?  I bet you can dump the node tag, merge it into
>> PartitionDispatch, and save some more code and some more
>> pointer-chasing.
>
> OK, I merged the fields of what used to be PartitionKeyExecInfo into
> PartitionDispatchData as the latter's new fields key and keystate.
>
>> - I still think it's a seriously bad idea for list partitioning and
>> range partitioning to need different code-paths all over the place
>> here. List partitions support nulls but not multi-column partitioning
>> keys and range partitions support multi-column partitioning keys but
>> not nulls, but you could use an internal structure that supports both.
>> Then you wouldn't need partition_list_values_bsearch and also
>> partition_rbound_bsearch; you could have one kind of bound structure
>> that can be bsearch'd for either list or range.  You might even be
>> able to unify list_partition_for_tuple and range_partition_for_tuple
>> although that looks a little harder.  In either case, you bsearch for
>> the greatest value <= the value you have.  The only difference is that
>> for list partitioning, you have to enforce at the end that it is an
>> equal value, whereas for range partitioning less-than-or-equal-to is
>> enough.  But you should still be able to arrange for more 

Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Pavel Stehule
2016-11-24 5:52 GMT+01:00 Pavel Stehule :

> Hi
>
> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera :
>
>> Oh my, I just noticed we have a new xpath preprocessor in this patch
>> too.  Where did this code come from -- did you write it all from
>> scratch?
>>
>
> I wrote it from scratch - libxml2 has not any API for iteration over XPath
> expression (different than iteration over XPath expression result), and
> what I have info, there will not be any new API in libxml2.
>
> There are two purposes:
>
> Safe manipulation with XPath expression prefixes - ANSI SQL design
> implicitly expects some prefix, but it can be used manually. The prefix
> should not be used twice and in some situations, when it can breaks the
> expression.
>

Implicit prefix for column PATH expressions is "./". An user can use it
explicitly.

In my initial patches, the manipulations with XPath expression was more
complex - now, it can be reduced - but then we lost default namespaces
support, what is nice feature, supported other providers.




>
> Second goal is support default namespaces - when we needed parser for
> first task, then the enhancing for this task was not too much lines more.
>
> This parser can be used for enhancing current XPath function - default
> namespaces are pretty nice, when you have to use namespaces.
>
>
>>
>> --
>> Álvaro Herrerahttps://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Karl O. Pinc
On Wed, 23 Nov 2016 03:21:05 -0600
"Karl O. Pinc"  wrote:

> On Sat, 19 Nov 2016 12:58:47 +0100
> Gilles Darold  wrote:
> 
> > ... attached v14 of the patch.   

> patch_pg_current_logfile-v14.diff.retry_current_logfiles-part3

> Re-try the write of current_logfiles should it fail because the
> system is too busy.

Attached are 2 more doc patchs.

---

patch_pg_current_logfile-v14.diff.pg_ctl_and_more_docs

Applies on top of
patch_pg_current_logfile-v14.diff.retry_current_logfiles-part3

Mentions pg_ctl and how using it might affect whether logfile
paths are captured in the current_logfiles file.  I think
the default RedHat packaging captures logs this way.  The
Debian default is to ship with logging_collector=off
and to rely on pg_ctrl via pg_ctlcluster to manage
log writing.  Both RH and Debian then pass stderr to systemd
and that does log management.  I don't
recall other distros' practice.  If the default distro
packaging means that current_logfiles/pg_current_logfile()
"don't work" this could be an issue to address in the
documentation.  Certainly the systemd reliance on stderr
capture and it's subsuming of logging functionality could
also be an issue.

Cross reference the current_logfiles file in the pg_current_logfile()
docs.

Marks up stderr appropriately.

Adds index entries.

---

patch_pg_current_logfile-v14.diff.doc_linux_default

Applies on top of
patch_pg_current_logfile-v14.diff.pg_ctl_and_more_docs

Mentions, in the body of the docs, defaults and their impact on
current_logfiles and pg_current_logfiles.  It seems appropriate
to mention this in the main documentation and in the overall context
of logging.

Adds index entries.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


patch_pg_current_logfile-v14.diff.pg_ctl_and_more_docs
Description: Binary data


patch_pg_current_logfile-v14.diff.doc_linux_default
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Pavel Stehule
2016-11-24 0:29 GMT+01:00 Alvaro Herrera :

> Here's another version.  Not there yet: need to move back the function
> to create the tupdesc, as discussed.  Not clear what's the best place,
> however.  I modified the grammar a bit (added the missing comma, removed
> PATH as an unreserved keyword and just used IDENT, removed the "Opt"
> version for column options), and reworked the comments in the transform
> phase (I tweaked the code here and there mostly to move things to nicer
> places, but it's pretty much the same code).
>
> In the new xpath_parser.c file I think we should tidy things up a bit.
> First, it needs more commentary on what the entry function actually
> does, in detail.  Also, IMO that function should be at the top of the
> file, not at the bottom, followed by all its helpers.  I would like some
> more clarity on the provenance of all this code, just to assess the
> probability of bugs; mostly as it's completely undocumented.
>
> I don't like the docs either.  I think we should have a complete
> reference to the syntax, followed by examples, rather than letting the
> examples drive the whole thing.  I fixed the synopsis so that it's not
> one very long line.
>
> If you use "PATH '/'" for a column, you get the text for all the entries
> in the whole XML, rather than the text for the particular row being
> processed.  Isn't that rather weird, or to put it differently, completely
> wrong?  I didn't find a way to obtain the whole XML row when you have
> the COLUMNS option (which is what I was hoping for with the "PATH '/'").
>

This is a libxml2 behave

Postprocessing only check result and try to push the result to expected
types.

Regards

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-11-23 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
> Thanks for the clarification, I could reproduce the issue and confirms that
> patch has fixed it.  Find logs of cascading standby at  PG9.2 Head and Patch
> attached (I have truncated few lines at end of server log generated in Head
> as those were repetitive).  I think the way you have directly explained
> the bug steps in code comments is not right (think if we start writing bug
> steps for each bug fix, how the code will look like).  So I have modified
> the comment to explain the situation and reason of check,  see if you find
> that as okay?

Thank you, I'm happy with your comment.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Pavel Stehule
Hi

2016-11-24 0:13 GMT+01:00 Alvaro Herrera :

> Oh my, I just noticed we have a new xpath preprocessor in this patch
> too.  Where did this code come from -- did you write it all from
> scratch?
>

I wrote it from scratch - libxml2 has not any API for iteration over XPath
expression (different than iteration over XPath expression result), and
what I have info, there will not be any new API in libxml2.

There are two purposes:

Safe manipulation with XPath expression prefixes - ANSI SQL design
implicitly expects some prefix, but it can be used manually. The prefix
should not be used twice and in some situations, when it can breaks the
expression.

Second goal is support default namespaces - when we needed parser for first
task, then the enhancing for this task was not too much lines more.

This parser can be used for enhancing current XPath function - default
namespaces are pretty nice, when you have to use namespaces.


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Calculation of param_source_rels in add_paths_to_joinrel

2016-11-23 Thread Ashutosh Bapat
Thanks Tom.

On Thu, Nov 24, 2016 at 2:58 AM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane  wrote:
>>> Ashutosh Bapat  writes:
 Also, the way this code has been written, the declaration of variable
 sjinfo masks the earlier declaration with the same name.
>
>>> Hmm, yeah, that's probably not terribly good coding practice.
>
>> Attached a patch to fix this.
>
> Pushed, sorry about the delay.
>
> regards, tom lane



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-11-23 Thread Amit Kapila
On Tue, Nov 22, 2016 at 8:48 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>
> If the problem occurs, the following pair of lines appear in the server log 
> of the cascading standby.  Could you check it?
>
> LOG:  restored log file "00020003" from archive
> LOG:  out-of-sequence timeline ID 1 (after 2) in log file 0, segment 3, 
> offset 0
>

Thanks for the clarification, I could reproduce the issue and confirms
that patch has fixed it.  Find logs of cascading standby at  PG9.2
Head and Patch attached (I have truncated few lines at end of server
log generated in Head as those were repetitive).  I think the way you
have directly explained the bug steps in code comments is not right
(think if we start writing bug steps for each bug fix, how the code
will look like).  So I have modified the comment to explain the
situation and reason of check,  see if you find that as okay?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


postgresql_Thu_Patch.log
Description: Binary data


postgresql_Thu_Head.log
Description: Binary data


cascading_standby_stuck_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_background contrib module proposal

2016-11-23 Thread amul sul
Hi All,

I would like to take over pg_background patch and repost for
discussion and review.

Initially Robert Haas has share this for parallelism demonstration[1]
and abandoned later with
summary of open issue[2] with this pg_background patch need to be
fixed, most of them seems to be
addressed in core except handling of type exists without binary
send/recv functions and documentation.
I have added handling for types that don't have binary send/recv
functions in the attach patch and will
work on documentation at the end.

One concern with this patch is code duplication with
exec_simple_query(), we could
consider Jim Nasby’s patch[3] to overcome this,  but  certainly we
will end up by complicating
exec_simple_query() to make pg_background happy.

As discussed previously[1] pg_background is a contrib module that lets
you launch
arbitrary command in a background worker.

• VACUUM in background
• Autonomous transaction implementation better than dblink way (i.e.
no separate authentication required).
• Allows to perform task like CREATE INDEX CONCURRENTLY from a
procedural language.

This module comes with following SQL APIs:

• pg_background_launch : This API takes SQL command, which user wants
to execute, and size of queue buffer.
  This function returns the process id of background worker.
• pg_background_result   : This API takes the process id as input
parameter and returns the result of command
  executed thought the background worker.
• pg_background_detach : This API takes the process id and detach the
background process which is waiting for
 user to read its results.


Here's an example of running vacuum and then fetching the results.
Notice that the
notices from the original session are propagated to our session; if an
error had occurred,
it would be re-thrown locally when we try to read the results.

postgres=# create table foo (a int);
CREATE TABLE
postgres=# insert into foo values(generate_series(1,5));
INSERT 0 5

postgres=# select pg_background_launch('vacuum verbose foo');
pg_background_launch
--
  65427
(1 row)

postgres=# select * from pg_background_result(65427) as (x text);
INFO:  vacuuming "public.foo"
INFO:  "foo": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
 x

VACUUM
(1 row)


Thanks to Vibhor kumar, Rushabh Lathia and Robert Haas for feedback.

Please let me know your thoughts, and thanks for reading.

[1]. 
https://www.postgresql.org/message-id/CA%2BTgmoam66dTzCP8N2cRcS6S6dBMFX%2BJMba%2BmDf68H%3DKAkNjPQ%40mail.gmail.com
[2]. 
https://www.postgresql.org/message-id/CA%2BTgmobPiT_3Qgjeh3_v%2B8Cq2nMczkPyAYernF_7_W9a-6T1PA%40mail.gmail.com
[3]. https://www.postgresql.org/message-id/54541779.1010906%40BlueTreble.com

Regards,
Amul


0001-pg_background_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Alvaro Herrera
Alvaro Herrera wrote:

> If you use "PATH '/'" for a column, you get the text for all the entries
> in the whole XML, rather than the text for the particular row being
> processed.  Isn't that rather weird, or to put it differently, completely
> wrong?  I didn't find a way to obtain the whole XML row when you have
> the COLUMNS option (which is what I was hoping for with the "PATH '/'").

Ah, apparently you need to use type XML for that column in order for
this to happen.  Example:

insert into emp values ($$

 
 
 
 John
 Doew
   
 344   

 55000
 

 
 
 Peter
 Panw
 
 216
 905-416-5004
 
 

 
 
 
 Mary
 Jonesw
 
 415
 905-403-6112
 647-504-4546
 64000
 
 

$$);

Note the weird salary_amount value here:

SELECT x.*
FROM emp, 
XMLTABLE ('//depts/dept/employee' passing doc 
 COLUMNS 
i for ordinality,
empIDint PATH '@id',
firstnamevarchar(25) PATH 'name/first' default 'FOOBAR',
lastname VARCHAR(25) PATH 'name/last',
salary xml path 'concat(salary/text(), salary/@currency)' default 'DONT 
KNOW', salary_amount xml path '/' )
   WITH ORDINALITY
   AS X (i, a, b, c)  limit 1;
 i │  a  │  b   │  c   │  salary  │ salary_amount │ ordinality 
───┼─┼──┼──┼──┼───┼
 1 │ 905 │ John │ Doew │ 55000USD │  ↵│  1
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  John↵│ 
   │ │  │  │  │  Doew↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  344 ↵│ 
   │ │  │  │  │  55000   ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  Peter   ↵│ 
   │ │  │  │  │  Panw↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  216 ↵│ 
   │ │  │  │  │  905-416-5004↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  Mary↵│ 
   │ │  │  │  │  Jonesw  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  415 ↵│ 
   │ │  │  │  │  905-403-6112↵│ 
   │ │  │  │  │  647-504-4546↵│ 
   │ │  │  │  │  64000   ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │  ↵│ 
   │ │  │  │  │   │ 
(1 fila)


If you declare salary_amount to be text instead, it doesn't happen anymore.
Apparently if you put it in a namespace, it doesn't hapen either.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Physical append-only tables

2016-11-23 Thread Bruce Momjian
On Mon, Nov 14, 2016 at 08:43:12PM +, Greg Stark wrote:
> That said, I don't think the "maintain clustering a bit better using
> BRIN" is a bad idea. It's just the bit about turning a table
> append-only to deal with update-once data that I think is overreach.

What if we used BRIN to find heap pages where the new row was in the
page BRIN min/max range, and the heap page had free space.  Only if that
fails do we put is somewhere else in the heap.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId

2016-11-23 Thread Craig Ringer
On 24 November 2016 at 02:32, Andres Freund  wrote:
> Hi,
>
> On 2016-11-23 20:58:22 +0800, Craig Ringer wrote:
>> Today I ran into an issue where commit timestamp lookups were failing with
>>
>> ERROR: cannot retrieve commit timestamp for transaction 2
>>
>> which is of course FrozenTransactionId.
>>
>> TransactionIdGetCommitTsData(...) ERRORs on !TransactionIdIsNormal(),
>> which I think is wrong. Attached is a patch to make it return 0 for
>> FrozenTransactionId and BootstrapTransactionId, like it does for xids
>> that are too old.
>
> Why? It seems quite correct to not allow lookups for special case
> values, as it seems sensible to give them special treatmeant at the call
> site?

It's surprising behaviour that doesn't make sense. Look at it this way:

- We do some work, generating rows that have commit timestamps
- TransactionIdGetCommitTsData() on those rows returns their cts fine
- The commit timestamp data ages out
- TransactionIdGetCommitTsData() returns 0 on these rows
- vacuum comes alone and freezes the rows, even though nothing's changed
- TransactionIdGetCommitTsData() suddenly ERRORs

Nothing has meaningfully changed on these rows. They have gone from
"old, committed, past the commit timestamp threshold" to "old,
commited, past the commit timestamp threshold, frozen".

It makes no sense to ERROR when vacuum gets around to freezing the
tuples, when we don't also ERROR when we pass the cts threshold.

ERRORing on BootstrapTransactionId is slightly more reasonable since
those rows can never have had a cts in the first place, but it's also
unnecessary since they're effectively "oldest always-committed xids".

Making it ERROR on FrozenTransactionId was a mistake and should be corrected.

>> IMO this should be back-patched to 9.6 and, without the TAP test part,
>> to 9.5.
>
> Why would we want to backpatch a behaviour change, where arguments for
> the current and proposed behaviour exists?

I don't think it's crucial since callers can just work around it, but
IMO the current behaviour is a design oversight that should be
corrected and can be safely and sensibly corrected. Nobody's going to
rely on FrozenTransactionId ERRORing.

I don't think a backpatch is crucial though; as you note, C-level
callers can work around the problem pretty simply, and that's just
what I've done in pglogical for existing versions. I just think it's
ugly, should be fixed, and is safe to fix.

It's slightly harder for SQL-level callers to work around since they
must hardcode a CASE that tests for xmin = XID '1' OR xmin = XID '2',
and it's much less reasonable to expect SQL level callers to deal with
this sort of mess with low level state.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Andrew Dunstan



On 11/23/2016 06:31 PM, Alvaro Herrera wrote:

Sorry, here's the patch.  Power loss distracted me here.

By the way, the pgindent you did is slightly confused because you failed
to add the new struct types you define to typedefs.list.



Tips on how to use pgindent for developers:



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Alvaro Herrera
Here's another version.  Not there yet: need to move back the function
to create the tupdesc, as discussed.  Not clear what's the best place,
however.  I modified the grammar a bit (added the missing comma, removed
PATH as an unreserved keyword and just used IDENT, removed the "Opt"
version for column options), and reworked the comments in the transform
phase (I tweaked the code here and there mostly to move things to nicer
places, but it's pretty much the same code).

In the new xpath_parser.c file I think we should tidy things up a bit.
First, it needs more commentary on what the entry function actually
does, in detail.  Also, IMO that function should be at the top of the
file, not at the bottom, followed by all its helpers.  I would like some
more clarity on the provenance of all this code, just to assess the
probability of bugs; mostly as it's completely undocumented.

I don't like the docs either.  I think we should have a complete
reference to the syntax, followed by examples, rather than letting the
examples drive the whole thing.  I fixed the synopsis so that it's not
one very long line.

If you use "PATH '/'" for a column, you get the text for all the entries
in the whole XML, rather than the text for the particular row being
processed.  Isn't that rather weird, or to put it differently, completely
wrong?  I didn't find a way to obtain the whole XML row when you have
the COLUMNS option (which is what I was hoping for with the "PATH '/'").

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Alvaro Herrera
Oh my, I just noticed we have a new xpath preprocessor in this patch
too.  Where did this code come from -- did you write it all from
scratch?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-11-23 Thread Thomas Munro
On Fri, Nov 18, 2016 at 12:17 PM, Peter Geoghegan  wrote:
> So, the Btree you show is supposed to be good? You've left it up to me
> to draw T2, the corrupt version? If so, here is a drawing of T2 for
> the sake of clarity:
>
>   Btree   =[a|b]
>/   \
>   / \
>  /   \
>[a]---[b]
> | |
> | |
>[c]---[d]

Actually I meant that at T2 the btree is exactly same as it was at T1,
but the order of the values has changed according to the comparator
(for example, because a collation definition changed when you updated
your operating system), so that the btree is now corrupted.  Based on
Goetz Graefe's paper I was wondering if amcheck would be unable to
detect this due to the cousin problem.

>> Now c and b have been switched around.  Won't amcheck still pass since
>> we only verify immediate downlinks and siblings?  Yet searches for b
>> will take a wrong turn at the root.  Do I have this right?  I wonder
>> if there is a way to verify that each page's high key < the 'next' key
>> for all ancestors.
>
> I don't think you have it right, because your argument is predicated
> on classic L, not the Postgres variant. The high key is literally
> guaranteed to be *equal* to the initial first value of the right half
> of a split page post-split (at least initially), which is where the
> new downlink for the new right half comes from in turn, so it will be
> exactly equal as well. There is a comment which says all this within
> _bt_insert_parent(), actually.
>
> In general, I think it's pretty bad that we're so loose about
> representing the key space in downlinks compared to classic L,
> because it probably makes index bloat a lot worse in internal pages,
> messing up the key space long term. As long as we're doing that,
> though, we clearly cannot usefully "verify that each page's high key <
> the 'next' key for all ancestors", because we deviate from L in a
> way that makes that check always fail. It's okay that it always fails
> for us (it doesn't actually indicate corruption), because of our
> duplicate-aware index scan logic (change to invariant).
>
> In summary, I'm pretty sure that our "Ki <= v <= Ki+1" thing makes the
> cousin problem a non-issue, because we don't follow downlinks that are
> equal to our scankey -- we go to their immediate left and start
> *there*, in order to support duplicates in leaf pages. Index scans for
> 'b' work for both T1 and T2 in Postgres, because we refuse to follow
> the 'b' downlinks when doing an index scan for 'b' (relevant to T2). I
> would actually like to change things to make the invariant the classic
> L "Ki < v <= Ki+1", to avoid bloat in the internal pages and to make
> suffix truncation in internal pages work.
>
> So, we don't have the cousin problem, but since I wish for us to adopt
> the stricter classic L invariant at some point, you could say that I
> also wished that we had the cousin problem.

Thank you for your patient explanation!  Please see my humble attempt
to test this scenario and learn something about btrees in the process,
attached.  amcheck does detect the violation of the high key
invariant.

> Confused yet? :-)

Yes.  But it's getting better slowly :-)

-- 
Thomas Munro
http://www.enterprisedb.com
create extension if not exists amcheck;
create extension if not exists pageinspect;

drop table if exists my_table;
drop type if exists my_type;

-- define an enum ordered a, b, c, d, but do it in a funky way that results in
-- odd OIDs that we can later mess around with
create type my_type as enum ('d');
alter type my_type add value 'c' before 'd';
alter type my_type add value 'b' before 'c';
alter type my_type add value 'a' before 'b';

create table my_table (my_column my_type, counter serial);
create index my_index on my_table(my_column);

create or replace function pretty_print_btree_recurse(pageno int,
  height int,
  level int)
returns void language plpgsql as
$$
declare
  v_spaces text := rpad('', (height - level) * 2, ' ');
  v_ctid text;
  v_subpageno int;
  v_key_hex text;
  v_key_oid oid;
  v_key_label text;
  v_page_description text;
  v_item int;
begin
  if height = level then
v_page_description := 'root';
  elsif level = 0 then
v_page_description := 'leaf';
  else
v_page_description := 'intl';
  end if;
  for v_ctid, v_key_hex, v_item in
  select ctid::text, data, itemoffset
  from bt_page_items('my_index', pageno)
  loop
if v_ctid like '(4294967295,%' then
  continue;
end if;
v_subpageno := (v_ctid::point)[0];
v_key_oid := ('x' || substring(v_key_hex, 1, 2))::bit(8)::int +
 (('x' || substring(v_key_hex, 4, 2))::bit(8)::int << 8);
v_key_label := coalesce((select enumlabel from pg_enum where oid = 

Re: [HACKERS] UNDO and in-place update

2016-11-23 Thread Thomas Munro
On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan  wrote:
> * Our behavior with many duplicates in secondary indexes is pretty bad
> in general, I suspect.

>From the pie-in-the-sky department:  I believe there are
snapshot-based systems that don't ever have more than one entry for a
logical row in a btree.  Instead, they do UNDO-log based snapshot
reconstruction for btrees too, generalising what is being discussed
for heap tables in this thread.  That means that whenever you descend
a btree, at each page you have to be prepared to go and look in the
UNDO log if necessary on a page-by-page basis.  Alternatively, some
systems use a shadow table rather than an UNDO log for the old
entries, but still privilege queries that need the latest version by
keeping only those in the btree.  I don't know the details but suspect
that both of those approaches might require CSN (= LSN?) based
snapshots, so that you can make page-level visibility determinations
while traversing the 'current' btree based on a page header.  That
would reduce both kinds of btree bloat: the primary kind of being
entries for dead tuples that still exist in the heap, and the
secondary kind being the resultant permanent btree fragmentation that
remains even after vacuum removes them.  Presumably once you have all
that, you can allow index-only-scans without a visibility map.  Also,
I suspect that such a "3D time travelling" btree would be a
requirement for index ordered tables in a snapshot-based RDBMS.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP FUNCTION of multiple functions

2016-11-23 Thread Tom Lane
Alvaro Herrera  writes:
> Peter Eisentraut wrote:
>> Here is a patch series that implements several changes in the internal
>> grammar and node representation of function signatures.  They are not
>> necessarily meant to be applied separately, but they explain the
>> progression of the changes nicely, so I left them like that for review.

> I think patches 1-4 and 6 are pretty clearly straight cleanup and it
> seems fine to me to push them ahead of the rest.  I didn't review super
> carefully but it looks good on a quick glance.  I'd just advise to pay
> special attention to the objectaddress.sql test and the UI involved
> there (which you appear to have done already).

I looked at this briefly.  I agree that 0001-0003 are simple cleanup of
the grammar and could be pushed without further ado.  However, starting
with 0004 I begin to get queasy.  The plan seems to be that instead of
"objname" always being a List of strings, for functions (and then
aggregates and operators) it will be a one-element List of some new struct
that has then got a name list inside it.  This seems seriously bug-prone.
It leads to code with random changes of data representation at seemingly
random places, like this bit from 0005:
 
+   if (stmt->removeType == OBJECT_AGGREGATE || stmt->removeType == 
OBJECT_FUNCTION)
+   objname = list_make1(objname);

I've got no confidence that you've found all the places that will be
broken by that, nor that it won't introduce new bugs later.

Also, it seems like the end goal ought to involve getting rid of the
separate objarg/objargs fields in places like RenameStmt, as well
as the separate objargs argument to get_object_address and friends,
but the patch mostly doesn't do that --- 0006 does change
CreateOpClassItem, and 0007 changes AlterOperatorStmt, but that seems
like just the tip of the iceberg.

I wonder whether it would be useful to invent an ObjectName node type

typedef struct ObjectName
{
NodeTag type;
List *names;   /* list of String, representing 
possibly-qualified name */
List *args;/* list of TypeName, if needed for object type */
} ObjectName;

and replace all the places that currently have separate objname/objargs
fields or arguments with a uniform "ObjectName *" field/argument.  (This
node type could replace FuncWithArgs, obviously.)  Although that would
be more invasive, you could be certain that you'd looked at every place
that's affected by the change of representation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Calculation of param_source_rels in add_paths_to_joinrel

2016-11-23 Thread Tom Lane
Ashutosh Bapat  writes:
> On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane  wrote:
>> Ashutosh Bapat  writes:
>>> Also, the way this code has been written, the declaration of variable
>>> sjinfo masks the earlier declaration with the same name.

>> Hmm, yeah, that's probably not terribly good coding practice.

> Attached a patch to fix this.

Pushed, sorry about the delay.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DISTINCT with btree skip scan

2016-11-23 Thread Thomas Munro
On Wed, Oct 12, 2016 at 4:19 PM, Thomas Munro
 wrote:
> Here it is, now split into two parts: one patch
> to add an amskip operation, and another to consider using it to
> implement DISTINCT when it was already has an index only scan path on
> an index that supports skipping.

Those patches add an amskip operation and then teach the planner and
executor how to do this, given a table foo (a, b) with an index on (a)
or (a, b):

  SELECT DISTINCT foo FROM a

  Index Only Scan for distinct values of (a) using foo_pkey on foo

In just the right circumstances that is vastly faster than scanning
every tuple and uniquifying.  I was thinking that the next step in
this experiment might be to introduce a kind of plan that can handle
queries where the index's leading column is not in the WHERE clause,
building on the above plan, and behaving conceptually a little bit
like a Nested Loop, like so:

  SELECT * FROM foo WHERE b = 42

  Index Skip Scan
->  Index Only Scan for distinct values of (a) using foo_pkey on foo
->  Index Only Scan using foo_pkey on foo
  Index Cond: ((a = outer.a) AND (b = 42))

I suppose you could instead have a single node that knows how to
perform the whole scan itself so that you don't finish up searching
for each distinct value in both the inner and outer subplan, but it
occurred to me that building the plan out of Lego bricks like this
might generalise better to more complicated queries.  I suppose it
might also parallelise nicely with a Parallel Index Only Scan as the
outer plan.

Maybe something similar is possible for handling "GROUP BY a".

Worth pursuing?  Does amskip suck?  Does anyone have better ideas,
either for how to do the low level skip or the higher level Index Skip
Scan, or perhaps a completely different way of looking at this?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Google Cloud Compute + FreeBSD + PostgreSQL = timecounter issue

2016-11-23 Thread Maeldron T.

On 23.11.16 20:43, Maeldron T. wrote:



pg_test_timing doesn’t show the problem, or I read the output wrong.


Or it does. I checked another output than the one I attached at the end.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Google Cloud Compute + FreeBSD + PostgreSQL = timecounter issue

2016-11-23 Thread Maeldron T.

In short:

The available timecounters on Google Compute Instances seem to be random.

The setting in the official FreeBSD image is wrong (not available on any 
of my test instances). FreeBSD will pick up a timecounter at random.


When either the TSC or the TSC-low counter is used, explain analyze 
behaves normally. The system clock will be wrong with a few seconds in 
each minute. ntpd won’t (and shouldn’t) fix that. Daemons panic. Time 
travel gets real.


When ACPI-fast is used, the system clock stays normal. However, an 
"explain analyze select count(1) from table" will run for 3ms 
instead of 300ms.


pg_test_timing doesn’t show the problem, or I read the output wrong.


In long:

https://forums.freebsd.org/threads/58666/


Notes:

$ pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 6346.80 nsec
Histogram of timing durations:
< usec   % of total  count
 1  0.0  0
 2  0.0  0
 4  0.0  0
 8 96.37600 42
16  2.26939  10727
32  0.62727   2965
64  0.08801416
   128  0.56634   2677
   256  0.04824228
   512  0.01523 72
  1024  0.00508 24
  2048  0.00275 13
  4096  0.00085  4
  8192  0.00042  2
 16384  0.0  0
 32768  0.00021  1
 65536  0.0  0
131072  0.00021  1



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId

2016-11-23 Thread Andres Freund
Hi,

On 2016-11-23 20:58:22 +0800, Craig Ringer wrote:
> Today I ran into an issue where commit timestamp lookups were failing with
>
> ERROR: cannot retrieve commit timestamp for transaction 2
>
> which is of course FrozenTransactionId.
>
> TransactionIdGetCommitTsData(...) ERRORs on !TransactionIdIsNormal(),
> which I think is wrong. Attached is a patch to make it return 0 for
> FrozenTransactionId and BootstrapTransactionId, like it does for xids
> that are too old.

Why? It seems quite correct to not allow lookups for special case
values, as it seems sensible to give them special treatmeant at the call
site?


> IMO this should be back-patched to 9.6 and, without the TAP test part,
> to 9.5.

Why would we want to backpatch a behaviour change, where arguments for
the current and proposed behaviour exists?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer

2016-11-23 Thread Jeff Janes
On Mon, Nov 21, 2016 at 6:32 AM, Vladimir Svedov  wrote:

> Hi,
> I have this question. Looked for a help on http://dba.stackexchange.com/
> No success.
> Maybe you can answer?..
> Thank you in advance
>
>
> "FOREIGN_TABLE" created with postgres_fdw. LOCAL_TABLE is just a local
> table...
>
> Symptoms:
>
>1. I run in psql query SELECT * from FOREIGN_TABLE. No log generated
>2. I run in bash psql -c "SELECT * from LOCAL_TABLE". No log generated
>3. I run in bash psql -c "SELECT * from FOREIGN_TABLE". ::LOG: could
>not receive data from client: Connection reset by peer generated in
>postgres log
>
>
Which server log file is this generated in, the local or the foreign?
Whichever it is, is there an entry in the logfile for the other server
which seems to match up to this one?  That may have more useful details.

Cheers,

Jeff


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Karl O. Pinc
On Wed, 23 Nov 2016 10:39:28 -0500
Tom Lane  wrote:

> Robert Haas  writes:
> > On Wed, Nov 23, 2016 at 4:21 AM, Karl O. Pinc 
> > wrote:  
> >> The bool types on the stack in logfile_rotate() are
> >> my work.  Bools on the stack don't make sense as far
> >> as hardware goes, so the compiler's optimizer should change
> >> them to int.  This patch changes the bools to ints
> >> should that be to someone's taste.  
> 
> > That does not seem like a good idea from here.  Even if it buys some
> > microscopic speedup, in a place like this it won't matter.  It's
> > more important that the code be clear, and using an int where you
> > really intend a bool isn't an improvement as far as clarity goes.  
> 
> Not to mention that the "bools on the stack don't make sense" premise
> is bogus anyway.

Thanks.  I don't think I've paid attention since 8088 days, just always
used bools.  We'll drop that patch then.

But this reminds me.  I should have used a bool return value.

Attached is a version 2 of
patch_pg_current_logfile-v14.diff.deletion-v2


Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


patch_pg_current_logfile-v14.diff.deletion-v2
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Alvaro Herrera
Alvaro Herrera wrote:

> I'm still wondering how this works.  I'll continue to explore the patch
> in order to figure out.

Ah, so it's already parsed as a "range function".  That part looks good
to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer

2016-11-23 Thread Robert Haas
On Wed, Nov 23, 2016 at 3:08 AM, Vladimir Svedov  wrote:
> No, I select from it OK.
> The bug(?) is that when I do it in oppened psql session it produces no log,
> and when I run same select as psql -c "SELECT..." it gives the above

OK, that's pretty weird.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-23 Thread Alvaro Herrera
I tried to see if a following RTE was able to "see" the entries created by
XMLTABLE, and sure it can:

SELECT X.*, generate_series 
FROM emp, 
XMLTABLE ('//depts/dept/employee' passing doc 
   COLUMNS 
   empIDINTEGER PATH '@id',
   firstnamevarchar(25) PATH 'name/first',
   lastname VARCHAR(25) PATH 'name/last') AS X,
generate_series(900, empid);

 empid │ firstname │ lastname │ generate_series 
───┼───┼──┼─
   901 │ John  │ Doe  │ 900
   901 │ John  │ Doe  │ 901
   902 │ Peter │ Pan  │ 900
   902 │ Peter │ Pan  │ 901
   902 │ Peter │ Pan  │ 902
   903 │ Mary  │ Jones│ 900
   903 │ Mary  │ Jones│ 901
   903 │ Mary  │ Jones│ 902
   903 │ Mary  │ Jones│ 903
(9 filas)

Cool.

I'm still wondering how this works.  I'll continue to explore the patch
in order to figure out.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-23 Thread Catalin Iacob
On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki
 wrote:
> transaction_read_only=on does not mean the standby.  As the manual article on 
> hot standby says, they are different.

> I'm afraid that if the current patch is committed, you will lose interest in 
> the ideal solution.

I agree with Robert that lacking the ideal solution shouldn't block a
good enough solution today. But I find the current patch as it stands
too confusing to pass the "good enough" bar.

> Then if the current patch is out as v10, there would be a concern about 
> incompatibility when we pursue the ideal solution in a later release.  That 
> is, "should we continue to report > that this server is standby even if it's 
> actually a primary with transaction_read_only is on, to maintain 
> compatibility with the older release."

Agreed. I am worried that this will end up being a wart:
target_server_type=primary doesn't really mean primary it means that
by default the session is not read only which by the way is usually
the case for primaries but that can be changed.

> If you want to connect to a server where the transaction is read-only, then 
> shouldn't the connection parameter be something like 
> "target_session_attrs=readonly"?  That represents exactly what the code does.

FWIW I find this to be a reasonable compromise. To keep the analogy
with the current patch it would be more something like
"target_session_attrs=read_write|any" and the docs could point out "by
the way, if you didn't tweak default_transaction_read_only read_write
will only pick a primary so this can be used for failover".
The functionality is the same but changing the name removes the
ambiguity in the semantics and leaves the door open for a future real
primary/replica functionality which includes logical replication with
whatever semantics are deemed appropriate then.

We do lose pgjdbc compatibility but the current patch doesn't have
that 100% anyway: in pgjdbc it's called targetServerType and it
accepts master instead of primary and it also accepts slave and
preferSlave
I notice at 
https://github.com/pgjdbc/www/blob/master/documentation/head/connect.md
that pgjdbc says "The master/slave distinction is currently done by
observing if the server allows writes" which, due to the "currently"
part seems to acknowledge this is not ideal and leaves the door open
for changes in semantics but I don't think changing semantics is going
to fly for Postgres itself.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in comment in file analyze.c [master branch]

2016-11-23 Thread Tom Lane
Kostiantyn Nemchenko  writes:
> I've found a typo in a comment in function do_analyze_rel() in
> file analyze.c [line 348, master branch].

No, "iff" is intentional there.  It means "if and only if".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Gilles Darold
Le 23/11/2016 à 16:26, Tom Lane a écrit :
> "Karl O. Pinc"  writes:
>> Maybe on the 2nd call to strtok() you could pass ""
>> as the 2nd argument?  That'd be a little wonky but
>> the man page does not say you can't have an empty
>> set of delimiters.  On the other hand strtok() is
>> not a perfect choice, you don't want to "collapse"
>> adjacent delimiters in the parsed string or ignore
>> leading spaces.  I'd prefer a strstr() solution.
> I'd stay away from strtok() no matter what.  The process-wide static
> state it implies is dangerous: if you use it, you're betting that
> you aren't interrupting some caller's use, nor will any callee decide
> to use it.  In a system as large as Postgres, that's a bad bet, or
> would be if we didn't discourage use of strtok() pretty hard.
>
> As far as I can find, there are exactly two users of strtok() in
> the backend, and they're already playing with fire because one
> calls the other (look in utils/misc/tzparser.c).  I don't want the
> hazard to get any larger.
>
>   regards, tom lane

Understood, I will remove and replace this call from the patch and more
generally from my programming.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in comment in file analyze.c [master branch]

2016-11-23 Thread Kostiantyn Nemchenko
I've found a typo in a comment in function do_analyze_rel() in
file analyze.c [line 348, master branch].

Attached a patch.


minor_typo_analyze_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-23 Thread Tom Lane
Tomas Vondra  writes:
> This would also reduce the amount of data that we need to write to WAL, 
> although I'm not sure the amount is actually a problem. I've seen 
> instances with ~500MB stat files, but those were instances with hundreds 
> of databases, each with many thousands of objects.

Hm, was this before we split up the stats file per-database?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Tom Lane
Robert Haas  writes:
> On Wed, Nov 23, 2016 at 4:21 AM, Karl O. Pinc  wrote:
>> The bool types on the stack in logfile_rotate() are
>> my work.  Bools on the stack don't make sense as far
>> as hardware goes, so the compiler's optimizer should change
>> them to int.  This patch changes the bools to ints
>> should that be to someone's taste.

> That does not seem like a good idea from here.  Even if it buys some
> microscopic speedup, in a place like this it won't matter.  It's more
> important that the code be clear, and using an int where you really
> intend a bool isn't an improvement as far as clarity goes.

Not to mention that the "bools on the stack don't make sense" premise
is bogus anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Robert Haas
On Wed, Nov 23, 2016 at 4:21 AM, Karl O. Pinc  wrote:
> patch_pg_current_logfile-v14.diff.bool_to_int
>
> Do not include in "main" patch, submit to maintainers separately.
>
> Applies on top of patch_pg_current_logfile-v14.diff
>
> The bool types on the stack in logfile_rotate() are
> my work.  Bools on the stack don't make sense as far
> as hardware goes, so the compiler's optimizer should change
> them to int.  This patch changes the bools to ints
> should that be to someone's taste.

That does not seem like a good idea from here.  Even if it buys some
microscopic speedup, in a place like this it won't matter.  It's more
important that the code be clear, and using an int where you really
intend a bool isn't an improvement as far as clarity goes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Tom Lane
"Karl O. Pinc"  writes:
> Maybe on the 2nd call to strtok() you could pass ""
> as the 2nd argument?  That'd be a little wonky but
> the man page does not say you can't have an empty
> set of delimiters.  On the other hand strtok() is
> not a perfect choice, you don't want to "collapse"
> adjacent delimiters in the parsed string or ignore
> leading spaces.  I'd prefer a strstr() solution.

I'd stay away from strtok() no matter what.  The process-wide static
state it implies is dangerous: if you use it, you're betting that
you aren't interrupting some caller's use, nor will any callee decide
to use it.  In a system as large as Postgres, that's a bad bet, or
would be if we didn't discourage use of strtok() pretty hard.

As far as I can find, there are exactly two users of strtok() in
the backend, and they're already playing with fire because one
calls the other (look in utils/misc/tzparser.c).  I don't want the
hazard to get any larger.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-23 Thread Tomas Vondra

On 11/23/2016 12:24 PM, Michael Paquier wrote:

On Wed, Nov 23, 2016 at 6:15 PM, Magnus Hagander  wrote:

There's also the consideration of what to do with stats *on the standby*. If
we WAL log the stats file, then when it replays onthe standby, the stats
there will be overwritten. And stats like number of index vs seq scans on
the standby are still interesting and would be lost.


Perhaps it would make sense to separate the stat files by type then?
The action taken for each file depends on its type.



That seems reasonable. There are two types of stats - usage statistics 
(number of index scans etc.), which we probably don't need on standby, 
and statistics that we use to drive autovacuum.


This would also reduce the amount of data that we need to write to WAL, 
although I'm not sure the amount is actually a problem. I've seen 
instances with ~500MB stat files, but those were instances with hundreds 
of databases, each with many thousands of objects.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNDO and in-place update

2016-11-23 Thread Robert Haas
On Tue, Nov 22, 2016 at 11:18 PM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane  wrote:
>>> Oracle spends a lot of time on this, and it's really cache-inefficient
>>> because the data is spread all over.  This was what this guy felt in
>>> circa 2001; I'd have to think that the cache unfriendliness problem is
>>> much worse for modern hardware.
>
>> I imagine that temporal locality helps a lot. Most snapshots will be
>> interested in the same row version.
>
> Yeah, but the problem is that all those transactions have to scavenge
> through a big chunk of UNDO log to find out what they need to know.

In my design, there's one UNDO log per transaction, so if a page has
been recently modified only by a single transaction whose effects are
visible to your snapshot, you can very quickly decide that you don't
need to look at UNDO at all.  If it's been recently modified by
multiple transactions all of which are visible to your snapshot, you
can consult the TPD and then decide that none of those transactions
are interesting and therefore their UNDO doesn't need to be examined.

When there are transactions that whose effects are not visible to your
snapshot, the cost is proportional to the number of times those
transactions have modified the page.  Therefore, the "bad" case for
this design is when the same page gets updated a large number of times
by one or more concurrent transactions.  If the transaction you can't
see has made 437 separate modifications to the page, you're going to
need to look at all 437 UNDO records to figure out what you can see.
That is likely to suck.

I think that one of the crucial questions for this design is how often
that will happen.  For people with mostly static data, it should be
rare.  In something like a pgbench workload, it boils down to how
often multiple transactions hit the same page before the previous
transactions that hit that page become all-visible.  Of course, if you
have too many backends relative to the scale factor, pgbench is going
to become lock-bound anyway and it probably won't matter much, but
it's about two orders of magnitude easier to hit the same page than to
hit the same tuple, so it could be that the point where you start
incurring significant costs for visibility checking and page
reconstruction happens well before the tuple lock contention becomes
noticeable.  It might be worth doing some mathematical modeling to try
to estimate how serious that effect is likely to be.

There are also some things that you can do to mitigate this, although
they add even more complexity.  For example:

- If the same transaction touches the same page repeatedly, you could
make a rule that every 5th or every 10th or every 20th modification to
the same page within the same transaction emits a larger summary
record into the UNDO that is a consolidated record of all changes made
to the page by that transaction thus far.  That penalizes writers, but
helps readers; when a reader reaches a summary record, it need look no
further.

- Peter alluded to the possibility - or so I thought - of caching
reconstructed page versions.  That would make a lot of sense for
sequential scans or cases where the same page is being accessed
repeatedly.

- If you're doing an index scan and you only care about one tuple, the
page could contain one bit per tuple which is set if the tuple has
been modified by any transaction the page views as recent.  If the bit
is clear for the one tuple you care about, you can ignore the UNDO
even if is for a transaction not visible to your snapshot, because you
know it didn't touch that tuple.

- If you create a TPD entry for the page because there are multiple
recent transactions that have modified it, you can put additional
information into the TPD entry to speed up access.  For example, you
could store an array indexed by itemid which indicates which of the
recent transactions have touched any given itemid.  This is mostly
useful for index scans, which can then ignore all the transactions
that don't pertain to the one TID they care about.

Which of those ideas are best?  Are they enough, even if we do them
all?  Will there be other down sides?  I don't know.  But I think it's
clear that bloat is an ongoing and serious concern for our users (cf.
snapshot too old), and that vacuuming is too (cf. freeze map) and I
believe we are reaching a point where we can't make much further
improvement in those areas without some kind of major architectural
change.  WARM might qualify; this idea certainly does.  I also believe
that we NEED to improve further.  At least for EnterpriseDB, the
problems I'm trying to address through this design are big problems.
If this design isn't good, let's come up with something else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Karl O. Pinc
On Wed, 23 Nov 2016 03:21:05 -0600
"Karl O. Pinc"  wrote:

> But also, you can't use strtok() to parse lbuffer because
> the path you're returning can contain a space. 

Maybe on the 2nd call to strtok() you could pass ""
as the 2nd argument?  That'd be a little wonky but
the man page does not say you can't have an empty
set of delimiters.  On the other hand strtok() is
not a perfect choice, you don't want to "collapse"
adjacent delimiters in the parsed string or ignore
leading spaces.  I'd prefer a strstr() solution.

Or strchr() even better.

Regards.

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2016-11-23 Thread Craig Ringer
On 23 November 2016 at 03:55, Robert Haas  wrote:
> On Tue, Nov 22, 2016 at 1:49 AM, Craig Ringer  wrote:
>> On 22 November 2016 at 10:20, Craig Ringer  wrote:
>>> I'm currently looking at making detection of replay conflict with a
>>> slot work by separating the current catalog_xmin into two effective
>>> parts - the catalog_xmin currently needed by any known slots
>>> (ProcArray->replication_slot_catalog_xmin, as now), and the oldest
>>> actually valid catalog_xmin where we know we haven't removed anything
>>> yet.
>>
>> OK, more detailed plan.
>>
>> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
>> are already held down by ProcArray's catalog_xmin. But that doesn't
>> mean we haven't removed newer tuples from specific relations and
>> logged that in xl_heap_clean, etc, including catalogs or user
>> catalogs, it only means the clog still exists for those XIDs.
>
> Really?

(Note the double negative above).

Yes, necessarily so. You can't look up xids older than the clog
truncation threshold at oldestXid, per our discussion on txid_status()
and traceable commit. But the tuples from that xact aren't guaranteed
to exist in any given relation; vacuum uses vacuum_set_xid_limits(...)
which calls GetOldestXmin(...); that in turn scans ProcArray to find
the oldest xid any running xact cares about. It might bump it down
further if there's a replication slot requirement or based on
vacuum_defer_cleanup_age, but it doesn't care in the slightest about
oldestXmin.

oldestXmin cannot advance until vacuum has removed all tuples for that
xid and advanced the database's datfrozenxid. But a given oldestXmin
says nothing about which tuples, catalog or otherwise, still exist and
are acessible.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId

2016-11-23 Thread Craig Ringer
On 23 November 2016 at 20:58, Craig Ringer  wrote:
> Hi all
>
> Today I ran into an issue where commit timestamp lookups were failing with
>
> ERROR: cannot retrieve commit timestamp for transaction 2
>
> which is of course FrozenTransactionId.
>
> TransactionIdGetCommitTsData(...) ERRORs on !TransactionIdIsNormal(),
> which I think is wrong. Attached is a patch to make it return 0 for
> FrozenTransactionId and BootstrapTransactionId, like it does for xids
> that are too old.
>
> Note that the prior behaviour was as designed and has tests to enforce
> it. I just think it's wrong, and it's also not documented.
>
> IMO this should be back-patched to 9.6 and, without the TAP test part, to 9.5.

Updated to correct the other expected file, since there's an alternate.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 52081e9aa91102022d6e853d4e2217308bb230a9 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 23 Nov 2016 19:50:50 +0800
Subject: [PATCH] Treat frozen and bootstrap xids as old, not invalid, for
 committs

TransactionIdGetCommitTsData() ERROR'd on FrozenTransactionId and
BootstrapTransactionId, which is a surprising outcome for callers
given that it usually returns 0 for xids too old for their commit
timestamp to still be known.

Callers hitting this problem would get an error like:

ERROR: cannot retrieve commit timestamp for transaction 2

Fix it so the frozen and bootstrap XIDs return 0 like other too-old XIDs.
---
 src/backend/access/transam/commit_ts.c   | 11 +--
 src/test/modules/commit_ts/expected/commit_timestamp.out | 12 ++--
 src/test/modules/commit_ts/t/004_restart.pl  | 14 --
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 7746578..a5b270c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -289,11 +289,18 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	TransactionId oldestCommitTsXid;
 	TransactionId newestCommitTsXid;
 
-	/* error if the given Xid doesn't normally commit */
-	if (!TransactionIdIsNormal(xid))
+	if (!TransactionIdIsValid(xid))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		errmsg("cannot retrieve commit timestamp for transaction %u", xid)));
+	else if (!TransactionIdIsNormal(xid))
+	{
+		/* frozen and bootstrap xids are always committed far in the past */
+		*ts = 0;
+		if (nodeid)
+			*nodeid = 0;
+		return false;
+	}
 
 	LWLockAcquire(CommitTsLock, LW_SHARED);
 
diff --git a/src/test/modules/commit_ts/expected/commit_timestamp.out b/src/test/modules/commit_ts/expected/commit_timestamp.out
index 99f3322..5b7783b 100644
--- a/src/test/modules/commit_ts/expected/commit_timestamp.out
+++ b/src/test/modules/commit_ts/expected/commit_timestamp.out
@@ -28,9 +28,17 @@ DROP TABLE committs_test;
 SELECT pg_xact_commit_timestamp('0'::xid);
 ERROR:  cannot retrieve commit timestamp for transaction 0
 SELECT pg_xact_commit_timestamp('1'::xid);
-ERROR:  cannot retrieve commit timestamp for transaction 1
+ pg_xact_commit_timestamp 
+--
+ 
+(1 row)
+
 SELECT pg_xact_commit_timestamp('2'::xid);
-ERROR:  cannot retrieve commit timestamp for transaction 2
+ pg_xact_commit_timestamp 
+--
+ 
+(1 row)
+
 SELECT x.xid::text::bigint > 0, x.timestamp > '-infinity'::timestamptz, x.timestamp <= now() FROM pg_last_committed_xact() x;
  ?column? | ?column? | ?column? 
 --+--+--
diff --git a/src/test/modules/commit_ts/t/004_restart.pl b/src/test/modules/commit_ts/t/004_restart.pl
index 900e9b7..32be07c 100644
--- a/src/test/modules/commit_ts/t/004_restart.pl
+++ b/src/test/modules/commit_ts/t/004_restart.pl
@@ -25,19 +25,13 @@ like(
 
 ($ret, $stdout, $stderr) =
   $node_master->psql('postgres', qq[SELECT pg_xact_commit_timestamp('1');]);
-is($ret, 3, 'getting ts of BootstrapTransactionId reports error');
-like(
-	$stderr,
-	qr/cannot retrieve commit timestamp for transaction/,
-	'expected error from BootstrapTransactionId');
+is($ret, 0, 'getting ts of BootstrapTransactionId succeeds');
+is($stdout, '', 'timestamp of BootstrapTransactionId is null');
 
 ($ret, $stdout, $stderr) =
   $node_master->psql('postgres', qq[SELECT pg_xact_commit_timestamp('2');]);
-is($ret, 3, 'getting ts of FrozenTransactionId reports error');
-like(
-	$stderr,
-	qr/cannot retrieve commit timestamp for transaction/,
-	'expected error from FrozenTransactionId');
+is($ret, 0, 'getting ts of FrozenTransactionId succeeds');
+is($stdout, '', 'timestamp of FrozenTransactionId is null');
 
 # Since FirstNormalTransactionId will've occurred during initdb, long before we
 # enabled commit timestamps, it'll be null since we have no cts data for it but
-- 
2.5.5


-- 
Sent 

[HACKERS] [bugfix] commit timestamps ERROR on lookup of FrozenTransactionId

2016-11-23 Thread Craig Ringer
Hi all

Today I ran into an issue where commit timestamp lookups were failing with

ERROR: cannot retrieve commit timestamp for transaction 2

which is of course FrozenTransactionId.

TransactionIdGetCommitTsData(...) ERRORs on !TransactionIdIsNormal(),
which I think is wrong. Attached is a patch to make it return 0 for
FrozenTransactionId and BootstrapTransactionId, like it does for xids
that are too old.

Note that the prior behaviour was as designed and has tests to enforce
it. I just think it's wrong, and it's also not documented.

IMO this should be back-patched to 9.6 and, without the TAP test part, to 9.5.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 52081e9aa91102022d6e853d4e2217308bb230a9 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 23 Nov 2016 19:50:50 +0800
Subject: [PATCH] Treat frozen and bootstrap xids as old, not invalid, for
 committs

TransactionIdGetCommitTsData() ERROR'd on FrozenTransactionId and
BootstrapTransactionId, which is a surprising outcome for callers
given that it usually returns 0 for xids too old for their commit
timestamp to still be known.

Callers hitting this problem would get an error like:

ERROR: cannot retrieve commit timestamp for transaction 2

Fix it so the frozen and bootstrap XIDs return 0 like other too-old XIDs.
---
 src/backend/access/transam/commit_ts.c   | 11 +--
 src/test/modules/commit_ts/expected/commit_timestamp.out | 12 ++--
 src/test/modules/commit_ts/t/004_restart.pl  | 14 --
 3 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 7746578..a5b270c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -289,11 +289,18 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
 	TransactionId oldestCommitTsXid;
 	TransactionId newestCommitTsXid;
 
-	/* error if the given Xid doesn't normally commit */
-	if (!TransactionIdIsNormal(xid))
+	if (!TransactionIdIsValid(xid))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 		errmsg("cannot retrieve commit timestamp for transaction %u", xid)));
+	else if (!TransactionIdIsNormal(xid))
+	{
+		/* frozen and bootstrap xids are always committed far in the past */
+		*ts = 0;
+		if (nodeid)
+			*nodeid = 0;
+		return false;
+	}
 
 	LWLockAcquire(CommitTsLock, LW_SHARED);
 
diff --git a/src/test/modules/commit_ts/expected/commit_timestamp.out b/src/test/modules/commit_ts/expected/commit_timestamp.out
index 99f3322..5b7783b 100644
--- a/src/test/modules/commit_ts/expected/commit_timestamp.out
+++ b/src/test/modules/commit_ts/expected/commit_timestamp.out
@@ -28,9 +28,17 @@ DROP TABLE committs_test;
 SELECT pg_xact_commit_timestamp('0'::xid);
 ERROR:  cannot retrieve commit timestamp for transaction 0
 SELECT pg_xact_commit_timestamp('1'::xid);
-ERROR:  cannot retrieve commit timestamp for transaction 1
+ pg_xact_commit_timestamp 
+--
+ 
+(1 row)
+
 SELECT pg_xact_commit_timestamp('2'::xid);
-ERROR:  cannot retrieve commit timestamp for transaction 2
+ pg_xact_commit_timestamp 
+--
+ 
+(1 row)
+
 SELECT x.xid::text::bigint > 0, x.timestamp > '-infinity'::timestamptz, x.timestamp <= now() FROM pg_last_committed_xact() x;
  ?column? | ?column? | ?column? 
 --+--+--
diff --git a/src/test/modules/commit_ts/t/004_restart.pl b/src/test/modules/commit_ts/t/004_restart.pl
index 900e9b7..32be07c 100644
--- a/src/test/modules/commit_ts/t/004_restart.pl
+++ b/src/test/modules/commit_ts/t/004_restart.pl
@@ -25,19 +25,13 @@ like(
 
 ($ret, $stdout, $stderr) =
   $node_master->psql('postgres', qq[SELECT pg_xact_commit_timestamp('1');]);
-is($ret, 3, 'getting ts of BootstrapTransactionId reports error');
-like(
-	$stderr,
-	qr/cannot retrieve commit timestamp for transaction/,
-	'expected error from BootstrapTransactionId');
+is($ret, 0, 'getting ts of BootstrapTransactionId succeeds');
+is($stdout, '', 'timestamp of BootstrapTransactionId is null');
 
 ($ret, $stdout, $stderr) =
   $node_master->psql('postgres', qq[SELECT pg_xact_commit_timestamp('2');]);
-is($ret, 3, 'getting ts of FrozenTransactionId reports error');
-like(
-	$stderr,
-	qr/cannot retrieve commit timestamp for transaction/,
-	'expected error from FrozenTransactionId');
+is($ret, 0, 'getting ts of FrozenTransactionId succeeds');
+is($stdout, '', 'timestamp of FrozenTransactionId is null');
 
 # Since FirstNormalTransactionId will've occurred during initdb, long before we
 # enabled commit timestamps, it'll be null since we have no cts data for it but
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-23 Thread Daniel Verite
   I wrote:

> So I went through the psql commands which don't fail on parse errors
> for booleans
> [...]

Here's a v5 patch implementing the suggestions mentioned upthread:
all meta-commands calling ParseVariableBool() now fail
when the boolean argument can't be parsed successfully.

Also includes a minor change to SetVariableAssignHook() that now
returns the result of the hook it calls after installing it.
It doesn't make any difference in psql behavior since callers
of SetVariableAssignHook() ignore its return value, but it's
more consistent with SetVariable().

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index a9a2fdb..356467b 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -254,25 +254,30 @@ exec_command(const char *cmd,
if (opt1 != NULL && strncmp(opt1, prefix, sizeof(prefix) - 1) 
== 0)
{
reuse_previous =
-   ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix) ?
+   ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix, ) ?
TRI_YES : TRI_NO;
-
-   free(opt1);
-   opt1 = read_connect_arg(scan_state);
+   if (success)
+   {
+   free(opt1);
+   opt1 = read_connect_arg(scan_state);
+   }
}
else
reuse_previous = TRI_DEFAULT;
 
-   opt2 = read_connect_arg(scan_state);
-   opt3 = read_connect_arg(scan_state);
-   opt4 = read_connect_arg(scan_state);
+   if (success)/* do not attempt to connect if 
reuse_previous argument was invalid */
+   {
+   opt2 = read_connect_arg(scan_state);
+   opt3 = read_connect_arg(scan_state);
+   opt4 = read_connect_arg(scan_state);
 
-   success = do_connect(reuse_previous, opt1, opt2, opt3, opt4);
+   success = do_connect(reuse_previous, opt1, opt2, opt3, 
opt4);
 
+   free(opt2);
+   free(opt3);
+   free(opt4);
+   }
free(opt1);
-   free(opt2);
-   free(opt3);
-   free(opt4);
}
 
/* \cd */
@@ -1548,7 +1553,11 @@ exec_command(const char *cmd,

 OT_NORMAL, NULL, false);
 
if (opt)
-   pset.timing = ParseVariableBool(opt, "\\timing");
+   {
+   boolnewval = ParseVariableBool(opt, "\\timing", 
);
+   if (success)
+   pset.timing = newval;
+   }
else
pset.timing = !pset.timing;
if (!pset.quiet)
@@ -2660,7 +2669,14 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
if (value && pg_strcasecmp(value, "auto") == 0)
popt->topt.expanded = 2;
else if (value)
-   popt->topt.expanded = ParseVariableBool(value, param);
+   {
+   boolvalid;
+   boolnewval = ParseVariableBool(value, param, 
);
+   if (valid)
+   popt->topt.expanded = newval;
+   else
+   return false;
+   }
else
popt->topt.expanded = !popt->topt.expanded;
}
@@ -2668,8 +2684,16 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
/* locale-aware numeric output */
else if (strcmp(param, "numericlocale") == 0)
{
+
if (value)
-   popt->topt.numericLocale = ParseVariableBool(value, 
param);
+   {
+   boolvalid;
+   boolnewval = ParseVariableBool(value, param, 
);
+   if (valid)
+   popt->topt.numericLocale = newval;
+   else
+   return false;
+   }
else
popt->topt.numericLocale = !popt->topt.numericLocale;
}
@@ -2724,7 +2748,14 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
else if (strcmp(param, "t") == 0 || strcmp(param, "tuples_only") == 0)
{
if (value)
-   popt->topt.tuples_only = ParseVariableBool(value, 
param);
+   {
+

Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

2016-11-23 Thread Thomas Munro
On Wed, Oct 5, 2016 at 10:32 AM, Thomas Munro
 wrote:
> One obvious problem is that this patch results in at least *two* DSM
> segments being created for every parallel query execution: the main
> segment used for parallel execution, and then the initial segment
> managed by the DSA area.  One thought is that DSA areas are the more
> general mechanism, so perhaps we should figure out how to store
> contents of the existing segment in it.  The TOC interface would need
> a few tweaks to be able to live in memory allocated with dsa_allocate,
> and they we'd need to share that address with other backends so that
> they could find it (cf the current approach of finding the TOC at the
> start of the segment).  I haven't prototyped that yet.  That'd involve
> changing the wording "InitializeDSM" that appears in various places
> including the FDW API, which has been putting me off...

... or we could allow DSA areas to be constructed inside existing
shmem, as in the attached patch which requires dsa_create_in_place,
from the patch at
https://www.postgresql.org/message-id/CAEepm%3D0-pbokaQdCXhtYn%3Dw64MmdJz4hYW7qcSU235ar276x7w%40mail.gmail.com
.

-- 
Thomas Munro
http://www.enterprisedb.com


dsa-area-for-executor-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2016-11-23 Thread Rushabh Lathia
On Tue, Nov 22, 2016 at 6:24 PM, Etsuro Fujita 
wrote:

> Hi Rushabh,
>
> On 2016/11/22 19:05, Rushabh Lathia wrote:
>
>> I started reviewing the patch and here are few initial review points and
>> questions for you.
>>
>
> Thanks for the review!
>
> 1)
>> -static void deparseExplicitTargetList(List *tlist, List
>> **retrieved_attrs,
>> +static void deparseExplicitTargetList(bool is_returning,
>> +  List *tlist,
>> +  List **retrieved_attrs,
>>deparse_expr_cxt *context);
>>
>> Any particular reason of inserting new argument as 1st argunment? In
>> general
>> we add new flags at the end or before the out param for the existing
>> function.
>>
>
> No, I don't.  I think the *context should be the last argument as in other
> functions in deparse.c.  How about inserting that before the out param
> **retrieved_attrs, like this?
>
> static void
> deparseExplicitTargetList(List *tlist,
>   bool is_returning,
>   List **retrieved_attrs,
>   deparse_expr_cxt *context);
>
>
Yes, adding it before retrieved_attrs would be good.


> 2)
>> -static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
>> -RelOptInfo *joinrel, bool use_alias, List
>> **params_list);
>> +static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
>> RelOptInfo *foreignrel,
>> +  bool use_alias, Index target_rel, List
>> **params_list);
>>
>
> Going beyond 80 character length
>>
>
> Will fix.
>
> 3)
>>  static void
>> -deparseExplicitTargetList(List *tlist, List **retrieved_attrs,
>> +deparseExplicitTargetList(bool is_returning,
>> +  List *tlist,
>> +  List **retrieved_attrs,
>>deparse_expr_cxt *context)
>>
>> This looks bit wired to me - as comment for the
>> deparseExplicitTargetList()
>> function name and comments says different then what its trying to do when
>> is_returning flag is passed. Either function comment need to be change or
>> may be separate function fo deparse returning list for join relation?
>>
>> Is it possible to teach deparseReturningList() to deparse the returning
>> list for the joinrel?
>>
>
> I don't think it's possible to do that because deparseReturningList uses
> deparseTargetList internally, which only allows us to emit a target list
> for a baserel.  For example, deparseReturningList can't handle a returning
> list that contains join outputs like this:
>
> postgres=# explain verbose delete from ft1 using ft2 where ft1.a = ft2.a
> returning ft1.*, ft2.*;
>QUERY PLAN
> 
> 
>  Delete on public.ft1  (cost=100.00..102.06 rows=1 width=46)
>Output: ft1.a, ft1.b, ft2.a, ft2.b
>->  Foreign Delete  (cost=100.00..102.06 rows=1 width=46)
>  Remote SQL: DELETE FROM public.t1 r1 USING public.t2 r2 WHERE
> ((r1.a = r2.a)) RETURNING r1.a, r1.b, r2.a, r2.b
> (4 rows)
>
> I'd like to change the comment for deparseExplicitTargetList.
>
> 4)
>>
>> +if (foreignrel->reloptkind == RELOPT_JOINREL)
>> +{
>> +List   *rlist = NIL;
>> +
>> +/* Create a RETURNING list */
>> +rlist = make_explicit_returning_list(rtindex, rel,
>> + returningList,
>> + fdw_scan_tlist);
>> +
>> +deparseExplicitReturningList(rlist, retrieved_attrs, );
>> +}
>> +else
>> +deparseReturningList(buf, root, rtindex, rel, false,
>> + returningList, retrieved_attrs);
>>
>> The code will be more centralized if we add the logic of extracting
>> returninglist
>> for JOINREL inside deparseReturningList() only, isn't it?
>>
>
> You are right.  Will do.
>
> 5) make_explicit_returning_list() pull the var list from the
>> returningList and
>> build the targetentry for the returning list and at the end rewrites the
>> fdw_scan_tlist.
>>
>> AFAIK, in case of DML - which is going to pushdown to the remote server
>> ideally fdw_scan_tlist should be same as returning list, as final output
>> for the query is query will be RETUNING list only. isn't that true?
>>
>
> That would be true.  But the fdw_scan_tlist passed from the core would
> contain junk columns inserted by the rewriter and planner work, such as
> CTID for the target table and whole-row Vars for other tables specified in
> the FROM clause of an UPDATE or the USING clause of a DELETE.  So, I
> created the patch so that the pushed-down UPDATE/DELETE retrieves only the
> data needed for the RETURNING calculation from the remote server, not the
> whole fdw_scan_tlist data.


Yes, I noticed that fdw_scan_tlist have CTID for the target table and
whole-raw 

Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-23 Thread Michael Paquier
On Wed, Nov 23, 2016 at 6:15 PM, Magnus Hagander  wrote:
> There's also the consideration of what to do with stats *on the standby*. If
> we WAL log the stats file, then when it replays onthe standby, the stats
> there will be overwritten. And stats like number of index vs seq scans on
> the standby are still interesting and would be lost.

Perhaps it would make sense to separate the stat files by type then?
The action taken for each file depends on its type.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNDO and in-place update

2016-11-23 Thread Albe Laurenz
Robert Haas wrote:
> To implement this in PostgreSQL, I believe we would need support for
> UNDO.

> - Reading a page that has been recently modified gets significantly
> more expensive; it is necessary to read the associated UNDO entries
> and do a bunch of calculation that is significantly more complex than
> what is required today.

As others have said, that is the main drawback.

Also, and related to this, ROLLBACK would become an expensive operation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-23 Thread Karl O. Pinc
Hi Gilles,

On Sat, 19 Nov 2016 12:58:47 +0100
Gilles Darold  wrote:

> ... attached v14 of the patch. 

Attached are patches for your consideration and review.
(Including your latest v14 patch for completeness.)

Some of the attached patches (like the GUC symbol
patch you've seen before) are marked to be submitted
as separate patches to the maintainers when we send
them code for review.  These need looking over by
somebody, I hope you, before they get sent on so
please comment on these if you're not going to look
at them or if you see a problem with them.   (Or if
you like them.  :)  Thanks.

I also have comments at the bottom regards problems
I see but haven't patched.

---

patch_pg_current_logfile-v14.diff

Applies on top of master.

The current patch.

---

patch_pg_current_logfile-v14.diff.startup_docs

For consideration of inclusion in "main" patch.

Applies on top of patch_pg_current_logfile-v14.diff

A documentation fix.

(Part of) my previous doc patch was wrong; current_logfiles is not
unconditionally written on startup.

---

patch_pg_current_logfile-v14.diff.bool_to_int

Do not include in "main" patch, submit to maintainers separately.

Applies on top of patch_pg_current_logfile-v14.diff

The bool types on the stack in logfile_rotate() are
my work.  Bools on the stack don't make sense as far
as hardware goes, so the compiler's optimizer should change
them to int.  This patch changes the bools to ints
should that be to someone's taste.

---

patch_pg_current_logfile-v14.diff.logdest_change

For consideration of inclusion in "main" patch.

Applies on top of patch_pg_current_logfile-v14.diff.

Fixes a bug where, when log_destination changes and the config
file is reloaded, a no-longer-used logfile path may be written
to the current_logfiles file.  The chain of events would be
as follows:

1) PG configured with csvlog in log_destination. Logs are written.

This makes last_csv_file_name non-NULL.


2) PG config changed to remove csvlog from log_destination
and SIGHUP sent.

This removes the csvlog path from current_logfiles but does not
make last_csv_file_name NULL.  last_csv_file_name has the old
path in it.


3) PG configured to add csvlog back to log_destination and
SIGHUP sent.

When csvlogging is turned back on, the stale csv path is written
to current_logfiles.  This is overwritten as soon as some csv logs
are written because the new csv logfile must be opened, but this is
still a problem.  Even if it happens to be impossible at the moment
to get past step 2 to step 3 without having some logs written it seems
a lot more robust to manually "expire" the last*_file_name variable
content when log_destination changes.


So what the patch does is "scrub" the "last_*file_name" variables
when the config file changes.

FWIW, I moved the logfile_writename() call toward the top of the
if block just to keep all the code which sorta-kinda involves
the old_log_destination variable together.

---

patch_pg_current_logfile-v14.diff.logdest_change-part2

Do not include in "main" patch, submit to maintainers separately.

Applies on top of patch_pg_current_logfile-v14.diff.logdest_change

Adds a PGLogDestination typedef.  A separate patch since this may be
overkill and more about coding style than anything else.

---

And now, a series of patches to fix the problem where, at least
potentially, logfile_writename() gets a ENFILE or EMFILE and
therefore a log file path does not ever get written to the
current_logfiles file.  The point of this patch series is to retry until
the content of current_logfiles is correct.

All of these are for consideration of inclusion in "main" patch.


patch_pg_current_logfile-v14.diff.retry_current_logfiles-part1

Applies on top of patch_pg_current_logfile-v14.diff.logdest_change

A documentation patch.  Notes that (even with retrying) the
current_logfiles content might not be right.


patch_pg_current_logfile-v14.diff.retry_current_logfiles-part2

Applies on top of
patch_pg_current_logfile-v14.diff.retry_current_logfiles-part1

Remove arguments from logfile_writename().  Use static storage
instead.  We always update last_file_name and last_csv_file_name
whenever logfile_writename() is called so there's not much of
a change here.



patch_pg_current_logfile-v14.diff.retry_current_logfiles-part3

Applies on top of
patch_pg_current_logfile-v14.diff.retry_current_logfiles-part2

Re-try the write of current_logfiles should it fail because the
system is too busy.

---

patch_pg_current_logfile-v14.diff.backoff

Do not include in "main" patch, submit to maintainers separately.

Applies on top of
patch_pg_current_logfile-v14.diff.retry_current_logfiles-part3

Introduces a backoff when retrying to write the current_logfiles file,
because retrying when failure is due to a busy system only makes the
system more busy.  This may well be over-engineering but I thought
I'd present the code and let more experienced people decide.

I have yet to really test this patch.


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-23 Thread Magnus Hagander
On Tue, Nov 22, 2016 at 10:26 PM, Alvaro Herrera 
wrote:

> Andres Freund wrote:
> > On 2016-11-22 16:15:58 -0500, Tom Lane wrote:
>
> > > Maybe a workable compromise would be to leave the file present, and
> have
> > > the stats collector re-write it every (say) five minutes.  Then I'd be
> > > okay with having an immediate shutdown skip writing the file; you'd be
> > > losing up to five minutes' worth of activity, but not going completely
> > > nuts.  So the stats collector's normal activities would include writing
> > > the temp file on-demand and the permanent file on a timed cycle.
> > >
> > > The other components of the fix (deleting on PITR rewind or stats
> > > collector crash) would remain the same.
>
> +1
>
> > It could even make sense to WAL log the contents of the stats file at
> > checkpoint (or similar) time. Then it'd be sane after crash recovery,
> > including starting from an old base backup.  Loosing the information
> > about what to vacuum / analyze after a failover is quite problematic...
>
> An idea worth considering.  This would also mean the file is valid in a
> standby -- the lack of the file is currently a problem if you promote
> the standby, as I recall.  But the file is perhaps too large to write to
> WAL on every checkpoint.
>

There's also the consideration of what to do with stats *on the standby*.
If we WAL log the stats file, then when it replays onthe standby, the stats
there will be overwritten. And stats like number of index vs seq scans on
the standby are still interesting and would be lost.

That's not to say that I don't agree with the issues of failover vs
autovacuum. But let's try to find a way that doesn't hurt other things to
get there. But that could just be part of how we deal with the replay of it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

2016-11-23 Thread Andres Freund
On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund  wrote:
> > I've a working fix for this, and for a similar issue Robert found. I'm
> > still playing around with it, but basically the fix is to make the
> > growth policy a bit more adaptive.
> 
> Any chance you can post a patch soon?

Here's my WIP series addressing this and related problems. With this
we're again noticeably faster than the dynahash implementation, in both
the case here, and the query you brought up over IM.

This definitely needs some more TLC, but the general approach seems
good. I particularly like that it apparently allows us to increase the
default fillfactor without much downside according to my measurements.

Greetings,

Andres Freund
>From efc603dd67a1c95f2d24844b86436ed6b7a28c80 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 23 Nov 2016 00:23:42 -0800
Subject: [PATCH 1/3] Resize simplehash.h table in case of long runs.

This also allows to increase the default hashtable.
---
 src/include/lib/simplehash.h | 36 +---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 12aedbc..9522fa5 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -152,10 +152,12 @@ SH_SCOPE void SH_STAT(SH_TYPE *tb);
 
 #include "utils/memutils.h"
 
-/* conservative fillfactor for a robin hood table, might want to adjust */
-#define SH_FILLFACTOR (0.8)
+/* normal fillfactor, unless already close to maximum */
+#define SH_FILLFACTOR (0.9)
 /* increase fillfactor if we otherwise would error out */
-#define SH_MAX_FILLFACTOR (0.95)
+#define SH_MAX_FILLFACTOR (0.98)
+/* collision chain length at which we resize */
+#define SH_MAX_FILLFACTOR (0.98)
 /* max data array size,we allow up to PG_UINT32_MAX buckets, including 0 */
 #define SH_MAX_SIZE (((uint64) PG_UINT32_MAX) + 1)
 
@@ -550,6 +552,34 @@ SH_INSERT(SH_TYPE *tb, SH_KEY_TYPE key, bool *found)
 #endif
 			entry->status = SH_STATUS_IN_USE;
 			*found = false;
+
+			/*
+			 * To avoid negative consequences from overly imbalanced
+			 * hashtables, grow the hashtable if collisions lead to large
+			 * runs. The most likely cause of such imbalance is filling a
+			 * (currently) small table, from a currently big one, in
+			 * hash-table order.
+			 *
+			 * FIXME: compute boundaries in a more principled manner.
+			 */
+			if (unlikely(insertdist > 20 ||
+		 SH_DISTANCE_FROM_OPTIMAL(tb, curoptimal, emptyelem) > 1000))
+			{
+/* don't grow if the table would grow overly much */
+if (tb->members / ((double) tb->size) > 0.1)
+{
+	/*
+	 * elog(WARNING, "clumping b, growing this thing");
+	 * SH_STAT(tb);
+	 */
+	/*
+	 * Trigger a growth cycle next round, easier than resizing
+	 * now.
+	 */
+	tb->grow_threshold = 0;
+}
+			}
+
 			return entry;
 		}
 
-- 
2.10.2

>From 150bc7fb1e9eacaee5d46852fa464e83e7930aca Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 22 Nov 2016 20:15:37 -0800
Subject: [PATCH 2/3] Signal when a master backend is doing parallel work.

---
 src/backend/access/transam/parallel.c | 14 --
 src/backend/executor/nodeGather.c |  6 ++
 src/include/access/parallel.h | 20 +++-
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 59dc394..c038ea1 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -88,12 +88,14 @@ typedef struct FixedParallelState
 } FixedParallelState;
 
 /*
- * Our parallel worker number.  We initialize this to -1, meaning that we are
- * not a parallel worker.  In parallel workers, it will be set to a value >= 0
- * and < the number of workers before any user code is invoked; each parallel
- * worker will get a different parallel worker number.
+ * Our parallel worker number.  We initialize this to PARALLEL_WORKER_MASTER,
+ * meaning that we are not a parallel worker.  In parallel workers, it will be
+ * set to a value >= 0 and < the number of workers before any user code is
+ * invoked; each parallel worker will get a different parallel worker number.
+ * If the master backend performs parallel work, this will temporarily be set
+ * to PARALLEL_WORKER_MASTER_PARALLEL while doing so.
  */
-int			ParallelWorkerNumber = -1;
+int			ParallelWorkerNumber = PARALLEL_WORKER_MASTER;
 
 /* Is there a parallel message pending which we need to receive? */
 volatile bool ParallelMessagePending = false;
@@ -955,7 +957,7 @@ ParallelWorkerMain(Datum main_arg)
 	BackgroundWorkerUnblockSignals();
 
 	/* Determine and set our parallel worker number. */
-	Assert(ParallelWorkerNumber == -1);
+	Assert(ParallelWorkerNumber == PARALLEL_WORKER_MASTER);
 	memcpy(, MyBgworkerEntry->bgw_extra, sizeof(int));
 
 	/* 

Re: [HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer

2016-11-23 Thread Vladimir Svedov
No, I select from it OK.
The bug(?) is that when I do it in oppened psql session it produces no log,
and when I run same select as psql -c "SELECT..." it gives the above

2016-11-22 20:18 GMT+00:00 Robert Haas :

> On Tue, Nov 22, 2016 at 5:05 AM, Vladimir Svedov 
> wrote:
> > Hi,
> > Sorry - tried to reproduce on other machine and gather all statements.
> And
> > failed
> > Installed 9.3 (which has those symptoms) and still can't reproduce.
> > Must be platform specific, not version
>
> Probably the foreign server isn't configured properly, and points to a
> host/port to that resets the connection when you attempt to connect to
> it.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>