Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Michael Paquier
On Tue, Oct 25, 2016 at 1:17 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> Well... Coming back to the subject, are there any recommendations from
>> committers? -std=c89 in CFLAGS does not seem to help much to detect
>> extra commas in enums, even if this has been added in C99.
>
> My opinion is don't worry about it.  The buildfarm will find such problems
> just fine, and there's no functional reason to stress about finding
> extra commas before that.

Thanks for the feedback. Alea jacta est.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-24 Thread Haribabu Kommi
On Tue, Oct 18, 2016 at 1:51 AM, Shay Rojansky  wrote:

> The current macaddr datatype needs to be kept for some time by renaming
>> it without changing OID and use the newer one for further usage.
>>
>
> From the point of view of a driver maintainer... Npgsql looks up data
> types by their name - upon first connection to a database it queries
> pg_type and maps its internal data type handlers based on name. This allows
> it to avoid hardcoding data type OIDs in source code, and easily handle
> user-defined data types as well (enums, composites...). So a sudden rename
> of a datatype would definitely cause a problem. Of course it's possible to
> first check the server version and act accordingly but it seems to
> complicate things needlessly.
>

Yes, that's correct. Changing the existing datatype name is a pain, but for
enhancing
its use to adopt the new hardware addresses, i feel this is required.

Here I attached the first version of patch that supports both EUI-48 and
EUI-64 type
Mac addresses with a single datatype called macaddr. This is an variable
length
datatype similar like inet. It can store both 6 and 8 byte addresses.
Variable length
type is used because in case in future, if MAC address gets enhanced, still
this type
can support without breaking DISK compatibility.

Currently the patch lacks of documentation. Comments?

Regards,
Hari Babu
Fujitsu Australia


mac_eui64_support_1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Tom Lane
Michael Paquier  writes:
> Well... Coming back to the subject, are there any recommendations from
> committers? -std=c89 in CFLAGS does not seem to help much to detect
> extra commas in enums, even if this has been added in C99.

My opinion is don't worry about it.  The buildfarm will find such problems
just fine, and there's no functional reason to stress about finding
extra commas before that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-24 Thread Amit Kapila
On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar  wrote:
> On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar  wrote:
>> On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
>>  wrote:
>>
>>> In the results you've posted on 10/12, you've mentioned a regression with 32
>>> clients, where you got 52k tps on master but only 48k tps with the patch (so
>>> ~10% difference). I have no idea what scale was used for those tests,
>>
>> That test was with scale factor 300 on POWER 4 socket machine. I think
>> I need to repeat this test with multiple reading to confirm it was
>> regression or run to run variation. I will do that soon and post the
>> results.
>
> As promised, I have rerun my test (3 times), and I did not see any regression.
>

Thanks Tomas and Dilip for doing detailed performance tests for this
patch.  I would like to summarise the performance testing results.

1. With update intensive workload, we are seeing gains from 23%~192%
at client count >=64 with group_update patch [1].
2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing
gains from 12% to ~70% at client count >=64 [2].  Tests are done on
8-socket intel   m/c.
3. With pgbench workload (both simple-update and tpc-b at 300 scale
factor), we are seeing gain 10% to > 50% at client count >=64 [3].
Tests are done on 8-socket intel m/c.
4. To see why the patch only helps at higher client count, we have
done wait event testing for various workloads [4], [5] and the results
indicate that at lower clients, the waits are mostly due to
transactionid or clientread.  At client-counts where contention due to
CLOGControlLock is significant, this patch helps a lot to reduce that
contention.  These tests are done on on 8-socket intel m/c and
4-socket power m/c
5. With pgbench workload (unlogged tables), we are seeing gains from
15% to > 300% at client count >=72 [6].

There are many more tests done for the proposed patches where gains
are either or similar lines as above or are neutral.  We do see
regression in some cases.

1. When data doesn't fit in shared buffers, there is regression at
some client counts [7], but on analysis it has been found that it is
mainly due to the shift in contention from CLOGControlLock to
WALWriteLock and or other locks.
2. We do see in some cases that granular_locking and no_content_lock
patches has shown significant increase in contention on
CLOGControlLock.  I have already shared my analysis for same upthread
[8].

Attached is the latest group update clog patch.

In last commit fest, the patch was returned with feedback to evaluate
the cases where it can show win and I think above results indicates
that the patch has significant benefit on various workloads.  What I
think is pending at this stage is the either one of the committer or
the reviewers of this patch needs to provide feedback on my analysis
[8] for the cases where patches are not showing win.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAFiTN-tr_%3D25EQUFezKNRk%3D4N-V%2BD6WMxo7HWs9BMaNx7S3y6w%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAFiTN-v5hm1EO4cLXYmpppYdNQk%2Bn4N-O1m%2B%2B3U9f0Ga1gBzRQ%40mail.gmail.com
[4] - 
https://www.postgresql.org/message-id/CAFiTN-taV4iVkPHrxg%3DYCicKjBS6%3DQZm_cM4hbS_2q2ryLhUUw%40mail.gmail.com
[5] - 
https://www.postgresql.org/message-id/CAFiTN-uQ%2BJbd31cXvRbj48Ba6TqDUDpLKSPnsUCCYRju0Y0U8Q%40mail.gmail.com
[6] - http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip
[7] - http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip
[8] - 
https://www.postgresql.org/message-id/CAA4eK1J9VxJUnpOiQDf0O%3DZ87QUMbw%3DuGcQr4EaGbHSCibx9yA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] emergency outage requiring database restart

2016-10-24 Thread Alvaro Herrera
Merlin Moncure wrote:
> On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure  wrote:

> > Corruption struck again.
> > This time got another case of view busted -- attempting to create
> > gives missing 'type' error.
> 
> Call it a hunch -- I think the problem is in pl/sh.

I've heard that before.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding timeline following take II

2016-10-24 Thread Craig Ringer
On 24 October 2016 at 23:49, Petr Jelinek  wrote:
> Hi Craig,
>
> On 01/09/16 06:08, Craig Ringer wrote:
>> Hi all
>>
>> Attached is a rebased and updated logical decoding timeline following
>> patch for 10.0.
>>
>> This is a pre-requisite for the pending work on logical decoding on
>> standby servers and simplified failover of logical decoding.
>>
>
> I went over this and it looks fine to me, I only rebased the patch on
> top of current version (we renamed pg_xlog which broke the tests) and
> split the test harness to separate patch. Otherwise I would consider
> this to be ready for committer.
>
> I think this should go in early so that there is enough time in the
> cycle to uncover potential issues if there are any, even though it looks
> all correct to me.

Thanks for the review and update.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-24 Thread Kyotaro HORIGUCHI
Hi,

At Mon, 24 Oct 2016 15:55:58 +0900, Michael Paquier  
wrote in 
> On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila  wrote:
> > On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
> >  wrote:
> >> Thank you for looking and retelling this.
> 
> +1.
> 
> >> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas  
> >> wrote in 
> >> 
> >>> I can think of two solutions that would be "tighter":
> >>>
> >>> 1. When performing a restartpoint, update the minimum recovery point
> >>> to just beyond the checkpoint record.  I think this can't hurt anyone
> >>> who is actually restarting recovery on the same machine, because
> >>> that's exactly the point where they are going to start recovery.  A
> >>> minimum recovery point which precedes the point at which they will
> >>> start recovery is no better than one which is equal to the point at
> >>> which they will start recovery.
> >>
> >> This is a candidate that I thought of. But I avoided to change
> >> the behavior of minRecoveryPoint that (seems to me that) it
> >> describes only buffer state. So checkpoint with no update doesn't
> >> advances minRecoveryPoint as my understanding.
> >
> > I think what you are saying is not completely right, because we do
> > update minRecoveryPoint when we don't perform a new restart point.
> > When we perform restart point, then it assumes that flushing the
> > buffers will take care of updating minRecoveryPoint.

Thank you pointing that. I've forgot it. But as I looked there
again, I found that I have mistaken for a long time the code as
updating minRecoveryPoint, but it actually does invalidation.  If
the "update" were in the block of shutdown checkpoint, it would
look consistent but it is before the block, so any skipped
non-shutdown checkpoint record invalidates minRecoveryPoint but
leave the state as DB_IN_ARCHIVE_RECOVERY, aren't this two in a
contradiction? (Though I don't see something bad will happen with
it..)

> Yep, minRecoveryPoint still gets updated when the last checkpoint
> record is the last restart point to avoid a hot standby to allow
> read-only connections at a LSN-point earlier than the last shutdown.

The described behavior seems corect, but the code seems to be
doing somewhat different.

> Anyway, we can clearly reject 1. in the light of
> https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com
> when playing with different stop locations at recovery.

| * If the last checkpoint record we've replayed is already our last
| * restartpoint, we can't perform a new restart point. We still update
| * minRecoveryPoint in that case, so that if this is a shutdown restart
| * point, we won't start up earlier than before. 
...
| * We don't explicitly advance minRecoveryPoint when we do create a
| * restartpoint. It's assumed that flushing the buffers will do that as a
| * side-effect.

The second sentense seems to me as "we *expect* minRecoveryPoint
to be updated anyway even if we don't do that here". Though a bit
different in reality..

skipped checkpoints - advance minRecvoeryPoint to the checkpoint

I'm failing to make a consistent model for the code around here
in my mind..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Thomas Munro
On Tue, Oct 25, 2016 at 2:34 PM, Peter Eisentraut
 wrote:
> On 10/24/16 8:37 PM, Michael Paquier wrote:
>> Well... Coming back to the subject, are there any recommendations from
>> committers? -std=c89 in CFLAGS does not seem to help much to detect
>> extra commas in enums, even if this has been added in C99.
>
> The only option that gives you a warning for this is -pedantic, and
> that's not going to work because it disabled a bunch of other stuff.

Considering your work to make PostgreSQL a valid C++ program, I just
wanted to note that C++03 doesn't like trailing commas in enums (since
it incorporates the earlier C standard).  That means that the baseline
for C++ would need to be at least C++11 for that to compile.  There
are also C99 features that are not in any C++ standard including
variable length arrays (which the C++ people consider to be insane
AFAIK) and restrict.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Peter Eisentraut
On 10/24/16 8:37 PM, Michael Paquier wrote:
> Well... Coming back to the subject, are there any recommendations from
> committers? -std=c89 in CFLAGS does not seem to help much to detect
> extra commas in enums, even if this has been added in C99.

The only option that gives you a warning for this is -pedantic, and
that's not going to work because it disabled a bunch of other stuff.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-24 Thread Peter Geoghegan
On Fri, Oct 7, 2016 at 5:47 PM, Peter Geoghegan  wrote:
> Work is still needed on:
>
> * Cost model. Should probably attempt to guess final index size, and
> derive calculation of number of workers from that. Also, I'm concerned
> that I haven't given enough thought to the low end, where with default
> settings most CREATE INDEX statements will use at least one parallel
> worker.
>
> * The whole way that I teach nbtsort.c to disallow catalog tables for
> parallel CREATE INDEX due to concerns about parallel safety is in need
> of expert review, preferably from Robert. It's complicated in a way
> that relies on things happening or not happening from a distance.
>
> * Heikki seems to want to change more about logtape.c, and its use of
> indirection blocks. That may evolve, but for now I can only target the
> master branch.
>
> * More extensive performance testing. I think that this V3 is probably
> the fastest version yet, what with Heikki's improvements, but I
> haven't really verified that.

While I haven't made progress on any of these open items, I should
still get a version out that applies cleanly on top of git tip --
commit b75f467b6eec0678452fd8d7f8d306e6df3a1076 caused the patch to
bitrot. I attach V4, which is a fairly mechanical rebase of V3, with
no notable behavioral changes or bug fixes.

-- 
Peter Geoghegan


0003-Add-force_btree_randomaccess-GUC-for-testing.patch.gz
Description: GNU Zip compressed data


0002-Add-parallel-B-tree-index-build-sorting.patch.gz
Description: GNU Zip compressed data


0001-Cap-the-number-of-tapes-used-by-external-sorts.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Michael Paquier
On Tue, Oct 25, 2016 at 6:59 AM, Tom Lane  wrote:
> Fabien COELHO  writes:
>>> An alternative that would be worth considering is to adopt a uniform
>>> rule of // for line-ending comments and /* for all other uses.
>
>> Why not. As far as comments are concerned, editors usually highlight them
>> in some color, and my eyes get used to the comment color, so the simpler &
>> shorter the better, really.
>
>>> We'd have to teach pgindent about that, and I dunno how hard that is.
>
>> Maybe it is enough to just to turn "/* no-nl */" to "// no-nl" in
>> pgindent's post_indent, but probably I'm too naïve:-)
>
> Well, IMO the point of making that change would be to buy an additional
> three characters of space for the comment before it wraps.  So I'd suspect
> that post-processing is too late.  But I've not looked into pgindent to
> see where the decisions are made exactly.

Well... Coming back to the subject, are there any recommendations from
committers? -std=c89 in CFLAGS does not seem to help much to detect
extra commas in enums, even if this has been added in C99.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] emergency outage requiring database restart

2016-10-24 Thread Merlin Moncure
On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure  wrote:
> On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure  wrote:
>> On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure  wrote:
>>> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian  wrote:
 On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
> > to be to the system catalogs with at least two critical tables dropped
> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
> > pointing at the wrong thing.  Couple more datapoints here.
> >
> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
> > *) Another database on the same cluster was not impacted.  However
> > it's more olap style and may not have been written to during the
> > outage
> >
> > Now, this infrastructure running this system is running maybe 100ish
> > postgres clusters and maybe 1000ish sql server instances with
> > approximately zero unexplained data corruption issues in the 5 years
> > I've been here.  Having said that, this definitely smells and feels
> > like something on the infrastructure side.  I'll follow up if I have
> > any useful info.
>
> After a thorough investigation I now have credible evidence the source
> of the damage did not originate from the database itself.
> Specifically, this database is mounted on the same volume as the
> operating system (I know, I know) and something non database driven
> sucked up disk space very rapidly and exhausted the volume -- fast
> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help

 However, disk space exhaustion should not lead to corruption unless the
 underlying layers lied in some way.
>>>
>>> I agree -- however I'm sufficiently separated from the things doing
>>> the things that I can't verify that in any real way.   In the meantime
>>> I'm going to take standard precautions (enable checksums/dedicated
>>> volume/replication).  Low disk space also does not explain the bizarre
>>> outage I had last friday.
>>
>> ok, data corruption struck again.  This time disk space is ruled out,
>> and access to the database is completely denied:
>> postgres=# \c castaging
>> WARNING:  leaking still-referenced relcache entry for
>> "pg_index_indexrelid_index"
>
> Corruption struck again.
> This time got another case of view busted -- attempting to create
> gives missing 'type' error.

Call it a hunch -- I think the problem is in pl/sh.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] emergency outage requiring database restart

2016-10-24 Thread Merlin Moncure
On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure  wrote:
> On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure  wrote:
>> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian  wrote:
>>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
 > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
 > to be to the system catalogs with at least two critical tables dropped
 > or inaccessible in some fashion.  A lot of the OIDs seemed to be
 > pointing at the wrong thing.  Couple more datapoints here.
 >
 > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
 > *) Another database on the same cluster was not impacted.  However
 > it's more olap style and may not have been written to during the
 > outage
 >
 > Now, this infrastructure running this system is running maybe 100ish
 > postgres clusters and maybe 1000ish sql server instances with
 > approximately zero unexplained data corruption issues in the 5 years
 > I've been here.  Having said that, this definitely smells and feels
 > like something on the infrastructure side.  I'll follow up if I have
 > any useful info.

 After a thorough investigation I now have credible evidence the source
 of the damage did not originate from the database itself.
 Specifically, this database is mounted on the same volume as the
 operating system (I know, I know) and something non database driven
 sucked up disk space very rapidly and exhausted the volume -- fast
 enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>>>
>>> However, disk space exhaustion should not lead to corruption unless the
>>> underlying layers lied in some way.
>>
>> I agree -- however I'm sufficiently separated from the things doing
>> the things that I can't verify that in any real way.   In the meantime
>> I'm going to take standard precautions (enable checksums/dedicated
>> volume/replication).  Low disk space also does not explain the bizarre
>> outage I had last friday.
>
> ok, data corruption struck again.  This time disk space is ruled out,
> and access to the database is completely denied:
> postgres=# \c castaging
> WARNING:  leaking still-referenced relcache entry for
> "pg_index_indexrelid_index"

Corruption struck again.
This time got another case of view busted -- attempting to create
gives missing 'type' error.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Tom Lane
Fabien COELHO  writes:
>> An alternative that would be worth considering is to adopt a uniform
>> rule of // for line-ending comments and /* for all other uses.

> Why not. As far as comments are concerned, editors usually highlight them 
> in some color, and my eyes get used to the comment color, so the simpler & 
> shorter the better, really.

>> We'd have to teach pgindent about that, and I dunno how hard that is.

> Maybe it is enough to just to turn "/* no-nl */" to "// no-nl" in 
> pgindent's post_indent, but probably I'm too naïve:-)

Well, IMO the point of making that change would be to buy an additional
three characters of space for the comment before it wraps.  So I'd suspect
that post-processing is too late.  But I've not looked into pgindent to
see where the decisions are made exactly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Fabien COELHO



I find it annoying that "//" comments are not supported, or having to
declare variables at the beginning of a block instead of when first used...


I think some c99 features would be nice (variadic macros for one), but
those particular two get a big "meh" from me.


This is probably a matter of taste.

The question really to know when some C99 features (inline, mixing code & 
declarations, flexible array members, slash-slash comments, compound 
literals, variadic macros, restrict qualifier, ...), will be considered 
okay for Pg sources, or if they should be kept C89 compliant for the next 
27 years.


Well, it seems that Microsoft took time to implement C99 features, which 
would mean that a recent "Visual C++" would be required to compile pg on 
windows if pg would use C99 features.



.oO( I wonder if it'd be possible to make ereport() an inline function ... )


Probaby not: ISTM that ereport relies on the fact that it is a macro to 
avoid evaluating some arguments until necessary, but the compiler would 
not do that on a function, whether inlined or not, because it would change 
the semantics if the evaluation was to fail.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Petr Jelinek


On 24/10/16 21:11, Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Alvaro Herrera  writes:
>>> Fabien COELHO wrote:
 I find it annoying that "//" comments are not supported, or having to
 declare variables at the beginning of a block instead of when first used...
>>
>>> I think some c99 features would be nice (variadic macros for one), but
>>> those particular two get a big "meh" from me.
>>
>> Yeah.  Personally, I'd want to continue the rule against // comments just
>> as a matter of maintaining stylistic consistency.  Code that's got a
>> random mishmash of // and /* comments looks messy, if you ask me.
> 
> +1
> 

+1

>> An alternative that would be worth considering is to adopt a uniform
>> rule of // for line-ending comments and /* for all other uses.  We'd
>> have to teach pgindent about that, and I dunno how hard that is.
> 
> I'm not sure it's worth the resulting mess.
> 

I think the current commenting style stylistically nicer.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Press Release Draft - 2016-10-27 Cumulative Update

2016-10-24 Thread Jonathan Katz

> On Oct 24, 2016, at 2:59 PM, Tom Lane  wrote:
> 
> Jonathan Katz  writes:
>> Below is the draft of the press release for the update this Thursday:
> 
>> https://git.postgresql.org/gitweb/?p=press.git;a=blob;f=update_releases/current/update_201610.md;h=aac4d0b36f3f3017d319ac617eff901efe0c10c0;hb=880dc99766ee0e608e95d9c0f36ce3cde59f470f
>>  
>> 
> 
> Couple thoughts:
> 
> * I don't believe that the FSM truncation issue is specific to being "in
> recovery".  The corruption itself would happen during crash/restart on
> a master, or while a standby is following a WAL stream, but the effects
> would be seen later.  Suggest just taking out "in recovery" from the
> description.
> 
> * Please clarify that the pg_upgrade VM issue is only in 9.6.0.
> 
> * The recipe for recovery from the VM issue is oversimplified.
> Rather than trying to explain how to fix it here, I'd suggest
> pointing to the wiki page we've created about that,
> https://wiki.postgresql.org/wiki/Visibility_Map_Problems
> 
> Looks good to me otherwise.

Incorporated feedback: 
https://git.postgresql.org/gitweb/?p=press.git;a=blob;f=update_releases/current/update_201610.md;h=11c9fb9504add6b5c060763722aba1cd2eb4afd4;hb=387f3e441666ce4d2a1e5f4e52247ce3a83e733d
 


Thanks!

Jonathan



Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Fabien COELHO



I find it annoying that "//" comments are not supported, or having to
declare variables at the beginning of a block instead of when first used...



I think some c99 features would be nice (variadic macros for one), but
those particular two get a big "meh" from me.


Yeah.  Personally, I'd want to continue the rule against // comments just
as a matter of maintaining stylistic consistency.  Code that's got a
random mishmash of // and /* comments looks messy, if you ask me.


Coding styles are by definition a subjective area...  Homogeneity 
*whatever the precise but reasonable rules* in a code base is a good 
thing.



An alternative that would be worth considering is to adopt a uniform
rule of // for line-ending comments and /* for all other uses.


Why not. As far as comments are concerned, editors usually highlight them 
in some color, and my eyes get used to the comment color, so the simpler & 
shorter the better, really.



We'd have to teach pgindent about that, and I dunno how hard that is.


Maybe it is enough to just to turn "/* no-nl */" to "// no-nl" in 
pgindent's post_indent, but probably I'm too naïve:-)


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Alvaro Herrera
Robert Haas wrote:

> While I was experimenting with this today, I discovered a problem of
> interpretation related to IPv6 addresses.  Internally, a postgresql://
> URL and a connection string are converted into the same format, so
> postgresql://a,b/ means just the same thing as host=a,b.  I thought
> that could similarly decide that postgresql://a:123,b:456/ is going to
> get translated to host=a:123,b:456 and then there can be further code
> to parse that into a list of host-and-port pairs.  However, if you do
> that, then something like host=1:2:3::4:5:6 is fundamentally
> ambiguous.  That :6 is equally valid either as part of the IP address
> or as a trailing port number specification, and there's no a priori
> way to know which it is.  Today, since the host part can't include a
> port specifier, it's regarded as part of the IP address, and I think
> it would probably be a bad idea to change that, as I believe Victor's
> patch would.  He seems to have it in mind that we could allow things
> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
> helpful for the future but doesn't avoid changing the meaning of
> connection strings that work today.

Umm, my recollection regarding IPv6 parsing in the URI syntax is that
those must appear inside square brackets -- it's not valid to have the
IPv6 address outside brackets, and the port number is necessarily
outside the brackets.  So there's no ambiguity.  If the current patch is
allowing IPv6 address to appear outside of brackets, that seems like a
bug to me.

The string conninfo spec should not accept port numbers in the "host"
part.  (Does it?)

> So now I think that to make this work correctly, we're going to need
> to change both the URL parser and also add parsing for the host and
> port.  Let's say the user says this:
> 
> postgresql://[1::2]:3,[4::5],[6::7]::8/

(There's a double colon before 8, I suppose that's just a typo.)

> What I think we need to do is translate that into this:
> 
> host=1::2,4::5,6::7 port=3,,8
> 
> Note the double-comma, indicating a blank port number for the second
> URL component.

Sounds reasonable.

> And then they might write one where the host and
> port parts don't have the same number of components, like this:
> 
> host=a,b,c port=3,4
> or
> host=a,b port=3,4,5
> 
> It is obvious what is meant if multiple hosts are given but only a
> single port number is specified; it is also obvious what is meant if
> the number of ports is equal to the number of hosts.

Agreed -- seems like we can accept both those cases.

> It is not obvious what it means if there are multiple ports but the
> number doesn't equal the number of hosts.

I think we should reject the case of differing number of elements and
neither host nor port is a singleton, as an error.  The suggestion to
ignore some parts seems too error-prone.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Press Release Draft - 2016-10-27 Cumulative Update

2016-10-24 Thread Jonathan Katz

> On Oct 24, 2016, at 2:49 PM, Robert Haas  wrote:
> 
> On Mon, Oct 24, 2016 at 1:14 PM, Jonathan Katz
>  wrote:
>> Hello,
>> 
>> Below is the draft of the press release for the update this Thursday:
>> 
>> https://git.postgresql.org/gitweb/?p=press.git;a=blob;f=update_releases/current/update_201610.md;h=aac4d0b36f3f3017d319ac617eff901efe0c10c0;hb=880dc99766ee0e608e95d9c0f36ce3cde59f470f
> 
> The discussion of truncating the visibility map is repeated twice,
> once at the top under "pg_upgrade issues on big-endian machines" and
> again at the bottom under "Updating".  We should only have it there
> once.  Also, the SQL we're proposing should actually be tested by
> someone before posting it - pg_truncate_visibility_map() takes a
> mandatory argument, so calling it with no arguments will not work.

I thought it might be good to repeat those instructions so people are clear 
that there is an actionable step post upgrade.  I’m happy to leave it in just 
the update section, but make a reference to say “Please see the “Update” 
section for post-install steps.”

Jonathan



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Peter Eisentraut
On 10/24/16 11:57 AM, Robert Haas wrote:
> Today, since the host part can't include a
> port specifier, it's regarded as part of the IP address, and I think
> it would probably be a bad idea to change that, as I believe Victor's
> patch would.  He seems to have it in mind that we could allow things
> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
> helpful for the future but doesn't avoid changing the meaning of
> connection strings that work today.

Let's keep in mind here that the decision to allow database names to
contain a connection parameter substructure has caused some security
issues.  Let's not create more levels of ambiguity and the need to pass
around override flags.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Peter Eisentraut
On 10/24/16 6:36 AM, Thom Brown wrote:
> So should it be the case that it disallows UNIX socket addresses
> entirely?  I can't imagine a list of UNIX socket addresses being that
> useful.

But maybe a mixed list of Unix-domain sockets and TCP connections?

The nice thing is that is it currently transparent and you don't have to
wonder.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-10-24 Thread Peter Eisentraut
On 10/12/16 7:58 PM, Robert Haas wrote:
> I don't think it's wrong that the handling is done there, though.  The
> process that is registering the background worker is well-placed to
> check whether there are already too many, and if it does not then the
> slot is consumed at least temporarily even if it busts the cap.  On
> the flip side, the postmaster is the only process that is well-placed
> to know when a background worker terminates.  The worker process
> itself can't be made responsible for it, as you suggest below, because
> it may never even start up in the first place (e.g. fork() returns
> EAGAIN).  And the registering process can't be made responsible,
> because it might die before the worker.

Those are valid technical points.  I have not worked out any alternatives.

I'm concerned that all this makes background workers managed by
extensions second-class citizens.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Bruce Momjian
On Mon, Oct 24, 2016 at 03:03:19PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > Fabien COELHO wrote:
> >> I find it annoying that "//" comments are not supported, or having to
> >> declare variables at the beginning of a block instead of when first used...
> 
> > I think some c99 features would be nice (variadic macros for one), but
> > those particular two get a big "meh" from me.
> 
> Yeah.  Personally, I'd want to continue the rule against // comments just
> as a matter of maintaining stylistic consistency.  Code that's got a
> random mishmash of // and /* comments looks messy, if you ask me.
> 
> An alternative that would be worth considering is to adopt a uniform
> rule of // for line-ending comments and /* for all other uses.  We'd
> have to teach pgindent about that, and I dunno how hard that is.

I think it would be easy to teach pg_indent.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python adding support for multi-dimensional arrays

2016-10-24 Thread Pavel Stehule
Hi

2016-10-14 10:53 GMT+02:00 Heikki Linnakangas :

> On 10/11/2016 08:56 AM, Pavel Stehule wrote:
>
>> 2016-10-11 7:49 GMT+02:00 Heikki Linnakangas :
>>
>> Unfortunately there are cases that are fundamentally ambiguous.
>>>
>>> create type comptype as (intarray int[]);
>>> create function array_return() returns comptype[] as $$
>>>   return 1;
>>> $$ language plpython;
>>>
>>> What does the function return? It could be two-dimension array of
>>> comptype, with a single-dimension intarray, or a single-dimension
>>> comptype,
>>> with a two-dimension intarray.
>>>
>>> We could resolve it for simpler cases, but not the general case. The
>>> simple cases would probably cover most things people do in practice. But
>>> if
>>> the distinction between a tuple and a list feels natural to Python
>>> programmers, I think it would be more clear in the long run to have
>>> people
>>> adjust their applications.
>>>
>>
>> I agree. The distinction is natural - and it is our issue, so we don't
>> distinguish strongly.
>>
>
> Ok, let's do that then. Here is a patch set that does that. The first is
> the main patch. The second patch adds some code to give a hint, if you do
> that thing that whose behavior changed. That code isn't very pretty, but I
> think a good error message is absolutely required, if we are to make this
> change. Does anyone have better suggestions on how to catch the common
> cases of that?
>
> Please review. Are the docs and the error messages now clear enough on
> this? We'll need a mention in the release notes too, when it's time for
> that.
>

The error message is clear.

I tested patches - and the regression test is broken (is not actualized)

+ -- Starting with PostgreSQL 10, a composite type in an array cannot be
represented as
+ -- a Python list, because it's ambiguous with multi-dimensional arrays.
So this
+ -- throws an error now. The error should contain a useful hint on the
issue.
+ CREATE FUNCTION composite_type_as_list()  RETURNS type_record[] AS $$
+   return [['first', 1]];
+ $$ LANGUAGE plpythonu;
+ SELECT * FROM composite_type_as_list();
+ ERROR:  malformed record literal: "first"
+ DETAIL:  Missing left parenthesis.
+ HINT:  To return a composite type in an array, return the composite type
as a Python tuple, e.g. "[('foo')]"
+ CONTEXT:  while creating return value
+ PL/Python function "composite_type_as_list"

I tested Pyhon 3.5 and 2.7 and there are not any other issues

There are no new tests for multidimensional array of composites - there is
only new negative test.

Regards

Pavel


>
> - Heikki
>
>


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Victor Wagner
On Wed, 19 Oct 2016 12:04:27 -0400
Robert Haas  wrote:

> On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner 
> wrote:
> > On Thu, 13 Oct 2016 12:30:59 +0530
> > Mithun Cy  wrote:  
> >> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner 
> >> wrote:
> >> Okay but for me consistency is also important. Since we agree to
> >> disagree on some of the comments and others have not expressed any
> >> problem I am moving it to committer.  
> >
> > Thank you for your efforts improving my patch  
> 
> I haven't deeply reviewed this patch, but on a quick read-through it
> certainly seems to need a lot of cleanup work.
> 

I've did some cleanup, i.e. running pgindent, spell-checking comments
and indenting sgml and attaching cleaned up version here.

I've also re-added test file t/001-multihost.pl which get lost from
previous post.

Thanks for your suggestions.



-- 
   Victor Wagner 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 4e34f00..b7c62d2 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomtarget_server_type=any
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be several host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+There can be more than one host parameter in
+the connection string. In this case these hosts would be considered
+alternate entries into same database and if connection to first one
+fails, the second would be attemped, and so on. This can be used
+for high availability clusters or for load balancing. See the
+ parameter. This feature
+works for TCP/IP host names only.
+   
+   
+The network host name can be accompanied by a port number, separated by
+colon. This port number is used only when connected to
+this host. If there is no port number, the port specified in the
+ parameter would be used instead.
+   
   
  
 
@@ -933,7 +955,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 is used to identify the connection in ~/.pgpass (see
 ).

-

 Without either a host name or host address,
 libpq will connect using a
@@ -943,6 +964,43 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname

   
 
+  
+   hostorder
+   
+   
+Specifies how to choose the host from the list of alternate hosts,
+specified in the  parameter.
+   
+   
+If the value of this argument is sequential (the
+default) connections to the hosts will be attempted in the order in
+which they are specified.
+   
+   
+If the value is random, the host to connect to
+will be randomly picked from the list. It allows load balacing between
+several cluster nodes. However, PostgreSQL doesn't currently support
+multimaster clusters. So, without the use of third-party products,
+only read-only connections can take advantage from
+load-balancing. See 
+   
+   
+  
+
+  
+   target_server_type
+
+
+ If this parameter is master, upon
+ successful connection the host is checked to determine whether
+ it is in a recovery state.  If it is, it then tries next host
+ in the connection string. If this parameter is
+ any, connection to standby nodes are
+ considered successful. 
+
+
+  
+
   
port

@@ -985,7 +1043,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
   
  
-
  
   

Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Alvaro Herrera  writes:
> > Fabien COELHO wrote:
> >> I find it annoying that "//" comments are not supported, or having to
> >> declare variables at the beginning of a block instead of when first used...
> 
> > I think some c99 features would be nice (variadic macros for one), but
> > those particular two get a big "meh" from me.
> 
> Yeah.  Personally, I'd want to continue the rule against // comments just
> as a matter of maintaining stylistic consistency.  Code that's got a
> random mishmash of // and /* comments looks messy, if you ask me.

+1

> An alternative that would be worth considering is to adopt a uniform
> rule of // for line-ending comments and /* for all other uses.  We'd
> have to teach pgindent about that, and I dunno how hard that is.

I'm not sure it's worth the resulting mess.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Press Release Draft - 2016-10-27 Cumulative Update

2016-10-24 Thread Tom Lane
Jonathan Katz  writes:
> Below is the draft of the press release for the update this Thursday:

> https://git.postgresql.org/gitweb/?p=press.git;a=blob;f=update_releases/current/update_201610.md;h=aac4d0b36f3f3017d319ac617eff901efe0c10c0;hb=880dc99766ee0e608e95d9c0f36ce3cde59f470f
>  
> 

Couple thoughts:

* I don't believe that the FSM truncation issue is specific to being "in
recovery".  The corruption itself would happen during crash/restart on
a master, or while a standby is following a WAL stream, but the effects
would be seen later.  Suggest just taking out "in recovery" from the
description.

* Please clarify that the pg_upgrade VM issue is only in 9.6.0.

* The recipe for recovery from the VM issue is oversimplified.
Rather than trying to explain how to fix it here, I'd suggest
pointing to the wiki page we've created about that,
https://wiki.postgresql.org/wiki/Visibility_Map_Problems

Looks good to me otherwise.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Tom Lane
Alvaro Herrera  writes:
> Fabien COELHO wrote:
>> I find it annoying that "//" comments are not supported, or having to
>> declare variables at the beginning of a block instead of when first used...

> I think some c99 features would be nice (variadic macros for one), but
> those particular two get a big "meh" from me.

Yeah.  Personally, I'd want to continue the rule against // comments just
as a matter of maintaining stylistic consistency.  Code that's got a
random mishmash of // and /* comments looks messy, if you ask me.

An alternative that would be worth considering is to adopt a uniform
rule of // for line-ending comments and /* for all other uses.  We'd
have to teach pgindent about that, and I dunno how hard that is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Press Release Draft - 2016-10-27 Cumulative Update

2016-10-24 Thread Robert Haas
On Mon, Oct 24, 2016 at 1:14 PM, Jonathan Katz
 wrote:
> Hello,
>
> Below is the draft of the press release for the update this Thursday:
>
> https://git.postgresql.org/gitweb/?p=press.git;a=blob;f=update_releases/current/update_201610.md;h=aac4d0b36f3f3017d319ac617eff901efe0c10c0;hb=880dc99766ee0e608e95d9c0f36ce3cde59f470f

The discussion of truncating the visibility map is repeated twice,
once at the top under "pg_upgrade issues on big-endian machines" and
again at the bottom under "Updating".  We should only have it there
once.  Also, the SQL we're proposing should actually be tested by
someone before posting it - pg_truncate_visibility_map() takes a
mandatory argument, so calling it with no arguments will not work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM corruption leading to errors

2016-10-24 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Also, we could at least discount the FSM root page and first intermediate
>> page, no?  That is, the upper limit could be
>> 
>> pg_relation_size(oid::regclass, 'fsm') / 2 - 
>> 2*current_setting('block_size')::BIGINT
>> 
>> I think this is a worthwhile improvement because it reduces the time spent
>> on small relations.  For me, the query as given takes 9 seconds to examine
>> the regression database, which seems like a lot.  Discounting two pages
>> reduces that to 20 ms.

> Hah, good one.  We spent some time thinking about subtracting some value
> to make the value more accurate but it didn't occur to me to just use
> constant two.

I got the arithmetic wrong in the above, it should be like

(pg_relation_size(oid::regclass, 'fsm') - 
2*current_setting('block_size')::BIGINT) / 2

With that, the runtime on HEAD's regression DB is about 700 ms, which is
still a nice win over 9000 ms.

I've put up draft wiki pages about these two problems at

https://wiki.postgresql.org/wiki/Free_Space_Map_Problems
https://wiki.postgresql.org/wiki/Visibility_Map_Problems

(thanks to Michael Paquier for initial work on the first one).
They're meant to be reasonably generic about FSM/VM problems
rather than only being about our current bugs.  Please review.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Alvaro Herrera
Fabien COELHO wrote:

> I find it annoying that "//" comments are not supported, or having to
> declare variables at the beginning of a block instead of when first used...

I think some c99 features would be nice (variadic macros for one), but
those particular two get a big "meh" from me.

.oO( I wonder if it'd be possible to make ereport() an inline function ... )

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM corruption leading to errors

2016-10-24 Thread Alvaro Herrera
Tom Lane wrote:

> Ah, scratch that, after rereading the FSM README I see it's correct,
> because there's a binary tree within each page; I'd only remembered
> that there was a search tree of pages.
> 
> Also, we could at least discount the FSM root page and first intermediate
> page, no?  That is, the upper limit could be
> 
>   pg_relation_size(oid::regclass, 'fsm') / 2 - 
> 2*current_setting('block_size')::BIGINT
> 
> I think this is a worthwhile improvement because it reduces the time spent
> on small relations.  For me, the query as given takes 9 seconds to examine
> the regression database, which seems like a lot.  Discounting two pages
> reduces that to 20 ms.

Hah, good one.  We spent some time thinking about subtracting some value
to make the value more accurate but it didn't occur to me to just use
constant two.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-24 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Mon, Oct 24, 2016 at 11:59:42AM -0300, Alvaro Herrera wrote:

> > There is one difference though, which is that the really destructive
> > use of pg_resetxlog is the one that removes pg_xlog files.  The other
> > uses that simply set flags in the control file are not as bad -- you can
> > simply go back to what the original value was.  I think we would only
> > need the obnoxious string requirement in the most dangerous mode.

> OK, but we are going need to document that this special flag is only
> needed for certain option uses.

Perhaps, but not in too much detail; just that "users need to follow
on-screen indications" or some such would be enough.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Press Release Draft - 2016-10-27 Cumulative Update

2016-10-24 Thread Jonathan Katz
Hello,

Below is the draft of the press release for the update this Thursday:

https://git.postgresql.org/gitweb/?p=press.git;a=blob;f=update_releases/current/update_201610.md;h=aac4d0b36f3f3017d319ac617eff901efe0c10c0;hb=880dc99766ee0e608e95d9c0f36ce3cde59f470f
 


Per the draft of the release notes, I chose to expand on the data corruption 
issues after the release summary.  I did include the DISTINCT aggregate crashes 
at the top of the list of fixes but do you think those should be highlighted 
too?

If you have feedback, please leave it within the next 24 hours (following 
timeline here: 
https://wiki.postgresql.org/wiki/UpdateReleaseDrafting#Tuesday_Night_or_Wednesday_Morning_.28US_Time.29
 
)
 so I can incorporate it prior to release.

Thanks!

Jonathan

Re: [HACKERS] FSM corruption leading to errors

2016-10-24 Thread Pavan Deolasee
On Mon, Oct 24, 2016 at 9:47 PM, Tom Lane  wrote:

>
>
> Also, we could at least discount the FSM root page and first intermediate
> page, no?  That is, the upper limit could be
>
> pg_relation_size(oid::regclass, 'fsm') / 2 -
> 2*current_setting('block_size')::BIGINT
>
>
+1 for doing that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-24 Thread Albe Laurenz
Tom Lane wrote:
> I poked at this a little bit.  AFAICT, the only actual cross-file
> references are in contrib/ltree/, which has quite a few.  Maybe we
> could hold our noses and attach PGDLLEXPORT to the declarations in
> ltree.h.
> 
> hstore's HSTORE_POLLUTE macro would also need PGDLLEXPORT-ification,
> but that's just within the macro so it wouldn't be too ugly.
> 
> The other problem is xml2's xml_is_well_formed(), which duplicates
> the name of a core backend function.  If we PGDLLEXPORT-ify that,
> we get conflicts against the core's declaration in utils/xml.h.
> I do not see any nice way to dodge that.  Maybe we could decide that
> six releases' worth of backwards compatibility is enough and
> just drop it from xml2 --- AFAICS, that would only break applications
> that had never updated to the extension-ified version of xml2 at all.

I guess it would also be possible, but undesirable, to decorate the
declaration in utils/xml.h.

Anyway, I have prepared a patch along the lines you suggest.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch
Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Logical decoding timeline following take II

2016-10-24 Thread Petr Jelinek
Hi Craig,

On 01/09/16 06:08, Craig Ringer wrote:
> Hi all
> 
> Attached is a rebased and updated logical decoding timeline following
> patch for 10.0.
> 
> This is a pre-requisite for the pending work on logical decoding on
> standby servers and simplified failover of logical decoding.
> 

I went over this and it looks fine to me, I only rebased the patch on
top of current version (we renamed pg_xlog which broke the tests) and
split the test harness to separate patch. Otherwise I would consider
this to be ready for committer.

I think this should go in early so that there is enough time in the
cycle to uncover potential issues if there are any, even though it looks
all correct to me.

> 
> The test harness code will become unnecessary when proper support for
> logical failover or logical decoding on standby is added, so I'm not
> really sure it should be committed.

Yeah as I said above I split out the test harness for this reason. The
good thing is that when the followup patches get in the test harness
should be easy to removed as the changes are very localized.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 3c775ee99820ea3e915e1a859fa399651e170ffc Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Mon, 24 Oct 2016 17:40:40 +0200
Subject: [PATCH 1/2] Follow timeline switches in logical decoding

When decoding from a logical slot, it's necessary for xlog reading to
be able to read xlog from historical (i.e. not current) timelines.
Otherwise decoding fails after failover to a physical replica because
the oldest still-needed archives are in the historical timeline.

Supporting logical decoding timeline following is a pre-requisite for
logical decoding on physical standby servers. It also makes it
possible to promote a replica with logical slots to a master and
replay from those slots, allowing logical decoding applications to
follow physical failover.

Logical slots cannot actually be created on a replica without use of
the low-level C slot management APIs so this is mostly foundation work
for subsequent changes to enable logical decoding on standbys.

This commit includes a module in src/test/modules with functions to
manipulate the slots (which is not otherwise possible in SQL code) in
order to enable testing, and a new test in src/test/recovery to ensure
that the behavior is as expected.

Note that an earlier version of logical decoding timeline following
was committed to 9.5 as 24c5f1a103ce, 3a3b309041b0, 82c83b337202, and
f07d18b6e94d. It was then reverted by c1543a81a7a8 just after 9.5
feature freeze when issues were discovered too late to safely fix them
in the 9.5 release cycle.

The prior approach failed to consider that a record could be split
across pages that are on different segments, where the new segment
contains the start of a new timeline. In that case the old segment
might be missing or renamed with a .partial suffix.

This patch reworks the logic to be page-based and in the process
simplify how the last timeline for a segment is looked up.

Slot timeline following only works in a backend. Frontend support can
be aded separately, where it could be useful for pg_xlogdump etc once
support for timeline.c, List, etc is added for frontend code.
---
 src/backend/access/transam/xlogutils.c | 207 +++--
 src/backend/replication/logical/logicalfuncs.c |  12 +-
 src/include/access/xlogreader.h|  11 ++
 3 files changed, 209 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/transam/xlogutils.c 
b/src/backend/access/transam/xlogutils.c
index 51a8e8d..014978f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -19,6 +19,7 @@
 
 #include 
 
+#include "access/timeline.h"
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
@@ -660,6 +661,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, 
Size count)
/* state maintained across calls */
static int  sendFile = -1;
static XLogSegNo sendSegNo = 0;
+   static TimeLineID sendTLI = 0;
static uint32 sendOff = 0;
 
p = buf;
@@ -675,7 +677,8 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, 
Size count)
startoff = recptr % XLogSegSize;
 
/* Do we need to switch to a different xlog segment? */
-   if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+   if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo) ||
+   sendTLI != tli)
{
charpath[MAXPGPATH];
 
@@ -702,6 +705,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, 
Size count)
path)));
}
sendOff = 0;
+   sendTLI = tli;
  

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Robert Haas
On Wed, Oct 19, 2016 at 7:26 PM, Peter van Hardenberg  wrote:
> Supporting different ports on different servers would be a much appreciated
> feature (I can't remember if it was Kafka or Cassandra that didn't do this
> and it was very annoying.)
>
> Remember, as the connection string gets more complicated, psql supports the
> Postgres URL format as a single command-line argument and we may want to
> begin encouraging people to use that syntax instead.

While I was experimenting with this today, I discovered a problem of
interpretation related to IPv6 addresses.  Internally, a postgresql://
URL and a connection string are converted into the same format, so
postgresql://a,b/ means just the same thing as host=a,b.  I thought
that could similarly decide that postgresql://a:123,b:456/ is going to
get translated to host=a:123,b:456 and then there can be further code
to parse that into a list of host-and-port pairs.  However, if you do
that, then something like host=1:2:3::4:5:6 is fundamentally
ambiguous.  That :6 is equally valid either as part of the IP address
or as a trailing port number specification, and there's no a priori
way to know which it is.  Today, since the host part can't include a
port specifier, it's regarded as part of the IP address, and I think
it would probably be a bad idea to change that, as I believe Victor's
patch would.  He seems to have it in mind that we could allow things
like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be
helpful for the future but doesn't avoid changing the meaning of
connection strings that work today.

So now I think that to make this work correctly, we're going to need
to change both the URL parser and also add parsing for the host and
port.  Let's say the user says this:

postgresql://[1::2]:3,[4::5],[6::7]::8/

What I think we need to do is translate that into this:

host=1::2,4::5,6::7 port=3,,8

Note the double-comma, indicating a blank port number for the second
URL component.  When we parse those host and port strings, we can
match up each host with the corresponding port number again.  Of
course, the user could also skip the URL format and directly specify a
connection string.  And then they might write one where the host and
port parts don't have the same number of components, like this:

host=a,b,c port=3,4
or
host=a,b port=3,4,5

It is obvious what is meant if multiple hosts are given but only a
single port number is specified; it is also obvious what is meant if
the number of ports is equal to the number of hosts.  It is not
obvious what it means if there are multiple ports but the number
doesn't equal the number of hosts.  I think we can either treat that
case as an error or else do the following: if there are extra port
specifiers, ignore them; if there are extra host specifiers, use the
last port in the list for all of the remaining hosts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM corruption leading to errors

2016-10-24 Thread Tom Lane
Michael Paquier  writes:
> Release notes refer to this page for methods to fix corrupted instances:
> https://wiki.postgresql.org/wiki/Free_Space_Map_Problems
> I just typed something based on Pavan's upper method, feel free to
> jump in and improve it as necessary.

Thanks for drafting that, but isn't the query wrong?
Specifically the bit about

SELECT blkno, pg_freespace(oid::regclass, blkno)
FROM generate_series(pg_relation_size(oid::regclass) / 
current_setting('block_size')::BIGINT,
 pg_relation_size(oid::regclass, 'fsm') / 2) AS blkno

It looks to me like this is approximating the highest block number that
could possibly have an FSM entry as size of the FSM fork (in bytes)
divided by 2.  But the FSM stores one byte per block.  There is overhead
for the FSM search tree, but in a large relation it's not going to be as
much as a factor of 2.  So I think that to be conservative we need to
drop the "/ 2".  Am I missing something?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM corruption leading to errors

2016-10-24 Thread Tom Lane
I wrote:
> It looks to me like this is approximating the highest block number that
> could possibly have an FSM entry as size of the FSM fork (in bytes)
> divided by 2.  But the FSM stores one byte per block.  There is overhead
> for the FSM search tree, but in a large relation it's not going to be as
> much as a factor of 2.  So I think that to be conservative we need to
> drop the "/ 2".  Am I missing something?

Ah, scratch that, after rereading the FSM README I see it's correct,
because there's a binary tree within each page; I'd only remembered
that there was a search tree of pages.

Also, we could at least discount the FSM root page and first intermediate
page, no?  That is, the upper limit could be

pg_relation_size(oid::regclass, 'fsm') / 2 - 
2*current_setting('block_size')::BIGINT

I think this is a worthwhile improvement because it reduces the time spent
on small relations.  For me, the query as given takes 9 seconds to examine
the regression database, which seems like a lot.  Discounting two pages
reduces that to 20 ms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM corruption leading to errors

2016-10-24 Thread Pavan Deolasee
On Mon, Oct 24, 2016 at 9:34 PM, Tom Lane  wrote:

>
> SELECT blkno, pg_freespace(oid::regclass, blkno)
> FROM generate_series(pg_relation_size(oid::regclass) /
> current_setting('block_size')::BIGINT,
>  pg_relation_size(oid::regclass, 'fsm') / 2) AS
> blkno
>
> It looks to me like this is approximating the highest block number that
> could possibly have an FSM entry as size of the FSM fork (in bytes)
> divided by 2.  But the FSM stores one byte per block.  There is overhead
> for the FSM search tree, but in a large relation it's not going to be as
> much as a factor of 2.  So I think that to be conservative we need to
> drop the "/ 2".  Am I missing something?
>
>
I went by these comments in fsm_internals.h, which suggest that the
SlotsPerFSMPage are limited to somewhat less than BLCKSZ divided by 2.

/*
 * Number of non-leaf and leaf nodes, and nodes in total, on an FSM page.
 * These definitions are internal to fsmpage.c.
 */
#define NodesPerPage (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - \
  offsetof(FSMPageData, fp_nodes))

#define NonLeafNodesPerPage (BLCKSZ / 2 - 1)
#define LeafNodesPerPage (NodesPerPage - NonLeafNodesPerPage)

/*
 * Number of FSM "slots" on a FSM page. This is what should be used
 * outside fsmpage.c.
 */
#define SlotsPerFSMPage LeafNodesPerPage


Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Send catalog_xmin separately in hot standby feedback

2016-10-24 Thread Petr Jelinek
Hi Craig,

On 05/09/16 11:28, Craig Ringer wrote:
> On 5 September 2016 at 14:44, Craig Ringer  wrote:
>> On 5 September 2016 at 12:40, Craig Ringer  wrote:
>>> Hi all
>>>
>>> Currently hot standby feedback sends GetOldestXmin()'s result to the
>>> upstream as the required xmin. GetOldestXmin() returns a slot's
>>> catalog_xmin if that's the lowest xmin on the system.
>>
>>
>> Note that this patch changes the API to GetOldestXmin(), adding a new
>> boolean to allow it to disregard the catalog_xmin of slots.
>>
>> Per Simon's feedback I'm going to split that out into a separate
>> patch, so will post a follow-up split one soon as the series.
> 

Here is my review of them.

> Now formatted a series:
> 
> 1.  Send catalog_xmin in hot standby feedback protocol

> + xmin_epoch = nextEpoch;
>   if (nextXid < xmin)
> - nextEpoch--;
> + xmin_epoch --;
> + catalog_xmin_epoch = nextEpoch;
> + if (nextXid < catalog_xmin)
> + catalog_xmin_epoch --;

Don't understand why you keep the nextEpoch here, it's not used anywhere
that I can see, you could just as well use the xmin_epoch directly if
that's how you want to name it.

> + /*
> +  * A 10.0+ standby's walsender passes the lowest catalog xmin of any
> +  * replication slot up to the master.
> +  */
> + feedbackCatalogXmin = pq_getmsgint(_message, 4);
> + feedbackCatalogEpoch = pq_getmsgint(_message, 4);
>  

I'd be more interested to know why this is sent rather than it's sent
since version 10+ in this comment.

> 2.  Make walsender respect catalog_xmin in hot standby feedback messages

> + if (TransactionIdIsNormal(feedbackCatalogXmin)
> + && TransactionIdPrecedes(feedbackCatalogXmin, 
> feedbackXmin))
> + {
> + MyPgXact->xmin = feedbackCatalogXmin;
> + }
> + else
> + {
> + MyPgXact->xmin = feedbackXmin;
> + }

This not how we usually use the {} brackets (there are some more
instances of using them for one line of code in this particular commit).

> 3.  Allow GetOldestXmin(...) to optionally disregard the catalog_xmin
> By ignoring slot catalog_xmin in the GetOldestXmin() call then
> separately calling ProcArrayGetReplicationSlotXmin() to get the
> catalog_xmin to  we might produce a catalog xmin slightly later than
> what was in the procarray at the time we previously called
> GetOldestXmin() to examine backend/session state. ProcArrayLock is
> released so it can be advanced in-between the calls. That's harmless -
> it isn't necessary for the reported catalog_xmin to be exactly
> consistent with backend state. If it advances it's safe to report the
> new position since we know the confirmed positions are on-disk
> locally.
> 
> The alternative here is to extend GetOldestXmin() to take an out-param
> to report the slot catalog xmin. That really just duplicates  the
> functionality of ProcArrayGetReplicationSlotXmin but means we can grab
> it within a single ProcArray lock. Variants of GetOldestXmin and
> ProcArrayGetReplicationSlotXmin that take an already-locked flag would
> work too, but would hold the lock across parts of GetOldestXmin() that
> currently don't retain it. I could also convert the current boolean
> param ignoreVacuum into a flags argument instead of adding another
> boolean. No real preference from me.
> 

I would honestly prefer the change to GetOldestXmin to return the
catalog_xmin. It seems both cleaner and does less locking.

> 4.  Send catalog_xmin separately in hot_standby_feedback messages
> 

This looks okay (provided the change above).

In general it's simpler patch than I expected which is good. But it
would be good to have some tests.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-24 Thread Bruce Momjian
On Mon, Oct 24, 2016 at 11:59:42AM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Sat, Oct 22, 2016 at 11:18:28PM +0300, Greg Stark wrote:
> 
> > > I think the apt-get behaviour was specifically designed to ensure it
> > > couldn't easily be put into a script which I would have said was
> > > desirable -- except I suspect there are situations where Postgres
> > > database scripts need to do a resetxlog. I'm not sure I can think of
> > > any examples offhand but I wouldn't be too surprised.
> > 
> > Yes, pg_upgrade has eight calls to pg_resetxlog to set various value.
> 
> There is one difference though, which is that the really destructive
> use of pg_resetxlog is the one that removes pg_xlog files.  The other
> uses that simply set flags in the control file are not as bad -- you can
> simply go back to what the original value was.  I think we would only
> need the obnoxious string requirement in the most dangerous mode.
> 
> Alternatively, we could separate the tool in two different executables.

OK, but we are going need to document that this special flag is only
needed for certain option uses.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-24 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Sat, Oct 22, 2016 at 11:18:28PM +0300, Greg Stark wrote:

> > I think the apt-get behaviour was specifically designed to ensure it
> > couldn't easily be put into a script which I would have said was
> > desirable -- except I suspect there are situations where Postgres
> > database scripts need to do a resetxlog. I'm not sure I can think of
> > any examples offhand but I wouldn't be too surprised.
> 
> Yes, pg_upgrade has eight calls to pg_resetxlog to set various value.

There is one difference though, which is that the really destructive
use of pg_resetxlog is the one that removes pg_xlog files.  The other
uses that simply set flags in the control file are not as bad -- you can
simply go back to what the original value was.  I think we would only
need the obnoxious string requirement in the most dangerous mode.

Alternatively, we could separate the tool in two different executables.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-10-24 Thread Amit Kapila
On Mon, Oct 24, 2016 at 8:00 PM, Amit Kapila  wrote:
> On Thu, Sep 29, 2016 at 6:04 AM, Robert Haas  wrote:
>> On Wed, Sep 28, 2016 at 3:04 PM, Robert Haas  wrote:
>
>
> Thanks for the valuable feedback.
>

Forgot to mention that in addition to fixing the review comments, I
had made an additional change to skip the dead tuple while copying
tuples from old bucket to new bucket during split.  This was
previously not possible because split and scan were blocking
operations (split used to take Exclusive lock on bucket and Scan used
to hold Share lock on bucket till the operation ends), but now it is
possible and during scan some of the tuples can be marked as dead.
Similarly during squeeze operation, skipping dead tuples while moving
tuples across buckets.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] repmgr: this node should be a standby

2016-10-24 Thread rugging24
Hello,
I am using repmgr 3.2 (PostgreSQL 9.6.0), postgresql-9.5.
I have a master, a slave and a third server for my setup. The third server
is used to house the rep mgr database.
my config is as below : 

===MASTER=
cluster=test
node=1
node_name=node1
conninfo='host=192.168.2.33 user=repuser password=repuser
dbname=replication_manager connect_timeout=10'
loglevel=INFO
logfacility=STDERR
logfile='/var/log/postgresql/repmgr.log'
pg_bindir=/usr/lib/postgresql/9.5/bin/

#service control commands

service_start_command = service postgresql start
service_stop_command = service postgresql stop
service_restart_command = service postgresql restart
service_reload_command = service postgresql reload
service_promote_command = pg_ctlcluster 9.5 main promote

pg_basebackup_options='--xlog-method=s'


 Slave =
cluster=test
node=2
node_name=node2
conninfo='host=192.168.2.33 user=repuser password=repuser
dbname=replication_manager connect_timeout=10'
loglevel=INFO
logfacility=STDERR
logfile='/var/log/postgresql/repmgr.log'
pg_bindir=/usr/lib/postgresql/9.5/bin/

#service control commands

service_start_command = service postgresql start
service_stop_command = service postgresql stop
service_restart_command = service postgresql restart
service_reload_command = service postgresql reload
service_promote_command = pg_ctlcluster 9.5 main promote

pg_basebackup_options='--xlog-method=s'


= The third server===
this holds the rep mgr database .


- I was able to register the master to the cluster 
master node correctly registered for cluster test with id 1 (conninfo: host=
192.168.2.31 user=repuser password=repuser dbname=repmgr connect_timeout=10)


But when I tried registering the slave node, I get the following error :

this node should be a standby (host= 192.168.2.33 user=repuser
password=repuser dbname=replication_manager connect_timeout=10)


Is there a way to run repmgr with the rep mgr table on a separate server ?


regards

rugging24



--
View this message in context: 
http://postgresql.nabble.com/repmgr-this-node-should-be-a-standby-tp5927408.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On conflict update & hint bits

2016-10-24 Thread Kevin Grittner
On Sun, Oct 23, 2016 at 3:42 PM, Tom Lane  wrote:

> The business about not throwing a serialization failure due to actions
> of our own xact does seem worth fixing now, but if I understand correctly,
> we can deal with that by adding a test for xmin-is-our-own-xact into
> the existing code structure.  I propose doing that much (and adding
> Munro's regression test case) and calling it good for today.

Thanks.  This is the only part of it that I consider an actual
*bug* (since you can retry the serialization failure forever and
never move on because there is no other transaction involved which
can finish to clear the problem) as opposed to an opportunity to
optimize (by reducing false positive serialization failures
actually involving other transactions).  I never saw anything in
the literature addressing the intersection of UPSERT and SSI, and I
think we do need to think about and discuss this a bit more before
we can be sure of the best fix.  This is probably not thread on
which to have that discussion.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-24 Thread Bruce Momjian
On Sat, Oct 22, 2016 at 11:18:28PM +0300, Greg Stark wrote:
> On Fri, Oct 21, 2016 at 9:03 PM, Stephen Frost  wrote:
> > WARNING: The following essential packages will be removed.
> > This should NOT be done unless you know exactly what you are doing!
> >   login
> > 0 upgraded, 0 newly installed, 1 to remove and 71 not upgraded.
> > After this operation, 1,212 kB disk space will be freed.
> > You are about to do something potentially harmful.
> > To continue type in the phrase 'Yes, do as I say!'
> 
> Another case was:
>glxgears -iacknowledgethatthistoolisnotabenchmark
> 
> Which was required to get it to print the FPS for a while. The problem
> -- and also the advantage -- of this is that it's scriptable. That
> means people can still put it in recipe books and scripts and others
> can copy it without being aware what it does or even that they're
> doing it.
> 
> I think the apt-get behaviour was specifically designed to ensure it
> couldn't easily be put into a script which I would have said was
> desirable -- except I suspect there are situations where Postgres
> database scripts need to do a resetxlog. I'm not sure I can think of
> any examples offhand but I wouldn't be too surprised.

Yes, pg_upgrade has eight calls to pg_resetxlog to set various value.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with track_commit_timestamp and server restart

2016-10-24 Thread Tom Lane
Alvaro Herrera  writes:
> Now let's discuss a release note entry for 9.5 and 9.6, shall we?

The text in your commit message seems sufficient from here.
On it now (although if somebody doesn't fix anon git PDQ, we're
not releasing today anyway).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with track_commit_timestamp and server restart

2016-10-24 Thread Alvaro Herrera
Now let's discuss a release note entry for 9.5 and 9.6, shall we?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with track_commit_timestamp and server restart

2016-10-24 Thread Alvaro Herrera
Craig Ringer wrote:

> I initially could'n't see this when tested on 8f1fb7d with a
> src/test/recovery/t test script. But it turns out it's OK on immediate
> shutdown and crash recovery, but not on clean shutdown and restart.

Oh, oops.

> The attached patch adds a TAP test to src/test/recovery to show this.

I moved it to src/test/modules/commit_ts and pushed.  src/test/recovery
is not currently run by buildfarm, except hamster, unless the bf client
has been patched since I last looked.  But that's not the reason I moved
it, but rather because the other location seemed better.  I also fixed
the wrongly pasted comment at the top of the file while at it.

If this disrupts the buildfarm in any way I will just revert it
immediately.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On conflict update & hint bits

2016-10-24 Thread Tom Lane
Konstantin Knizhnik  writes:
> Just for information: I know that you are working on this issue, but as 
> far as we need to proceed further with our testing of multimaster,
> I have done the following obvious changes and it fixes the problem (at 
> least this assertion failure is not happen any more):

This seems kind of the hard way --- why didn't you put the buffer lock
calls into ExecCheckHeapTupleVisible?

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8f1fb7d621b0e6bd2eb0ba2ac9634c5b5a03564b

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Victor Wagner
On Wed, 19 Oct 2016 20:15:38 -0400
Robert Haas  wrote:

> 
> Some more comments:
> 
> - I am pretty doubtful that the changes to connectOptions2() work as
> intended.  I think that's only going to be called before actualhost
> and actualport could possibly get set.  I don't think we keep
> reinvoking this function for every new host we try.

I would investigate this. Probably test cases for .pgpass file should
be added.

 
> - It's pretty clear that this isn't going to work if the host list
> includes a mix of hostnames and UNIX socket addresses.  The code that
> handles the UNIX socket address case is totally separate from the code
> that handles the multiple-hostname case.

Muitihost feature was never intended to work with unix-domain socket. 
May be it should be clarified in the docs, or even improved error
message emitted when unix socket occurs in the multiple host list.
 
> - This creates a sort of definitional problem for functions like
> PQhost().  The documentation says of this and other related functions:
> "These values are fixed for the life of the PGconn object."  But with
> this patch, that would no longer be true.  So at least the
> documentation needs an update.  Actually, though, I think the problem

I've missed this phrase, documentation should be updated.

> is more fundamental than that.  If the user asks for PQhost(conn), do
> they want the list of all hosts, or the one to which we're currently
> connected?  It might be either, depending on what they intend to do
> with the information.  If they mean to construct a new connection
> string, they might well want them all; but something like psql's
> \conninfo only shows the current one.  There's also the question of
> what "the current one" means if we're not connected to any server at
> all.  I'm not sure exactly how to solve these problems, but they need
> some thought.

I've thought that no one would need to reconstruct connect string from
conninfo object.  May be I've missed some use cases. 

Name PQhost suggest that it'll return just one host. So, I've decided
to interpret it as  'currently choosen host'. 

If we really need function which lists all the potentially connected
hosts, it should be new function. 

I hope this would introduce no backward incompatibility, because no
existing setup uses multihost connect strings.

Thanks for your critique.

Victor.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Victor Wagner
On Mon, 24 Oct 2016 11:36:31 +0100
Thom Brown  wrote:


> > - It's pretty clear that this isn't going to work if the host list
> > includes a mix of hostnames and UNIX socket addresses.  The code
> > that handles the UNIX socket address case is totally separate from
> > the code that handles the multiple-hostname case.  
> 
> So should it be the case that it disallows UNIX socket addresses
> entirely?  I can't imagine a list of UNIX socket addresses being that
> useful.

Really, patch was implemented as TCP-only from the beginning.

I know only one case, where list of UNIX sockets is useful - test suite.
I have to include modifications to PostgresNode.pm in the patch to
allow testing with TCP sockets.

Also, I can imagine that people would want include one unix socket into
list of TCP ones, i.e. one of replicas running on the same host as
application server. But patch doesn't support it. May be it should be
clarified in the documentation, that this is not supported.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-10-24 Thread Jeevan Chalke
On Sat, Oct 22, 2016 at 9:09 PM, Tom Lane  wrote:

> brolga is still not terribly happy with this patch: it's choosing not to
> push down the aggregates in one of the queries.  While I failed to
> duplicate that result locally, investigation suggests that brolga's result
> is perfectly sane; in fact it's not very clear why we're not getting that
> from multiple critters, because the plan brolga is choosing is not
> inferior to the expected one.
>
> The core of the problem is this subquery:
>
> contrib_regression=# explain verbose select min(13), avg(ft1.c1),
> sum(ft2.c1) from ft1 right join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12;
>
>  QUERY PLAN
> 
> 
> -
>  Foreign Scan  (cost=108.61..108.64 rows=1 width=44)
>Output: (min(13)), (avg(ft1.c1)), (sum(ft2.c1))
>Relations: Aggregate on ((public.ft1) INNER JOIN (public.ft2))
>Remote SQL: SELECT min(13), avg(r1."C 1"), sum(r2."C 1") FROM ("S 1"."T
> 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" =
> 12
> (4 rows)
>
> If you look at the estimate to just fetch the data, it's:
>
> contrib_regression=# explain verbose select ft1.c1, ft2.c1 from ft1 right
> join ft2 on (ft1.c1 = ft2.c1) where ft1.c1 = 12;
>   QUERY PLAN
> 
> --
>  Foreign Scan  (cost=100.55..108.62 rows=1 width=8)
>Output: ft1.c1, ft2.c1
>Relations: (public.ft1) INNER JOIN (public.ft2)
>Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN
> "S 1"."T 1" r2 ON (((r2."C 1" = 12)) AND ((r1."C 1" = 12
> (4 rows)
>
> Note we're expecting only one row out of the join.  Now the cost of doing
> three aggregates on a single row of input is not a lot.  Comparing these
> local queries:
>
> regression=# explain select min(13),avg(q1),sum(q2) from int8_tbl where
> q2=456;
>   QUERY PLAN
> ---
>  Aggregate  (cost=1.07..1.08 rows=1 width=68)
>->  Seq Scan on int8_tbl  (cost=0.00..1.06 rows=1 width=16)
>  Filter: (q2 = 456)
> (3 rows)
>
> regression=# explain select (q1),(q2) from int8_tbl where q2=456;
>QUERY PLAN
> -
>  Seq Scan on int8_tbl  (cost=0.00..1.06 rows=1 width=16)
>Filter: (q2 = 456)
> (2 rows)
>
> we seem to have startup = input cost + .01 and then another .01
> for total.  So the estimate to do the above remote scan and then
> aggregate locally should have been 108.63 startup and 108.64 total,
> give or take.  The estimate for aggregating remotely is a hair better,
> but it's not nearly better enough to guarantee that the planner won't
> see it as fuzzily the same cost.
>
> In short: the problem with this test case is that it's considering
> aggregation over only a single row, which is a situation in which
> pushing down the aggregate actually doesn't save us anything, because
> we're retrieving one row from the remote either way.  So it's not at all
> surprising that we don't get a stable plan choice.  The test query needs
> to be adjusted so that the aggregation is done over multiple rows,
> allowing fdw_tuple_cost to kick in and provide some daylight between
> the cost estimates.
>

Attached patch which performs aggrgation over 1000 rows as suggested by Tom.

I believe it will have a stable output/plan now.

Thanks


>
> regards, tom lane
>



-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


agg_push_down_fix_testcase.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-24 Thread Thom Brown
On 20 October 2016 at 01:15, Robert Haas  wrote:
>
> On Wed, Oct 19, 2016 at 12:04 PM, Robert Haas  wrote:
> > On Thu, Oct 13, 2016 at 5:53 AM, Victor Wagner  wrote:
> >> On Thu, 13 Oct 2016 12:30:59 +0530
> >> Mithun Cy  wrote:
> >>> On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner 
> >>> wrote:
> >>> Okay but for me consistency is also important. Since we agree to
> >>> disagree on some of the comments and others have not expressed any
> >>> problem I am moving it to committer.
> >>
> >> Thank you for your efforts improving my patch
> >
> > I haven't deeply reviewed this patch, but on a quick read-through it
> > certainly seems to need a lot of cleanup work.
>
> Some more comments:
>
> - I am pretty doubtful that the changes to connectOptions2() work as
> intended.  I think that's only going to be called before actualhost
> and actualport could possibly get set.  I don't think we keep
> reinvoking this function for every new host we try.
>
> - It's pretty clear that this isn't going to work if the host list
> includes a mix of hostnames and UNIX socket addresses.  The code that
> handles the UNIX socket address case is totally separate from the code
> that handles the multiple-hostname case.

So should it be the case that it disallows UNIX socket addresses
entirely?  I can't imagine a list of UNIX socket addresses being that
useful.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-24 Thread Dilip Kumar
On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar  wrote:
> On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
>  wrote:
>
>> In the results you've posted on 10/12, you've mentioned a regression with 32
>> clients, where you got 52k tps on master but only 48k tps with the patch (so
>> ~10% difference). I have no idea what scale was used for those tests,
>
> That test was with scale factor 300 on POWER 4 socket machine. I think
> I need to repeat this test with multiple reading to confirm it was
> regression or run to run variation. I will do that soon and post the
> results.

As promised, I have rerun my test (3 times), and I did not see any regression.
Median of 3 run on both head and with group lock patch are same.
However I am posting results of all three runs.

I think in my earlier reading, we saw TPS ~48K with the patch, but I
think over multiple run we get this reading with both head as well as
with patch.

Head:

run1:

transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 87784836
latency average = 0.656 ms
tps = 48769.327513 (including connections establishing)
tps = 48769.543276 (excluding connections establishing)

run2:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 91240374
latency average = 0.631 ms
tps = 50689.069717 (including connections establishing)
tps = 50689.263505 (excluding connections establishing)

run3:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 90966003
latency average = 0.633 ms
tps = 50536.639303 (including connections establishing)
tps = 50536.836924 (excluding connections establishing)

With group lock patch:
--
run1:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 87316264
latency average = 0.660 ms
tps = 48509.008040 (including connections establishing)
tps = 48509.194978 (excluding connections establishing)

run2:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 91950412
latency average = 0.626 ms
tps = 51083.507790 (including connections establishing)
tps = 51083.704489 (excluding connections establishing)

run3:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 90378462
latency average = 0.637 ms
tps = 50210.225983 (including connections establishing)
tps = 50210.405401 (excluding connections establishing)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-10-24 Thread Etsuro Fujita

On 2016/10/22 17:19, Ashutosh Bapat wrote:

Review for postgres-fdw-more-full-join-pushdown-v6 patch.

The patch applies cleanly and regression is clean (make check in
regress directory and that in postgres_fdw).


Thanks for the review!


Here are some comments.
1. Because of the following code change, for a joinrel we might end up using
attrs_used, which will be garbage for a join rel. The assumption that tlist can
not be NIL for a join relation is wrong. Even for a join relation, tlist can be
NULL.  Consider query 'explain verbose select count(*) from ft1, ft2' Since
count(*) doesn't need any columns, the tlist is NIL. With the patch the server
crashes for this query.
-if (foreignrel->reloptkind == RELOPT_JOINREL)
+/*
+ * Note: tlist for a base relation might be non-NIL.  For example, if the
+ * base relation is an operand of a foreign join performing a full outer
+ * join and has non-NIL remote_conds, the base relation will be deparsed
+ * as a subquery, so the tlist for the base relation could be non-NIL.
+ */
+if (tlist != NIL)
 {
-/* For a join relation use the input tlist */

We can not decide whether to use deparseExplicitTargetList or
deparse it from attrs_used based on the tlist. SELECT lists, whether it's
topmost SELECT or a subquery (even on a base relation), for deparsing a
JOIN query need to use deparseExplicitTargetList.


Will fix.


2. The code in deparseRangeTblRef() dealing with subqueries, is very similar to
deparseSelectStmtForRel(). The only thing deparseRangeTblRef() does not require
is the appending ORDER BY, which is determined by existence of pathkeys
argument. Probably we should reuse deparseSelectStmtForRel(), instead of
duplicating the code. This will also make the current changes to
deparseSelectSql unnecessary.


Will do.


3. Why do we need following change? The elog() just few lines above says that
we expect only Var nodes. Why then we use deparseExpr() instead of
deparseVar()?
 if (i > 0)
 appendStringInfoString(buf, ", ");
-deparseVar(var, context);
+deparseExpr((Expr *) var, context);

 *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1);

And I get the answer for that question, somewhere down in the patch
+/*
+ * If the given expression is an output column of a subquery-in-FROM,
+ * deparse the alias to the expression instead.
+ */
+if (IsA(node, Var))
+{
+inttabno;
+intcolno;
+
+if (isSubqueryExpr(node, context->foreignrel, , ))
+{
+appendStringInfo(context->buf, "%s%d.%s%d",
+ SS_TAB_ALIAS_PREFIX, tabno,
+ SS_COL_ALIAS_PREFIX, colno);
+return;
+}
+}
+

Functionally, this code belongs to deparseVar() and not deparseExpr(). Like all
other functions which handle Expr nodes, deparseExpr() is expected to be a
dispatcher to the node specific function.


OK, will change that way.


4. This comment is good to have there, but is unrelated to this patch. May be a
separate patch?
+/* Don't generate bad syntax if no columns */

5. Changed comment doesn't say anything different from the original comment. It
may have been good to have it this way to start with, but changing it now
doesn't bring anything new. You seem to have just merged prologue of the
function with a comment in that function. I think, this change is unnecessary.
- * The function constructs ... JOIN ... ON ... for join relation. For a base
- * relation it just returns schema-qualified tablename, with the appropriate
- * alias if so requested.
+ * For a join relation the clause of the following form is appended to buf:
+ * ((outer relation)  (inner relation) ON (joinclauses))
+ * For a base relation the function just adds the schema-qualified tablename,
+ * with the appropriate alias if so requested.
+/* Don't generate bad syntax if no columns */

6. Change may be good but unrelated to this patch. May be material for a
separate patch. There are few such changes in this function. While these
changes may be good by themselves, they distract reviewer from the goal of the
patch.
-appendStringInfo(buf, "(");
+appendStringInfoChar(buf, '(');


OK on #4, #5, and #6, Will remove all the changes.  I will leave those 
for separate patches.



7. I don't understand why you need to change this function so much. Take for
example the following change.
-RelOptInfo *rel_o = fpinfo->outerrel;
-RelOptInfo *rel_i = fpinfo->innerrel;
-StringInfoData join_sql_o;
-StringInfoData join_sql_i;
+/* Begin the FROM clause entry */
+appendStringInfoChar(buf, '(');

 /* Deparse outer relation */
-initStringInfo(_sql_o);
-deparseFromExprForRel(_sql_o, root, rel_o, true, params_list);
+deparseRangeTblRef(buf, root,
+   fpinfo->outerrel,
+   

[HACKERS] Microvacuum support for Hash Index

2016-10-24 Thread Ashutosh Sharma
Hi All,

I have added a microvacuum support for hash index access method and
attached is the v1 patch for the same. The patch basically takes care
of the following things:

1. Firstly, it changes the marking of dead tuples from
'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this
we accumulate the heap tids and offset of all the hash index tuples if
it is pointed by kill_prior_tuple during scan and then mark all
accumulated tids as LP_DEAD either while stepping from one page to
another (assuming the scan in both forward and backward direction) or
during end of the hash index scan or during rescan.

2. Secondly, when inserting tuple into hash index table, if not enough
space is found on a current page then it ensures that we first clean
the dead tuples if found in the current hash index page before moving
to the next page in a bucket chain or going for a bucket split. This
basically increases the page reusability and reduces the number of
page splits, thereby reducing the overall size of hash index table.

I have compared the hash index size with and without my patch
(microvacuum_hash_index_v1.patch attached with this mail) on a high
end machine at various scale factors and the results are shown below.
For testing this, i have created hash index (pgbench_accounts_aid) on
aid column of 'pgbench_accounts' table instead of primary key and the
results shown in below table are for the same. The patch
(pgbench.patch) having these changes is also attached with this mail.
Moreover, I am using my own script file (file_hash_kill_prior_tuple)
for updating the index column with pgbench read-write command. The
script file 'file_hash_kill_prior_tuple' is also attached with this
mail.

Here are some initial test results showing the benefit of this patch:

postgresql.conf and pgbench settings:
autovacuum=off
client counts = 64
run time duration = 15 mins

./pgbench -c $threads -j $threads -T 900 postgres -f
~/file_hash_kill_prior_tuple

Scale Factorhash index size @ start HEADHEAD + Patch
10   32 MB 579 MB  158 MB
50   128 MB   630 MB  350 MB
100 256 MB  1255 MB  635 MB
300 1024 MB2233 MB  1093 MB


As shown in above result, at 10 scale factor the hash index size has
reduced by almost 4 times whereas at 50 and 300 scale factors it has
reduced by half with my patch. This basically proves that we can
reduce the hash index size to a good extent with this patch.

System specifications:
-
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 8
NUMA node(s):  8
Vendor ID: GenuineIntel


Note: The patch (microvacuum_hash_index_v1.patch) is prepared on top
of concurrent_hash_index_v8.patch-[1] and wal_hash_index_v5.1.patch[2]
for hash index.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BX%3D8sUd1UCZDZnE3D9CGi9kw%2Bkjxp2Tnw7SX5w8pLBNw%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1KE%3D%2BkkowyYD0vmch%3Dph4ND3H1tViAB%2B0cWTHqjZDDfqg%40mail.gmail.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index db73f05..a0720ef 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -325,14 +325,21 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
 		if (scan->kill_prior_tuple)
 		{
 			/*
-			 * Yes, so mark it by setting the LP_DEAD state in the item flags.
+			 * Yes, so remember it for later. (We'll deal with all such
+			 * tuples at once right after leaving the index page or at
+			 * end of scan.)
 			 */
-			ItemIdMarkDead(PageGetItemId(page, offnum));
+			if (so->killedItems == NULL)
+so->killedItems = palloc(MaxIndexTuplesPerPage *
+		 sizeof(HashScanPosItem));
 
-			/*
-			 * Since this can be redone later if needed, mark as a hint.
-			 */
-			MarkBufferDirtyHint(buf, true);
+			if (so->numKilled < MaxIndexTuplesPerPage)
+			{
+so->killedItems[so->numKilled].heapTid = so->hashso_heappos;
+so->killedItems[so->numKilled].indexOffset =
+			ItemPointerGetOffsetNumber(&(so->hashso_curpos));
+so->numKilled++;
+			}
 		}
 
 		/*
@@ -439,6 +446,9 @@ hashbeginscan(Relation rel, int nkeys, int norderbys)
 
 	so->hashso_skip_moved_tuples = false;
 
+	so->killedItems = NULL;
+	so->numKilled = 0;
+
 	scan->opaque = so;
 
 	return scan;
@@ -454,6 +464,10 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
+	/* Before leaving current page, deal with any killed items */
+	if (so->numKilled > 0)
+		hashkillitems(scan);
+
 	_hash_dropscanbuf(rel, so);
 
 	/* set 

Re: [HACKERS] issue with track_commit_timestamp and server restart

2016-10-24 Thread Julien Rouhaud
On 24/10/2016 06:58, Craig Ringer wrote:
> On 22 October 2016 at 19:51, Julien Rouhaud  wrote:
>> I just noticed that if track_commit_timestamp is enabled, the
>> oldestCommitTsXid and newestCommitTsXid don't persist after a server
>> restart, so you can't ask for the commit ts of a transaction that
>> committed before the last server start, although the information is
>> still available (unless of course if a freeze occured).  AFAICT it
>> always behave this way.
> 
> I initially could'n't see this when tested on 8f1fb7d with a
> src/test/recovery/t test script. But it turns out it's OK on immediate
> shutdown and crash recovery, but not on clean shutdown and restart.
> 
> The attached patch adds a TAP test to src/test/recovery to show this.
> If you run the TAP test before recompiling with the fix it'll fail.
> "make" to apply the fix, then re-run and it'll succeed. Or just
> temporarily roll back the fix with:
> 
> git checkout HEAD^1 -- src/backend/access/transam/commit_ts.c
> git reset src/backend/access/transam/commit_ts.c
> 
> and rebuild to see it fail.
> 
> To save time running the recovery suite, just
> 
>rm src/test/recovery/00[0-8]*.pl
> 
> (It'd be nice to have a prove_check target to run just one test file).
> 
> This would explain a few issues I've seen reported with BDR from the
> community which I've so far been unable to reproduce, so thanks very
> much for the report.
> 
> Álvaro, would you mind checking this and committing to HEAD and 9.6?
> The commits.c change only should also be committed to 9.5, but not the
> TAP test.
> 

Thanks a lot for the review, and adding the tests!


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rename max_parallel_degree?

2016-10-24 Thread Amit Kapila
On Thu, Oct 13, 2016 at 5:28 AM, Robert Haas  wrote:
> On Wed, Oct 12, 2016 at 8:13 AM, Peter Eisentraut
>  wrote:
>> On 10/4/16 10:16 AM, Julien Rouhaud wrote:
>>> Please find attached v9 of the patch, adding the parallel worker class
>>> and changing max_worker_processes default to 16 and max_parallel_workers
>>> to 8.  I also added Amit's explanation for the need of a write barrier
>>> in ForgetBackgroundWorker().
>>
>> This approach totally messes up the decoupling between the background
>> worker facilities and consumers of those facilities.  Having dozens of
>> lines of code in bgworker.c that does the accounting and resource
>> checking on behalf of parallel.c looks very suspicious.  Once we add
>> logical replication workers or whatever, we'll be tempted to add even
>> more stuff in there and it will become a mess.
>
> I attach a new version of the patch that I've been hacking on in my
> "spare time", which reduces the footprint in bgworker.c quite a bit.
>

Couple of comments -

@@ -370,6 +388,9 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)

  Assert(rw->rw_shmem_slot <
max_worker_processes);
  slot = >slot[rw->rw_shmem_slot];
+ if ((rw-
>rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
+ BackgroundWorkerData-
>parallel_terminate_count++;
+
  slot->in_use = false;

It seems upthread [1], we agreed to have a write barrier before the
setting slot->in_use, but I don't see the same in patch.


Isn't it better to keep planner related checks from Julien's patch,
especially below one:

--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -249,6 +249,7 @@ standard_planner(Query *parse, int cursorOptions,
ParamListInfo boundParams)
  parse->utilityStmt == NULL &&
  !parse->hasModifyingCTE &&
  max_parallel_workers_per_gather > 0 &&
+ max_parallel_workers > 0 &&
  !IsParallelWorker() &&
  !IsolationIsSerializable())

> I don't think it's wrong that the handling is done there, though.  The
> process that is registering the background worker is well-placed to
> check whether there are already too many, and if it does not then the
> slot is consumed at least temporarily even if it busts the cap.  On
> the flip side, the postmaster is the only process that is well-placed
> to know when a background worker terminates.  The worker process
> itself can't be made responsible for it, as you suggest below, because
> it may never even start up in the first place (e.g. fork() returns
> EAGAIN).  And the registering process can't be made responsible,
> because it might die before the worker.
>
>> I think we should think of a way where we can pass the required
>> information during background worker setup, like a pointer to the
>> resource-limiting GUC variable.
>
> I don't think this can work, per the above.
>

I see the value in Peter's point, but could not think of a way to
implement the same.


[1] - 
https://www.postgresql.org/message-id/CA%2BTgmoZS7DFFGz7kKWLoTAsNnXRKUiQFDt5yHgf4L73z52dgAQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-24 Thread Michael Paquier
On Mon, Oct 24, 2016 at 1:26 PM, Amit Kapila  wrote:
> On Fri, Oct 21, 2016 at 10:32 PM, Robert Haas  wrote:
>> 2. In perform_base_backup(), if the endptr returned by
>> do_pg_stop_backup() precedes the end of the checkpoint record returned
>> by do_pg_start_backup(), use the latter value instead.  Unfortunately,
>> that's not so easy: we can't just say if (endptr < starptr) startptr =
>> endptr; because startptr is the *start* of the checkpoint record, not
>> the end.  I suspect somebody could figure out a solution to this
>> problem, though.
>>
>
> With this approach, don't we need something similar for API's
> pg_stop_backup()/pg_stop_backup_v2()?

Yes, I think so. That would sort of map with the idea I mentioned
upthread to have pg_stop_backup() return the contents of the control
file and have the caller write it to the backup by itself.

>> If we decide we don't want to aim for one of these tighter solutions
>> and just adopt the previously-discussed patch, then at the very least
>> it needs better comments.
>
> +1.

Yeah, here is an attempt at doing that:
-* We return the current minimum recovery point as the backup end
+* We return the current last replayed point as the backup end
 * location. Note that it can be greater than the exact backup end
-* location if the minimum recovery point is updated after the backup of
+* location if the last replayed point is updated after the backup of
 * pg_control. This is harmless for current uses.
 *
+* Using the last replayed point as the backup end location ensures that
+* the end location will never be older than the start position, something
+* that could happen if for example minRecoveryPoint is used as backup
+* end location when it never gets updated because no buffer flushes
+* occurred. By using the last replay location, note that the backup may
+* include more WAL segments than necessary. If the additional WAL
+* replayed since minRecoveryPoint does not include a checkpoint, there
+* is actually no need for it. Even if it does include a checkpoint,
+* only the portion up to the checkpoint itself is necessary and not
+* the WAL generated beyond that. Still, in the case of a backup taken
+* from a standby, with its master disconnected, this ensures that the
+* backup is valid.
+*

Thoughts welcome.
-- 
Michael


backup-standby-v3.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On conflict update & hint bits

2016-10-24 Thread Konstantin Knizhnik



On 24.10.2016 00:49, Peter Geoghegan wrote:

On Sun, Oct 23, 2016 at 2:46 PM, Tom Lane  wrote:

What's bothering me is (a) it's less than 24 hours to release wrap and
(b) this patch changes SSI-relevant behavior and hasn't been approved
by Kevin.  I'm not familiar enough with that logic to commit a change
in it on my own authority, especially with so little time for problems
to be uncovered.

I should point out that I knew that the next set of point releases had
been moved forward much later than you did. I had to work on this fix
during the week, which was pretty far from ideal for me for my own
reasons.

Just for information: I know that you are working on this issue, but as 
far as we need to proceed further with our testing of multimaster,
I have done the following obvious changes and it fixes the problem (at 
least this assertion failure is not happen any more):


src/backend/executor/nodeModifyTable.c

@@ -1087,6 +1087,13 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 test = heap_lock_tuple(relation, , estate->es_output_cid,
lockmode, LockWaitBlock, false, ,
);
+/*
+ * We must hold share lock on the buffer content while examining tuple
+ * visibility.  Afterwards, however, the tuples we have found to be
+ * visible are guaranteed good as long as we hold the buffer pin.
+ */
+LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
 switch (test)
 {
 case HeapTupleMayBeUpdated:
@@ -1142,6 +1149,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
  * loop here, as the new version of the row might not conflict
  * anymore, or the conflicting tuple has actually been 
deleted.

  */
+LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 ReleaseBuffer(buffer);
 return false;

@@ -1175,6 +1183,8 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
 /* Store target's existing tuple in the state's dedicated slot */
 ExecStoreTuple(, mtstate->mt_existing, buffer, false);

+LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
 /*
  * Make tuple and any needed join variables available to ExecQual and
  * ExecProject.  The EXCLUDED tuple is installed in 
ecxt_innertuple, while


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-10-24 Thread Michael Paquier
On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila  wrote:
> On Mon, Oct 24, 2016 at 9:18 AM, Kyotaro HORIGUCHI
>  wrote:
>> Thank you for looking and retelling this.

+1.

>> At Fri, 21 Oct 2016 13:02:21 -0400, Robert Haas  
>> wrote in 
>>> I can think of two solutions that would be "tighter":
>>>
>>> 1. When performing a restartpoint, update the minimum recovery point
>>> to just beyond the checkpoint record.  I think this can't hurt anyone
>>> who is actually restarting recovery on the same machine, because
>>> that's exactly the point where they are going to start recovery.  A
>>> minimum recovery point which precedes the point at which they will
>>> start recovery is no better than one which is equal to the point at
>>> which they will start recovery.
>>
>> This is a candidate that I thought of. But I avoided to change
>> the behavior of minRecoveryPoint that (seems to me that) it
>> describes only buffer state. So checkpoint with no update doesn't
>> advances minRecoveryPoint as my understanding.
>
> I think what you are saying is not completely right, because we do
> update minRecoveryPoint when we don't perform a new restart point.
> When we perform restart point, then it assumes that flushing the
> buffers will take care of updating minRecoveryPoint.

Yep, minRecoveryPoint still gets updated when the last checkpoint
record is the last restart point to avoid a hot standby to allow
read-only connections at a LSN-point earlier than the last shutdown.
Anyway, we can clearly reject 1. in the light of
https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com
when playing with different stop locations at recovery.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor code improvement to postgresGetForeignJoinPaths

2016-10-24 Thread Ashutosh Bapat
On Mon, Oct 24, 2016 at 7:56 AM, Tatsuro Yamada
 wrote:
> Hi,
>
> The last argument of create_foreignscan_path called by 
> postgresGetForeignJoinPaths is
> set to NULL, but it would be suitable to set it to NIL because the argument 
> type is List.
>
> Please find attached a patch.
>
You are right. Every call except that one is using NIL, so better to
fix it. The pattern was repeated in the recent aggregate pushdown
support. Here's patch to fix create_foreignscan_path() call in
add_foreign_grouping_paths() as well.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 906d6e6..2cfb82b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -4424,21 +4424,21 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 	 */
 	joinpath = create_foreignscan_path(root,
 	   joinrel,
 	   NULL,	/* default pathtarget */
 	   rows,
 	   startup_cost,
 	   total_cost,
 	   NIL,		/* no pathkeys */
 	   NULL,	/* no required_outer */
 	   epq_path,
-	   NULL);	/* no fdw_private */
+	   NIL);	/* no fdw_private */
 
 	/* Add generated path into joinrel by add_path(). */
 	add_path(joinrel, (Path *) joinpath);
 
 	/* Consider pathkeys for the join relation */
 	add_paths_with_pathkeys_for_rel(root, joinrel, epq_path);
 
 	/* XXX Consider parameterized paths for the join relation */
 }
 
@@ -4741,21 +4741,21 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* Create and add foreign path to the grouping relation. */
 	grouppath = create_foreignscan_path(root,
 		grouped_rel,
 		grouping_target,
 		rows,
 		startup_cost,
 		total_cost,
 		NIL,	/* no pathkeys */
 		NULL,	/* no required_outer */
 		NULL,
-		NULL);	/* no fdw_private */
+		NIL);	/* no fdw_private */
 
 	/* Add generated path into grouped_rel by add_path(). */
 	add_path(grouped_rel, (Path *) grouppath);
 }
 
 /*
  * Create a tuple from the specified row of the PGresult.
  *
  * rel is the local representation of the foreign table, attinmeta is
  * conversion data for the rel's tupdesc, and retrieved_attrs is an

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Remove extra comma at end of enum list

2016-10-24 Thread Fabien COELHO



C99-specific feature, and wasn't intentional in the first place.

Per buildfarm member mylodon


This is the second time I see that in the last couple of weeks. Would
it be better to use some custom CFLAGS to detect that earlier in the
review process? I have never much used gcc's std or clang extensions
in this area. Has somebody any recommendations?


My 0.02€, I've used the following for years without problem, but it would 
need testing with a several versions of gcc & clang before committing:


  sh> gcc -std=c89/... ...
  sh> clang -std=c89/... ...

And a free advice, hoping not to start a troll:

Even as conservative and old fashioned as I am, I was 18 in 1989 and had a 
driving license. In 2017 C99 will also turn 18 and might be considered old 
and stable enough for pg:-)


I find it annoying that "//" comments are not supported, or having to 
declare variables at the beginning of a block instead of when first 
used...


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2016-10-24 Thread Rushabh Lathia
On Thu, Oct 20, 2016 at 1:12 PM, Amit Kapila 
wrote:

> On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia
>  wrote:
> > On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila 
> > wrote:
> >>
> >> There is lot of common code between ExecGatherMerge and ExecGather.
> >> Do you think it makes sense to have a common function to avoid the
> >> duplicity?
> >>
> >> I see there are small discrepancies in both the codes like I don't see
> >> the use of single_copy flag, as it is present in gather node.
> >>
> >
> > Yes, even I thought to centrilize some things of ExecGather and
> > ExecGatherMerge,
> > but its really not something that is fixed. And I thought it might change
> > particularly
> > for the Gather Merge. And as explained by Robert single_copy doesn't make
> > sense
> > for the Gather Merge. I will still look into this to see if something
> can be
> > make
> > centralize.
> >
>
> Okay, I haven't thought about it, but do let me know if you couldn't
> find any way to merge the code.
>
>
Sure, I will look into this.


> >>
> >> 3.
> >> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool
> >> force)
> >> {
> >> ..
> >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
> >> +
> >> + /*
> >> +
> >>  * try to read more tuple into nowait mode and store it into the tuple
> >> + * array.
> >> +
> >>  */
> >> + if (HeapTupleIsValid(tup))
> >> + fill_tuple_array(gm_state, reader);
> >>
> >> How the above read tuple is stored in array?  In anycase the above
> >> interface seems slightly awkward to me. Basically, I think what you
> >> are trying to do here is after reading first tuple in waitmode, you
> >> are trying to fill the array by reading more tuples.  So, can't we
> >> push reading of this fist tuple into that function and name it as
> >> form_tuple_array().
> >>
> >
> > Yes, you are right.
> >
>
> You have not answered my first question.  I will try to ask again, how
> the tuple read by below code is stored in the array:
>
>
Tuple directly get stored into related TupleTableSlot.
In gather_merge_readnext()
at the end of function it build the TupleTableSlot for the given tuple. So
tuple
read by above code is directly stored into TupleTableSlot.

>> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
>
> > First its trying to read tuple into wait-mode, and once
> > it
> > find tuple then its try to fill the tuple array (which basically try to
> read
> > tuple
> > into nowait-mode). Reason I keep it separate is because in case of
> > initializing
> > the gather merge, if we unable to read tuple from all the worker - while
> > trying
> > re-read, we again try to fill the tuple array for the worker who already
> > produced
> > atleast a single tuple (see gather_merge_init() for more details).
> >
>
> Whenever any worker produced one tuple, you already try to fill the
> array in gather_merge_readnext(), so does the above code help much?
>
> > Also I
> > thought
> > filling tuple array (which basically read tuple into nowait mode) and
> > reading tuple
> > (into wait-mode) are two separate task - and if its into separate
> function
> > that code
> > look more clear.
>
> To me that looks slightly confusing.
>
> > If you have any suggestion for the function name
> > (fill_tuple_array)
> > then I am open to change that.
> >
>
> form_tuple_array (form_tuple is used at many places in code, so it
> should look okay).


Ok, I rename it with next patch.


> I think you might want to consider forming array
> even for leader, although it might not be as beneficial as for
> workers, OTOH, the code will get simplified if we do that way.
>

Yes, I did that earlier - and as you guessed its not be any beneficial
so to avoided extra memory allocation for the tuple array, I am not
forming array for leader.

Today, I observed another issue in code:
>
> +gather_merge_init(GatherMergeState *gm_state)
> {
> ..
> +reread:
> + for (i = 0; i < nreaders + 1; i++)
> + {
> + if (TupIsNull(gm_state->gm_slots[i]) ||
> + gm_state->gm_slots[i]->tts_isempty)
> + {
> + if (gather_merge_readnext(gm_state, i, initialize ? false : true))
> + {
> + binaryheap_add_unordered(gm_state->gm_heap,
> + Int32GetDatum(i));
> + }
> + }
> + else
> + fill_tuple_array(gm_state, i);
> + }
> + initialize = false;
> +
> + for (i = 0; i < nreaders; i++)
> + if (TupIsNull(gm_state->gm_slots[i]) || gm_state->gm_slots[i]->tts_
> isempty)
> + goto reread;
> ..
> }
>
> This code can cause infinite loop.  Consider a case where one of the
> worker doesn't get any tuple because by the time it starts all the
> tuples are consumed by all other workers. The above code will keep on
> looping to fetch the tuple from that worker whereas that worker can't
> return any tuple.  I think you can fix it by checking if the
> particular queue has been exhausted.
>
>
Oh yes. I will work on the fix and soon submit the next set of patch.


> >> >
> >> > Open Issue:
> >> >
> >> > - Commit