Re: [HACKERS] Documentation fixes for pg_visibility

2016-06-22 Thread Michael Paquier
On Thu, Jun 23, 2016 at 1:46 PM, Michael Paquier
 wrote:
> On Thu, Jun 23, 2016 at 1:42 PM, Michael Paquier
>  wrote:
>> While looking at the module I found two mistakes in the docs:
>> pg_visibility_map and pg_visibility *not* taking in input a block
>> number are SRFs, and return a set of records. The documentation is
>> just listing them with "returns record". A patch is attached.
>
> And that: s/PD_ALL_VISIBILE/PD_ALL_VISIBLE.

And would it actually make sense to have pg_check_frozen(IN regclass,
IN blkno) to target only a certain page? Same for pg_check_visible. It
would take a long time to run those functions on large tables as they
scan all the pages of a relation at once..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation fixes for pg_visibility

2016-06-22 Thread Michael Paquier
On Thu, Jun 23, 2016 at 1:42 PM, Michael Paquier
 wrote:
> While looking at the module I found two mistakes in the docs:
> pg_visibility_map and pg_visibility *not* taking in input a block
> number are SRFs, and return a set of records. The documentation is
> just listing them with "returns record". A patch is attached.

And that: s/PD_ALL_VISIBILE/PD_ALL_VISIBLE.
-- 
Michael


docs-visibility.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Documentation fixes for pg_visibility

2016-06-22 Thread Michael Paquier
Hi,

While looking at the module I found two mistakes in the docs:
pg_visibility_map and pg_visibility *not* taking in input a block
number are SRFs, and return a set of records. The documentation is
just listing them with "returns record". A patch is attached.
Thanks,
-- 
Michael


docs-visibility.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-22 Thread David Rowley
On 23 June 2016 at 11:22, Tom Lane  wrote:
> While working on that, I noticed what seems to me to be a minor bug.
> The behavior that I'd expect (and that I documented) for a deserialization
> function is that it just allocates its result in the current, short-lived
> memory context, since it will be the combine function's responsibility to
> merge that into the long-lived transition state.  But it looks to me like
> the deserialization functions in numeric.c are allocating their results
> in the aggregate context, which will mean a leak.  (For example,
> numeric_avg_deserialize creates its result using makeNumericAggState
> which forces the result into the agg context.)

Yes, you're right.

In the end I decided to add a makeNumericAggStateCurrentContext()
function which does not perform any memory context switching at all.
It seems like this can be used for the combine functions too, since
they've already switched to the aggregate memory context. This should
save a few cycles during aggregate combine, and not expend any extra
as some alternatives, like adding a flag to makeNumericAggState().


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


deserialize_memctx_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-06-22 Thread Amit Kapila
On Wed, Jun 22, 2016 at 8:48 PM, Robert Haas  wrote:
> On Wed, Jun 22, 2016 at 5:14 AM, Amit Kapila  wrote:
>> We can do it in the way as you are suggesting, but there is another thing
>> which we need to consider here.  As of now, the patch tries to finish the
>> split if it finds split-in-progress flag in either old or new bucket.  We
>> need to lock both old and new buckets to finish the split, so it is quite
>> possible that two different backends try to lock them in opposite order
>> leading to a deadlock.  I think the correct way to handle is to always try
>> to lock the old bucket first and then new bucket.  To achieve that, if the
>> insertion on new bucket finds that split-in-progress flag is set on a
>> bucket, it needs to release the lock and then acquire the lock first on old
>> bucket, ensure pincount is 1 and then lock new bucket again and ensure that
>> pincount is 1. I have already maintained the order of locks in scan (old
>> bucket first and then new bucket; refer changes in _hash_first()).
>> Alternatively, we can try to  finish the splits only when someone tries to
>> insert in old bucket.
>
> Yes, I think locking buckets in increasing order is a good solution.

Okay.

> I also think it's fine to only try to finish the split when the insert
> targets the old bucket.  Finishing the split enables us to remove
> tuples from the old bucket, which lets us reuse space instead of
> accelerating more.  So there is at least some potential benefit to the
> backend inserting into the old bucket.  On the other hand, a process
> inserting into the new bucket derives no direct benefit from finishing
> the split.
>

makes sense, will change that way and will add a comment why we are
just doing it for old bucket.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-06-22 Thread Amit Kapila
On Wed, Jun 22, 2016 at 8:44 PM, Robert Haas  wrote:
> On Wed, Jun 22, 2016 at 5:10 AM, Amit Kapila  wrote:
>>>
>>> I think this is basically correct, although I don't find it to be as
>>> clear as I think it could be.  It seems very clear that any operation
>>> which potentially changes the order of tuples in the bucket chain,
>>> such as the squeeze phase as currently implemented, also needs to
>>> exclude all concurrent scans.  However, I think that it's OK for
>>> vacuum to remove tuples from a given page with only an exclusive lock
>>> on that particular page.
>>
>> How can we guarantee that it doesn't remove a tuple that is required by scan
>> which is started after split-in-progress flag is set?
>
> If the tuple is being removed by VACUUM, it is dead.  We can remove
> dead tuples right away, because no MVCC scan will see them.  In fact,
> the only snapshot that will see them is SnapshotAny, and there's no
> problem with removing dead tuples while a SnapshotAny scan is in
> progress.  It's no different than heap_page_prune() removing tuples
> that a SnapshotAny sequential scan was about to see.
>
> If the tuple is being removed because the bucket was split, it's only
> a problem if the scan predates setting the split-in-progress flag.
> But since your design involves out-waiting all scans currently in
> progress before setting that flag, there can't be any scan in progress
> that hasn't seen it.
>

For above cases, just an exclusive lock will work.

>  A scan that has seen the flag won't look at the
> tuple in any case.
>

Why so?  Assume that scan started on new bucket where
split-in-progress flag was set, now it will not look at tuples that
are marked as moved-by-split in this bucket, as it will assume to find
all such tuples in old bucket.  Now, if allow Vacuum or someone else
to remove tuples from old with just an Exclusive lock, it is quite
possible that scan miss the tuple in old bucket which got removed by
vacuum.

>>> (Plain text email is preferred to HTML on this mailing list.)
>>>
>>
>> If I turn to Plain text [1], then the signature of my e-mail also changes to
>> Plain text which don't want.  Is there a way, I can retain signature
>> settings in Rich Text and mail content as Plain Text.
>
> Nope, but I don't see what you are worried about.  There's no HTML
> content in your signature anyway except for a link, and most
> mail-reading software will turn that into a hyperlink even without the
> HTML.
>

Okay, I didn't knew that mail-reading software does that.  Thanks for
pointing out.

-- 

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Questionabl description in datatype.sgml

2016-06-22 Thread Bruce Momjian
On Sat, Jun 18, 2016 at 11:58:58AM -0400, Tom Lane wrote:
> Tatsuo Ishii  writes:
> > In "8.13.2. Encoding Handling"
> >
> > When using binary mode to pass query parameters to the server
> > and query results back to the client, no character set conversion
> > is performed, so the situation is different.  In this case, an
> > encoding declaration in the XML data will be observed, and if it
> > is absent, the data will be assumed to be in UTF-8 (as required by
> > the XML standard; note that PostgreSQL does not support UTF-16).
> > On output, data will have an encoding declaration
> > specifying the client encoding, unless the client encoding is
> > UTF-8, in which case it will be omitted.
> >
> 
> > In the first sentence shouldn't "no character set conversion" be "no
> > encoding conversion"? PostgreSQL is doing client/server encoding
> > conversion, rather than character set conversion.
> 
> I think the text is treating "character set conversion" as meaning
> the same thing as "encoding conversion"; certainly I've never seen
> any place in our docs that draws a distinction between those terms.
> If you think there is a difference, maybe we need to define those
> terms somewhere.

Uh, I think Unicode is a character set, and UTF8 is an encoding.  I
think Tatsuo is right here.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sslmode=require fallback

2016-06-22 Thread Bruce Momjian
On Thu, Jun 16, 2016 at 10:42:56AM +0200, Magnus Hagander wrote:
> However, if this is the expected behavior, the documentation at https://
> www.postgresql.org/docs/current/static/libpq-ssl.html should be updated to
> make this more clear. It should be made clear that the existence of the
> file ~/.postgresql/root.crt changes the behavior of sslmode=require and
> sslmode=prefer.
> 
> 
> 
> Agreed. It's basically backwards compatibility with something that was badly
> documented in the first place :) That's not a particularly strong argument for
> the way it is. Clarifying the documentation would definitely be a good
> improvement.

Does this have to remain backward-compatible forever?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-22 Thread Tom Lane
David Rowley  writes:
> I've attached my proposed patch for the user defined aggregate docs.

I whacked this around some more and pushed it.

While working on that, I noticed what seems to me to be a minor bug.
The behavior that I'd expect (and that I documented) for a deserialization
function is that it just allocates its result in the current, short-lived
memory context, since it will be the combine function's responsibility to
merge that into the long-lived transition state.  But it looks to me like
the deserialization functions in numeric.c are allocating their results
in the aggregate context, which will mean a leak.  (For example,
numeric_avg_deserialize creates its result using makeNumericAggState
which forces the result into the agg context.)

Now this leak isn't going to be real significant unless we accumulate a
whole lot of partial results in one query, which would be unusual and
maybe even impossible in the current parallel-query environment.  But it
still seems wrong.  Please check, and submit a patch if I'm right about
what's happening.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how is the WAL receiver process stopped and restarted when the network connection is broken and then restored?

2016-06-22 Thread Craig Ringer
On 22 June 2016 at 23:52, Rui Hai Jiang  wrote:

> Hello,
>
> I have one Primary server and one Standby server. They are doing streaming
> replication well.
>
> I  did some testing. I broke the network connection between them for a few
> minutes, and then restored the network. I found the both the WAL sender and
> WAL receiver were stopped and the restarted.
>
> I wonder how WAL receiver process is stopped and restarted. I have checked
> the code hoping to find out the answer, but I don't have any clue.
>

If TCP keepalives are enabled, the TCP connection will break when the
keepalives stop arriving.

If wal receiver timeout is enabled, it'll notice that it didn't get any
data from the walsender and assume it went away.

If the OS notices that the socket went away - say, it got a TCP RST from
the remote peer as it shut down cleanly - it'll close the walreceiver
socket and the walreceiver will quit.

Otherwise it won't notice and will wait indefinitely.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-22 Thread David Rowley
On 23 June 2016 at 08:53, Tom Lane  wrote:
> I wrote:
>> David Rowley  writes:
>>> I've gone and implemented the dummy argument approach for
>>> deserialization functions.
>
>> How do you feel about the further idea of locking down the signatures
>> to be exactly "serialize(internal) returns bytea" and "deserialize(bytea,
>> internal) returns internal", and removing pg_aggregate.aggserialtype?
>> I don't see very much value in allowing any other nominal transmission
>> type besides bytea; and the less flexibility in these function signatures,
>> the less chance of confusion/misuse of other internal-related functions.

I don't object to removing it. If we find a use for non-bytea serial
types down the line, then we can add it again.

> Not hearing any objections, pushed that way.

Many thanks for committing those fixes.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-22 Thread Alvaro Herrera
Robert Haas wrote:

> I see the patch, but I don't see much explanation of why the patch is
> correct, which I think is pretty scary in view of the number of
> mistakes we've already made in this area.  The comments just say:
> 
> + * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
> + * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
> + * share-locked in 9.2 or earlier and then pg_upgrade'd.
> 
> Why must that be true?
> 
> + * We must not try to resolve such multixacts locally, because the result 
> would
> + * be bogus, regardless of where they stand with respect to the current valid
> + * range.
> 
> What about other pre-upgrade mxacts that don't have this exact bit
> pattern?  Why can't we try to resolve them and end up in trouble just
> as easily?

Attached are two patches, one for 9.3 and 9.4 and the other for 9.5 and
master.  Pretty much the same as before, but with answers to the above
concerns.  In particular,

 /*
+ * A tuple that has HEAP_XMAX_IS_MULTI and HEAP_XMAX_LOCK_ONLY but neither of
+ * XMAX_EXCL_LOCK and XMAX_KEYSHR_LOCK must come from a tuple that was
+ * share-locked in 9.2 or earlier and then pg_upgrade'd.
+ *
+ * In 9.2 and prior, HEAP_XMAX_IS_MULTI was only set when there were multiple
+ * FOR SHARE lockers of that tuple.  That set HEAP_XMAX_LOCK_ONLY (with a
+ * different name back then) but neither of HEAP_XMAX_EXCL_LOCK and
+ * HEAP_XMAX_KEYSHR_LOCK.  That combination is no longer possible in 9.3 and
+ * up, so if we see that combination we know for certain that the tuple was
+ * locked in an earlier release; since all such lockers are gone (they cannot
+ * survive through pg_upgrade), such tuples can safely be considered not
+ * locked.
+ *
+ * We must not resolve such multixacts locally, because the result would be
+ * bogus, regardless of where they stand with respect to the current valid
+ * multixact range.
+ */
+#define HEAP_LOCKED_UPGRADED(infomask) \
+( \
+((infomask) & HEAP_XMAX_IS_MULTI) && \
+((infomask) & HEAP_XMAX_LOCK_ONLY) && \
+(((infomask) & (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)) == 0) \
+)


One place that had an XXX comment in the previous patch
(heap_lock_updated_tuple_rec) now contains this:

+   /*
+* We don't need a test for pg_upgrade'd tuples: this is only
+* applied to tuples after the first in an update chain.  Said
+* first tuple in the chain may well be locked-in-9.2-and-
+* pg_upgraded, but that one was already locked by our caller,
+* not us; and any subsequent ones cannot be because our
+* caller must necessarily have obtained a snapshot later than
+* the pg_upgrade itself.
+*/
+   Assert(!HEAP_LOCKED_UPGRADED(mytup.t_data->t_infomask));


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 7235c0b120208d73d53d3929fe8243d5e487dca8
Author: Alvaro Herrera 
AuthorDate: Thu Jun 16 23:33:20 2016 -0400
CommitDate: Wed Jun 22 17:17:15 2016 -0400

Fix handling of multixacts predating pg_upgrade

After pg_upgrade, it is possible that some tuples' Xmax have multixacts
corresponding to the old installation; such multixacts cannot have
running members anymore.  In many code sites we already know not to read
them and clobber them silently, but at least when VACUUM tries to freeze
a multixact or determine if one needs freezing, there's an attempt to
resolve it to its member transactions by calling GetMultiXactIdMembers,
and if the multixact value is "in the future" with regards to the
current valid multixact range, an error like this is raised:
ERROR:  MultiXactId 123 has not been created yet -- apparent wraparound
and vacuuming fails.  Per discussion with Andrew Gierth, it is completely
bogus to try to resolve multixacts coming from before a pg_upgrade,
regardless of where they stand with regards to the current valid
multixact range.

It's possible to get from under this problem by doing SELECT FOR UPDATE
of the problem tuples, but if tables are large, this is slow and
tedious, so a more thorough solution is desirable.

To fix, we realize that multixacts in xmax created in 9.2 and previous
have a specific bit pattern that is never used in 9.3 and later.
Whenever the infomask of the tuple matches that bit pattern, we just
ignore the multixact completely as if Xmax wasn't set; or, in the case
of tuple freezing, we act as if an unwanted value is set and clobber it
without decoding.  This guarantees that no errors will be raised, and
that the values will be progressively removed until all tables are
clean.  Most callers of GetMultiXactIdMembers are patched to recognize
directly that the value is a removable 

Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-22 Thread Tom Lane
I wrote:
> David Rowley  writes:
>> I've gone and implemented the dummy argument approach for
>> deserialization functions.

> How do you feel about the further idea of locking down the signatures
> to be exactly "serialize(internal) returns bytea" and "deserialize(bytea,
> internal) returns internal", and removing pg_aggregate.aggserialtype?
> I don't see very much value in allowing any other nominal transmission
> type besides bytea; and the less flexibility in these function signatures,
> the less chance of confusion/misuse of other internal-related functions.

Not hearing any objections, pushed that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reference to UT1

2016-06-22 Thread Bruce Momjian
On Mon, Jun  6, 2016 at 03:53:41PM +1200, Thomas Munro wrote:
> Hi
> 
> The manual[1] says "Technically,PostgreSQL uses UT1 rather than UTC
> because leap seconds are not handled."  I'm certainly no expert on
> this stuff but it seems to me that we are using POSIX time[2] or Unix
> time, not UT1.

Based on this report I have removed mentions of UT1 from our docs in the
attached patch, which I would like to apply to 9.6.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
new file mode 100644
index 2420c94..cf9afe2
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
*** SELECT * FROM pg_locks pl LEFT JOIN pg_p
*** 9810,9817 
 of time zone names that are recognized by SET TIMEZONE,
 along with their associated abbreviations, UTC offsets,
 and daylight-savings status.  (Technically,
!PostgreSQL uses UT1 rather
!than UTC because leap seconds are not handled.)
 Unlike the abbreviations shown in pg_timezone_abbrevs, many of these names imply a set of daylight-savings transition
 date rules.  Therefore, the associated information changes across local DST
--- 9810,9817 
 of time zone names that are recognized by SET TIMEZONE,
 along with their associated abbreviations, UTC offsets,
 and daylight-savings status.  (Technically,
!PostgreSQL does not use UTC because leap
!seconds are not handled.)
 Unlike the abbreviations shown in pg_timezone_abbrevs, many of these names imply a set of daylight-savings transition
 date rules.  Therefore, the associated information changes across local DST
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 98b3995..052b8f5
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT EXTRACT(SECOND FROM TIME '17:12:2
*** 7578,7584 
  The time zone offset from UTC, measured in seconds.  Positive values
  correspond to time zones east of UTC, negative values to
  zones west of UTC.  (Technically,
! PostgreSQL uses UT1 because
  leap seconds are not handled.)
 

--- 7578,7584 
  The time zone offset from UTC, measured in seconds.  Positive values
  correspond to time zones east of UTC, negative values to
  zones west of UTC.  (Technically,
! PostgreSQL does not use UTC because
  leap seconds are not handled.)
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Requesting external_pid_file with postgres -C when not initialized lead to coredump

2016-06-22 Thread Tom Lane
alain radix  writes:
> Another effect of the patch is that it become possible to start a cluster
> with external_pid_file='' in postgresql.conf
> It would have that same behavior as many other parameters.

I'd still be inclined to do that with something along the lines of

-   if (external_pid_file)
+   if (external_pid_file && external_pid_file[0])

rather than changing the default per se.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-22 Thread Tom Lane
David Rowley  writes:
> I've gone and implemented the dummy argument approach for
> deserialization functions.

How do you feel about the further idea of locking down the signatures
to be exactly "serialize(internal) returns bytea" and "deserialize(bytea,
internal) returns internal", and removing pg_aggregate.aggserialtype?
I don't see very much value in allowing any other nominal transmission
type besides bytea; and the less flexibility in these function signatures,
the less chance of confusion/misuse of other internal-related functions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question and suggestion about application binary compatibility policy

2016-06-22 Thread Bruce Momjian
On Mon, May 30, 2016 at 03:04:24AM +, Tsunakawa, Takayuki wrote:
> Hello,
> 
> I'd like to ask you about the policy of application binary compatibility.  
> And have a suggestion as well.
> 
> QUESTION
> ==
> 
> My customer asked me if the following usage is correct.
> 
> - Build an embedded SQL C application with PostgreSQL 9.2.
> - Distribute the app without ecpg nor libpq libraries.  Require users to 
> install PostgreSQL client which includes ecpg and libpq libraries.
> - Use the app with newer PostgreSQL major versions without rebuilding the 
> app.  That is, the users just replaces the PostgreSQL client.
> 
> I expect this is legal, because the so_major versions of ecpg and libpq are 6 
> and 5 respectively for all PostgreSQL 9.x versions so far.  However, I could 
> not find any description of this binary compatibility policy in the manual, 
> so I haven't been able to answer the customer.
> 
> What's the official policy of application binary compatibility?  I found the 
> same discussion in the below thread, but I'm afraid any clear answer wasn't 
> given:
> 
> https://www.postgresql.org/message-id/522f0b6b.1040...@4js.com
> 
> 
> SUGGESTION
> ==
> 
> How about adding an article about application binary compatibility in the 
> following section, as well as chapters for libpq, ECPG, etc?
> 
> 17.6. Upgrading a PostgreSQL Clust
> https://www.postgresql.org/docs/devel/static/upgrading.html
> 
> There are three kinds of application assets that users are concerned about 
> when upgrading.  Are there anything else to mention?
> 
> * libpq app
> * ECPG app
> * C-language user defined function (and other stuff dependent on it, such as 
> extensions, UDTs, etc.)

I know this thread is old but it bounced around a lot of ideas.  I think
there are some open questions:

*  Will a new application link against an older minor-version libpq?
*  Will an old application link against a newer minor-version libpq?

I think we are all in agreement that a binary cannot run using a libpq
with a different major version number.

We have this text in src/tools/RELEASE_CHANGES:

Major Version
=

The major version number should be updated whenever the source of the
library changes to make it binary incompatible. Such changes include,
but are not limited to:

1. Removing a public function or structure (or typedef, enum, ...)

2. Modifying a public functions arguments.

3. Removing a field from a public structure.

4. Adding a field to a public structure, unless steps have been
previously taken to shield users from such a change, for example by
such structures only ever being allocated/instantiated by a library
function which would give the new field a suitable default value.

Adding a new function should NOT force an increase in the major version
number. (Packagers will see the standard minor number update and install
the new library.)  When the major version is increased all applications
which link to the library MUST be recompiled - this is not desirable. 
When
the major version is updated the minor version gets reset.

Minor Version
=

The minor version number should be updated whenever the functionality of
the library has changed, typically a change in source code between 
releases
would mean an increase in the minor version number so long as it does 
not
require a major version increase.

Given that we make at least minor changes to our libraries in every 
major
PostgreSQL version, we always bump all minor library version numbers at 
the
start of each development cycle as a matter of policy.

This is saying running against a mismatched minor version should be fine
for a binary.

ecpg is a little tricker because it has knowledge of SQL data types and
might not support certain features if the ecpg library isn't _run_
against the same major server version.  My guess is that older ecpg
libraries will run fine against newer servers, but the opposite might
not be true.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Requesting external_pid_file with postgres -C when not initialized lead to coredump

2016-06-22 Thread alain radix
Hi Tom,

The actual fix do the job for preventing coredump.
Effectively, reporting "" instead of (null) would be more consistent with
the values found in pg_settings.

Another effect of the patch is that it become possible to start a cluster
with external_pid_file='' in postgresql.conf
It would have that same behavior as many other parameters.

I started a 9.6 beta with the following lines in postgresql.conf and the
only one that reported a problem is external_pid_file

external_pid_file = ''  # write an extra PID file
unix_socket_group = ''  # (change requires restart)
bonjour_name = ''   # defaults to the computer name
ssl_ca_file = ''# (change requires restart)
ssl_crl_file = ''   # (change requires restart)
krb_server_keyfile = ''
shared_preload_libraries = ''   # (change requires restart)
synchronous_standby_names = ''  # standby servers that provide sync rep
cluster_name = ''   # added to process titles if
nonempty
default_tablespace = '' # a tablespace name, '' uses the default
temp_tablespaces = ''   # a list of tablespace names, ''
uses
local_preload_libraries = ''
session_preload_libraries = ''

It seems to me that the general behavior it to consider an empty string as
an unset parameter in postgresql.conf, why should external_pid_file be an
exception ?

I would prefer postgresql -C to report an empty string to be consistent
with pg_settings in a backport of the fix.

I also would prefer external_pid_file to behave as other parameters in the
next versions.

Regards.

Alain


On 22 June 2016 at 17:24, Tom Lane  wrote:

> alain radix  writes:
> > Here is a new patch with postmaster.c modification so that it check
> about a
> > empty string instead of a null pointer.
>
> Why should we adopt this, when there is already a better (more complete)
> fix in git?
>
> I'm not sold on the wisdom of modifying the semantics of
> external_pid_file, which is what this patch effectively does.  I don't
> know if anything outside our core code looks at that variable, but if
> anything does, this would create issues for it.
>
> I'm even less sold on the wisdom of changing every other GUC that
> has a null bootstrap default, which is what we would also have to do
> in order to avoid the identical misbehavior for other -C cases.
>
> > The meaning is no more to avoid a core dump as you've done a change for
> > that but to have the same result with postgres -C as with a request to
> > pg_settings, an empty string when the parameter is not set.
>
> Well, maybe we should make -C print an empty string not "(nil)" for
> such cases.  You're right that we don't make a distinction between
> NULL and "" in other code paths that display a string GUC, so doing
> so here is a bit inconsistent.
>
> regards, tom lane
>



-- 
Alain Radix


[HACKERS] [HITB-Announce] HITB2016AMS Videos & GSEC Singapore Voting

2016-06-22 Thread Hafez Kamal

Videos from the 7th annual HITB Security Conference are being released
this week!

HITBSecConf Youtube channel: http://youtube.com/hitbsecconf

Talks from the #HITB2016AMS CommSec track have already been uploaded and
linked to their individual presentations:

http://conference.hitb.org/hitbsecconf2016ams/commsec-track/

Conference materials:
http://conference.hitb.org/hitbsecconf2016ams/materials/

===

On an unrelated note, voting for HITB GSEC talks in Singapore (August
25th and 26th) closes on the 30th of June! Two audience voted talks have
already been added to the agenda:

iOS 10 Kernel Heap Revisited - Stefan Esser

http://gsec.hitb.org/sg2016/sessions/ios-10-kernel-heap-revisited/

Look Mom! I Don't Use Shellcode: A Browser Exploitation Case Study for
Internet Explorer 11 - Moritz Jodeit

http://gsec.hitb.org/sg2016/sessions/look-mom-i-dont-use-shellcode-a-browser-exploitation-case-study-for-internet-explorer-11/

HITB GSEC Voting: http://gsec.hitb.org/vote/

See you in Singapore!

Regards,
Hafez Kamal
Hack in The Box (M) Sdn. Bhd
36th Floor, Menara Maxis
Kuala Lumpur City Centre
50088 Kuala Lumpur, Malaysia
Tel: +603-26157299
Fax: +603-26150088



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Precedence of new phrase search tsquery operator

2016-06-22 Thread Teodor Sigaev

Attached patch changes a precedences of operations to |, &, <->, | in ascending
order. BTW, it simplifies a bit a code around printing and parsing of tsquery.


|, &, <->, ! of course
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how is the WAL receiver process stopped and restarted when the network connection is broken and then restored?

2016-06-22 Thread Rui Hai Jiang
Hello,

I have one Primary server and one Standby server. They are doing streaming
replication well.

I  did some testing. I broke the network connection between them for a few
minutes, and then restored the network. I found the both the WAL sender and
WAL receiver were stopped and the restarted.

I wonder how WAL receiver process is stopped and restarted. I have checked
the code hoping to find out the answer, but I don't have any clue.

Could anyone help?

Thanks,
Rui Hai


Re: [HACKERS] Requesting external_pid_file with postgres -C when not initialized lead to coredump

2016-06-22 Thread Tom Lane
alain radix  writes:
> Here is a new patch with postmaster.c modification so that it check about a
> empty string instead of a null pointer.

Why should we adopt this, when there is already a better (more complete)
fix in git?

I'm not sold on the wisdom of modifying the semantics of
external_pid_file, which is what this patch effectively does.  I don't
know if anything outside our core code looks at that variable, but if
anything does, this would create issues for it.

I'm even less sold on the wisdom of changing every other GUC that
has a null bootstrap default, which is what we would also have to do
in order to avoid the identical misbehavior for other -C cases.

> The meaning is no more to avoid a core dump as you've done a change for
> that but to have the same result with postgres -C as with a request to
> pg_settings, an empty string when the parameter is not set.

Well, maybe we should make -C print an empty string not "(nil)" for
such cases.  You're right that we don't make a distinction between
NULL and "" in other code paths that display a string GUC, so doing
so here is a bit inconsistent.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-06-22 Thread Robert Haas
On Wed, Jun 22, 2016 at 5:14 AM, Amit Kapila  wrote:
> We can do it in the way as you are suggesting, but there is another thing
> which we need to consider here.  As of now, the patch tries to finish the
> split if it finds split-in-progress flag in either old or new bucket.  We
> need to lock both old and new buckets to finish the split, so it is quite
> possible that two different backends try to lock them in opposite order
> leading to a deadlock.  I think the correct way to handle is to always try
> to lock the old bucket first and then new bucket.  To achieve that, if the
> insertion on new bucket finds that split-in-progress flag is set on a
> bucket, it needs to release the lock and then acquire the lock first on old
> bucket, ensure pincount is 1 and then lock new bucket again and ensure that
> pincount is 1. I have already maintained the order of locks in scan (old
> bucket first and then new bucket; refer changes in _hash_first()).
> Alternatively, we can try to  finish the splits only when someone tries to
> insert in old bucket.

Yes, I think locking buckets in increasing order is a good solution.
I also think it's fine to only try to finish the split when the insert
targets the old bucket.  Finishing the split enables us to remove
tuples from the old bucket, which lets us reuse space instead of
accelerating more.  So there is at least some potential benefit to the
backend inserting into the old bucket.  On the other hand, a process
inserting into the new bucket derives no direct benefit from finishing
the split.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Requesting external_pid_file with postgres -C when not initialized lead to coredump

2016-06-22 Thread alain radix
Yes Michael,

You're right, I aw this yesterday but couldn't make a patch.

Here is a new patch with postmaster.c modification so that it check about a
empty string instead of a null pointer.

The meaning is no more to avoid a core dump as you've done a change for
that but to have the same result with postgres -C as with a request to
pg_settings, an empty string when the parameter is not set.

It seems a lot better to me that different ways to request the same
parameter should return the same answer.

Actually, setting from a request to pg_settings an empty string in
postgresql.conf for external_pid_file would lead to an error for the
postmaster.

To values of parameter should be the same reported in pg_settings and
postgres -C and should be a valid settings in postgresql.conf
So, that if you create a postgresql.conf from pg_settings or postresql -C,,
it would would be a valid one.

So, the patch make an empty string valid in all places and shouldn't cause
problem with existing installations.

Regards.


Alain Radix

On 21 June 2016 at 23:45, Michael Paquier  wrote:

> On Wed, Jun 22, 2016 at 2:02 AM, Alvaro Herrera
>  wrote:
> > alain radix wrote:
> >> So, here is my first patch for PostgreSQL.
> >
> > Looking forward to further ones,
>
> A comment on this patch: this is an incorrect approach anyway.
> PostmasterMain relies on this value being NULL to decide if this PID
> file should be written or not, so this patch applied as-is would
> result in a message outputted to stderr if the logic is not changed
> there.
> --
> Michael
>



-- 
Alain Radix


external_pid_file.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-06-22 Thread Robert Haas
On Wed, Jun 22, 2016 at 5:10 AM, Amit Kapila  wrote:
>> > Insertion will happen by scanning the appropriate bucket and needs to
>> > retain pin on primary bucket to ensure that concurrent split doesn't 
>> > happen,
>> > otherwise split might leave this tuple unaccounted.
>>
>> What do you mean by "unaccounted"?
>
> It means that split might leave this tuple in old bucket even if it can be
> moved to new bucket.  Consider a case where insertion has to add a tuple on
> some intermediate overflow bucket in the bucket chain, if we allow split
> when insertion is in progress, split might not move this newly inserted
> tuple.

OK, that's a good point.

>> > Now for deletion of tuples from (N+1/2) bucket, we need to wait for the
>> > completion of any scans that began before we finished populating bucket 
>> > N+1,
>> > because otherwise we might remove tuples that they're still expecting to
>> > find in bucket (N+1)/2. The scan will always maintain a pin on primary
>> > bucket and Vacuum can take a buffer cleanup lock (cleanup lock includes
>> > Exclusive lock on bucket and wait till all the pins on buffer becomes zero)
>> > on primary bucket for the buffer.  I think we can relax the requirement for
>> > vacuum to take cleanup lock (instead take Exclusive Lock on buckets where 
>> > no
>> > split has happened) with the additional flag has_garbage which will be set
>> > on primary bucket, if any tuples have been moved from that bucket, however 
>> > I
>> > think for squeeze phase (in this phase, we try to move the tuples from 
>> > later
>> > overflow pages to earlier overflow pages in the bucket and then if there 
>> > are
>> > any empty overflow pages, then we move them to kind of a free pool) of
>> > vacuum, we need a cleanup lock, otherwise scan results might get effected.
>>
>> affected, not effected.
>>
>> I think this is basically correct, although I don't find it to be as
>> clear as I think it could be.  It seems very clear that any operation
>> which potentially changes the order of tuples in the bucket chain,
>> such as the squeeze phase as currently implemented, also needs to
>> exclude all concurrent scans.  However, I think that it's OK for
>> vacuum to remove tuples from a given page with only an exclusive lock
>> on that particular page.
>
> How can we guarantee that it doesn't remove a tuple that is required by scan
> which is started after split-in-progress flag is set?

If the tuple is being removed by VACUUM, it is dead.  We can remove
dead tuples right away, because no MVCC scan will see them.  In fact,
the only snapshot that will see them is SnapshotAny, and there's no
problem with removing dead tuples while a SnapshotAny scan is in
progress.  It's no different than heap_page_prune() removing tuples
that a SnapshotAny sequential scan was about to see.

If the tuple is being removed because the bucket was split, it's only
a problem if the scan predates setting the split-in-progress flag.
But since your design involves out-waiting all scans currently in
progress before setting that flag, there can't be any scan in progress
that hasn't seen it.  A scan that has seen the flag won't look at the
tuple in any case.

>> (Plain text email is preferred to HTML on this mailing list.)
>>
>
> If I turn to Plain text [1], then the signature of my e-mail also changes to
> Plain text which don't want.  Is there a way, I can retain signature
> settings in Rich Text and mail content as Plain Text.

Nope, but I don't see what you are worried about.  There's no HTML
content in your signature anyway except for a link, and most
mail-reading software will turn that into a hyperlink even without the
HTML.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres_fdw join pushdown - wrong results with whole-row reference

2016-06-22 Thread Ashutosh Bapat
On Wed, Jun 22, 2016 at 3:57 PM, Etsuro Fujita 
wrote:

> On 2016/06/22 18:16, Ashutosh Bapat wrote:
>
>> On Wed, Jun 22, 2016 at 2:26 PM, Etsuro Fujita
>> > wrote:
>>
>
> I think we could address this in another way once we support
>> deparsing subqueries; rewrite the remote query into something that
>> wouldn't need the CASE WHEN conversion.  For example, we currently
>> have:
>>
>> postgres=# explain verbose select ft2 from ft1 left join ft2 on
>> ft1.a = ft2.a;
>>
>> QUERY PLAN
>>
>> --
>>  Foreign Scan  (cost=100.00..110.04 rows=1 width=32)
>>Output: ft2.*
>>Relations: (public.ft1) LEFT JOIN (public.ft2)
>>Remote SQL: SELECT CASE WHEN r2.* IS NOT NULL THEN ROW(r2.a,
>> r2.b) END FROM (public.t1 r1 LEFT JOIN public.t2 r2 ON (((r1.a =
>> r2.a
>> (4 rows)
>>
>> However, if we support deparsing subqueries, the remote query in the
>> above example could be rewritten into something like this:
>>
>> SELECT ss.c2 FROM t1 LEFT JOIN (SELECT t2.a, ROW(a, b) FROM t2)
>> ss(c1, c2) ON (t1.a = ss.c1);
>>
>> So we would no longer need "CASE WHEN r2.* IS NOT NULL THEN
>> ROW(r2.a, r2.b) END" in the target list in the remote query.
>>
>
> Right. Although, it means that the query processor at the other end has
>> to do extra work for pulling up the subqueries.
>>
>
> Yeah, that's right.  But this approach seems not so ugly...
>
> For the CASE WHEN conversion for a system column other than ctid, we
>> could also address this by replacing the whole-row reference in the
>> IS NOT NULL condition in that conversion with the system column
>> reference.
>>
>
> That would not work again as the system column reference would make
>> sense locally but may not be available at the foreign server e.g.
>> foreign table targeting a view a tableoid is requested.
>>
>
> Maybe I'm confused, but I think that in the system-column case it's the
> user's responsibility to specify system columns for foreign tables in a
> local query only when that makes sense on the remote end, as shown in the
> below counter example:
>
> postgres=# create foreign table fv1 (a int, b int) server myserver options
> (table_name 'v1');
> CREATE FOREIGN TABLE
> postgres=# select ctid, * from fv1;
> ERROR:  column "ctid" does not exist
> CONTEXT:  Remote SQL command: SELECT a, b, ctid FROM public.v1
>

But a ctid, when available, would rightly get nullified when referenced as
a column of table. What happens with the other system columns is we 0 them
out locally, whether they are available at the foreign server or not. We
never try to check whether they are available at the foreign server or not.
If we use such column's column name to decide whether to report 0 or null
and if that column is not available at the foreign table, we will get
error. I think we should avoid that. Those column anyway do not make any
sense.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Postgres_fdw join pushdown - wrong results with whole-row reference

2016-06-22 Thread Etsuro Fujita

On 2016/06/22 18:16, Ashutosh Bapat wrote:

On Wed, Jun 22, 2016 at 2:26 PM, Etsuro Fujita
> wrote:



I think we could address this in another way once we support
deparsing subqueries; rewrite the remote query into something that
wouldn't need the CASE WHEN conversion.  For example, we currently have:

postgres=# explain verbose select ft2 from ft1 left join ft2 on
ft1.a = ft2.a;

QUERY PLAN

--
 Foreign Scan  (cost=100.00..110.04 rows=1 width=32)
   Output: ft2.*
   Relations: (public.ft1) LEFT JOIN (public.ft2)
   Remote SQL: SELECT CASE WHEN r2.* IS NOT NULL THEN ROW(r2.a,
r2.b) END FROM (public.t1 r1 LEFT JOIN public.t2 r2 ON (((r1.a =
r2.a
(4 rows)

However, if we support deparsing subqueries, the remote query in the
above example could be rewritten into something like this:

SELECT ss.c2 FROM t1 LEFT JOIN (SELECT t2.a, ROW(a, b) FROM t2)
ss(c1, c2) ON (t1.a = ss.c1);

So we would no longer need "CASE WHEN r2.* IS NOT NULL THEN
ROW(r2.a, r2.b) END" in the target list in the remote query.



Right. Although, it means that the query processor at the other end has
to do extra work for pulling up the subqueries.


Yeah, that's right.  But this approach seems not so ugly...


For the CASE WHEN conversion for a system column other than ctid, we
could also address this by replacing the whole-row reference in the
IS NOT NULL condition in that conversion with the system column
reference.



That would not work again as the system column reference would make
sense locally but may not be available at the foreign server e.g.
foreign table targeting a view a tableoid is requested.


Maybe I'm confused, but I think that in the system-column case it's the 
user's responsibility to specify system columns for foreign tables in a 
local query only when that makes sense on the remote end, as shown in 
the below counter example:


postgres=# create foreign table fv1 (a int, b int) server myserver 
options (table_name 'v1');

CREATE FOREIGN TABLE
postgres=# select ctid, * from fv1;
ERROR:  column "ctid" does not exist
CONTEXT:  Remote SQL command: SELECT a, b, ctid FROM public.v1

where v1 is a view created on the remote server "myserver".

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres_fdw join pushdown - wrong results with whole-row reference

2016-06-22 Thread Amit Langote
On 2016/06/22 18:14, Ashutosh Bapat wrote:
> I wonder whether such a whole-row-var would arise from the nullable side
>> of a join? I guess not.  Not that I'm saying we shouldn't account for that
>> case at all since any and every whole-row-var in the targetlist currently
>> gets that treatment, even those that are known non-nullable. Couldn't we
>> have prevented the latter somehow?  IOW, only generate the CASE WHEN when
>> a Var being deparsed is known nullable as the comment there says:
>>
>> deparse.c:
>>
>> 1639 /*
>> 1640  * In case the whole-row reference is under an outer join then it has
>> 1641  * to go NULL whenver the rest of the row goes NULL. Deparsing a join
>> 1642  * query would always involve multiple relations, thus qualify_col
>> 1643  * would be true.
>> 1644  */
>> 1645 if (qualify_col)
>> 1646 {
>> 1647 appendStringInfoString(buf, "CASE WHEN");
>> 1648 ADD_REL_QUALIFIER(buf, varno);
>> 1649 appendStringInfo(buf, "* IS NOT NULL THEN ");
>> 1650 }
>>
>> But I guess just fixing the expression as your patch does may be just fine.
>>
> I thought about that, but it means that we have compute the nullable relids
> (which isn't a big deal I guess). That's something more than necessary for
> fixing the bug, which is the focus in beta stage right now.

Agreed.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres_fdw join pushdown - wrong results with whole-row reference

2016-06-22 Thread Ashutosh Bapat
On Wed, Jun 22, 2016 at 2:26 PM, Etsuro Fujita 
wrote:

> On 2016/06/22 17:11, Amit Langote wrote:
>
>> I wonder whether such a whole-row-var would arise from the nullable side
>> of a join? I guess not.  Not that I'm saying we shouldn't account for that
>> case at all since any and every whole-row-var in the targetlist currently
>> gets that treatment, even those that are known non-nullable. Couldn't we
>> have prevented the latter somehow?  IOW, only generate the CASE WHEN when
>> a Var being deparsed is known nullable as the comment there says:
>>
>> deparse.c:
>>
>> 1639 /*
>> 1640  * In case the whole-row reference is under an outer join then it has
>> 1641  * to go NULL whenver the rest of the row goes NULL. Deparsing a join
>> 1642  * query would always involve multiple relations, thus qualify_col
>> 1643  * would be true.
>> 1644  */
>> 1645 if (qualify_col)
>> 1646 {
>> 1647 appendStringInfoString(buf, "CASE WHEN");
>> 1648 ADD_REL_QUALIFIER(buf, varno);
>> 1649 appendStringInfo(buf, "* IS NOT NULL THEN ");
>> 1650 }
>>
>
> I think we could address this in another way once we support deparsing
> subqueries; rewrite the remote query into something that wouldn't need the
> CASE WHEN conversion.  For example, we currently have:
>
> postgres=# explain verbose select ft2 from ft1 left join ft2 on ft1.a =
> ft2.a;
> QUERY PLAN
>
> --
>  Foreign Scan  (cost=100.00..110.04 rows=1 width=32)
>Output: ft2.*
>Relations: (public.ft1) LEFT JOIN (public.ft2)
>Remote SQL: SELECT CASE WHEN r2.* IS NOT NULL THEN ROW(r2.a, r2.b) END
> FROM (public.t1 r1 LEFT JOIN public.t2 r2 ON (((r1.a = r2.a
> (4 rows)
>
> However, if we support deparsing subqueries, the remote query in the above
> example could be rewritten into something like this:
>
> SELECT ss.c2 FROM t1 LEFT JOIN (SELECT t2.a, ROW(a, b) FROM t2) ss(c1, c2)
> ON (t1.a = ss.c1);
>
> So we would no longer need "CASE WHEN r2.* IS NOT NULL THEN ROW(r2.a,
> r2.b) END" in the target list in the remote query.
>

Right. Although, it means that the query processor at the other end has to
do extra work for pulling up the subqueries.


>
> For the CASE WHEN conversion for a system column other than ctid, we could
> also address this by replacing the whole-row reference in the IS NOT NULL
> condition in that conversion with the system column reference.
>
> That would not work again as the system column reference would make sense
locally but may not be available at the foreign server e.g. foreign table
targeting a view a tableoid is requested.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Postgres_fdw join pushdown - wrong results with whole-row reference

2016-06-22 Thread Ashutosh Bapat
I wonder whether such a whole-row-var would arise from the nullable side
> of a join? I guess not.  Not that I'm saying we shouldn't account for that
> case at all since any and every whole-row-var in the targetlist currently
> gets that treatment, even those that are known non-nullable. Couldn't we
> have prevented the latter somehow?  IOW, only generate the CASE WHEN when
> a Var being deparsed is known nullable as the comment there says:
>
> deparse.c:
>
> 1639 /*
> 1640  * In case the whole-row reference is under an outer join then it has
> 1641  * to go NULL whenver the rest of the row goes NULL. Deparsing a join
> 1642  * query would always involve multiple relations, thus qualify_col
> 1643  * would be true.
> 1644  */
> 1645 if (qualify_col)
> 1646 {
> 1647 appendStringInfoString(buf, "CASE WHEN");
> 1648 ADD_REL_QUALIFIER(buf, varno);
> 1649 appendStringInfo(buf, "* IS NOT NULL THEN ");
> 1650 }
>
> But I guess just fixing the expression as your patch does may be just fine.
>
>
I thought about that, but it means that we have compute the nullable relids
(which isn't a big deal I guess). That's something more than necessary for
fixing the bug, which is the focus in beta stage right now.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Hash Indexes

2016-06-22 Thread Amit Kapila
On Tue, Jun 21, 2016 at 9:33 PM, Robert Haas  wrote:
>
> On Thu, Jun 16, 2016 at 3:28 AM, Amit Kapila 
wrote:
> >> Incomplete splits can be completed either by vacuum or insert as both
> >> needs exclusive lock on bucket.  If vacuum finds split-in-progress
flag on a
> >> bucket then it will complete the split operation, vacuum won't see
this flag
> >> if actually split is in progress on that bucket as vacuum needs
cleanup lock
> >> and split retains pin till end of operation.  To make it work for
Insert
> >> operation, one simple idea could be that if insert finds
split-in-progress
> >> flag, then it releases the current exclusive lock on bucket and tries
to
> >> acquire a cleanup lock on bucket, if it gets cleanup lock, then it can
> >> complete the split and then the insertion of tuple, else it will have a
> >> exclusive lock on bucket and just perform the insertion of tuple.  The
> >> disadvantage of trying to complete the split in vacuum is that split
might
> >> require new pages and allocating new pages at time of vacuum is not
> >> advisable.  The disadvantage of doing it at time of Insert is that
Insert
> >> might skip it even if there is some scan on the bucket is going on as
scan
> >> will also retain pin on the bucket, but I think that is not a big
deal.  The
> >> actual completion of split can be done in two ways: (a) scan the new
bucket
> >> and build a hash table with all of the TIDs you find there.  When
copying
> >> tuples from the old bucket, first probe the hash table; if you find a
match,
> >> just skip that tuple (idea suggested by Robert Haas offlist) (b)
delete all
> >> the tuples that are marked as moved_by_split in the new bucket and
perform
> >> the split operation from the beginning using old bucket.
> >
> > I have completed the patch with respect to incomplete splits and delayed
> > cleanup of garbage tuples.  For incomplete splits, I have used the
option
> > (a) as mentioned above.  The incomplete splits are completed if the
> > insertion sees split-in-progress flag in a bucket.
>
> It seems to me that there is a potential performance problem here.  If
> the split is still being performed, every insert will see the
> split-in-progress flag set.  The in-progress split retains only a pin
> on the primary bucket, so other backends could also get an exclusive
> lock, which is all they need for an insert.  It seems that under this
> algorithm they will now take the exclusive lock, release the exclusive
> lock, try to take a cleanup lock, fail, again take the exclusive lock.
> That seems like a lot of extra monkeying around.  Wouldn't it be
> better to take the exclusive lock and then afterwards check if the pin
> count is 1?  If so, even though we only intended to take an exclusive
> lock, it is actually a cleanup lock.  If not, we can simply proceed
> with the insertion.  This way you avoid unlocking and relocking the
> buffer repeatedly.
>

We can do it in the way as you are suggesting, but there is another thing
which we need to consider here.  As of now, the patch tries to finish the
split if it finds split-in-progress flag in either old or new bucket.  We
need to lock both old and new buckets to finish the split, so it is quite
possible that two different backends try to lock them in opposite order
leading to a deadlock.  I think the correct way to handle is to always try
to lock the old bucket first and then new bucket.  To achieve that, if the
insertion on new bucket finds that split-in-progress flag is set on a
bucket, it needs to release the lock and then acquire the lock first on old
bucket, ensure pincount is 1 and then lock new bucket again and ensure that
pincount is 1. I have already maintained the order of locks in scan (old
bucket first and then new bucket; refer changes in _hash_first()).
Alternatively, we can try to  finish the splits only when someone tries to
insert in old bucket.

> > The second major thing
> > this new version of patch has achieved is cleanup of garbage tuples i.e
the
> > tuples that are left in old bucket during split.  Currently (in HEAD),
as
> > part of a split operation, we clean the tuples from old bucket after
moving
> > them to new bucket, as we have heavy-weight locks on both old and new
bucket
> > till the whole split operation.  In the new design, we need to take
cleanup
> > lock on old bucket and exclusive lock on new bucket to perform the split
> > operation and we don't retain those locks till the end (release the
lock as
> > we move on to overflow buckets).  Now to cleanup the tuples we need a
> > cleanup lock on a bucket which we might not have at split-end.  So I
choose
> > to perform the cleanup of garbage tuples during vacuum and when
re-split of
> > the bucket happens as during both the operations, we do hold cleanup
lock.
> > We can extend the cleanup of garbage to other operations as well if
> > required.
>
> I think it's OK for the squeeze phase to be deferred until vacuum or 

Re: [HACKERS] Hash Indexes

2016-06-22 Thread Amit Kapila
On Tue, Jun 21, 2016 at 9:26 PM, Robert Haas  wrote:
>
> On Tue, May 10, 2016 at 8:09 AM, Amit Kapila 
wrote:
>
> > Once the split operation has set the split-in-progress flag, it will
begin scanning bucket (N+1)/2.  Every time it finds a tuple that properly
belongs in bucket N+1, it will insert the tuple into bucket N+1 with the
moved-by-split flag set.  Tuples inserted by anything other than a split
operation will leave this flag clear, and tuples inserted while the split
is in progress will target the same bucket that they would hit if the split
were already complete.  Thus, bucket N+1 will end up with a mix of
moved-by-split tuples, coming from bucket (N+1)/2, and unflagged tuples
coming from parallel insertion activity.  When the scan of bucket (N+1)/2
is complete, we know that bucket N+1 now contains all the tuples that are
supposed to be there, so we clear the split-in-progress flag on both
buckets.  Future scans of both buckets can proceed normally.  Split
operation needs to take a cleanup lock on primary bucket to ensure that it
doesn't start if there is any Insertion happening in the bucket.  It will
leave the lock on primary bucket, but not pin as it proceeds for next
overflow page.  Retaining pin on primary bucket will ensure that vacuum
doesn't start on this bucket till the split is finished.
>
> In the second-to-last sentence, I believe you have reversed the words
> "lock" and "pin".
>

Yes. What, I mean to say is release the lock, but retain the pin on primary
bucket till end of operation.

> > Insertion will happen by scanning the appropriate bucket and needs to
retain pin on primary bucket to ensure that concurrent split doesn't
happen, otherwise split might leave this tuple unaccounted.
>
> What do you mean by "unaccounted"?
>

It means that split might leave this tuple in old bucket even if it can be
moved to new bucket.  Consider a case where insertion has to add a tuple on
some intermediate overflow bucket in the bucket chain, if we allow split
when insertion is in progress, split might not move this newly inserted
tuple.

> > Now for deletion of tuples from (N+1/2) bucket, we need to wait for the
completion of any scans that began before we finished populating bucket
N+1, because otherwise we might remove tuples that they're still expecting
to find in bucket (N+1)/2. The scan will always maintain a pin on primary
bucket and Vacuum can take a buffer cleanup lock (cleanup lock includes
Exclusive lock on bucket and wait till all the pins on buffer becomes zero)
on primary bucket for the buffer.  I think we can relax the requirement for
vacuum to take cleanup lock (instead take Exclusive Lock on buckets where
no split has happened) with the additional flag has_garbage which will be
set on primary bucket, if any tuples have been moved from that bucket,
however I think for squeeze phase (in this phase, we try to move the tuples
from later overflow pages to earlier overflow pages in the bucket and then
if there are any empty overflow pages, then we move them to kind of a free
pool) of vacuum, we need a cleanup lock, otherwise scan results might get
effected.
>
> affected, not effected.
>
> I think this is basically correct, although I don't find it to be as
> clear as I think it could be.  It seems very clear that any operation
> which potentially changes the order of tuples in the bucket chain,
> such as the squeeze phase as currently implemented, also needs to
> exclude all concurrent scans.  However, I think that it's OK for
> vacuum to remove tuples from a given page with only an exclusive lock
> on that particular page.
>

How can we guarantee that it doesn't remove a tuple that is required by
scan which is started after split-in-progress flag is set?

>  Also, I think that when cleaning up after a
> split, an exclusive lock is likewise sufficient to remove tuples from
> a particular page provided that we know that every scan currently in
> progress started after split-in-progress was set.
>

I think this could also have a similar issue as above, unless we have
something which prevents concurrent scans.

>
> (Plain text email is preferred to HTML on this mailing list.)
>

If I turn to Plain text [1], then the signature of my e-mail also changes
to Plain text which don't want.  Is there a way, I can retain signature
settings in Rich Text and mail content as Plain Text.


[1] -
http://www.mail-signatures.com/articles/how-to-add-or-change-an-email-signature-in-gmailgoogle-apps/

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Postgres_fdw join pushdown - wrong results with whole-row reference

2016-06-22 Thread Etsuro Fujita

On 2016/06/22 17:11, Amit Langote wrote:

I wonder whether such a whole-row-var would arise from the nullable side
of a join? I guess not.  Not that I'm saying we shouldn't account for that
case at all since any and every whole-row-var in the targetlist currently
gets that treatment, even those that are known non-nullable. Couldn't we
have prevented the latter somehow?  IOW, only generate the CASE WHEN when
a Var being deparsed is known nullable as the comment there says:

deparse.c:

1639 /*
1640  * In case the whole-row reference is under an outer join then it has
1641  * to go NULL whenver the rest of the row goes NULL. Deparsing a join
1642  * query would always involve multiple relations, thus qualify_col
1643  * would be true.
1644  */
1645 if (qualify_col)
1646 {
1647 appendStringInfoString(buf, "CASE WHEN");
1648 ADD_REL_QUALIFIER(buf, varno);
1649 appendStringInfo(buf, "* IS NOT NULL THEN ");
1650 }


I think we could address this in another way once we support deparsing 
subqueries; rewrite the remote query into something that wouldn't need 
the CASE WHEN conversion.  For example, we currently have:


postgres=# explain verbose select ft2 from ft1 left join ft2 on ft1.a = 
ft2.a;

QUERY PLAN
--
 Foreign Scan  (cost=100.00..110.04 rows=1 width=32)
   Output: ft2.*
   Relations: (public.ft1) LEFT JOIN (public.ft2)
   Remote SQL: SELECT CASE WHEN r2.* IS NOT NULL THEN ROW(r2.a, r2.b) 
END FROM (public.t1 r1 LEFT JOIN public.t2 r2 ON (((r1.a = r2.a

(4 rows)

However, if we support deparsing subqueries, the remote query in the 
above example could be rewritten into something like this:


SELECT ss.c2 FROM t1 LEFT JOIN (SELECT t2.a, ROW(a, b) FROM t2) ss(c1, 
c2) ON (t1.a = ss.c1);


So we would no longer need "CASE WHEN r2.* IS NOT NULL THEN ROW(r2.a, 
r2.b) END" in the target list in the remote query.


For the CASE WHEN conversion for a system column other than ctid, we 
could also address this by replacing the whole-row reference in the IS 
NOT NULL condition in that conversion with the system column reference.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres_fdw join pushdown - wrong results with whole-row reference

2016-06-22 Thread Amit Langote
On 2016/06/21 20:42, Ashutosh Bapat wrote:
> On Tue, Jun 21, 2016 at 4:36 PM, Amit Langote wrote:
>> On 2016/06/21 16:27, Rushabh Lathia wrote:
>>>
>>> And as above documentation clearly says that IS NULL and IS NOT NULL do
>> not
>>> always return inverse results for row-valued expressions. So need to
>> change
>>> the
>>> deparse logic into postgres_fdw - how ? May be to use IS NULL rather
>> then IS
>>> NOT NULL?
>>>
>>> Input/thought?
>>
>> Perhaps - NOT expr IS NULL?  Like in the attached.
>>
>>
> As the documentation describes row is NULL is going to return true when all
> the columns in a row are NULL, even though row itself is not null. So, with
> your patch a row (null, null, null) is going to be output as a NULL row. Is
> that right?

Right.

> In an outer join we have to differentiate between a row being null (because
> there was no joining row on nullable side) and a non-null row with all
> column values being null. If we cast the whole-row expression to a text
> e.g. r.*::text and test the resultant value for nullness, it gives us what
> we want. A null row casted to text is null and a row with all null values
> casted to text is not null.

You are right.  There may be non-null rows with all columns null which are
handled wrongly (as Rushabh reports) and the hack I proposed is not right
for.  Especially if from non-nullable side as in the reported case, NULL
test for such a whole-row-var would produce the wrong result.  Casting to
text as your patch does produces the correct behavior.

I wonder whether such a whole-row-var would arise from the nullable side
of a join? I guess not.  Not that I'm saying we shouldn't account for that
case at all since any and every whole-row-var in the targetlist currently
gets that treatment, even those that are known non-nullable. Couldn't we
have prevented the latter somehow?  IOW, only generate the CASE WHEN when
a Var being deparsed is known nullable as the comment there says:

deparse.c:

1639 /*
1640  * In case the whole-row reference is under an outer join then it has
1641  * to go NULL whenver the rest of the row goes NULL. Deparsing a join
1642  * query would always involve multiple relations, thus qualify_col
1643  * would be true.
1644  */
1645 if (qualify_col)
1646 {
1647 appendStringInfoString(buf, "CASE WHEN");
1648 ADD_REL_QUALIFIER(buf, varno);
1649 appendStringInfo(buf, "* IS NOT NULL THEN ");
1650 }

But I guess just fixing the expression as your patch does may be just fine.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers