date:20161102

Re: [HACKERS] Re: [COMMITTERS] pgsql: Change the way that LWLocks for extensions are allocated.

2016-11-02 Thread Ashutosh Sharma

Hi,

I have started with the review for this patch and would like to share
some of my initial review comments that requires author's attention.

1) I am getting some trailing whitespace errors when trying to apply
this patch using git apply command. Following are the error messages i
am getting.

[edb@localhost postgresql]$ git apply test_pg_stat_statements-v1.patch
test_pg_stat_statements-v1.patch:28: trailing whitespace.
$(MAKE) -C $(top_builddir)/src/test/regress all
test_pg_stat_statements-v1.patch:41: space before tab in indent.
 $(REGRESSCHECKS)
warning: 2 lines add whitespace errors.

2) The test-case you have added is failing that is because queryid is
going to be different every time you execute the test-case. I think it
would be better to remove the queryid column from "SELECT queryid,
query, calls, rows from pg_stat_statements ORDER BY rows". Below is
the output i got upon test-case execution.

make regresscheck

== running regression test queries==
test pg_stat_statements   ... FAILED
== shutting down postmaster   ==

3) Ok. As you mentioned upthread, I do agree with the fact that we
can't add pg_stat_statements tests for installcheck as this module
requires shared_preload_libraries to be set. But, if there is an
already existing installation with pg_stat_statements module loaded we
should allow the test to run on this installation if requested
explicitly by the user. I think we can add some target like
'installcheck-force' in the MakeFile that would help the end users to
run the test on his own installation. Thoughts?

Also, I am changed the status in the commit-fest from "Needs review"
to "waiting on Author".

On Mon, Sep 19, 2016 at 6:26 PM, Amit Kapila  wrote:
> On Sat, Sep 10, 2016 at 5:02 PM, Amit Kapila  wrote:
>> On Tue, Aug 30, 2016 at 8:01 AM, Andres Freund  wrote:
>>> On 2016-08-30 07:57:19 +0530, Amit Kapila wrote:
 I will write such a test case either in this week or early next week.
>>>
>>> Great.
>>>
>>
>> Okay, attached patch adds some simple tests for pg_stat_statements.
>> One thing to note is that we can't add pg_stat_statements tests for
>> installcheck as this module requires shared_preload_libraries to be
>> set.  Hope this suffices the need.
>>
>
> Registered this patch for next CF.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Add make rules to download raw Unicode mapping files

2016-11-02 Thread Michael Paquier

On Wed, Nov 2, 2016 at 12:58 AM, Peter Eisentraut  wrote:
> Add make rules to download raw Unicode mapping files
>
> This serves as implicit documentation and is handy if someone wants to
> tweak things.  The rules are not part of a normal build, like this
> entire directory.

If the goal is to prevent to have those files in the code tree,
wouldn't it make sense as well to have a .gitignore with all those
files in it?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-02 Thread Amit Langote

On 2016/11/02 2:34, Robert Haas wrote:
> On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
>  wrote:
>> [ new patches ]
> 
> Reviewing 0006:

Thanks for the review!

> This patch seems scary.  I sort of assumed from the title -- "Teach a
> few places to use partition check quals." -- that this was an optional
> thing, some kind of optimization from which we could reap further
> advantage once the basic infrastructure was in place.  But it's not
> that at all.  It's absolutely necessary that we do this, or data
> integrity is fundamentally compromised.  How do we know that we've
> found all of the places that need to be taught about these new,
> uncatalogued constraints?

Making this a separate commit from 0003 was essentially to avoid this
getting lost among all of its other changes.  In fact, it was to bring to
notice for closer scrutiny whether all the sites in the backend code that
are critical for data integrity in face of the implicit partition
constraints are being informed about those constraints.

> I'm feeling fairly strongly like you should rewind and make the
> partitioning constraints normal catalogued constraints.  That's got a
> number of advantages, most notably that we can be sure they will be
> properly enforced by the entire system (modulo existing bugs, of
> course).  Also, they'll show up automatically in tools like psql's \d
> output, pgAdmin, and anything else that is accustomed to being able to
> find constraints in the catalog.  We do need to make sure that those
> constraints can't be dropped (or altered?) inappropriately, but that's
> a relatively small problem.  If we stick with the design you've got
> here, every client tool in the world needs to be updated, and I'm not
> seeing nearly enough advantage in this system to justify that kind of
> upheaval.

As for which parts of the system need to know about these implicit
partition constraints to *enforce* them for data integrity, we could say
that it's really just one site - ExecConstraints() called from
ExecInsert()/ExecUpdate().

Admittedly, the current error message style as in this patch exposes the
implicit constraint approach to a certain criticism: "ERROR:  new row
violates the partition boundary specification of \"%s\"".  It would say
the following if it were a named constraint: "ERROR: new row for relation
\"%s\" violates check constraint \"%s\""

For constraint exclusion optimization, we teach get_relation_constraints()
to look at these constraints.  Although greatly useful, it's not the case
of being absolutely critical.

Beside the above two cases, there is bunch of code (relcache, DDL) that
looks at regular constraints, but IMHO, we need not let any of that code
concern itself with the implicit partition constraints.  Especially, I
wonder why the client tools should want the implicit partitioning
constraint to be shown as a CHECK constraint?  As the proposed patch 0004
(psql) currently does, isn't it better to instead show the partition
bounds as follows?

+CREATE TABLE part_b PARTITION OF parted (
+   b WITH OPTIONS NOT NULL DEFAULT 1 CHECK (b >= 0),
+   CONSTRAINT check_a CHECK (length(a) > 0)
+) FOR VALUES IN ('b');

+\d part_b
+ Table "public.part_b"
+ Column |  Type   | Modifiers
++-+
+ a  | text|
+ b  | integer | not null default 1
+Partition of: parted FOR VALUES IN ('b')
+Check constraints:
+"check_a" CHECK (length(a) > 0)
+"part_b_b_check" CHECK (b >= 0)

Needless to say, that could save a lot of trouble thinking about
generating collision-free names of these constraints, their dependency
handling, unintended altering of these constraints, pg_dump, etc.

> In fact, as far as I can see, the only advantage of this approach is
> that when the insert arrives through the parent and is routed to the
> child by whatever tuple-routing code we end up with (I guess that's
> what 0008 does), we get to skip checking the constraint, saving CPU
> cycles.  That's probably an important optimization, but I don't think
> that putting the partitioning constraint in the catalog in any way
> rules out the possibility of performing that optimization.  It's just
> that instead of having the partitioning excluded-by-default and then
> sometimes choosing to include it, you'll have it included-by-default
> and then sometimes choose to exclude it.

Hmm, doesn't it seem like we would be making *more* modifications to the
existing code (backend or otherwise) to teach it about excluding the
implicit partition constraints than the other way around?  The other way
around being to modify the existing code to include the implicit
constraints which is what this patch is about.

Having said all that, I am open to switching to the catalogued partition
constraints if the arguments I make above in favor of this patch are not
all that sound.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.

Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh

On Wed, Nov 2, 2016 at 10:23 AM, Michael Paquier
 wrote:
> On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas  wrote:
>> IMHO, your rewrite of this patch was a bit heavy-handed.
>
> OK... Sorry for that.
>
>> I haven't
>> scrutinized the code here so maybe it was a big improvement, and if so
>> fine, but if not it's better to collaborate with the author than to
>> take over.
>
> While reviewing the code, that has finished by being a large rewrite,
> and that was more understandable than a review looking at all the
> small tweaks and things I have been through while reading it. I have
> also experimented a couple of ideas with the patch that I added, so at
> the end it proves to be a gain for everybody. I think that the last
> patch is an improvement, if you want to make your own opinion on the
> matter looking at the differences between both patches would be the
> most direct way to go.
>
If my understanding is correct regarding this feature, last two patches
completely break the fundamental idea of wal consistency check feature.
I mentioned this in my last reply as well that we've to use some flag
to indicate
whether an image should be restored during replay or not. Otherwise,
XLogReadBufferForRedoExtended will always restore the image skipping the usual
redo operation. What's happening now is the following:
1. If wal_consistency is on, include backup block image with the wal record.
2. During replay, XLogReadBufferForRedoExtended always restores the backup block
image in local buffer since XLogRecHasBlockImage is true for each block.
3. In checkConsistency, you compare the local buffer with the backup block image
from the wal record. It'll always be consistent.
This feature aims to validate whether wal replay operation is
happening correctly or not.
To achieve that aim, we should not alter the wal replay operation itself.

Rest of the suggestions are well-taken. I'll update the patch accordingly.
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Michael Paquier

On Wed, Nov 2, 2016 at 4:41 PM, Kuntal Ghosh  wrote:
> On Wed, Nov 2, 2016 at 10:23 AM, Michael Paquier
>  wrote:
>> On Tue, Nov 1, 2016 at 10:31 PM, Robert Haas  wrote:
>>> IMHO, your rewrite of this patch was a bit heavy-handed.
>>
>> OK... Sorry for that.
>>
>>> I haven't
>>> scrutinized the code here so maybe it was a big improvement, and if so
>>> fine, but if not it's better to collaborate with the author than to
>>> take over.
>>
>> While reviewing the code, that has finished by being a large rewrite,
>> and that was more understandable than a review looking at all the
>> small tweaks and things I have been through while reading it. I have
>> also experimented a couple of ideas with the patch that I added, so at
>> the end it proves to be a gain for everybody. I think that the last
>> patch is an improvement, if you want to make your own opinion on the
>> matter looking at the differences between both patches would be the
>> most direct way to go.
>>
> If my understanding is correct regarding this feature, last two patches
> completely break the fundamental idea of wal consistency check feature.
> I mentioned this in my last reply as well that we've to use some flag
> to indicate
> whether an image should be restored during replay or not. Otherwise,
> XLogReadBufferForRedoExtended will always restore the image skipping the usual
> redo operation. What's happening now is the following:
> 1. If wal_consistency is on, include backup block image with the wal record.
> 2. During replay, XLogReadBufferForRedoExtended always restores the backup 
> block
> image in local buffer since XLogRecHasBlockImage is true for each block.
> 3. In checkConsistency, you compare the local buffer with the backup block 
> image
> from the wal record. It'll always be consistent.
> This feature aims to validate whether wal replay operation is
> happening correctly or not.
> To achieve that aim, we should not alter the wal replay operation itself.

Hm... Right. That was broken. And actually, while the record-level
flag is useful so as you don't need to rely on checking
wal_consistency when doing WAL redo, the block-level flag is useful to
make a distinction between blocks that have to be replayed and the
ones that are used only for consistency, and both types could be mixed
in a record. Using it in bimg_info would be fine... Perhaps a better
name for the flag would be something like BKPIMAGE_APPLY, to mean that
the FPW needs to be applied at redo. Or BKPIMAGE_IGNORE, to bypass it
when replaying it. IS_REQUIRED_FOR_REDO is quite confusing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-02 Thread Amit Langote

On 2016/11/02 16:41, Amit Langote wrote:
> Having said all that, I am open to switching to the catalogued partition
> constraints if the arguments I make above in favor of this patch are not
> all that sound.

One problem I didn't immediately see a solution for if we go with the
catalogued partition constraints is how do we retrieve a relation's
partition constraint?  There are a couple of cases where that becomes
necessary.  Consider that we are adding a partition to a partitioned table
that is itself partition.  In this case, the new partition's constraint
consists of whatever we generate from its FOR VALUES specification *ANDed*
with the parent's constraint.  We must somehow get hold of the latter.
Which of the parent's named check constraints in the pg_constraint catalog
is supposed to be the partition constraint?  With the uncatalogued
partition constraints approach, we simply generate it from the parent's
pg_class.relpartbound (or we might get the relcache copy of the same
stored in rd_partcheck).  Hmm.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-02 Thread Gilles Darold

Le 02/11/2016 à 03:51, Karl O. Pinc a écrit :
> On Mon, 31 Oct 2016 09:26:27 +0100
> Gilles Darold  wrote:
>
>> Le 30/10/2016 à 08:04, Karl O. Pinc a écrit :
>> Attached patch v11 include your patch.
>>
>>> I have some questions about logfile_writename():
>>>
>>>  Why does the logfile_open() call fail silently?
>> logfile_open() "fail silently" in logfile_writename(), like in other
>> parts of syslogger.c where it is called, because this is a function()
>> that already report a message when an error occurs ("could not open
>> log file..."). I think I have already replied to this question.
> Please apply the attached patch on top of your v11 patch.
> It simulates a logfile_open() failure.  Upon simulated failure you do
> not get a "currrent_logfile" file, and neither do you get any
> indication of any problems anywhere in the logs.  It's failing
> silently.

Please have a look at line  1137 on HEAD of syslogger.c you will see
that in case of failure function logfile_open() report a FATAL or LOG
error with message:

errmsg("could not open log file \"%s\": %m", filename);

I don't think adding more error messages after this has any interest
this is why logfile_writename() just return without doing anything.
Other call to logfile_open() have the exact same behavior.

For a real example, create the temporary file and change its system
privileges:

   touch /usr/local/postgresql/data/current_logfiles.tmp
   chmod 400 /usr/local/postgresql/data/current_logfiles.tmp

So in this case of failure it prints:

2016-11-02 09:56:00.791 CET [6321] LOG:  could not open log file
"current_logfiles.tmp": Permission denied

I don't understand what you are expecting more? I agree that "open log"
can confuse a little but this is also a file where we log the current
log file name so I can live with that.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh

On Wed, Nov 2, 2016 at 1:34 PM, Michael Paquier
 wrote:
> Hm... Right. That was broken. And actually, while the record-level
> flag is useful so as you don't need to rely on checking
> wal_consistency when doing WAL redo, the block-level flag is useful to
> make a distinction between blocks that have to be replayed and the
> ones that are used only for consistency, and both types could be mixed
> in a record. Using it in bimg_info would be fine... Perhaps a better
> name for the flag would be something like BKPIMAGE_APPLY, to mean that
> the FPW needs to be applied at redo. Or BKPIMAGE_IGNORE, to bypass it
> when replaying it. IS_REQUIRED_FOR_REDO is quite confusing.
BKPIMAGE_APPLY seems reasonable.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2016-11-02 Thread Amit Langote

On 2016/11/02 2:44, Robert Haas wrote:
> On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
>  wrote:
>>> Insisting that you can't drop a child without detaching it first seems
>>> wrong to me.  If I already made this comment and you responded to it,
>>> please point me back to whatever you said.  However, my feeling is
>>> this is flat wrong and absolutely must be changed.
>>
>> I said the following [1]:
>>
>> | Hmm, I don't think I like this.  Why should it be necessary to detach
>> | a partition before dropping it?  That seems like an unnecessary step.
>>
>> I thought we had better lock the parent table when removing one of its
>> partitions and it seemed a bit odd to lock the parent table when dropping
>> a partition using DROP TABLE?  OTOH, with ALTER TABLE parent DETACH
>> PARTITION, the parent table is locked anyway.
> 
> That "OTOH" part seems like a pretty relevant point.
> 
> Basically, I think people expect to be able to say "DROP THINGTYPE
> thingname" or at most "DROP THINGTYPE thingname CASCADE" and have that
> thing go away.  I'm opposed to anything which requires some other
> series of steps without a very good reason, and possible surprise
> about the precise locks that the command requires isn't a good enough
> reason from my point of view.

OK, I changed things so that DROP TABLE a_partition no longer complains
about requiring to detach first.  Much like how index_drop() locks the
parent table ('parent' in a different sense, of course) and later
invalidates its relcache, heap_drop_with_catalog(), if the passed in relid
is a partition, locks the parent table using AccessExclusiveLock, performs
its usual business, and finally invalidates the parent's relcache before
closing it without relinquishing the lock.  Does that sounds sane?  One
downside is that if the specified command is DROP TABLE parent CASCADE,
the above described invalidation is a waste of cycles because the parent
will be dropped anyway after all the partitions are dropped.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-11-02 Thread Ashutosh Bapat

On Mon, Oct 31, 2016 at 6:17 AM, Masahiko Sawada  wrote:
> On Fri, Oct 28, 2016 at 3:19 AM, Robert Haas  wrote:
>> On Wed, Oct 26, 2016 at 2:00 AM, Masahiko Sawada  
>> wrote:
>>> I think we can consider the atomic commit and the atomic visibility
>>> separately, and the atomic visibility can build on the top of the
>>> atomic commit.
>>
>> It is true that we can do that, but I'm not sure whether it's the best 
>> design.
>
> I'm not sure best design, too. We need to discuss more. But this is
> not a particular feature for the sharing solution. The atomic commit
> using 2PC is useful for other servers that can use 2PC, not only
> postgres_fdw.
>

I think, we need to discuss the big picture i.e. architecture for
distributed transaction manager for PostgreSQL. Divide it in smaller
problems and then solve each of them as series of commits possibly
producing a useful feature with each commit. I think, what Robert is
pointing out is if we spend time solving smaller problems, we might
end up with something which can not be used to solve the bigger
problem. Instead, if we define the bigger problem and come up with
clear subproblems that when solved would solve the bigger problem, we
may not end up in such a situation.

There are many distributed transaction models discussed in various
papers like [1], [2], [3]. We need to assess which one/s, would suit
PostgreSQL FDW infrastructure and may be specifically for
postgres_fdw. There is some discussion at [4]. It lists a few
approaches, but I could not find a discussion on pros and cons of each
of them, and a conclusion as to which of the approaches suits
PostgreSQL. May be we want to start that discussion.

I know that it's hard to come up with a single model that would suit
FDWs or would serve all kinds of applications. We may not be able to
support a full distributed transaction manager for every FDW out
there. It's possible that because of lack of the big picture, we will
not see anything happen in this area for another release. Given that
and since all of the models in those papers require 2PC as a basic
building block, I was of the opinion that we could at least start with
2PC implementation. But I think request for bigger picture is also
valid for reasons stated above.

> Attached latest 3 patches that incorporated review comments so far.
> But recovery speed improvement that is discussed on another thread is
> not incorporated yet.
> Please give me feedback.
>

[1] http://link.springer.com/article/10.1007/s00778-014-0359-9
[2] 
https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/1c0a12a383dd2cd8c125613300585c64/7684dd8109a5b3d5c1256de40051686f/$FILE/tdd99.pdf
[3] http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1713&context=cstech
[4] https://wiki.postgresql.org/wiki/DTM

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-02 Thread Karl O. Pinc

On Wed, 2 Nov 2016 10:07:34 +0100
Gilles Darold  wrote:

> Please have a look at line  1137 on HEAD of syslogger.c you will see
> that in case of failure function logfile_open() report a FATAL or LOG
> error with message:
> 
> errmsg("could not open log file \"%s\": %m", filename);

Ok.  Thanks.  Sorry for the confusion.

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: scan key push down to heap [WIP]

2016-11-02 Thread Dilip Kumar

On Sat, Oct 29, 2016 at 12:17 PM, Dilip Kumar  wrote:
> What about putting slot reference inside HeapScanDesc ?. I know it
> will make ,heap layer use executor structure but just a thought.
>
> I have quickly hacked this way where we use slot reference in
> HeapScanDesc and directly use
>  slot_getattr inside HeapKeyTest (only if we have valid slot otherwise
> use _heap_getattr) and measure the worst case performance (what you
> have mentioned above.)
>
> My Test: (21 column table with varchar in beginning + qual is on last
> few column + varying selectivity )
>
> postgres=# \d test
>   Table "public.test"
>  Column |   Type| Modifiers
> +---+---
>  f1 | integer   |
>  f2 | character varying |
>  f3 | integer   |
>  f4 | integer   |
>  f5 | integer   |
>  f6 | integer   |
>  f7 | integer   |
>  f8 | integer   |
>  f9 | integer   |
>  f10| integer   |
>  f11| integer   |
>  f12| integer   |
>  f13| integer   |
>  f14| integer   |
>  f15| integer   |
>  f16| integer   |
>  f17| integer   |
>  f18| integer   |
>  f19| integer   |
>  f20| integer   |
>  f21| integer   |
>
> tuple count : 1000 (10 Million)
> explain analyze select * from test where f21< $1 and f20 < $1 and f19
> < $1 and f15 < $1 and f10 < $1; ($1 vary from 1Million to 1Million).
>
> Target code base:
> ---
> 1. Head
> 2. Heap_scankey_pushdown_v1
> 3. My hack for keeping slot reference in HeapScanDesc
> (v1+use_slot_in_HeapKeyTest)
>
> Result:
> Selectivity Head   scan_key_pushdown_v1 v1+use_slot_in_HeapKeyTest
> 0.1 3880  2980 2747
> 0.2 4041  3187 2914
> 0.5 5051  4921 3626
> 0.8 5378  7296 3879
> 1.0 6161  8525 4575
>
> Performance graph is attached in the mail..
>
> Observation:
> 
> 1. Heap_scankey_pushdown_v1, start degrading after very high
> selectivity (this behaviour is only visible if table have 20 or more
> columns, I tested with 10 columns but with that I did not see any
> regression in v1).
>
> 2. (v1+use_slot_in_HeapKeyTest) is always winner, even at very high 
> selectivity.

The patch is attached for this  (storing slot reference in HeapScanDesc)..

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


heap_scankey_pushdown_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make coverage-html on OS X

2016-11-02 Thread Peter Eisentraut

On 10/29/16 4:31 PM, Jim Nasby wrote:
> tl;dr: It's critical that you actually do a make install, or at least it 
> is if you've set --prefix with configure. If you don't, then even if you 
> do make check you'le going to get the *installed* libpq, and not the 
> *built* libpq.

I was not able to reproduce this.  I suspect that this would happen if
you don't disable System Integrity Protection, because if SIP is on,
then the environment variable DYLD_LIBRARY_PATH is disabled, and a
regression test run would be prone to use a mix of source tree and
install tree components.  Often, this is not a problem, but if your
source tree is built with different options than the previous install,
then it could make a mess.  Building with coverage instrumentation is
obviously one case where the build options are quite different.

> Also, looks like there's a race between the .info and .c.gcov targets 
> with parallel make; I'm wondering if there's a way to fix that by not 
> allowing parallelism in each directory...? (Presumably the issue is the 
> rm *.gcov). It'd be nice to fix this because -j speeds up coverage-html 
> a lot, even with just 2 cores.

I was not able to reproduce this.  The make rules look reasonable to me
for parallelism.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-02 Thread Karl O. Pinc

On Wed, 2 Nov 2016 07:55:45 -0500
"Karl O. Pinc"  wrote:

> On Wed, 2 Nov 2016 10:07:34 +0100
> Gilles Darold  wrote:
> 
> > Please have a look at line  1137 on HEAD of syslogger.c 

> Ok.  Thanks.  Sorry for the confusion.

And yes, we did talk about this before.  I should have remembered.

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh

On Wed, Nov 2, 2016 at 1:11 PM, Kuntal Ghosh  wrote:
> Rest of the suggestions are well-taken. I'll update the patch accordingly.
I've updated the last submitted patch(v10) with the following changes:
- added a block level flag BKPIMAGE_APPLY to distinguish backup image
blocks which needs to be restored during replay.
- at present, hash index operations are not WAL-logged. Hence, I've removed
the consistency check option for hash indices. It can be added later.

Few comments:
- Michael suggested to use an integer variable and bitwise-shift
operation to store
the RMGR values instead of using a boolean array. But, boolean array
implementation looks cleaner to me. For example,
+if (wal_consistency[rmid])
+   rechdr->xl_info |= XLR_CHECK_CONSISTENCY;

+include_image = needs_backup || wal_consistency[rmid];

- Another suggestion was to remove wal_consistency from PostgresNode.pm
because small buildfarm machines may suffer on it. Although I've no
experience in this matter, I would like to be certain that nothings breaks
in recovery tests after some modifications.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


walconsistency_v11.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-11-02 Thread Kevin Grittner

On Sun, Oct 30, 2016 at 10:35 AM, Kevin Grittner  wrote:

> SPI support would also
> allow us to consider using set logic for validating foreign keys,
> instead of the one-row-at-a-time approach currently used.

Just as a proof of concept for this I used the attached test case
to create foreign keys using current techniques versus set-oriented
queries with the transition-tsr code.  These probably can be
improved, since this is a "first cut" off the top of my head.

The delete of about one million rows from a "parent" table with no
matching rows in the "child" table, and no index on referencing
column in the child table, took 24:17.969 using current triggers
and 00:03.262 using the set-based triggers.  Yes, that reduces
current run time for that case by 99.78%

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

ri-set-logic.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] about missing xml related functionnalities

2016-11-02 Thread Jean-Paul Jorda

Hi,
I send this email after reading the xml2 module « Deprecation notice » in
https://www.postgresql.org/docs/9.6/static/xml2.html.

I still use xml2 module for one thing: xslt_process(). We store some
data in xml fields, and we sometime need to correct / upgrade the format.
Being able to apply a xslt transformation is a very efficient way to
achieve that.

So, as far as I'm concerned, xslt transformation appears to be a missing
fuctionnality in the newer api.

Best regards,

Jean-Paul


-- 
Jean-Paul JORDA
Centre de ressources technologiques
Coordinateur de l'équipe des développeurs
EDP Sciences
BP 112
17, avenue du Hoggar
P.A. de Courtaboeuf
F-91944 Les Ulis Cedex A
tel: 33 (0)1 69 18 17 64


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-11-02 Thread Kevin Grittner

Attached is a minor fix to go on top of transition-tsr for issues
found yesterday in testing.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 53bfd4b..2da841e 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4512,7 +4512,7 @@ set_cte_size_estimates(PlannerInfo *root, RelOptInfo 
*rel, double cte_rows)
 void
 set_tuplestore_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 {
-   RangeTblEntry *rte;
+   RangeTblEntry *rte PG_USED_FOR_ASSERTS_ONLY;
 
/* Should only be applied to base relations that are tuplestore 
references */
Assert(rel->relid > 0);
diff --git a/src/backend/parser/parse_relation.c 
b/src/backend/parser/parse_relation.c
index 7c943c3..0b45139 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1160,12 +1160,17 @@ parserOpenTable(ParseState *pstate, const RangeVar 
*relation, int lockmode)
else
{
/*
+* An unqualified name might be a tuplestore relation 
name.
+*/
+   if (get_visible_tuplestore_metadata(pstate->p_tsrcache, 
relation->relname))
+   rel = NULL;
+   /*
 * An unqualified name might have been meant as a 
reference to
 * some not-yet-in-scope CTE.  The bare "does not 
exist" message
 * has proven remarkably unhelpful for figuring out 
such problems,
 * so we take pains to offer a specific hint.
 */
-   if (isFutureCTE(pstate, relation->relname))
+   else if (isFutureCTE(pstate, relation->relname))
ereport(ERROR,

(errcode(ERRCODE_UNDEFINED_TABLE),
 errmsg("relation \"%s\" does 
not exist",
diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h
index a9f5d6e..57906c0 100644
--- a/src/include/parser/parse_node.h
+++ b/src/include/parser/parse_node.h
@@ -158,7 +158,7 @@ struct ParseState
boolp_locked_from_parent;
Relationp_target_relation;
RangeTblEntry *p_target_rangetblentry;
-   Tsrcache*p_tsrcache;/* visible named tuplestore relations */
+   Tsrcache   *p_tsrcache; /* visible named tuplestore relations */
 
/*
 * Optional hook functions for parser callbacks.  These are null unless

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-11-02 Thread Oskari Saarenmaa


26.10.2016, 21:34, Andres Freund kirjoitti:

Any chance that plsh or the script it executes does anything with the file 
descriptors it inherits? That'd certainly one way to get into odd corruption 
issues.

We processor really should use O_CLOEXEC for the majority of it file handles.


Attached a patch to always use O_CLOEXEC in BasicOpenFile if we're not 
using EXEC_BACKEND.  It'd be nice to not expose all fds to most 
pl-languages either, but I guess there's no easy solution to that 
without forcibly closing all fds whenever any functions are called.


/ Oskari
>From 50d7410b895a1aae26c7001f11bd0d71a200ef96 Mon Sep 17 00:00:00 2001
From: Oskari Saarenmaa 
Date: Wed, 2 Nov 2016 16:42:36 +0200
Subject: [PATCH] BasicOpenFile: always use O_CLOEXEC if it is available

---
 src/backend/storage/file/fd.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git src/backend/storage/file/fd.c src/backend/storage/file/fd.c
index b7ff5ef..6cbe378 100644
--- src/backend/storage/file/fd.c
+++ src/backend/storage/file/fd.c
@@ -894,7 +894,19 @@ BasicOpenFile(FileName fileName, int fileFlags, int fileMode)
 	int			fd;
 
 tryAgain:
-	fd = open(fileName, fileFlags, fileMode);
+	fd = open(fileName, fileFlags, fileMode
+#if defined(O_CLOEXEC) && !defined(EXEC_BACKEND)
+	/*
+	 * We don't want exec'd processes to inherit our file handles unless
+	 * EXEC_BACKEND is used.  We don't expect execve() calls inside the
+	 * postgres code, but extensions and pl-languages may spawn new
+	 * processes that either don't work due to having no usable file
+	 * descriptors or write garbage in the files previously opened by
+	 * us.
+	 */
+	| O_CLOEXEC
+#endif
+		);
 
 	if (fd >= 0)
 		return fd;/* success! */
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-11-02 Thread Pavel Stehule

2016-11-02 15:57 GMT+01:00 Kevin Grittner :

> On Sun, Oct 30, 2016 at 10:35 AM, Kevin Grittner 
> wrote:
>
> > SPI support would also
> > allow us to consider using set logic for validating foreign keys,
> > instead of the one-row-at-a-time approach currently used.
>
> Just as a proof of concept for this I used the attached test case
> to create foreign keys using current techniques versus set-oriented
> queries with the transition-tsr code.  These probably can be
> improved, since this is a "first cut" off the top of my head.
>
> The delete of about one million rows from a "parent" table with no
> matching rows in the "child" table, and no index on referencing
> column in the child table, took 24:17.969 using current triggers
> and 00:03.262 using the set-based triggers.  Yes, that reduces
> current run time for that case by 99.78%
>

this is great number

Pavel


>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: [HACKERS] Logical Replication WIP

2016-11-02 Thread Peter Eisentraut

On 10/24/16 9:22 AM, Petr Jelinek wrote:
> I added one more prerequisite patch (the first one) which adds ephemeral
> slots (or well implements UI on top of the code that was mostly already
> there). The ephemeral slots are different in that they go away either on
> error or when session is closed. This means the initial data sync does
> not have to worry about cleaning up the slots after itself. I think this
> will be useful in other places as well (for example basebackup). I
> originally wanted to call them temporary slots in the UI but since the
> behavior is bit different from temp tables I decided to go with what the
> underlying code calls them in UI as well.

I think it makes sense to expose this.

Some of the comments need some polishing.

Eventually, we might want to convert the option list in
CREATE_REPLICATION_SLOT into a list instead of adding more and more
keywords (see also VACUUM), but not necessarily now.

I find the way Acquire and Release are handled now quite confusing.
Because Release of an ephemeral slot means to delete it, you have
changed most code to never release them until the end of the session.
So there is a lot of ugly and confusing code that needs to know this
difference.  I think we need to use some different verbs for different
purposes here.  Acquire and release should keep their meaning of "I'm
using this", and the calls in proc.c and postgres.c should be something
like ReplicationSlotCleanup().

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split up psql \d Modifiers column

2016-11-02 Thread Kuntal Ghosh

On Fri, Oct 28, 2016 at 9:00 AM, Peter Eisentraut
 wrote:
> I propose to change the psql \d output a bit, best shown with an example:
>
>  \d persons3
> -   Table "public.persons3"
> - Column |  Type   |Modifiers
> -+-+--
> - id | integer | not null
> - name   | text| default ''::text
> +  Table "public.persons3"
> + Column |  Type   | Collation | Nullable | Default
> ++-+---+--+--
> + id | integer |   | not null |
> + name   | text|   |  | ''::text
>
+1.
psql-ref.sgml(line 4085) needs to be modified. Otherwise, the patch
looks good to me.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Substantial bloat in postgres_fdw regression test runtime

2016-11-02 Thread Tom Lane

In 9.6, "make installcheck" in contrib/postgres_fdw takes a shade
under 3 seconds on my machine.  In HEAD, it's taking 10 seconds.
I am not happy, especially not since there's no parallelization
of the contrib regression tests.  That's a direct consumption of
my time and all other developers' time too.  This evidently came
in with commit 7012b132d (Push down aggregates to remote servers),
and while that's a laudable feature, I cannot help thinking that
it does not deserve this much of an imposition on every make check
that's ever done for the rest of eternity.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-11-02 Thread Adam Brusselback

>
> The delete of about one million rows from a "parent" table with no
> matching rows in the "child" table, and no index on referencing
> column in the child table, took 24:17.969 using current triggers
> and 00:03.262 using the set-based triggers.  Yes, that reduces
> current run time for that case by 99.78%


That is really incredible.  Gets rid of the need for an index on
referencing columns for a ton of use cases.

Re: [HACKERS] pageinspect: Hash index support

2016-11-02 Thread Jesper Pedersen


On 11/01/2016 03:28 PM, Peter Eisentraut wrote:

Ok, thanks for your feedback.

Maybe "Returned with Feedback" is more appropriate, as there is still
development left.


I have applied the documentation change that introduced subsections,
which seems quite useful independently.  I have also committed the tests
that I proposed and will work through the failures.

As no new patch has been posted for the 2016-11 CF, I will close the
patch entry now.  Please submit an updated patch when you have the time,
keeping an eye on ongoing work to update hash indexes.



Agreed.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] auto_explain vs. parallel query

2016-11-02 Thread Tomas Vondra


On 11/01/2016 08:32 PM, Robert Haas wrote:

On Tue, Nov 1, 2016 at 10:58 AM, Tomas Vondra
 wrote:

Damn! You're right of course. Who'd guess I need more coffee this early?

Attached is a fix replacing the flag with an array of flags, indexed by
ParallelMasterBackendId. Hopefully that makes it work with multiple
concurrent parallel queries ... still, I'm not sure this is the right
solution.


I feel like it isn't.  I feel like this ought to go in the DSM for
that parallel query, not the main shared memory segment, but I'm not
sure how to accomplish that offhand.  Also, if we do solve it this
way, surely we don't need the locking.  The flag's only set before any
workers have started and never changes thereafter.



I'm not sure what you mean by "DSM for that parallel query" - I thought 
the segments are created for Gather nodes, no? Or is there a DSM for the 
whole query that we could use?


Another thing is that maybe we don't really want to give extensions 
access to any of those segments - my impression was those segments are 
considered internal (is there RequestAddinShmemSpace for them?). And 
hacking something just for auto_explain seems a big ugly.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-02 Thread Amit Kapila

On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra
 wrote:
> On 11/01/2016 08:13 PM, Robert Haas wrote:
>>
>> On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
>>  wrote:
>>>
>
> The one remaining thing is the strange zig-zag behavior, but that might
> easily be a due to scheduling in kernel, or something else. I don't consider
> it a blocker for any of the patches, though.
>

The only reason I could think of for that zig-zag behaviour is
frequent multiple clog page accesses and it could be due to below
reasons:

a. transaction and its subtransactions (IIRC, Dilip's case has one
main transaction and two subtransactions) can't fit into same page, in
which case the group_update optimization won't apply and I don't think
we can do anything for it.
b. In the same group, multiple clog pages are being accessed.  It is
not a likely scenario, but it can happen and we might be able to
improve a bit if that is happening.
c. The transactions at same time tries to update different clog page.
I think as mentioned upthread we can handle it by using slots an
allowing multiple groups to work together instead of a single group.

To check if there is any impact due to (a) or (b), I have added few
logs in code (patch - group_update_clog_v9_log).  The log message
could be "all xacts are not on same page" or  "Group contains
different pages".

Patch group_update_clog_v9_slots tries to address (c). So if there is
any problem due to (c), this patch should improve the situation.

Can you please try to run the test where you saw zig-zag behaviour
with both the patches separately?  I think if there is anything due to
postgres, then you can see either one of the new log message or
performance will be improved, OTOH if we see same behaviour, then I
think we can probably assume it due to scheduler activity and move on.
Also one point to note here is that even when the performance is down
in that curve, it is equal to or better than HEAD.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

group_update_clog_v9_log.patch
Description: Binary data

group_update_clog_v9_slots.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-02 Thread Tomas Vondra


On 11/02/2016 05:52 PM, Amit Kapila wrote:

On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra
 wrote:

On 11/01/2016 08:13 PM, Robert Haas wrote:


On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
 wrote:




The one remaining thing is the strange zig-zag behavior, but that might
easily be a due to scheduling in kernel, or something else. I don't consider
it a blocker for any of the patches, though.



The only reason I could think of for that zig-zag behaviour is
frequent multiple clog page accesses and it could be due to below
reasons:

a. transaction and its subtransactions (IIRC, Dilip's case has one
main transaction and two subtransactions) can't fit into same page, in
which case the group_update optimization won't apply and I don't think
we can do anything for it.
b. In the same group, multiple clog pages are being accessed.  It is
not a likely scenario, but it can happen and we might be able to
improve a bit if that is happening.
c. The transactions at same time tries to update different clog page.
I think as mentioned upthread we can handle it by using slots an
allowing multiple groups to work together instead of a single group.

To check if there is any impact due to (a) or (b), I have added few
logs in code (patch - group_update_clog_v9_log). The log message
could be "all xacts are not on same page" or "Group contains
different pages".

Patch group_update_clog_v9_slots tries to address (c). So if there
is any problem due to (c), this patch should improve the situation.

Can you please try to run the test where you saw zig-zag behaviour
with both the patches separately? I think if there is anything due
to postgres, then you can see either one of the new log message or
performance will be improved, OTOH if we see same behaviour, then I
think we can probably assume it due to scheduler activity and move
on. Also one point to note here is that even when the performance is
down in that curve, it is equal to or better than HEAD.



Will do.

Based on the results with more client counts (increment by 6 clients 
instead of 36), I think this really looks like something unrelated to 
any of the patches - kernel, CPU, or something already present in 
current master.


The attached results show that:

(a) master shows the same zig-zag behavior - No idea why this wasn't 
observed on the previous runs.


(b) group_update actually seems to improve the situation, because the 
performance keeps stable up to 72 clients, while on master the 
fluctuation starts way earlier.


I'll redo the tests with a newer kernel - this was on 3.10.x which is 
what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches 
you submitted, if the 4.8.6 kernel does not help.


Overall, I'm convinced this issue is unrelated to the patches.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-02 Thread Tomas Vondra


On 11/02/2016 05:52 PM, Amit Kapila wrote:

On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra
 wrote:

On 11/01/2016 08:13 PM, Robert Haas wrote:


On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
 wrote:




The one remaining thing is the strange zig-zag behavior, but that might
easily be a due to scheduling in kernel, or something else. I don't consider
it a blocker for any of the patches, though.



The only reason I could think of for that zig-zag behaviour is
frequent multiple clog page accesses and it could be due to below
reasons:

a. transaction and its subtransactions (IIRC, Dilip's case has one
main transaction and two subtransactions) can't fit into same page, in
which case the group_update optimization won't apply and I don't think
we can do anything for it.
b. In the same group, multiple clog pages are being accessed.  It is
not a likely scenario, but it can happen and we might be able to
improve a bit if that is happening.
c. The transactions at same time tries to update different clog page.
I think as mentioned upthread we can handle it by using slots an
allowing multiple groups to work together instead of a single group.

To check if there is any impact due to (a) or (b), I have added few
logs in code (patch - group_update_clog_v9_log). The log message
could be "all xacts are not on same page" or "Group contains
different pages".

Patch group_update_clog_v9_slots tries to address (c). So if there
is any problem due to (c), this patch should improve the situation.

Can you please try to run the test where you saw zig-zag behaviour
with both the patches separately? I think if there is anything due
to postgres, then you can see either one of the new log message or
performance will be improved, OTOH if we see same behaviour, then I
think we can probably assume it due to scheduler activity and move
on. Also one point to note here is that even when the performance is
down in that curve, it is equal to or better than HEAD.



Will do.

Based on the results with more client counts (increment by 6 clients 
instead of 36), I think this really looks like something unrelated to 
any of the patches - kernel, CPU, or something already present in 
current master.


The attached results show that:

(a) master shows the same zig-zag behavior - No idea why this wasn't 
observed on the previous runs.


(b) group_update actually seems to improve the situation, because the 
performance keeps stable up to 72 clients, while on master the 
fluctuation starts way earlier.


I'll redo the tests with a newer kernel - this was on 3.10.x which is 
what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches 
you submitted, if the 4.8.6 kernel does not help.


Overall, I'm convinced this issue is unrelated to the patches.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


zig-zag.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-11-02 Thread Peter Eisentraut

On 11/1/16 3:28 PM, Peter Eisentraut wrote:
> I have also committed the tests
> that I proposed and will work through the failures.

So, we're down to crashes in gin_metapage_info() on ia64 and sparc64.
My guess is that the raw page data that is passed into the function
needs to be 8-byte aligned before being accessed as GinMetaPageData.
(Maybe GinPageGetMeta() should do it?)  Does anyone have a box like this
handy to experiment?  Of course an actual core dump could tell more.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2016-11-02 Thread Tom Lane

Peter Eisentraut  writes:
> So, we're down to crashes in gin_metapage_info() on ia64 and sparc64.
> My guess is that the raw page data that is passed into the function
> needs to be 8-byte aligned before being accessed as GinMetaPageData.

That's what it looks like to me, too.  The "bytea" page image is
guaranteed to be improperly aligned for 64-bit access, since it will have
an int32 length word before the actual page data, breaking the alignment
that would exist for a page sitting in a page buffer.  This is likely to
be a problem for more things than just gin_metapage_info(); sooner or
later it could affect just about everything in pageinspect.

> (Maybe GinPageGetMeta() should do it?)

I think the right thing is likely to be to copy the presented bytea
into a palloc'd (and therefore properly aligned) buffer.  And not
just in this one function.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-02 Thread Mithun Cy

On Tue, Nov 1, 2016 at 9:42 PM, Robert Haas  wrote:
> Ah, nuts.  Thanks, good catch.  Should be fixed in the attached version.

I repeated the test on new patch, It works fine now, Also did some more
negative tests forcibly failing some internal calls. All tests have passed.
This patch works as described and look good to me.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] plan_rows confusion with parallel queries

2016-11-02 Thread Tomas Vondra


Hi,

while eye-balling some explain plans for parallel queries, I got a bit 
confused by the row count estimates. I wonder whether I'm alone.


Consider for example a simple seq scan query, which in non-parallel 
explain looks like this:


  QUERY PLAN
-
 Seq Scan on tables t  (cost=0.00..16347.60 rows=317160 width=356)
   (actual rows=317160 loops=1)
 Planning time: 0.173 ms
 Execution time: 47.707 ms
(3 rows)

but a parallel plan looks like this:

 QUERY PLAN
-
 Gather  (cost=0.00..14199.10 rows=317160 width=356)
 (actual rows=317160 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   ->  Parallel Seq Scan on tables t  (cost=... rows=102310 width=356)
  (actual rows=79290 loops=4)
 Planning time: 0.209 ms
 Execution time: 150.812 ms
(6 rows)


Now, for actual rows we can simply do 79290 * 4 = 317160, and we get the 
correct number of rows produced by the plan (i.e. matching the 
non-parallel query).


But for the estimate, it doesn't work like that:

102310 * 4 = 409240

which is ~30% above the actual estimate. It's clear why this is 
happening - when computing plan_rows, we don't count the leader as a 
full worker, but use this:


leader_contribution = 1.0 - (0.3 * path->parallel_workers);

so with 3 workers, the leader is only worth ~0.1 of a worker:

102310 * 3.1 = 317161

It's fairly easy to spot this here, because the Gather node is right 
above the Parallel Seq Scan, and the values in the Gather accurate. But 
in many plans the Gather will not be immediately above the node (e.g. 
there may be parallel aggregate in between).


Of course, the fact that we use planned number of workers when computing 
plan_rows but actual number of workers for actually produced rows makes 
this even more confusing.


BTW is it really a good idea to use nloops to track the number of 
workers executing a given node? How will that work if once we get 
parallel nested loops and index scans?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan_rows confusion with parallel queries

2016-11-02 Thread Tom Lane

Tomas Vondra  writes:
> while eye-balling some explain plans for parallel queries, I got a bit 
> confused by the row count estimates. I wonder whether I'm alone.

I got confused by that a minute ago, so no you're not alone.  The problem
is even worse in join cases.  For example:

 Gather  (cost=34332.00..53265.35 rows=100 width=8)
   Workers Planned: 2
   ->  Hash Join  (cost=2.00..52255.35 rows=100 width=8)
 Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2))
 ->  Append  (cost=0.00..8614.96 rows=417996 width=8)
   ->  Parallel Seq Scan on pp  (cost=0.00..8591.67 rows=416667 widt
h=8)
   ->  Parallel Seq Scan on pp1  (cost=0.00..23.29 rows=1329 width=8
)
 ->  Hash  (cost=14425.00..14425.00 rows=100 width=8)
   ->  Seq Scan on cc  (cost=0.00..14425.00 rows=100 width=8)

There are actually 100 rows in pp, and none in pp1.  I'm not bothered
particularly by the nonzero estimate for pp1, because I know where that
came from, but I'm not very happy that nowhere here does it look like
it's estimating a million-plus rows going into the join.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-11-02 Thread Kevin Grittner

> 2016-11-02 15:57 GMT+01:00 Kevin Grittner :
>> On Sun, Oct 30, 2016 at 10:35 AM, Kevin Grittner  wrote:
>>
>>> SPI support would also
>>> allow us to consider using set logic for validating foreign keys,
>>> instead of the one-row-at-a-time approach currently used.
>>
>> Just as a proof of concept for this I used the attached test case
>> to create foreign keys using current techniques versus set-oriented
>> queries with the transition-tsr code.  These probably can be
>> improved, since this is a "first cut" off the top of my head.
>>
>> The delete of about one million rows from a "parent" table with no
>> matching rows in the "child" table, and no index on referencing
>> column in the child table, took 24:17.969 using current triggers
>> and 00:03.262 using the set-based triggers.  Yes, that reduces
>> current run time for that case by 99.78%

On Wed, Nov 2, 2016 at 11:07 AM, Pavel Stehule  wrote:
>
> this is great number

On Wed, Nov 2, 2016 at 11:41 AM, Adam Brusselback
 wrote:
>
> That is really incredible.  Gets rid of the need for an index on referencing
> columns for a ton of use cases.

Keep in mind that this is just a quick proof of concept.  Unless
all participating transactions are at serializable isolation level
something would need to be done to handle race conditions, and that
might affect performance.  I do think that this approach is likely
to be better in enough circumstances, even after that is covered,
that it will be worth pursuing -- either as an option when
declaring a foreign key, or as the only implementation.  Until we
have a version that covers the race conditions and benchmark it in
a variety of workloads, it is hard to feel sure about the latter.
There may be some situations where crawling the indexes a row at a
time will perform better than this by enough to want to retain that
option.

A big plus of a single set-oriented statement is that it doesn't
suck unlimited RAM -- it will use work_mem to limit each tuplestore
and each query node, spilling to disk if needed.  The current FK
implementation sometimes runs for a very long time and can run
people out of memory.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Michael Paquier

On Wed, Nov 2, 2016 at 11:30 PM, Kuntal Ghosh
 wrote:
> On Wed, Nov 2, 2016 at 1:11 PM, Kuntal Ghosh  
> wrote:
>> Rest of the suggestions are well-taken. I'll update the patch accordingly.
> I've updated the last submitted patch(v10) with the following changes:
> - added a block level flag BKPIMAGE_APPLY to distinguish backup image
> blocks which needs to be restored during replay.
> - at present, hash index operations are not WAL-logged. Hence, I've removed
> the consistency check option for hash indices. It can be added later.

Both make sense.

> - Another suggestion was to remove wal_consistency from PostgresNode.pm
> because small buildfarm machines may suffer on it. Although I've no
> experience in this matter, I would like to be certain that nothings breaks
> in recovery tests after some modifications.

An extra idea that I have here would be to extend the TAP tests to
accept an environment variable that would be used to specify extra
options when starting Postgres instances. Buildfarm machines could use
it.

+/*
+ * Remember that, if WAL consistency check is enabled for
the current rmid,
+ * we always include backup image with the WAL record.
But, during redo we
+ * restore the backup block only if needs_backup is set.
+ */
+if (needs_backup)
+bimg.bimg_info |= BKPIMAGE_APPLY;
+
+
You should be careful about extra newlines and noise in the code.

-/* If it's a full-page image, restore it. */
-if (XLogRecHasBlockImage(record, block_id))
+/* If full-page image should be restored, do it. */
+if (XLogRecBlockImageApply(record, block_id))
Hm. It seems to me that this modification is incorrect. If the page
does not need to be applied, aka if it needs to be used for
consistency checks, what should be done is more something like the
following in XLogReadBufferForRedoExtended:
if (Apply(record, block_id))
return;
if (HasImage)
{
//do what needs to be done with an image
}
else
{
//do default things
}

XLogRecBlockImageApply() should only check for BKP_APPLY and not imply
HasImage(). This will be more flexible when for example using it for
assertions.

With this patch the comments on top of XLogReadBufferForRedo are
wrong. A block is not unconditionally applied.

+#define XLogRecBlockImageApply(decoder, block_id) \
+(XLogRecHasBlockImage(decoder, block_id) && \
+(((decoder)->blocks[block_id].bimg_info & BKPIMAGE_APPLY) > 0))
Maybe != 0? That's the most common practice in the code.

It would be more consistent to have a boolean flag to treat
BKPIMAGE_APPLY in xlogreader.c. Have a look at has_image for example.
This will as well reduce dependencies on the header xlog_record.h

+/*
+ * For a speculative tuple, the content of t_ctid is conflicting
+ * between the backup page and current page. Hence, we set it
+ * to the current block number and current offset.
+ */
+if (HeapTupleHeaderIsSpeculative(page_htup))
+ItemPointerSet(&page_htup->t_ctid, blkno, off);
In the set of masking functions this is the only portion of the code
depending on blkno. But isn't that actually a bug in speculative
inserts? Andres (added now in CC), Peter, could you provide some input
regarding that? The masking functions should not prevent the detection
of future errors, and this code is making me uneasy.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-11-02 Thread Adam Brusselback

>
> There may be some situations where crawling the indexes a row at a
> time will perform better than this by enough to want to retain that
> option.


If an index existed, wouldn't it still be able to use that in the set-based
implementation? Is there something which would make doing the check
set-based ever worse than row based inherently?

Re: [HACKERS] plan_rows confusion with parallel queries

2016-11-02 Thread Tomas Vondra


On 11/02/2016 09:00 PM, Tom Lane wrote:

Tomas Vondra  writes:

while eye-balling some explain plans for parallel queries, I got a bit
confused by the row count estimates. I wonder whether I'm alone.


I got confused by that a minute ago, so no you're not alone.  The problem
is even worse in join cases.  For example:

 Gather  (cost=34332.00..53265.35 rows=100 width=8)
   Workers Planned: 2
   ->  Hash Join  (cost=2.00..52255.35 rows=100 width=8)
 Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2))
 ->  Append  (cost=0.00..8614.96 rows=417996 width=8)
   ->  Parallel Seq Scan on pp  (cost=0.00..8591.67 rows=416667 widt
h=8)
   ->  Parallel Seq Scan on pp1  (cost=0.00..23.29 rows=1329 width=8
)
 ->  Hash  (cost=14425.00..14425.00 rows=100 width=8)
   ->  Seq Scan on cc  (cost=0.00..14425.00 rows=100 width=8)

There are actually 100 rows in pp, and none in pp1.  I'm not bothered
particularly by the nonzero estimate for pp1, because I know where that
came from, but I'm not very happy that nowhere here does it look like
it's estimating a million-plus rows going into the join.



Yeah. I wonder how tools visualizing explain plans are going to compute 
time spent in a given node (i.e. excluding the time spent in child 
nodes), or expected cost of that node.


So far we could do something like

self_time = total_time - child_node_time * nloops

and example in this plan it's pretty clear we spend ~130ms in Aggregate:

 QUERY PLAN

 Aggregate  (cost=17140.50..17140.51 rows=1 width=8)
   (actual time=306.675..306.675 rows=1 loops=1)
   ->  Seq Scan on tables  (cost=0.00..16347.60 rows=317160 width=0)
(actual time=0.188..170.993 rows=317160 loops=1)
 Planning time: 0.201 ms
 Execution time: 306.860 ms
(4 rows)

But in parallel plans it can easily happen that

child_node_time * nloops > total_time

Consider for example this parallel plan:

QUERY PLAN

 Finalize Aggregate  (cost=15455.19..15455.20 rows=1 width=8)
 (actual time=107.636..107.636 rows=1 loops=1)
   ->  Gather  (cost=15454.87..15455.18 rows=3 width=8)
   (actual time=107.579..107.629 rows=4 loops=1)
 Workers Planned: 3
 Workers Launched: 3
 ->  Partial Aggregate  (cost=14454.87..14454.88 rows=1 ...)
   (actual time=103.895..103.895 rows=1 loops=4)
   ->  Parallel Seq Scan on tables
   (cost=0.00..14199.10 rows=102310 width=0)
  (actual time=0.059..59.217 rows=79290 loops=4)
 Planning time: 0.052 ms
 Execution time: 109.250 ms
(8 rows)

Reading explains for parallel plans will always be complicated, but 
perhaps overloading the nloops like this makes it more complicated?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Making table reloading easier

2016-11-02 Thread Craig Ringer

Hi all

A very common operation that users perform is reloading tables.
Sometimes as part of an ETL process. Sometimes as part of a dump and
reload. Sometimes loading data from external DBs, etc.

Right now users have to jump through a bunch of hoops to do this efficiently:

BEGIN;

TRUNCATE TABLE my_table;

SELECT pg_get_indexdef(indexrelid::regclass)
FROM pg_index WHERE indrelid = 'table_name'::regclass;

-- Drop 'em all
DROP INDEX ... ;

COPY my_table FROM 'file';

-- Re-create indexes
CREATE INDEX ...;

COMMIT;


This is pretty clunky.

We already have support for disabling indexes, it's just not exposed
to the user. So the simplest option would seem to be to expose it with
something like:

  ALTER TABLE my_table
  DISABLE INDEX ALL;

which would take an ACCESS EXCLUSIVE lock then set:

  indistready = 'f'
  indislive = 'f'
  indisvalid = 'f'

on each index, or the named index if the user specifies one particular index.

After loading the table, a REINDEX on the table would rebuild and
re-enable the indexes.

That changes the process to:

BEGIN;
TRUNCATE TABLE my_table;
ALTER TABLE my_table DISABLE INDEX ALL;
COPY ...;
REINDEX TABLE my_table;
COMMIT;

It'd be even better to also add a REINDEX flag to COPY, where it
disables indexes and re-creates them after it finishes. But that could
be done separately.

Thoughts?

I'm not sure I can tackle this in the current dev cycle, but it looks
simple enough that I can't help wondering what obvious thing I'm
missing about why it hasn't been done yet.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] about missing xml related functionnalities

2016-11-02 Thread Craig Ringer

On 2 November 2016 at 21:12, Jean-Paul Jorda
 wrote:
> Hi,
> I send this email after reading the xml2 module « Deprecation notice » in
> https://www.postgresql.org/docs/9.6/static/xml2.html.
>
> I still use xml2 module for one thing: xslt_process(). We store some
> data in xml fields, and we sometime need to correct / upgrade the format.
> Being able to apply a xslt transformation is a very efficient way to
> achieve that.
>
> So, as far as I'm concerned, xslt transformation appears to be a missing
> fuctionnality in the newer api.

It is, but so far nobody has stepped up to do the work required to
implement it in core.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan_rows confusion with parallel queries

2016-11-02 Thread Tomas Vondra


On 11/02/2016 11:56 PM, Tomas Vondra wrote:

On 11/02/2016 09:00 PM, Tom Lane wrote:

Tomas Vondra  writes:

while eye-balling some explain plans for parallel queries, I got a bit
confused by the row count estimates. I wonder whether I'm alone.


I got confused by that a minute ago, so no you're not alone.  The problem
is even worse in join cases.  For example:

 Gather  (cost=34332.00..53265.35 rows=100 width=8)
   Workers Planned: 2
   ->  Hash Join  (cost=2.00..52255.35 rows=100 width=8)
 Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2))
 ->  Append  (cost=0.00..8614.96 rows=417996 width=8)
   ->  Parallel Seq Scan on pp  (cost=0.00..8591.67
rows=416667 widt
h=8)
   ->  Parallel Seq Scan on pp1  (cost=0.00..23.29
rows=1329 width=8
)
 ->  Hash  (cost=14425.00..14425.00 rows=100 width=8)
   ->  Seq Scan on cc  (cost=0.00..14425.00 rows=100
width=8)

There are actually 100 rows in pp, and none in pp1.  I'm not bothered
particularly by the nonzero estimate for pp1, because I know where that
came from, but I'm not very happy that nowhere here does it look like
it's estimating a million-plus rows going into the join.



Although - it is estimating 1M rows, but only "per worker" estimates are 
shown, and because there are 2 workers planned it says 1M/2.4 which is 
the 416k. I agree it's a bit unclear, but at least it's consistent with 
how we treat loops (i.e. that the numbers are per loop).


But there's more fun with joins - consider for example this simple join:

   QUERY PLAN
--
 Gather  (cost=19515.96..43404.82 rows=96957 width=12)
 (actual time=295.167..746.312 rows=9 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Hash Join  (cost=18515.96..32709.12 rows=96957 width=12)
  (actual time=249.281..670.309 rows=3 loops=3)
 Hash Cond: (t2.a = t1.a)
 ->  Parallel Seq Scan on t2
 (cost=0.00..8591.67 rows=416667 width=8)
 (actual time=0.100..184.315 rows=33 loops=3)
 ->  Hash  (cost=16925.00..16925.00 rows=96957 width=8)
   (actual time=246.760..246.760 rows=9 loops=3)
   Buckets: 131072  Batches: 2  Memory Usage: 2976kB
   ->  Seq Scan on t1
   (cost=0.00..16925.00 rows=96957 width=8)
   (actual time=0.065..178.385 rows=9 loops=3)
 Filter: (b < 10)
 Rows Removed by Filter: 91
 Planning time: 0.763 ms
 Execution time: 793.653 ms
(13 rows)

Suddenly we don't show per-worker estimates for the hash join - both the 
Hash Join and the Gather have exactly the same cardinality estimate.


Now, let's try forcing Nested Loops and see what happens:

QUERY PLAN
-
 Gather  (cost=1000.42..50559.65 rows=96957 width=12)
 (actual time=0.610..203.694 rows=9 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Nested Loop  (cost=0.42..39863.95 rows=96957 width=12)
(actual time=0.222..182.755 rows=3 loops=3)
 ->  Parallel Seq Scan on t1
(cost=0.00..9633.33 rows=40399 width=8)
(actual time=0.030..40.358 rows=3 loops=3)
   Filter: (b < 10)
   Rows Removed by Filter: 30
 ->  Index Scan using t2_a_idx on t2
  (cost=0.42..0.74 rows=1 width=8)
  (actual time=0.002..0.002 rows=1 loops=9)
   Index Cond: (a = t1.a)
 Planning time: 0.732 ms
 Execution time: 250.707 ms
(11 rows)

So, different join method but same result - 2 workers, loops=3. But 
let's try with small tables (100k rows instead of 1M rows):


  QUERY PLAN

 Gather  (cost=0.29..36357.94 rows=100118 width=12) (actual 
time=13.219..589.723 rows=10 loops=1)

   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Nested Loop  (cost=0.29..36357.94 rows=100118 width=12)
(actual time=0.288..442.821 rows=10 loops=1)
 ->  Seq Scan on t1  (cost=0.00..1444.18 rows=100118 width=8)
  (actual time=0.148..49.308 rows=10 loops=1)
 ->  Index Scan using t2_a_idx on t2
  (cost=0.29..0.34 rows=1 width=8)
  (actual time=0.002..0.002 rows=1 loops=10)
   Index Cond: (a = t1.a)
 Planning time: 0.483 ms
 Execution time: 648.941 ms
(10 rows)

Suddenly, we get nworkers=1 with loops=1 (and not nworkers+1 as before). 
FWIW I've only seen this with force_parallel_mode=on, and the row counts 
are correct, so perhaps that's OK. single_copy seems a bit 
underdocumented, though.



regards

[HACKERS] Row level security implementation in Foreign Table in Postgres

2016-11-02 Thread Sounak Chakraborty

Row level security feature implementation in Postgres is through policy and the 
row security qualifier is attached as a subquery to the main query before query 
planning. The RLS is implemented through ALTER TABLE STATEMENT. 
But my doubt is why this feature is not enabled in case of Foreign Table. 
(ALTER FOREIGN TABLE doesn't have a option of enabling Row Level Security). 
Is this is not implemented due to some limitations in the current design? 
Because from a quick view it looks like the security subquery can also be 
easily attached to the main query and passed for processing in foreign 
database. 

Thanks
Sounak

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan_rows confusion with parallel queries

2016-11-02 Thread Tom Lane

Tomas Vondra  writes:
> On 11/02/2016 11:56 PM, Tomas Vondra wrote:
>> On 11/02/2016 09:00 PM, Tom Lane wrote:
>>> Tomas Vondra  writes:
 while eye-balling some explain plans for parallel queries, I got a bit
 confused by the row count estimates. I wonder whether I'm alone.

>>> I got confused by that a minute ago, so no you're not alone.  The problem
>>> is even worse in join cases.  For example:
>>>  Gather  (cost=34332.00..53265.35 rows=100 width=8)
>>>Workers Planned: 2
>>>->  Hash Join  (cost=2.00..52255.35 rows=100 width=8)
>>>  Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2))
>>>  ->  Append  (cost=0.00..8614.96 rows=417996 width=8)
>>>->  Parallel Seq Scan on pp  (cost=0.00..8591.67 rows=416667 
>>> width=8)
>>>->  Parallel Seq Scan on pp1  (cost=0.00..23.29 rows=1329 
>>> width=8)
>>>  ->  Hash  (cost=14425.00..14425.00 rows=100 width=8)
>>>->  Seq Scan on cc  (cost=0.00..14425.00 rows=100 
>>> width=8)
>>> There are actually 100 rows in pp, and none in pp1.  I'm not bothered
>>> particularly by the nonzero estimate for pp1, because I know where that
>>> came from, but I'm not very happy that nowhere here does it look like
>>> it's estimating a million-plus rows going into the join.

> Although - it is estimating 1M rows, but only "per worker" estimates are 
> shown, and because there are 2 workers planned it says 1M/2.4 which is 
> the 416k. I agree it's a bit unclear, but at least it's consistent with 
> how we treat loops (i.e. that the numbers are per loop).

Well, it's not *that* consistent.  If we were estimating all the numbers
underneath the Gather as being per-worker numbers, that would make some
amount of sense.  But neither the other seqscan, nor the hash on it, nor
the hashjoin's output count are scaled that way.  It's very hard to call
the above display anything but flat-out broken.

> But there's more fun with joins - consider for example this simple join:
> ...
> Suddenly we don't show per-worker estimates for the hash join - both the 
> Hash Join and the Gather have exactly the same cardinality estimate.

Yeah.  That doesn't seem to be quite the same problem as in my example,
but it's about as confused.

Maybe we need to bite the bullet and add a "number of workers" field
to the estimated and actual counts.  Not sure how much that helps for
the partial-count-for-the-leader issue, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] who calls InitializeTimeouts() ?

2016-11-02 Thread Chapman Flack

Hi,

It looks like for about 3 years, PL/Java has been calling
InitializeTimeouts before calling RegisterTimeout. Looking over
the callers of InitializeTimeouts in core, though, it appears
that an extension like PL/Java should be able to assume it has
already been called, and in fact would be rude to call it again,
as it isn't idempotent and could conceivably clobber somebody
else's timeouts.

As PL/Java only uses it for a timeout on process exit anyway,
perhaps this is a mistake that has just never had much chance
to cause a noticeable problem.

Am I right that it's a mistake, though?

Thanks,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #14350: VIEW with INSTEAD OF INSERT TRIGGER and COPY. Missing feature or working as designed.

2016-11-02 Thread Haribabu Kommi

On Tue, Nov 1, 2016 at 8:05 PM, Dilip Kumar  wrote:

> On Tue, Nov 1, 2016 at 12:40 PM, Haribabu Kommi
>  wrote:
> > COPY command is treated as an UTILITY command, During the query
> > processing, the rules are not applied on the COPY command, and in the
> > execution of COPY command, it just inserts the data into the heap and
> > indexes with direct calls.
> >
> > I feel supporting the COPY command for the case-2 is little bit complex
> > and may not be that much worth.
>
> I agree it will be complex..
> >
> > Attached is the updated patch with doc changes.
>
> Now since we are adding support for INSTEAD OF TRIGGER in COPY FROM
> command, It will be good that we provide a HINT to user if INSTEAD of
> trigger does not exist, like we do in case of insert ?
>
> INSERT case:
> postgres=# insert into ttt_v values(7);
> 2016-11-01 14:31:39.845 IST [72343] ERROR:  cannot insert into view "ttt_v"
> 2016-11-01 14:31:39.845 IST [72343] DETAIL:  Views that do not select
> from a single table or view are not automatically updatable.
> 2016-11-01 14:31:39.845 IST [72343] HINT:  To enable inserting into
> the view, provide an INSTEAD OF INSERT trigger or an unconditional ON
> INSERT DO INSTEAD rule.
>
> COPY case:
> postgres=# COPY ttt_v FROM stdin;
> 2016-11-01 14:31:52.235 IST [72343] ERROR:  cannot copy to view "ttt_v"
> 2016-11-01 14:31:52.235 IST [72343] STATEMENT:  COPY ttt_v FROM stdin;


Thanks for your suggestion. Yes, I agree that adding a hint is good.
Updated patch is attached with addition of hint message.

2016-11-03 14:56:28.685 AEDT [7822] ERROR:  cannot copy to view "ttt_v"
2016-11-03 14:56:28.685 AEDT [7822] HINT:  To enable copy to view, provide
an INSTEAD OF INSERT trigger
2016-11-03 14:56:28.685 AEDT [7822] STATEMENT:  COPY ttt_v FROM stdin;


Regards,
Hari Babu
Fujitsu Australia


copy_to_view_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #14350: VIEW with INSTEAD OF INSERT TRIGGER and COPY. Missing feature or working as designed.

2016-11-02 Thread Dilip Kumar

On Thu, Nov 3, 2016 at 9:57 AM, Haribabu Kommi  wrote:
> Thanks for your suggestion. Yes, I agree that adding a hint is good.
> Updated patch is attached with addition of hint message.
>
> 2016-11-03 14:56:28.685 AEDT [7822] ERROR:  cannot copy to view "ttt_v"
> 2016-11-03 14:56:28.685 AEDT [7822] HINT:  To enable copy to view, provide
> an INSTEAD OF INSERT trigger
> 2016-11-03 14:56:28.685 AEDT [7822] STATEMENT:  COPY ttt_v FROM stdin;

Okay, Patch in general looks fine to me. One cosmetic comments, IMHO
in PG we follow operator at end of the line, so move '&&' to end of
the previous line.

+ if (cstate->rel->rd_rel->relkind != RELKIND_RELATION
+ && (!cstate->rel->trigdesc ||
+ !cstate->rel->trigdesc->trig_insert_instead_row))

Meanwhile I will test it and give the feedback.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-11-02 Thread Kuntal Ghosh

On Thu, Nov 3, 2016 at 2:35 AM, Michael Paquier
 wrote:
>> - Another suggestion was to remove wal_consistency from PostgresNode.pm
>> because small buildfarm machines may suffer on it. Although I've no
>> experience in this matter, I would like to be certain that nothings breaks
>> in recovery tests after some modifications.
>
> An extra idea that I have here would be to extend the TAP tests to
> accept an environment variable that would be used to specify extra
> options when starting Postgres instances. Buildfarm machines could use
> it.
>
It can be added as a separate feature.

>
> -/* If it's a full-page image, restore it. */
> -if (XLogRecHasBlockImage(record, block_id))
> +/* If full-page image should be restored, do it. */
> +if (XLogRecBlockImageApply(record, block_id))
> Hm. It seems to me that this modification is incorrect. If the page
> does not need to be applied, aka if it needs to be used for
> consistency checks, what should be done is more something like the
> following in XLogReadBufferForRedoExtended:
> if (Apply(record, block_id))
> return;
> if (HasImage)
> {
> //do what needs to be done with an image
> }
> else
> {
> //do default things
> }
>
XLogReadBufferForRedoExtended should return a redo action
(block restored, done, block needs redo or block not found). So, we
can't just return
from the function if BLKIMAGE_APPLY flag is not set. It still has to
check whether a
redo is required or not.

> XLogRecBlockImageApply() should only check for BKP_APPLY and not imply
> HasImage(). This will be more flexible when for example using it for
> assertions.
seems reasonable.

> It would be more consistent to have a boolean flag to treat
> BKPIMAGE_APPLY in xlogreader.c. Have a look at has_image for example.
For flags in bimg_info, we directly check if the mask bit is set in bimg_info
(ex: hole, compressed). Besides, we use this flag only at
XLogReadBufferForRedoExtended.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

47 matches

Mail list logo