Re: [HACKERS] dsm_unpin_segment

2016-08-20 Thread Amit Kapila
On Sat, Aug 20, 2016 at 6:13 PM, Robert Haas  wrote:
> On Sat, Aug 20, 2016 at 7:37 AM, Amit Kapila  wrote:
>> 2.
>> + if (dsm_control->item[seg->control_slot].pinned)
>> + elog(ERROR, "cannot pin a segment that is already pinned");
>>
>> Shouldn't this be a user facing error (which means we should use ereport)?
>
> Uh, certainly not.  This can't happen because of SQL the user enters;
> it can only happen because of a C coding error.  elog() is the
> appropriate tool for that case.
>

Okay, your point makes sense to me, but in that case why not use an
Assert here?  I think elog() could also be used here as well, but it
seems to me elog() is most appropriate for the cases where it is not
straightforward to tell the behaviour of API or value of variable like
when PageAddItem() is not successful.

void
 dsm_pin_segment(dsm_segment *seg)

+void
+dsm_unpin_segment(dsm_handle handle)

Another point, I want to highlight here is that pin/unpin API's have
different type of arguments.  The comments on top of function
dsm_unpin_segment() explains the need of using dsm_handle which seems
reasonable.  I just pointed out to see if anybody else has a different
view.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-08-20 Thread Amit Kapila
On Sat, Aug 20, 2016 at 9:58 PM, Claudio Freire  wrote:
> On Sat, Aug 20, 2016 at 4:27 AM, Amit Kapila  wrote:
>>
>> That makes sense, but this means there is a chance that the searches
>> could lead to different buffers in case of uniqueness checks (the
>> search with key-ctid could lead to a different buffer than the search
>> with just key).  I am not clear do we have requirement for doing this
>> uniqueness check for key-ctid search API, because as I understand you
>> want to do it mainly for vacuum and WARM tuples. Vacuum won't add new
>> tuples, so is this required for WARM tuples?
>
> Well, I'm not realy sure what exactly would need to be done when doing
> the WARM conditional insert in the case of unique indexes, and that's
> a strong motivation to not add the interface for inserts just yet.
> Vacuum will only need to delete, and in the case of deletes the
> operation would be quite straight forward.
>

Right. I think if you initially modify the interface only for deletes
that will reduce the complexity of patch as well.  We can later
enhance it to handle WARM tuple case.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication slots replicated to standbys?

2016-08-20 Thread Craig Ringer
On 21 Aug 2016 12:36 AM, "Bruce Momjian"  wrote:
>
> On Sat, Aug 20, 2016 at 01:43:42PM +0900, Michael Paquier wrote:
> > On Sat, Aug 20, 2016 at 1:39 PM, Bruce Momjian  wrote:
> > > Someone reported that a replication slot that existed at the time a
base
> > > backup was done on the master was copied to the standby.  Because they
> > > didn't realize it, their WAL was not being recycled on the standby.
> > >
> > > Is that possible?  Is it a known behavior?  I don't see it documented.
> >
> > >From backup.sgml:
> >
> > It is often a good idea to also omit from the backup the files
> > within the cluster's pg_replslot/ directory, so that
> > replication slots that exist on the master do not become part of the
> > backup.  Otherwise, the subsequent use of the backup to create a
standby
> > may result in indefinite retention of WAL files on the standby, and
> > possibly bloat on the master if hot standby feedback is enabled,
because
> > the clients that are using those replication slots will still be
connecting
> > to and updating the slots on the master, not the standby.  Even if
the
> > backup is only intended for use in creating a new master, copying
the
> > replication slots isn't expected to be particularly useful, since
the
> > contents of those slots will likely be badly out of date by the time
> > the new master comes on line.
> >
> >
> > Note as well that pg_basebackup omits its content and creates an empty
> > directory.
>
> Seems like another good idea to use pg_basebackup rather than manually
> doing base backups;  Magnus has been saying this for a while.

The main time that's an issue is when you're rsync'ing to save bandwidth,
using CoW volume snapshots, etc. pg_basebackup becomes totally impractical
on big systems.

> I supposed there is no way we could remove this error-prone behavior
> because replication slots must survive server restarts.  Is there no way
> to know if we are starting a standby from a fresh base backup vs.
> restarting a standby?  In that case we could clear the replication
> slots.  Are there any other error-prone things copied from the master?

We could remove slots when we enter archive recovery. But I've recently
implememted support for logical decoding from standbys, which needs slots.
Physical slot use on standby is also handy. We cannot tell whether a slot
was created on the replica or created on the master and copied in the base
backup and don't want to drop slots created on the replica.

I also have use cases for slots being retained in restore from snapshot,
for re-integrating restored nodes into an MM mesh.

I think a recovery.conf option to remove all slots during archive recovery
could be handy. But mostly it comes down to tools not copying them.


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-20 Thread Ryan Murphy
>
> I looked into this and soon found that fe_utils/string_utils.o has
> dependencies on libpq that are much wider than just pqexpbuffer :-(.
> It might be a project to think about sorting that out sometime, but it
> looks like it would be an awful lot of work just to avoid having initdb
> depend on libpq.so.  So I now think this objection is impractical.
> I'll just annotate the makefile entry to say that initdb itself doesn't
> use libpq.


Sounds good.


>
> > Another perhaps-only-cosmetic issue is that now initdb prints quotes
> > whether they are needed or not.
>
> I still think this is worth fixing, but it's a simple modification.
> Will take care of it.
>
> Great!  Seems like a good solution, I agree it looks better without quotes
if they're not needed.


> This item is listed in the commitfest as a bug fix, but given the lack of
> previous complaints, and the fact that the printed command isn't really
> meant to be blindly copied-and-pasted anyway (you're at least meant to
> replace "logfile" with something), I do not think it needs back-patching.
>
>
Agreed, not a "bug" in that sense.
Thanks Tom.


> regards, tom lane
>


Re: [HACKERS] SP-GiST support for inet datatypes

2016-08-20 Thread Tom Lane
Emre Hasegeli  writes:
> Attached patches add SP-GiST support to the inet datatypes.

I started to look at this patch.  The reported speedup is pretty nice,
but ...

> The operator
> class comes with a small change on the SP-GiST framework to allow fixed
> number of child nodes.

... this part of the patch breaks the existing API for SP-GiST opclasses.
That is a hard sell.  It may only matter for one existing opclass in core,
but unless we have reason to think nobody is using any custom SP-GiST
opclasses, that is not a pleasant thing to do.  How important is it really
for this opclass?  Several of the existing opclasses use fixed numbers of
child nodes, so why does this need something they don't?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-20 Thread Tom Lane
Jeff Janes  writes:
> On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane  wrote:
>> It does know it, what it doesn't know is how many duplicates there are.

> Does it know whether the count comes from a parsed query-string list/array,
> rather than being an estimate from something else?  If it came from a join,
> I can see why it would be dangerous to assume they are mostly distinct.
> But if someone throws 6000 things into a query string and only 200 distinct
> values among them, they have no one to blame but themselves when it makes
> bad choices off of that.

I am not exactly sold on this assumption that applications have
de-duplicated the contents of a VALUES or IN list.  They haven't been
asked to do that in the past, so why do you think they are doing it?

>> If we do what I think you're suggesting, which is assume the entries are
>> all distinct, I'm afraid we'll just move the estimation problems somewhere
>> else.

> Any guesses as to where?  (other than the case of someone doing something
> silly with their query strings?)

Well, overestimates are as bad as underestimates --- it might lead us away
from using a nestloop, for example.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2016-08-20 Thread Jeff Janes
On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > So even though it knows that 6952 values have been shoved in the bottom,
> it
> > thinks only 200 are going to come out of the aggregation.  This seems
> like
> > a really lousy estimate.  In more complex queries than the example one
> > given it leads to poor planning choices.
>
> > Is the size of the input list not available to the planner at the point
> > where it estimates the distinct size of the input list?  I'm assuming
> that
> > if it is available to EXPLAIN than it is available to the planner.  Does
> it
> > know how large the input list is, but just throw up its hands and use 200
> > as the distinct size anyway?
>
> It does know it, what it doesn't know is how many duplicates there are.
>

Does it know whether the count comes from a parsed query-string list/array,
rather than being an estimate from something else?  If it came from a join,
I can see why it would be dangerous to assume they are mostly distinct.
But if someone throws 6000 things into a query string and only 200 distinct
values among them, they have no one to blame but themselves when it makes
bad choices off of that.


Would it work to check vardata->rel->rtekind == RTE_VALUES
 in get_variable_numdistinct to detect that?



> If we do what I think you're suggesting, which is assume the entries are
> all distinct, I'm afraid we'll just move the estimation problems somewhere
> else.
>

Any guesses as to where?  (other than the case of someone doing something
silly with their query strings?)

Cheers,

Jeff


Re: [HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-08-20 Thread Tom Lane
Noah Misch  writes:
> On Fri, Aug 19, 2016 at 07:22:02PM -0400, Tom Lane wrote:
>> So maybe we ought to work towards taking those out?

> Maybe.  It's a policy question boiling down to our willingness to spend cycles
> zeroing memory in order to limit trust in the confidentiality of storage
> backing the data directory.  Think of "INSERT INTO t VALUES(my_encrypt('key',
> 'cleartext'))", subsequent to which bits of the key or cleartext leak to disk
> by way of WAL padding bytes.  A reasonable person might expect that not to
> happen; GNU Privacy Guard and a few like-minded programs prevent it.  I'm on
> the fence regarding whether PostgreSQL should target this level of vigilance.
> An RDBMS is mainly a tool for managing persistent data, and PostgreSQL will
> never be a superior tool for data that _must not_ persist.  Having said that,
> the runtime cost of zeroing memory and the development cost of reviewing the
> patches to do so is fairly low.

[ after thinking some more about this... ]

FWIW, I put pretty much zero credence in the proposition that junk left in
padding bytes in WAL or data files represents a meaningful security issue.
An attacker who has access to those files will probably find much more
that is of interest in the non-pad data.  My only interest here is in
making the code sanitizer-clean, which seems like it is useful for
debugging purposes independently of any security arguments.

So to me, it seems like the core of this complaint boils down to "my
sanitizer doesn't understand the valgrind exclusion patterns that have
been created for Postgres".  We can address that to some extent by trying
to reduce the number of valgrind exclusions we need, but it's unlikely to
be practical to get that to zero, and it's not very clear that adding
runtime cycles is a good tradeoff for it either.  So maybe we need to push
back on the assumption that people should expect their sanitizers to
produce zero warnings without having made some effort to adapt the
valgrind rules.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: PROPOSAL: make PostgreSQL sanitizers-friendly (and prevent information disclosure)

2016-08-20 Thread Noah Misch
On Fri, Aug 19, 2016 at 07:22:02PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-19 17:55:25 -0400, Tom Lane wrote:
> >> It'd be useful also to figure out why our existing valgrind testing has
> >> not caught this already.  The example you give looks like it surely
> >> ought to be replicated well enough in the standard regression tests.
> 
> > The valgrind suppression file explicitly hides a bunch of padding
> > issues.
> 
> So maybe we ought to work towards taking those out?

Maybe.  It's a policy question boiling down to our willingness to spend cycles
zeroing memory in order to limit trust in the confidentiality of storage
backing the data directory.  Think of "INSERT INTO t VALUES(my_encrypt('key',
'cleartext'))", subsequent to which bits of the key or cleartext leak to disk
by way of WAL padding bytes.  A reasonable person might expect that not to
happen; GNU Privacy Guard and a few like-minded programs prevent it.  I'm on
the fence regarding whether PostgreSQL should target this level of vigilance.
An RDBMS is mainly a tool for managing persistent data, and PostgreSQL will
never be a superior tool for data that _must not_ persist.  Having said that,
the runtime cost of zeroing memory and the development cost of reviewing the
patches to do so is fairly low.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we cacheline align PGXACT?

2016-08-20 Thread Tom Lane
Robert Haas  writes:
> Wow, nice results.  My intuition on why PGXACT helped in the first place was 
> that it minimized the number of cache lines that had to be touched to take a 
> snapshot. Padding obviously would somewhat increase that again, so I can't 
> quite understand why it seems to be helping... any idea?

That's an interesting point.  I wonder whether this whole thing will be
useless or even counterproductive after (if) Heikki's CSN-snapshot patch
gets in.  I would certainly not mind reverting the PGXACT/PGPROC
separation if it proves no longer helpful after that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we cacheline align PGXACT?

2016-08-20 Thread Robert Haas
On Aug 19, 2016, at 2:12 AM, Alexander Korotkov  
wrote:
> Hackers,
> 
> originally this idea was proposed by Andres Freund while experimenting with 
> lockfree Pin/UnpinBuffer [1].
> The patch is attached as well as results of pgbench -S on 72-cores machine.  
> As before it shows huge benefit in this case.
> For sure, we should validate that it doesn't cause performance regression in 
> other cases.  At least we should test read-write and smaller machines.
> Any other ideas?

Wow, nice results.  My intuition on why PGXACT helped in the first place was 
that it minimized the number of cache lines that had to be touched to take a 
snapshot. Padding obviously would somewhat increase that again, so I can't 
quite understand why it seems to be helping... any idea?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-20 Thread Tom Lane
I wrote:
> A bigger issue here is that it seems fundamentally wrong for initdb to be
> including libpq, because it surely is never meant to be communicating
> with a running postmaster.  Not sure what to do about that.  We could
> consider moving pqexpbuffer out of libpq into fe_utils, but I wonder
> whether that would break any third-party code.  We've never advertised
> pqexpbuffer.h as a supported API of libpq, but it's probably handy enough
> that people use it anyway.  I suppose we could duplicate it in fe_utils
> and libpq, though that's a tad ugly.  Thoughts?

I looked into this and soon found that fe_utils/string_utils.o has
dependencies on libpq that are much wider than just pqexpbuffer :-(.
It might be a project to think about sorting that out sometime, but it
looks like it would be an awful lot of work just to avoid having initdb
depend on libpq.so.  So I now think this objection is impractical.
I'll just annotate the makefile entry to say that initdb itself doesn't
use libpq.

> Another perhaps-only-cosmetic issue is that now initdb prints quotes
> whether they are needed or not.  I find this output pretty ugly:
> ...
> That's not really the fault of this patch perhaps.  Maybe we could adjust
> appendShellString so it doesn't add quotes if they are clearly
> unnecessary.

I still think this is worth fixing, but it's a simple modification.
Will take care of it.

This item is listed in the commitfest as a bug fix, but given the lack of
previous complaints, and the fact that the printed command isn't really
meant to be blindly copied-and-pasted anyway (you're at least meant to
replace "logfile" with something), I do not think it needs back-patching.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making pg_hba.conf case-insensitive

2016-08-20 Thread Bruce Momjian
On Thu, Aug 18, 2016 at 04:49:04PM -0400, Bruce Momjian wrote:
> On Thu, Aug 18, 2016 at 03:01:48PM -0400, Peter Eisentraut wrote:
> > On 8/18/16 1:59 PM, Bruce Momjian wrote:
> > > o  compares words in columns that can only support keywords as
> > > case-insensitive, double-quoted or not
> > > 
> > > o  compares words in columns that can contain user/db names or keywords
> > > as case-sensitive if double-quoted, case-insensitive if not
> > 
> > I can maybe see the case of the second one, but the first one doesn't
> > make sense to me.
> > 
> > We've in the past had discussions like this about whether command line
> > arguments of tools should be case insensitive like SQL, and we had
> > resolved that since the shell is not SQL, it shouldn't work like that.
> > pg_hba.conf is also not SQL, and neither, for that matter, is
> > pg_ident.conf and postgresql.conf.
> 
> OK, I am happy to remove the TODO item and see if we get new complaints.

TODO item removed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-20 Thread Tom Lane
Michael Paquier  writes:
> Regarding your patch, with a bit of clean up it gives the attached.

This fails to build for me, with

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -g -O1 initdb.o findtimezone.o localtime.o 
encnames.o  -L../../../src/port -L../../../src/common -Wl,--as-needed 
-Wl,-rpath,'/home/postgres/testversion/lib',--enable-new-dtags 
-L../../../src/fe_utils -lpgfeutils -lpq  -lpgcommon -lpgport -lz -lreadline 
-lrt -lcrypt -ldl -lm  -o initdb
/usr/bin/ld: cannot find -lpq
collect2: ld returned 1 exit status
make: *** [initdb] Error 1

evidently because the link command omits the necessary -L switch, because
you didn't use the approved macro for linking in libpq.  It should be
$(libpq_pgport) instead, I believe.  (Probably the reason it seems to work
for you is you have a version of libpq.so in /usr/lib; but then you are
really doing the link against the wrong version of libpq.)

A bigger issue here is that it seems fundamentally wrong for initdb to be
including libpq, because it surely is never meant to be communicating
with a running postmaster.  Not sure what to do about that.  We could
consider moving pqexpbuffer out of libpq into fe_utils, but I wonder
whether that would break any third-party code.  We've never advertised
pqexpbuffer.h as a supported API of libpq, but it's probably handy enough
that people use it anyway.  I suppose we could duplicate it in fe_utils
and libpq, though that's a tad ugly.  Thoughts?

Another perhaps-only-cosmetic issue is that now initdb prints quotes
whether they are needed or not.  I find this output pretty ugly:

Success. You can now start the database server using:

  'pg_ctl' -D '/home/postgres/testversion/data' -l logfile start

That's not really the fault of this patch perhaps.  Maybe we could adjust
appendShellString so it doesn't add quotes if they are clearly
unnecessary.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standalone backend PANICs during recovery

2016-08-20 Thread Tom Lane
I wrote:
> In short, I don't think control should have been here at all.  My proposal
> for a fix is to force EnableHotStandby to remain false in a standalone
> backend.

I tried to reproduce Bernd's problem by starting a standalone backend in
a data directory that was configured as a hot standby slave, and soon
found that there are much bigger issues than this.  The startup sequence
soon tries to wait for WAL to arrive, which in HEAD uses

WaitLatch(&XLogCtl->recoveryWakeupLatch,
  WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
  5000L);

which immediately elog(FATAL)s because a standalone backend has no parent
postmaster and so postmaster_alive_fds[] isn't set.  But if it didn't do
that, it'd wait forever because of course there is no active WAL receiver
process that would ever provide more WAL.

The only way that you'd ever get to a command prompt is if somebody made a
promotion trigger file, which would cause the startup code to promote the
cluster into master status, which does not really seem like something that
would be a good idea in Bernd's proposed use case of "investigating a
problem".

Alternatively, if we were to force standby_mode off in a standalone
backend, it would come to the command prompt right away but again it would
have effectively promoted the cluster to master.  That is certainly not
something we should ever do automatically.

So at this point I'm pretty baffled as to what the actual use-case is
here.  I am tempted to say that a standalone backend should refuse to
start at all if a recovery.conf file is present.  If we do want to
allow the case, we need some careful thought about what it should do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication slots replicated to standbys?

2016-08-20 Thread Bruce Momjian
On Sat, Aug 20, 2016 at 01:43:42PM +0900, Michael Paquier wrote:
> On Sat, Aug 20, 2016 at 1:39 PM, Bruce Momjian  wrote:
> > Someone reported that a replication slot that existed at the time a base
> > backup was done on the master was copied to the standby.  Because they
> > didn't realize it, their WAL was not being recycled on the standby.
> >
> > Is that possible?  Is it a known behavior?  I don't see it documented.
> 
> >From backup.sgml:
>
> It is often a good idea to also omit from the backup the files
> within the cluster's pg_replslot/ directory, so that
> replication slots that exist on the master do not become part of the
> backup.  Otherwise, the subsequent use of the backup to create a standby
> may result in indefinite retention of WAL files on the standby, and
> possibly bloat on the master if hot standby feedback is enabled, because
> the clients that are using those replication slots will still be 
> connecting
> to and updating the slots on the master, not the standby.  Even if the
> backup is only intended for use in creating a new master, copying the
> replication slots isn't expected to be particularly useful, since the
> contents of those slots will likely be badly out of date by the time
> the new master comes on line.
>
> 
> Note as well that pg_basebackup omits its content and creates an empty
> directory.

Seems like another good idea to use pg_basebackup rather than manually
doing base backups;  Magnus has been saying this for a while.

I supposed there is no way we could remove this error-prone behavior
because replication slots must survive server restarts.  Is there no way
to know if we are starting a standby from a fresh base backup vs.
restarting a standby?  In that case we could clear the replication
slots.  Are there any other error-prone things copied from the master?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-20 Thread Ryan Murphy
On Sat, Aug 20, 2016 at 8:26 AM, Michael Paquier 
wrote:

> On Sat, Aug 20, 2016 at 4:39 AM, Ryan Murphy 
> wrote:
> > Here is another version of my initdb shell quoting patch.  I have removed
> > the unnecessary {} block.  I also ran pgindent on the code prior to
> creating
> > the patch.
>
> Could you please *not* top-post? This breaks the logic of the thread,
> this is the third time that I mention it, and that's not the style of
> this mailing list.
>
>
Sure, sorry about that.  Am not used to this style of mailing list.  I had
been avoiding top posting since you first mentioned it but then with the
new patch I didn't know if that was still supposed to go at the bottom.


> Regarding your patch, with a bit of clean up it gives the attached.
> You should declare variables at the beginning of a code block or
> function. One call to appendPQExpBufferStr can as well be avoided, and
> you added too many newlines. I have switched that as ready for
> committer.
> --
> Michael
>

Thanks Michael, I've looked at your changes and everything looks good to me.

I did notice the commit message is gone etc., is that not part of the
patch?  Perhaps the committer prefers to write their own message, that's
fine too.


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-08-20 Thread Claudio Freire
On Sat, Aug 20, 2016 at 4:27 AM, Amit Kapila  wrote:
>> All uniqueness checks will find the same "nbuf" (read-locked), which
>> is the first one that matches the key.
>>
>> They could have different "buf" buffers (the write-locked) one. But
>> since read locks cannot be held while a write-lock is held, I believe
>> it doesn't matter. All uniqueness checks will try to read-lock nbuf
>> unless the uniqueness check for that insertion is done. When they do,
>> they will block until the other insertion is finished, and when that
>> happens, either the insert happened, or it returned an xid to wait for
>> and retry. If it succeeded, the check attempting the read-lock will
>> see the inserted tuple and return the xid of the earlier transaction
>> and wait on it. If it failed, no insert happened because there's a
>> visible tuple there, and the read-lock will succeed, find the visible
>> tuple, and the insert will fail too.
>>
>> In essence, it is still the case, I believe, that only one insert can
>> succeed at a time, all the others will retry. It only happens that
>> with the patch, the contention will be between a read lock and a write
>> lock, and not two write locks, and there's a little less contention
>> (some more work can be done in parallel).
>>
>>> With this new mechanism, do we have two type of search interfaces
>>> where one would work for keys (as now) and other for key-ctid or it
>>> will be just a single interface which works both ways?  I think either
>>> way there are pros and cons.
>>
>> The patch doesn't add another interface, but the idea is to add it.
>> The implementation will probably fall on a common function that can do
>> both
>>
>
> That makes sense, but this means there is a chance that the searches
> could lead to different buffers in case of uniqueness checks (the
> search with key-ctid could lead to a different buffer than the search
> with just key).  I am not clear do we have requirement for doing this
> uniqueness check for key-ctid search API, because as I understand you
> want to do it mainly for vacuum and WARM tuples. Vacuum won't add new
> tuples, so is this required for WARM tuples?

Well, I'm not realy sure what exactly would need to be done when doing
the WARM conditional insert in the case of unique indexes, and that's
a strong motivation to not add the interface for inserts just yet.
Vacuum will only need to delete, and in the case of deletes the
operation would be quite straight forward.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-20 Thread Tom Lane
Haribabu Kommi  writes:
> This is a new statistics view that is used to provide the number of
> SQL operations that are
> happened on a particular interval of time. This view is useful for the
> system to find out the
> pattern of the operations that are happening in the instance during
> particular interval of
> time.

> Following is the more or less columns and their details of the pg_stat_sql 
> view.

> postgres=# \d pg_stat_sql
>View "pg_catalog.pg_stat_sql"
>Column|   Type   | Modifiers
> -+--+---
>  selects | bigint   |
>  inserts | bigint   |
>  deletes | bigint   |
>  updates | bigint   |
>  declares| bigint   |
>  fetches | bigint   |
>  copies  | bigint   |
>  reindexes   | bigint   |
>  truncates   | bigint   |
>  stats_reset | timestamp with time zone |


1. This set of counters seems remarkably random.  Why, for instance,
count reindexes but not original index creations?

2. What will you do with cases such as SELECTs containing modifying CTEs?
Or EXPLAIN ANALYZE?  Does CLUSTER count as a REINDEX?  Does REFRESH
MATERIALIZED VIEW count as a SELECT?  (Or maybe it's an INSERT?)

If you're going to be selective about what you count, you're forever
going to be fielding complaints from users who are unhappy that you
didn't count some statement type they care about, or feel that you
misclassified some complex case.

I'm inclined to suggest you forget this approach and propose a single
counter for "SQL commands executed", which avoids all of the above
definitional problems.  People who need more detail than that are
probably best advised to look to contrib/pg_stat_statements, anyway.

Also, if you do that, there hardly seems a need for a whole new view.
You could just add a column to pg_stat_database.  The incremental
maintenance effort doesn't seem enough to justify its own GUC, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LSN as a recovery target

2016-08-20 Thread Michael Paquier
On Sat, Aug 20, 2016 at 10:44 AM, Petr Jelinek  wrote:
> On 20/08/16 02:13, Michael Paquier wrote:
>> On Fri, Aug 19, 2016 at 10:47 PM, Adrien Nayrat
>>  wrote:
>> Using a PG_TRY/CATCH block the way you do to show to user a different
>> error message while the original one is actually correct does not
>> sound like a good idea to me. It complicates the code and the original
>> pattern matches what is already done for timestamps, where in case of
>> error you'd get that:
>> FATAL:  invalid input syntax for type timestamp with time zone: "aa"
>>
>> I think that a better solution would be to make clearer in the docs
>> that pg_lsn is used here. First, in recovery.conf.sample, let's use
>> pg_lsn and not LSN. Then, in the sgml docs, let's add a reference to
>> the page of pg_lsn as well:
>> https://www.postgresql.org/docs/devel/static/datatype-pg-lsn.html
>>
>> Now we have that:
>> +   
>> +This parameter specifies the LSN of the write-ahead log stream up
>> to
>> +which recovery will proceed. The precise stopping point is also
>> +influenced by .
>> +   
>> Perhaps we could just add an additional sentence: "This parameter
>> value is parsed using the system datatype ". Or something
>> like that. Thoughts?
>>
>
> +1

So here is the patch resulting in that. After thinking more about that
I have arrived at the conclusion to not use LSN in recovery.conf, but
"WAL position (LSN)", and also mention in the docs that pg_lsn is used
to parse the parameter value. A couple of log messages are changed
similarly. It definitely makes it more understandable for users to
speak of WAL positions.

I also noticed that the top block of "Recovery Target Settings" needs
to mention recovery_target_lsn, aka when name, xid, lsn and time are
combined in the same recovery.conf, only the last value will be used.
-- 
Michael


recovery-target-lsn-3.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding slots can go backwards when used from SQL, docs are wrong

2016-08-20 Thread Petr Jelinek

On 20/08/16 16:01, Craig Ringer wrote:

On 5 June 2016 at 09:54, David G. Johnston mailto:david.g.johns...@gmail.com>> wrote:

On Thursday, March 17, 2016, Craig Ringer mailto:cr...@2ndquadrant.com>> wrote:

The first patch was incorrectly created on top of failover slots
not HEAD. Attached patch applies on HEAD.


Lots of logical decoding work ongoing but this one shows as active
in the September cf and the comments by Craig indicate potential
relevance to 9.6.  Is this still live as-is or has it been subsumed
by other threads?



Yes. Both those patches are still pending and should be considered for
commit in the next CF. (Of course, I think they should just be
committed, but I would, wouldn't I?)




I think the doc one should definitely go in and possibly be back-patched 
all the way to 9.4. I didn't look at the other one.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY bug ?

2016-08-20 Thread Dean Rasheed
On 20 August 2016 at 03:15, Andrea Adami  wrote:
> when i run the query: "select * from public.policy_view"
> the ouput is the same (all rows)  for all users
> i'm doing some mistakes or this is a bug ?
>

No, it looks correct to me. When going through a view, the policies
and permission checks that apply are those that would apply to the
view's owner, which in this case is postgres, so no policies are
applied.

Or, quoting from the notes in the CREATE POLICY documentation:

As with normal queries and views, permission checks and policies for
the tables which are referenced by a view will use the view owner's
rights and any policies which apply to the view owner.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding slots can go backwards when used from SQL, docs are wrong

2016-08-20 Thread Craig Ringer
On 5 June 2016 at 09:54, David G. Johnston 
wrote:

> On Thursday, March 17, 2016, Craig Ringer  wrote:
>
>> The first patch was incorrectly created on top of failover slots not
>> HEAD. Attached patch applies on HEAD.
>>
>
> Lots of logical decoding work ongoing but this one shows as active in the
> September cf and the comments by Craig indicate potential relevance to
> 9.6.  Is this still live as-is or has it been subsumed by other threads?
>


Yes. Both those patches are still pending and should be considered for
commit in the next CF. (Of course, I think they should just be committed,
but I would, wouldn't I?)


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] LSN as a recovery target

2016-08-20 Thread Julien Rouhaud
On 20/08/2016 15:41, Michael Paquier wrote:
> On Sat, Aug 20, 2016 at 10:44 AM, Petr Jelinek  wrote:
>> If we want to specifically name the recovery_target_lsn in the message, we
>> could probably do it using context.
> 
> So that would be basically assigning error_context_stack for each item
> parsed for recovery.conf? That seems a bit narrow as usually
> recovery.conf is not that long..
> 

That'd still be an improvement, since the error message doesn't even
mention that the error comes from recovery.conf.  I agree it can't come
from many places, but it may be useful for some people.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] bigint txids vs 'xid' type, new txid_recent(bigint) => xid

2016-08-20 Thread Craig Ringer
On 19 August 2016 at 21:10, Peter Eisentraut  wrote:

> On 8/18/16 9:20 PM, Craig Ringer wrote:
> > On 19 August 2016 at 02:35, Jim Nasby  > > wrote:
> > I think we need to either add real types for handling XID/epoch/TXID
> > or finally create uint types. It's *way* too easy to screw things up
> > the way they are today.
> >
> > Hm. Large scope increase there. Especially introducing unsigned types.
> > There's a reason that hasn't been done already - it's not just copying a
> > whole pile of code, it's also defining all the signed/unsigned
> > interactions and conversions carefully.
>
> https://github.com/petere/pguint ;-)
>
> > I'm not against adding a 'bigxid' or 'epoch_xid' or something,
> > internally a uint64. It wouldn't need all the opclasses, casts, function
> > overloads, etc that uint8 would.
>
> That sounds much better.


Yeah, but not something I expect to be able to do in the near future :S .
As it is, this is considerably off what I really need to be working on. And
we'll still want a way to get the short 32-bit xid with detection of epoch
wrap, which is what this patch adds.

I posted an updated version of this patch on
https://www.postgresql.org/message-id/camsr+yhqiwnei0dactbos40t+v5s_+dst3pyv_8v2wnvh+x...@mail.gmail.com
since it's stacked on top of txid_status(bigint) now. The patch there is
just a trivial rename of this one.

I don't expect to get to adding a 'bigxid' commit and converting
views/functions in this release cycle. Still too much else to do.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-20 Thread Craig Ringer
On 20 August 2016 at 21:24, Craig Ringer  wrote:

> Hi all
>
> Following on from
>
> bigint txids vs 'xid' type, new txid_recent(bigint) => xid
>
>
Ahem. Forgot to squash in a fixup commit. Updated patch of
txid_status(bigint) attachd.

A related patch follows, adding a new txid_current_ifassigned(bigint)
function as suggested by Jim Nasby. It's usefully connected to
txid_status() and might as well be added at the same time.

Since it builds on the same history I've also attached an updated version
of txid_recent(bigint) now called txid_convert_ifrecent(bigint), per the
above-linked thread.

Finally, and not intended for commit, is a useful test function I wrote to
cause extremely rapid xid wraparound, bundled up into a src/test/regress
test case. txid_incinerate() can jump the server about UINT32/2 xids in ~2
seconds if fsync is off, making it handy for testing.  Posting so others
can use it for their own test needs later and because it's useful for
testing these patches that touch on the xid epoch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From fc6174eada3e73f86934cd7e4f4b701c70324ec2 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 19 Aug 2016 14:44:15 +0800
Subject: [PATCH 1/4] Introduce txid_status(bigint) to get status of an xact

If an appliation is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.
---
 doc/src/sgml/func.sgml   | 28 +++-
 src/backend/access/transam/clog.c| 23 --
 src/backend/catalog/system_views.sql | 20 +
 src/backend/utils/adt/txid.c | 82 
 src/include/access/clog.h| 23 ++
 src/include/catalog/pg_proc.h|  2 +
 src/test/regress/expected/txid.out   | 50 ++
 src/test/regress/sql/txid.sql| 36 
 8 files changed, 240 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7830334..ae01ed3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16911,6 +16911,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
 txid_visible_in_snapshot

 
+   
+txid_status
+   
+

 The functions shown in 
 provide server transaction information in an exportable form.  The main
@@ -16929,7 +16933,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
   
txid_current()
bigint
-   get current transaction ID, assigning a new one if the current transaction does not have one
+   get current 64-bit transaction ID with epoch, assigning a new one if the current transaction does not have one
   
   
txid_current_snapshot()
@@ -16956,6 +16960,11 @@ SELECT collation for ('foo' COLLATE "de_DE");
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
   
+  
+   txid_status(bigint)
+   txid_status
+   report the status of the given xact - committed, aborted, in-progress, or null if the xid is too old
+  
  
 

@@ -17026,6 +17035,23 @@ SELECT collation for ('foo' COLLATE "de_DE");

 

+txid_status(bigint) reports the commit status of a recent
+transaction. Any recent transaction can be identified as one of
+
+ in-progress
+ committed
+ aborted
+
+Prepared transactions are identified as in-progress.
+The commit status of transactions older than the transaction ID wrap-around
+threshold is no longer known by the system, so txid_status
+returns NULL for such transactions. Applications may use
+txid_status to determine whether a transaction committed
+or aborted when the application and/or database server crashed or lost
+connection while a COMMIT command was in-progress.
+   
+
+   
 The functions shown in 
 provide information about transactions that have been already committed.
 These functions mainly provide information about when the transactions
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2634476..1a6e26d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -41,29 +4

Re: [HACKERS] LSN as a recovery target

2016-08-20 Thread Michael Paquier
On Sat, Aug 20, 2016 at 10:44 AM, Petr Jelinek  wrote:
> If we want to specifically name the recovery_target_lsn in the message, we
> could probably do it using context.

So that would be basically assigning error_context_stack for each item
parsed for recovery.conf? That seems a bit narrow as usually
recovery.conf is not that long..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: initdb: "'" for QUOTE_PATH (non-windows)

2016-08-20 Thread Michael Paquier
On Sat, Aug 20, 2016 at 4:39 AM, Ryan Murphy  wrote:
> Here is another version of my initdb shell quoting patch.  I have removed
> the unnecessary {} block.  I also ran pgindent on the code prior to creating
> the patch.

Could you please *not* top-post? This breaks the logic of the thread,
this is the third time that I mention it, and that's not the style of
this mailing list.

Regarding your patch, with a bit of clean up it gives the attached.
You should declare variables at the beginning of a code block or
function. One call to appendPQExpBufferStr can as well be avoided, and
you added too many newlines. I have switched that as ready for
committer.
-- 
Michael


initdb-path-v4.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-08-20 Thread Craig Ringer
Hi all

Following on from

bigint txids vs 'xid' type, new txid_recent(bigint) => xid

https://www.postgresql.org/message-id/camsr+yfdzmn_iz7krroe+j0kvlqvfvgvzxbcvxr-mljgtoz...@mail.gmail.com


here's a patch that implements a txid_status(bigint) function to report the
commit status of a function.

If an application is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.

A future protocol enhancement to report txid assignment would be very
useful, but quite separate to this.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From a298bb242716fbf1526bf9f949a94740a9d975fc Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 19 Aug 2016 14:44:15 +0800
Subject: [PATCH 1/5] Introduce txid_status(bigint) to get status of an xact

If an appliation is disconnected while a COMMIT request is in flight,
the backend crashes mid-commit, etc, then an application may not be
sure whether or not a commit completed successfully or was rolled
back. While two-phase commit solves this it does so at a considerable
overhead, so introduce a lighter alternative.

txid_status(bigint) lets an application determine the status of a
a commit based on an xid-with-epoch as returned by txid_current()
or similar. Status may be committed, aborted, in-progress (including
prepared xacts) or null if the xact is too old for its commit status
to still be retained because it has passed the wrap-around epoch
boundary.

Applications must call txid_current() in their transactions to make
much use of this since PostgreSQL does not automatically report an xid
to the client when one is assigned.
---
 doc/src/sgml/func.sgml   | 11 -
 src/backend/access/transam/clog.c| 23 --
 src/backend/catalog/system_views.sql | 20 +
 src/backend/utils/adt/txid.c | 82 
 src/include/access/clog.h| 23 ++
 src/include/catalog/pg_proc.h|  2 +
 src/test/regress/expected/txid.out   | 50 ++
 src/test/regress/sql/txid.sql| 36 
 8 files changed, 223 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7830334..c2a0aeb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16911,6 +16911,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
 txid_visible_in_snapshot

 
+   
+txid_status
+   
+

 The functions shown in 
 provide server transaction information in an exportable form.  The main
@@ -16929,7 +16933,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
   
txid_current()
bigint
-   get current transaction ID, assigning a new one if the current transaction does not have one
+   get current 64-bit transaction ID with epoch, assigning a new one if the current transaction does not have one
   
   
txid_current_snapshot()
@@ -16956,6 +16960,11 @@ SELECT collation for ('foo' COLLATE "de_DE");
boolean
is transaction ID visible in snapshot? (do not use with subtransaction ids)
   
+  
+   txid_status(bigint)
+   txid_status
+   report the status of the given xact - committed, aborted, in-progress, or null if the xid is too old
+  
  
 

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 2634476..1a6e26d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -41,29 +41,6 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 
-/*
- * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
- * everywhere else in Postgres.
- *
- * Note: because TransactionIds are 32 bits and wrap around at 0x,
- * CLOG page numbering also wraps around at 0x/CLOG_XACTS_PER_PAGE,
- * and CLOG segment numbering at
- * 0x/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
- * explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
- */
-
-/* We need two bits per xact, so four xacts fit in a byte */
-#define CLOG_BITS_PER_XACT	2
-#define CLOG_XACTS_PER_BYTE 4

Re: [HACKERS] Logical decoding restart problems

2016-08-20 Thread Craig Ringer
On 20 August 2016 at 14:56, konstantin knizhnik 
wrote:

> Thank you for answers.
>
> No, you don't need to recreate them. Just advance your replication
> identifier downstream and request a replay position in the future. Let the
> existing slot skip over unwanted data and resume where you want to start
> replay.
>
> You can advance the replication origins on the peers as you replay
> forwarded xacts from your master.
>
> Have a look at how the BDR code does this during "catchup mode" replay.
>
> So while your problem discussed below seems concerning, you don't have to
> drop and recreate slots like are currently doing.
>
>
> The only reason for recreation of slot is that I want to move it to the
> current "horizont" and skip all pending transaction without explicit
> specification of the restart position.
>

Why not just specify the restart position as the upstream server's xlog
insert position?

Anyway, you _should_ specify the restart position. Otherwise, if there's
concurrent write activity, you might have a gap between when you stop
replaying from your forwarding slot on the recovery node and start
replaying from the other nodes.

Again, really, go read the BDR catchup mode code. Really.


> If I do not drop the slot and just restart replication specifying position
> 0/0 (invalid LSN), then replication will be continued from the current slot
> position in WAL, will not it?
>

The "current slot position" isn't in WAL. It's stored in the replication
slot in pg_replslot/ . But yes, if you pass 0/0 it'll use the stored
confirmed_flush_lsn from the replication slot.


> So there  is no way to specify something "start replication from the end
> of WAL", like lseek(0, SEEK_END).
>

Correct, but you can fetch the server's xlog insert position separately and
pass it.

I guess I can see it being a little bit useful to be able to say "start
decoding at the first commit after this command". Send a patch, see if
Andres agrees.

I still think your whole approach is wrong and you need to use replication
origins or similar to co-ordinate a consistent switchover.



> Slot is created by peer node using standard libpq connection with
> database=replication connection string.
>
>
So walsender interface  then.

>
>
>> The problem is that for some reasons consistent point is not so
>> consistent and we get partly decoded transactions.
>> I.e. transaction body consists of two UPDATE but reorder_buffer extracts
>> only the one (last) update and sent this truncated transaction to
>> destination causing consistency violation at replica.  I started
>> investigation of logical decoding code and found several things which I do
>> not understand.
>>
>
> Yeah, that sounds concerning and shouldn't happen.
>
>
> I looked at replication code more precisely and understand that my first
> concerns were wrong.
> Confirming flush position should not prevent replaying transactions with
> smaller LSNs.
>

Strictly, confirming the flush position does not prevent transactions *with
changes* at lower LSNs. It does prevent replay of transactions that
*commit* with lower LSNs.


> But unfortunately the problem is really present. May be it is caused by
> race conditions (although most logical decoder data is local to backend).
> This is why I will try to create reproducing scenario without multimaster.
>


> Yeh, but unfortunately it happens. Need to understand why...
>

Yes. I think we need a simple standalone test case. I've never yet seen a
partially decoded transaction like this.

> It's all already there. See logical decoding's use of xl_running_xacts.
>
> But how this information is persisted?
>

restart_lsn points to a xl_running_xacts record in WAL. Which is of course
persistent. The restart_lsn is persistent in the replication slot, as is
catalog_xmin and confirmed_flush_lsn.


> What will happen if wal_sender is restarted?
>

That's why the restart_lsn exists. Decoding restarts from the restart_lsn
when you START_REPLICATION on the new walsender. It continues without
sending data to the client until it decodes the first commit >
confirmed_flush_lsn or some greater-than-that LSN that you requested by
passing it to the START_REPLICATION command.

The snapshot builder is also involved; see snapbuild.c and the comments
there.

I'll wait for a test case or some more detail.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] dsm_unpin_segment

2016-08-20 Thread Robert Haas
On Sat, Aug 20, 2016 at 7:37 AM, Amit Kapila  wrote:
> 2.
> + if (dsm_control->item[seg->control_slot].pinned)
> + elog(ERROR, "cannot pin a segment that is already pinned");
>
> Shouldn't this be a user facing error (which means we should use ereport)?

Uh, certainly not.  This can't happen because of SQL the user enters;
it can only happen because of a C coding error.  elog() is the
appropriate tool for that case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving planner's checks for parallel-unsafety

2016-08-20 Thread Robert Haas
On Thu, Aug 18, 2016 at 5:07 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I have reviewed this and it looks good to me.  My only comment is that
>> this comment is slightly confusing:
>
>> !  * Returns the first of PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, or
>> !  * PROPARALLEL_SAFE that can be found in the given parsetree.  We use this
>
>> "First" might be read to mean "the first one we happen to run across"
>> rather than "the earliest in list ordering".
>
> Thanks for the review.  I'll reconsider how to phrase that --- have you
> any suggestions?

I think what you committed is fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Any need of GRANT/REVOKE CREATE TABLE | POLICY | ETC

2016-08-20 Thread Haribabu Kommi
On Tue, Aug 2, 2016 at 3:25 AM, Robert Haas  wrote:
> On Tue, Jul 19, 2016 at 2:59 AM, Haribabu Kommi
>  wrote:
>> During the discussion of supporting multi tenancy with the help of
>> row level security, because of some problems of executing any
>> policy that was created by an unprivileged user [1].
>>
>> To avoid that problem, If we have some kind of new mechanism to
>> GRANT/REVOKE only CREATE POLICY from all the unprivileged
>> users by providing other CREATE access to them.
>>
>> I just want to know is there any other such requirements that if such
>> option is available, it would be good or not? I don't know whether
>> this option is possible or not? If any such requirements are present
>> other than POLICY, i would like to analyze and propose a patch for
>> the same.
>
> Well, there are lots of things that users can do that might cause
> security exposures for other users.  I think it's fair to say that
> PostgreSQL - like many other systems, really - is going to work best
> when either all of the users trust each other or when they are
> well-isolated from each other (i.e. in separate databases).  If one
> user can get another to execute code, then that second user can usurp
> the privileges of the first user.  CREATE POLICY provides one way of
> accomplishing that, but it's not necessarily qualitatively different
> from any other mechanisms that we've had for a long time: CREATE
> TRIGGER, CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, and
> CREATE VIEW are all plausible ways of inducing another user to run
> your evil, malicious code.

Thanks for your valuable opinion and details.

> So I don't think that the right solution is to restrict CREATE POLICY
> in particular.  It might be useful to have some kind of general system
> for limiting the ability of certain users to run certain commands, but
> I'm not sure how much demand there really is for such a thing.  I
> think it's already possible using ProcessUtilityHook, if you're
> willing to write a small amount of C code.  Do our users want more?
> Very possibly.  Do I know exactly what they want?  No, I don't.

Here I attached a POC patch based on ProcessUtilityHook to restrict the
CREATE TRIGGER, CREATE FUNCTION, CREATE AGGREGATE,
CREATE OPERATOR, CREATE VIEW and CREATE POLICY commands
from normal user.

If any one is interested in this approach to restrict some of the operations
from some specific users, this patch can be further enhanced.


Regards,
Hari Babu
Fujitsu Australia


restrict_create_commands_poc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-08-20 Thread Haribabu Kommi
This is a new statistics view that is used to provide the number of
SQL operations that are
happened on a particular interval of time. This view is useful for the
system to find out the
pattern of the operations that are happening in the instance during
particular interval of
time.

Following is the more or less columns and their details of the pg_stat_sql view.

postgres=# \d pg_stat_sql
   View "pg_catalog.pg_stat_sql"
   Column|   Type   | Modifiers
-+--+---
 selects | bigint   |
 inserts | bigint   |
 deletes | bigint   |
 updates | bigint   |
 declares| bigint   |
 fetches | bigint   |
 copies  | bigint   |
 reindexes   | bigint   |
 truncates   | bigint   |
 stats_reset | timestamp with time zone |

The SQL counters gets updated only for the external SQL queries, not for
the SQL queries that are generated internally. The counters gets updated
at exec_simple_query and exec_execute_message functions.

User can reset the SQL counter statistics using an exposed function.
The Stats collection can be turned on/off using a GUC also.

Any comments/objections in providing a patch for the same?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm_unpin_segment

2016-08-20 Thread Amit Kapila
On Tue, Aug 9, 2016 at 10:07 AM, Thomas Munro
 wrote:
> On Tue, Aug 9, 2016 at 12:53 PM, Tom Lane  wrote:
>> The larger picture here is that Robert is exhibiting a touching but
>> unfounded faith that extensions using this feature will contain zero bugs.
>> IMO there needs to be some positive defense against mistakes in using the
>> pin/unpin API.  As things stand, multiple pin requests don't have any
>> fatal consequences (especially not on non-Windows), so I have little
>> confidence that it's not happening in the field.  I have even less
>> confidence that there wouldn't be too many unpin requests.
>
> Ok, here is a version that defends against invalid sequences of
> pin/unpin calls.  I had to move dsm_impl_pin_segment into the block
> protected by DynamicSharedMemoryControlLock, so that it could come
> after the already-pinned check, but before updating any state, since
> it makes a Windows syscall that can fail.  That said, I've only tested
> on Unix and will need to ask someone to test on Windows.
>

Few review comments:

1.
+ /* Note that 1 means no references (0 means unused slot). */
+ if (--dsm_control->item[i].refcnt == 1)
+ destroy = true;
+
+ /*
+ * Allow implementation-specific code to run.  We have to do this before
+ * releasing the lock, because impl_private_pm_handle may get modified by
+ * dsm_impl_unpin_segment.
+ */
+ if (control_slot >= 0)
+ dsm_impl_unpin_segment(handle,
+ &dsm_control->item[control_slot].impl_private_pm_handle);

If there is an error in dsm_impl_unpin_segment(), then we don't need
to decrement the reference count.  Isn't it better to do it after the
dsm_impl_unpin_segment() is successful.  Similarly, I think pinned
should be set to false after dsm_impl_unpin_segment().

Do you need a check if (control_slot >= 0)?  In the code just above
you have as Assert to ensure that it is >=0.

2.
+ if (dsm_control->item[seg->control_slot].pinned)
+ elog(ERROR, "cannot pin a segment that is already pinned");

Shouldn't this be a user facing error (which means we should use ereport)?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-08-20 Thread Amit Kapila
On Sat, Aug 20, 2016 at 10:23 AM, Claudio Freire  wrote:
> On Sat, Aug 20, 2016 at 1:05 AM, Amit Kapila  wrote:
>> On Thu, Aug 18, 2016 at 8:24 AM, Claudio Freire  
>> wrote:
>>>
>>> A couple of points make me uneasy about this patch, yet I can think of
>>> no better alternative, so I seek feedback:
>>>
>>>  - introducing a different format for inner index tuples makes for an
>>> invasive patch and quite difficult-to-reason code (it's easy to forget
>>> whether a page is leaf or inner and that now causes assertion failures
>>> or segfaults)
>>>  - the ascent-descent to check for uniqueness when there are large
>>> dead tuple runs could have locking subtleties that escape me. Perhaps
>>> there's better ways to solve this.
>>
>> I have tried to study this part of your patch and it seems somewhat
>> non-trivial and risky part of proposal.
>>
>> + } else {
>> + /*
>> + * We found the actual first item on another block, so we have to perform
>> + * a two-step search - first half, until our write-locked buffer, then 
>> another
>> + * starting from our write-locked buffer.
>> + */
>> + LockBuffer(buf, BUFFER_LOCK_UNLOCK);
>> + LockBuffer(buf, BT_WRITE);
>> +
>> + buf = _bt_moveright(rel, buf, natts, itup_scankey, &(itup->t_tid), false,
>> + true, stack, BT_WRITE, NULL);
>> +
>> + first_offset = _bt_binsrch(rel, buf, natts, itup_scankey, NULL, false, 
>> NULL);
>> +
>> + xwait = _bt_check_unique(rel, itup, heapRel, nbuf, buf,
>> first_offset, itup_scankey,
>> + checkUnique, &is_unique, &speculativeToken);
>> +
>> + _bt_relbuf(rel, nbuf);
>> + }
>>
>> The idea for uniqueness check is that we hold the write lock on
>> buffer/page on which we are trying to operate (we do take read locks
>> on the consecutive pages during this check).  Here, in your changed
>> proposal, you have two buffers (one to which the search led you and
>> one buffer previous in the chain) and before checking uniqueness, on
>> one of the buffers you seem to have write lock and on other you have
>> read lock.  This seems to change the way uniqueness check works till
>> now, can you explain how this works (can we safely assume that all
>> searches for uniqueness check will lead to the same buffer first).
>
> All uniqueness checks will find the same "nbuf" (read-locked), which
> is the first one that matches the key.
>
> They could have different "buf" buffers (the write-locked) one. But
> since read locks cannot be held while a write-lock is held, I believe
> it doesn't matter. All uniqueness checks will try to read-lock nbuf
> unless the uniqueness check for that insertion is done. When they do,
> they will block until the other insertion is finished, and when that
> happens, either the insert happened, or it returned an xid to wait for
> and retry. If it succeeded, the check attempting the read-lock will
> see the inserted tuple and return the xid of the earlier transaction
> and wait on it. If it failed, no insert happened because there's a
> visible tuple there, and the read-lock will succeed, find the visible
> tuple, and the insert will fail too.
>
> In essence, it is still the case, I believe, that only one insert can
> succeed at a time, all the others will retry. It only happens that
> with the patch, the contention will be between a read lock and a write
> lock, and not two write locks, and there's a little less contention
> (some more work can be done in parallel).
>
>> With this new mechanism, do we have two type of search interfaces
>> where one would work for keys (as now) and other for key-ctid or it
>> will be just a single interface which works both ways?  I think either
>> way there are pros and cons.
>
> The patch doesn't add another interface, but the idea is to add it.
> The implementation will probably fall on a common function that can do
> both
>

That makes sense, but this means there is a chance that the searches
could lead to different buffers in case of uniqueness checks (the
search with key-ctid could lead to a different buffer than the search
with just key).  I am not clear do we have requirement for doing this
uniqueness check for key-ctid search API, because as I understand you
want to do it mainly for vacuum and WARM tuples. Vacuum won't add new
tuples, so is this required for WARM tuples?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding restart problems

2016-08-20 Thread konstantin knizhnik
Thank you for answers.

> No, you don't need to recreate them. Just advance your replication identifier 
> downstream and request a replay position in the future. Let the existing slot 
> skip over unwanted data and resume where you want to start replay.
> 
> You can advance the replication origins on the peers as you replay forwarded 
> xacts from your master.
> 
> Have a look at how the BDR code does this during "catchup mode" replay.
>  
> So while your problem discussed below seems concerning, you don't have to 
> drop and recreate slots like are currently doing. 

The only reason for recreation of slot is that I want to move it to the current 
"horizont" and skip all pending transaction without explicit specification of 
the restart position.
If I do not drop the slot and just restart replication specifying position 0/0 
(invalid LSN), then replication will be continued from the current slot 
position in WAL, will not it?
So there  is no way to specify something "start replication from the end of 
WAL", like lseek(0, SEEK_END).
Right now I trying to overcome this limitation by explicit calculation of the 
position from which we should continue replication.
But unfortunately the problem  with partly decoded transactions persist.
I will try at next week to create example reproducing the problem without any 
multimaster stuff, just using standard logical decoding plugin.


> 
> To restart logical decoding we first drop existed slot, then create new one 
> and then start logical replication from the WAL position 0/0 (invalid LSN).
> In this case recovery should be started from the last consistent point.
> 
> How do you create the new slot? SQL interface? walsender interface? Direct C 
> calls?

Slot is created by peer node using standard libpq connection with 
database=replication connection string.

>  
> The problem is that for some reasons consistent point is not so consistent 
> and we get partly decoded transactions.
> I.e. transaction body consists of two UPDATE but reorder_buffer extracts only 
> the one (last) update and sent this truncated transaction to destination 
> causing consistency violation at replica.  I started investigation of logical 
> decoding code and found several things which I do not understand.
> 
> Yeah, that sounds concerning and shouldn't happen.

I looked at replication code more precisely and understand that my first 
concerns were wrong.
Confirming flush position should not prevent replaying transactions with 
smaller LSNs.
But unfortunately the problem is really present. May be it is caused by race 
conditions (although most logical decoder data is local to backend).
This is why I will try to create reproducing scenario without multimaster.

> 
> Assume that we have transactions T1={start_lsn=100, end_lsn=400} and 
> T2={start_lsn=200, end_lsn=300}.
> Transaction T2 is sent to the replica and replica confirms that flush_lsn=300.
> If now we want to restart logical decoding, we can not start with position 
> less than 300, because CreateDecodingContext doesn't allow it:
> 
> 
> Right. You've already confirmed receipt of T2, so you can't receive it again.
>  
> So it means that we have no chance to restore T1?
> 
> Wrong. You can, because the slot's restart_lsn still be will be some LSN <= 
> 100. The slot keeps track of inprogress transactions (using xl_running_xacts 
> records) and knows it can't discard WAL past lsn 100 because xact T1 is still 
> in-progress, so it must be able to decode from the start of it.
> 
> When you create a decoding context decoding starts at restart_lsn not at 
> confirmed_flush_lsn. confirmed_flush_lsn is the limit at which commits start 
> resulting in decoded data being sent to you. So in your case, T1 commits at 
> lsn=400, which is >300, so you'll receive the whole xact for T1.

Yeh, but unfortunately it happens. Need to understand why...

>  
> It's all already there. See logical decoding's use of xl_running_xacts.


But how this information is persisted?
What will happen if wal_sender is restarted?

> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services