date:20141008


On 10/09/2014 01:23 AM, Gavin Flower wrote:

On 09/10/14 10:13, Andres Freund wrote:

If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
breaking compat anyway...

Arguably we could also say that given that there's been little evident
problems with the borked computation we could also switch to a much
faster hash instead of continuing to use crc...


Could a 64 bit variant of some kind be useful as an option - in addition
to a correct 32 bit?


More bits allows you to detect more errors. That's the only advantage. 
I've never heard that being a problem, so no, I don't think that's a 
good idea.



As most people have 64 bit processors and storage is less constrained
now-a-days, as well as we tend to store much larger chunks of data.


That's irrelevant to the CRC in the WAL. Each WAL record is CRC'd 
separately, and they tend to be very small (less than 8k typically) 
regardless of how "large chunks of data" you store in the database.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback


On 10/09/2014 07:47 AM, furu...@pm.nttdata.co.jp wrote:

If we remove --fsync-interval, resoponse time to user will not be delay.
Although, fsync will be executed multiple times in a short period.
And there is no way to solve the problem without --fsync-interval, what

should we do about it?

I'm sorry, I didn't understand that.


Here is an example.
When WAL is sent at 100ms intervals, fsync() is executed 10 times per second.
If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of 
making WAL record) for 1 second, though, fsync() won't be executed several 
times in 1 second.
I think --fsync-interval meets the demands of people who wants to restrain 
fsync() happens for several time in short period, but what do you think?
Is it ok to delete --fsync-interval ?


I still don't see the problem.

In synchronous mode, pg_receivexlog should have similar logic as 
walreceiver does. It should read as much WAL from the socket as it can 
without blocking, and fsync() and send reply after that. And also fsync 
whenever switching to new segment. If the master sends WAL every 100ms, 
then pg_recevexlog will indeed fsync() 10 times per second. There's 
nothing wrong with that.


In asynchronous mode, only fsync whenever switching to new segment.


Yeah. Or rather, add a new message type, to indicate the
synchronous/asynchronous status.


What kind of 'message type' we have to add ?

Do we need to separate 'w' into two types ? synchronous and asynchronous ?

OR

Add a new message type, kind of 'notify synchronous',
and inform pg_receivexlog of synchronous status when it connect to the server.


Better to add a new "notify" message type. And pg_recevexlog should be 
prepared to receive it at any time. The status might change on the fly, 
if the server's configuration is reloaded.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Wed, Oct 8, 2014 at 10:49 PM, Simon Riggs  wrote:
>> Do you really expect me to do major work on some aspect of the syntax
>> like that, given, as you say, that nobody explicitly agreed with me
>> (and only you disagreed with me)? The only remark I heard on that was
>> from you (you'd prefer to use NEW.* and OLD.*).
>> But you spent much
>> more time talking about getting something MERGE-like, which
>> NEW.*/OLD.* clearly isn't.
>
> Yes, it is. Look at the AS clause.

You can alias each of the two tables being joined. But I only have one
table, and no join. When you referred to NEW.* and OLD.*, you clearly
were making a comparison with trigger WHEN clauses, and not MERGE
(which is a comparison I made myself, although for more technical
reasons). It hardly matters, though.

>> CONFLICTING() is very close (identical?) to MySQL's use of "ON
>> DUPLICATE KEY UPDATE val = VALUES(val)". I'm happy to discuss that,
>> but it's news to me that people take particular issue with it.
>
> 3 people have asked you questions or commented about the use of
> CONFLICTING() while I've been watching.

Lots of people have asked me lots of questions. Again, as I said, I
wasn't aware that CONFLICTING() was a particular point of contention.
Please be more specific.

> It's clearly a non-standard
> mechanism and not inline with other Postgres usage.

What would be "inline with other Postgres usage"? I don't think you've
been clear on what you think is a better alternative.

I felt a function-like expression was appropriate because the user
refers to different tuples of the target table. It isn't like a join.
Plus it's similar to the MySQL thing, but doesn't misuse VALUES() as a
function-like thing.

> If there is disagreement, publishing the summary of changes you plan
> to make in your next version will help highlight that.

I think I've done a pretty good job of collecting and collating the
opinions of others, fwiw.
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier 
wrote:

> The additional process at promotion sounds like a good idea, I'll try to
> get a patch done tomorrow. This would result as well in removing the
> XLogArchiveForceDone stuff. Either way, not that I have been able to
> reproduce the problem manually, things can be clearly solved.
>
Please find attached two patches aimed to fix this issue and to improve the
situation:
- 0001 prevents the apparition of those phantom WAL segment file by
ensuring that when a node is in recovery it will remove it whatever its
status in archive_status. This patch is the real fix, and should be applied
down to 9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the segment
files present in pg_xlog to have their status to .done, marking them for
removal. When looking at the code, I finally concluded that Fujii-san's
point, about marking in all cases as .done segment files that have been
fully streamed, actually makes more sense to not be backward. This patch
would actually not be mandatory for back-patching, but it makes the process
more robust IMO.

I imagine that it would be nice to get those things fixed before the next
minor release.
Regards,
-- 
Michael
From 18c2d47b1d5dd3c0439f990ee4da6b305d477ca4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Oct 2014 15:04:26 +0900
Subject: [PATCH 1/2] Fix apparition of archive status files of .ready on
 streaming standbys

Commit 1bd42cd has removed a check based on the recovery status of a node
when removing old WAL segment files in pg_xlog, causing the apparition of
.ready files that prevented the removal of some WAL segment files that
remained stuck in the archive folder. Note that this does not prevent
the abscence of some .done files as it may still be possible that some
segments cannot be marked correctly as complete after their stream is done
in the case for example of an abrupt disconnection between a standby and
its root node, but it ensures that when a node is in recovery old WAL
segment files are removed whatever their status in the folder
archive_status.

Per report from Jehan-Guillaume de Rorthais.
---
 src/backend/access/transam/xlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5a4dbb9..39701a3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3771,7 +3771,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 			strspn(xlde->d_name, "0123456789ABCDEF") == 24 &&
 			strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
 		{
-			if (XLogArchiveCheckDone(xlde->d_name))
+			if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
 			{
 snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);
 
-- 
2.1.2

From 493a9d05e9386403fc1e6d78df19dd8a59c53cae Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Oct 2014 15:09:44 +0900
Subject: [PATCH 2/2] Enforce all WAL segment files to be marked as .done at
 node promotion

This is a safety mechanism to ensure that there are no files that are not
considered as .done as some segments may have been missed particularly in
the case of partially written files or files being written when a disconnection
occurred between a streaming standby and its root node. This makes the node
reaching promotion having a state consistent with what is expected using
the assumption that all the WAL segment files that are done being streamed
should be always considered as archived by the node.
---
 src/backend/access/transam/xlog.c| 10 
 src/backend/access/transam/xlogarchive.c | 39 +++-
 src/include/access/xlog_internal.h   |  1 +
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 39701a3..af5f548 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6862,6 +6862,16 @@ StartupXLOG(void)
 	}
 
 	/*
+	 * Create a .done entry for each WAL file present in pg_xlog that has
+	 * not been yet marked as such. In some cases where for example a streaming
+	 * replica has had a connection to a remote node cut abruptly, such WAL
+	 * files may have been only partially written or even not flagged correctly
+	 * with .done.
+	 */
+	if (InRecovery)
+		XLogArchiveForceDoneAll();
+
+	/*
 	 * Kill WAL receiver, if it's still running, before we continue to write
 	 * the startup checkpoint record. It will trump over the checkpoint and
 	 * subsequent records if it's still alive when we start writing WAL.
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 047efa2..931106f 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -554,6 +554,40 @@ XLogArchiveNotifySeg(XLogSegNo segno)
 }
 
 /*
+ * XLogArchiveForceDoneAll
+ *
+ * Wrapper of XLogArchiveForceDone scanning all

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On 8 October 2014 23:24, Peter Geoghegan  wrote:

>> Also, it would be useful to hear that your're going to do something
>> about the references to rows using conflicting(), since nobody has
>> agreed with you there. Or hopefully even that you've listened and
>> implemented something differently already. (We need that, whatever we
>> do with other elements of syntax).
>
> Do you really expect me to do major work on some aspect of the syntax
> like that, given, as you say, that nobody explicitly agreed with me
> (and only you disagreed with me)? The only remark I heard on that was
> from you (you'd prefer to use NEW.* and OLD.*).
> But you spent much
> more time talking about getting something MERGE-like, which
> NEW.*/OLD.* clearly isn't.

Yes, it is. Look at the AS clause.

> CONFLICTING() is very close (identical?) to MySQL's use of "ON
> DUPLICATE KEY UPDATE val = VALUES(val)". I'm happy to discuss that,
> but it's news to me that people take particular issue with it.

3 people have asked you questions or commented about the use of
CONFLICTING() while I've been watching. It's clearly a non-standard
mechanism and not inline with other Postgres usage. Nobody actually
says "I object to this" - do they need to use that phrase before you
take note?

I'm beginning to feel that giving you review comments is being seen as
some kind of negative action. Needing to repeat myself makes it clear
that you aren't taking note.

Yes, I expect you to do these things
* collect other people's input, even if you personally disagree
* if there is disagreement amongst reviewers, seek to resolve that in
a fair and reasonable manner
* publish a summary of changes requested
* do major work to address them

So yes, I really expect that.

It doesn't matter that it is "only SImon" or "only Kevin". **One**
comment is enough for you to take note.

If there is disagreement, publishing the summary of changes you plan
to make in your next version will help highlight that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-08 Thread Kyotaro HORIGUCHI

Hello, simplly inhibit set retry flag when ProcDiePending in
my_sock_write seems enough.

But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
I modified the patch 4 as the attached patch.

Finally, the attached patch works as expected with Andres's patch
1-3.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 6fc6903..2288fe2 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size)
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
-		if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
+		if (!ProcDiePending &&
+			(errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN))
 		{
 			BIO_set_retry_write(h);
 		}
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 7b5b30f..ab9e122 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -199,6 +199,7 @@ secure_write(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;
 
+retry:
 #ifdef USE_SSL
 	if (port->ssl_in_use)
 	{
@@ -210,7 +211,26 @@ secure_write(Port *port, void *ptr, size_t len)
 		n = secure_raw_write(port, ptr, len);
 	}
 
-	/* XXX: We likely will want to process some interrupts here */
+	/*
+	 * We only want to process the interrupt here if socket writes are
+	 * blocking to increase the chance to get an error message to the
+	 * client. If we're not blocked there'll soon be a
+	 * CHECK_FOR_INTERRUPTS(). But if we're blocked we'll never get out of
+	 * that situation if the client has died.
+	 */
+	if (ProcDiePending && !port->noblock && n < 0)
+	{
+		/*
+		 * We're dying. It's safe (and sane) to handle that now.
+		 */
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* retry after processing interrupts */
+	if (n < 0 && errno == EINTR)
+	{
+		goto retry;
+	}
 
 	return n;
 }
@@ -236,10 +256,19 @@ wloop:
 		 * don't do anything while (possibly) inside a ssl library.
 		 */
 		w = WaitLatchOrSocket(&MyProc->procLatch,
-			  WL_SOCKET_WRITEABLE,
+			  WL_LATCH_SET | WL_SOCKET_WRITEABLE,
 			  port->sock, 0);
 
-		if (w & WL_SOCKET_WRITEABLE)
+		if (w & WL_LATCH_SET)
+		{
+			ResetLatch(&MyProc->procLatch);
+			/*
+			 * Force a return, so interrupts can be processed when not
+			 * (possibly) underneath a ssl library.
+			 */
+			errno = EINTR;
+		}
+		else if (w & WL_SOCKET_WRITEABLE)
 		{
 			goto wloop;
 		}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3a6aa1c..61390aa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -520,6 +520,13 @@ ProcessClientReadInterrupt(void)
 
 		errno = save_errno;
 	}
+	else if (ProcDiePending)
+	{
+		/*
+		 * We're dying. It's safe (and sane) to handle that now.
+		 */
+		CHECK_FOR_INTERRUPTS();
+	}
 }
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-08 Thread furuyao

>  What set of options would you pass if you want to use it as a
>  synchronous standby? And if you don't? Could we just have a single
> >> "--synchronous"
>  flag, instead of -F and --reply-fsync?
> >>>
> >>> If you set "synchronous_commit" as "remote_write", the options would
> >> be different .
> >>> The set of options in each case, see the following.
> >>>
> >>>
> >>>Synchronous standby(synchronous_commit=on)
> >>>--fsync-interval=-1
> >>>--reply-fsync
> >>>--slot=slotname
> >>>
> >>>Synchronous standby(synchronous_commit=remote_write)
> >>>--fsync-interval=-1
> >>>--reply-fsync
> >>>
> >>>Asynchronous
> >>>There are no relative options.
> >>>
> >>>
> >>> Well, if the response time delay(value of
> >> "--status-interval=interval") is acceptable, "--reply-fsync" is
> >> unnecessary.
> >>> Instead of "--reply-fsync", using "--synchronous"(which summarizes
> >> the "--reply-fsync" and "fsync-interval = -1") might be easy to
> >> understand. Although, in that case,  "--fsync-interval=interval"
> >> would be fixed value. Isn't there any problem ?
> >>
> >> I think we should remove --fsync-interval and --reply-fsync, and just
> >> have a --synchronous option, which would imply the same behavior you
> >> get with --fsync-interval=-1 --reply--fsync.
> >>
> >> That leaves the question of whether pg_receivexlog should do fsyncs
> >> when it's not acting as a synchronous standby. There isn't any real
> >> need to do so. In asynchronous mode, there are no guarantees anyway,
> >> and even without an fsync, the OS will eventually flush the data to
> disk.
> >>
> >> But we could do even better than that. It would be best if you didn't
> >> need even the --synchronous flag. The server knows whether the client
> >> is a synchronous standby or not. It could tell the client.
> >
> > If we remove --fsync-interval, resoponse time to user will not be delay.
> > Although, fsync will be executed multiple times in a short period.
> > And there is no way to solve the problem without --fsync-interval, what
> should we do about it?
> 
> I'm sorry, I didn't understand that.
> 
> > In asynchronous mode, I think there is no problem since the
> specification is same with release version.
> >
> >> The server knows whether the client is a synchronous standby or not.
> >> It could tell the client.
> >
> > When notifying the synchronous/asynchronous mode to the client from
> > the server, do we need to change the format of the Message ?
> 
> Yeah. Or rather, add a new message type, to indicate the
> synchronous/asynchronous status.

Thanks for the reply.

> > If we remove --fsync-interval, resoponse time to user will not be delay.
> > Although, fsync will be executed multiple times in a short period.
> > And there is no way to solve the problem without --fsync-interval, what
> should we do about it?
> 
> I'm sorry, I didn't understand that.

Here is an example.
When WAL is sent at 100ms intervals, fsync() is executed 10 times per second.
If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of 
making WAL record) for 1 second, though, fsync() won't be executed several 
times in 1 second.
I think --fsync-interval meets the demands of people who wants to restrain 
fsync() happens for several time in short period, but what do you think?  
Is it ok to delete --fsync-interval ? 

> >> The server knows whether the client is a synchronous standby or not.
> >> It could tell the client.
> >
> > When notifying the synchronous/asynchronous mode to the client from
> > the server, do we need to change the format of the Message ?
> 
> Yeah. Or rather, add a new message type, to indicate the
> synchronous/asynchronous status.

What kind of 'message type' we have to add ?

Do we need to separate 'w' into two types ? synchronous and asynchronous ?

OR 

Add a new message type, kind of 'notify synchronous', 
and inform pg_receivexlog of synchronous status when it connect to the server.

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-08 Thread Amit Kapila

On Thu, Oct 9, 2014 at 4:02 AM, Andres Freund 
wrote:
>
> >  /*
> > + * Arrange to remove a dynamic shared memory mapping at cleanup time.
> > + *
> > + * dsm_keep_mapping() can be used to preserve a mapping for the entire
> > + * lifetime of a process; this function reverses that decision, making
> > + * the segment owned by the current resource owner.  This may be useful
> > + * just before performing some operation that will invalidate the
segment
> > + * for future use by this backend.
> > + */
> > +void
> > +dsm_unkeep_mapping(dsm_segment *seg)
> > +{
> > + Assert(seg->resowner == NULL);
> > + ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
> > + seg->resowner = CurrentResourceOwner;
> > + ResourceOwnerRememberDSM(seg->resowner, seg);
> > +}
>
> Hm, I dislike the name unkeep.

I also think function name is not appropriate as per functionality.

> I guess you want to be symmetric to
> dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup()
> dm_remember_mapping()?

Another could be dsm_change_mapping().  Yet another idea could
be that we use single function (dsm_manage_mapping() with an
additional parameter to indicate the scope of segment) instead of
two different functions dsm_keep_mapping() and
dsm_unkeep_mapping().


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Deferring some AtStart* allocations?

2014-10-08 Thread Amit Kapila

On Wed, Oct 8, 2014 at 11:22 PM, Robert Haas  wrote:
>
> On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane  wrote:
> > Meh.  Even "SELECT 1" is going to be doing *far* more pallocs than that
to
> > get through raw parsing, parse analysis, planning, and execution
startup.
> > If you can find a few hundred pallocs we can avoid in trivial queries,
> > it would get interesting; but I'll be astonished if saving 4 is
measurable.
>
> I got nerd-sniped by this problem today, probably after staring at the
> profiling data very similar to what led Andres to ask the question in
> the first place.  The gains are indeed not measurable on a
> macrobenchmark, but I had to write the patch to figure that out, so
> here it is.
>
> Although the gain isn't a measurable percentage of total runtime, it
> does appear to be a significant percentage of the palloc traffic on
> trivial queries. I did 30-second SELECT-only pgbench runs at scale
> factor 10, using prepared queries.  According to perf, on unpatched
> master, StartTransactionCommand accounts for 11.46% of the calls to
> AllocSetAlloc.  (I imagine this is by runtime, not by call count, but
> it probably doesn't matter much either way.)  But with this patch,
> StartTransactionCommand's share drops to 4.43%.  Most of that is
> coming from AtStart_Inval(), which wouldn't be hard to fix either.
>
> I'm inclined to think that tamping down palloc traffic is worthwhile
> even if we can't directly measure the savings on a macrobenchmark.
> AllocSetAlloc is often at or near the top of profiling results, but
> there's rarely a single caller responsible for enough of those calls
> for to make it worth optimizing. But that means that if we refuse to
> take patches that save just a few pallocs, we're basically giving up
> on ever improving anything in this area, and that doesn't seem like a
> good idea either; it's only going to get slowly worse over time as we
> add more features.

Yeah, this makes sense, even otherwise also if somebody comes
up with a much larger patch which reduce few dozen of pallocs and shows
noticeable gain, then it will be difficult to quantify the patch based on
which particular pallocs gives improvements, as reducing few pallocs
might not give any noticeable gain.

> BTW, if anyone's curious what the biggest source of AllocSetAlloc
> calls is on this workload, the answer is ExecInitIndexScan(), which
> looks to be indirectly responsible for almost half the total (or just
> over half, with the patch applied).  That apparently calls a lot of
> different things that allocate small amounts of memory, and it
> accounts for nearly all of the AllocSetAlloc traffic that originates
> from PortalStart().

One way to dig into this problem is that we look at each individual
pallocs in PortalStart() path and try to see which one's can be
optimized, another way is that we introduce a concept of
Prepared Portals some thing similar to Prepared Statements, but for
Portal. This will not only avoid the pallocs in executor startup phase,
but can improve the performance by avoiding many other calls related
to executor startup.  I understand that this will be a much bigger
project, however the gains can also be substantial for prepared
statements when all/most of the data fits in memory.  It seems other
databases also have similar concepts for execution (example Oracle
uses Cursor_sharing/Shared cursors to achieve the same)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-10-08 Thread Etsuro Fujita


(2014/10/08 22:51), Fujii Masao wrote:

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
 wrote:



On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
 wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
 wrote:



PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE?  How about setting PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?



So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.



Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes using
the reloption.



OK, I'd vote for your idea of having both the GUC and the reloption. So, I
think the patch needs to be updated.  Fujii-san, what plan do you have about
the patch?



Please see the attached patch. In this patch, I introduced the GUC parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the parameter.
But do you have any better idea about that default value?


It seems reasonable to me that the GUC has the same default value as 
work_mem.  So, +1 for the default value of 4MB.



BTW, I moved the CommitFest entry of this patch to next CF 2014-10.


OK, I'll review the patch in the CF.

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 7:04 PM, Marti Raudsepp  wrote:
> Oops, I missed "for each row" here.

Note that I have an open item for what to do about statement level
triggers here: https://wiki.postgresql.org/wiki/UPSERT

I'm not satisfied with the current behavior. It needs more thought.
But I think row level triggers are fine.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 9, 2014 at 4:56 AM, Marti Raudsepp  wrote:
> create trigger ev1 before insert on evt_type execute procedure ins();
> create trigger ev2 before update on evt_type execute procedure upd();
> create trigger ev3 after insert on evt_type execute procedure ins();
> create trigger ev4 after update on evt_type execute procedure upd();

Oops, I missed "for each row" here.
Sorry for wasting your time.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 9, 2014 at 4:25 AM, Peter Geoghegan  wrote:
> On Wed, Oct 8, 2014 at 6:12 PM, Marti Raudsepp  wrote:
>> Skipping
>> BEFORE UPDATE entirely seems to violate POLA.

> Good thing that the patch doesn't do that, then. I clearly documented
> this in a few places, including:
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html

I tried to understand the docs and found them somewhat confusing, so I
turned to testing the implementation you posted on 2014-10-07. It
shows no signs of running UPDATE triggers in the following case. So is
this just a bug or a missing feature?

create table evt_type (id serial primary key, name text unique,
evt_count int, update_count int default 0);
prepare upsert(text) as
INSERT into evt_type (name, evt_count) values ($1, 1)
on conflict within evt_type_name_key
UPDATE set evt_count=conflicting(evt_count)+1;
create or replace function ins() returns trigger language plpgsql as
$$begin raise notice 'trigger % %', TG_WHEN, TG_OP; return new; end$$;
create or replace function upd() returns trigger language plpgsql as
$$begin raise notice 'trigger % %', TG_WHEN, TG_OP;
new.update_count=new.update_count+1; return new; end$$;
create trigger ev1 before insert on evt_type execute procedure ins();
create trigger ev2 before update on evt_type execute procedure upd();
create trigger ev3 after insert on evt_type execute procedure ins();
create trigger ev4 after update on evt_type execute procedure upd();

db=# execute upsert('foo');
NOTICE:  trigger BEFORE INSERT
NOTICE:  trigger AFTER INSERT
INSERT 0 1
db=# execute upsert('foo');
NOTICE:  trigger BEFORE INSERT
NOTICE:  trigger AFTER INSERT
INSERT 0 0
marti=# table evt_type;
 id | name | evt_count | update_count
+--+---+--
  1 | foo  | 2 |0

>> MySQL gets away with lots of things, they have several other caveats
> No true Scotsman.

Eh? I'm just saying there may be good reasons not to imitate MySQL here.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 6:12 PM, Marti Raudsepp  wrote:
> Oh, one more consideration: I believe you will run into the same issue
> if you want to implement BEFORE UPDATE triggers in any form. Skipping
> BEFORE UPDATE entirely seems to violate POLA.

Good thing that the patch doesn't do that, then. I clearly documented
this in a few places, including:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html

> It's common for applications to e.g. use triggers to keep track of
> latest modified time for a row. With your proposal, every query needs
> to include logic for that to work.

Wrong.

>>> If you don't see any reasons why it can't be done, these benefits seem
>>> clear to me. I think the tradeoffs at least warrant wider discussion.
>>
>> I don't.  That's very surprising. One day, it will fail unexpectedly.
>> As proposed, the way BEFORE INSERT triggers fire almost forces users
>> to consider the issues up-front.
>
> Not necessarily "up-front", as proposed it causes existing triggers to
> change behavior when users adopt the upsert feature. And that adoption
> may even be transparent to the user due to ORM magic.
>
> There are potential surprises with both approaches.

When you make the slightest effort to understand what my approach is,
I might take your remarks seriously.

>> Note that the CONFLICTING() behavior with respect to BEFORE INSERT
>> triggers work's the same as MySQL's "INSERT ON DUPLICATE KEY UPDATE
>> foo = VALUES(foo)" thing. There was agreement that that was the right
>> behavior, it seemed.
>
> MySQL gets away with lots of things, they have several other caveats
> with triggers. I don't think it's a good example to follow wrt trigger
> behavior.

No true Scotsman.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 9, 2014 at 3:47 AM, Peter Geoghegan  wrote:
> On Wed, Oct 8, 2014 at 5:37 PM, Marti Raudsepp  wrote:
>> Only in case the trigger changes *key* columns necessary for atomicity
>> (i.e. from the WITHIN index). Other columns are fair game. The
>> restriction seems justifiable to me: it's unreasonable to be atomic
>> with respect to values that change mid-way.

Oh, one more consideration: I believe you will run into the same issue
if you want to implement BEFORE UPDATE triggers in any form. Skipping
BEFORE UPDATE entirely seems to violate POLA.

It's common for applications to e.g. use triggers to keep track of
latest modified time for a row. With your proposal, every query needs
to include logic for that to work.

>> If you don't see any reasons why it can't be done, these benefits seem
>> clear to me. I think the tradeoffs at least warrant wider discussion.
>
> I don't.  That's very surprising. One day, it will fail unexpectedly.
> As proposed, the way BEFORE INSERT triggers fire almost forces users
> to consider the issues up-front.

Not necessarily "up-front", as proposed it causes existing triggers to
change behavior when users adopt the upsert feature. And that adoption
may even be transparent to the user due to ORM magic.

There are potential surprises with both approaches. Personally I
prefer a hard error instead of silently skipping logic that previously
worked. And that seems to match general PostgreSQL philosophy too.

> Note that the CONFLICTING() behavior with respect to BEFORE INSERT
> triggers work's the same as MySQL's "INSERT ON DUPLICATE KEY UPDATE
> foo = VALUES(foo)" thing. There was agreement that that was the right
> behavior, it seemed.

MySQL gets away with lots of things, they have several other caveats
with triggers. I don't think it's a good example to follow wrt trigger
behavior.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 5:37 PM, Marti Raudsepp  wrote:
> Only in case the trigger changes *key* columns necessary for atomicity
> (i.e. from the WITHIN index). Other columns are fair game. The
> restriction seems justifiable to me: it's unreasonable to be atomic
> with respect to values that change mid-way.

> If you don't see any reasons why it can't be done, these benefits seem
> clear to me. I think the tradeoffs at least warrant wider discussion.

I don't.  That's very surprising. One day, it will fail unexpectedly.
As proposed, the way BEFORE INSERT triggers fire almost forces users
to consider the issues up-front.

Note that the CONFLICTING() behavior with respect to BEFORE INSERT
triggers work's the same as MySQL's "INSERT ON DUPLICATE KEY UPDATE
foo = VALUES(foo)" thing. There was agreement that that was the right
behavior, it seemed.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 9, 2014 at 2:46 AM, Peter Geoghegan  wrote:
>> Indeed, the current behavior breaks even the canonical "keep track of
>> how many posts are in a thread" trigger example
> use an AFTER trigger for this kind of thing, and all of these
> issues go away.

In the latest patches from CommitFest, the UPDATE case invokes
BEFORE&AFTER INSERT triggers, not AFTER UPDATE. Is that going to be
changed? If so, I concede that.

>> I proposed an alternative that avoids this surprise and might allow
>> some other benefits. Can you please look into that?
>
> I looked at that.

Thanks!

> You're talking about making the case where before
> row triggers clearly are appropriate (when you just want to change the
> tuple being inserted) fail

Only in case the trigger changes *key* columns necessary for atomicity
(i.e. from the WITHIN index). Other columns are fair game. The
restriction seems justifiable to me: it's unreasonable to be atomic
with respect to values that change mid-way.

* It can distinguish between BEFORE INSERT and BEFORE UPDATE triggers,
which your approach fundamentally cannot.
* It can probably avoid evaluating column defaults in the UPDATE case,
like nextval() for unrelated serial columns => reduced overhead
* The semantics match better with MERGE-like syntax that seems to come
up again and again.
* And it appears to solve (part?) of the problem you had with naming
key *colums* in the WITHIN clause, rather than using index name.

If you don't see any reasons why it can't be done, these benefits seem
clear to me. I think the tradeoffs at least warrant wider discussion.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-08 Thread Fabrízio de Royes Mello

On Wed, Oct 8, 2014 at 2:55 AM, David Fetter  wrote:
>
> On Wed, Oct 08, 2014 at 12:41:46AM -0300, Fabrízio de Royes Mello wrote:
> > On Wed, Oct 8, 2014 at 12:36 AM, Josh Berkus  wrote:
> > >
> > > On 10/07/2014 09:44 AM, Fabrízio de Royes Mello wrote:
> > > > We can think in a mechanism to create "properties / options" and
assign
> > it
> > > > to objects (table, index, column, schema, ...) like COMMENT does.
> > > >
> > > > A quickly thought:
> > > >
> > > > CREATE OPTION [ IF NOT EXISTS ] name
> > > > VALIDATOR valfunction
> > > > [ DEFAULT value ];
> > > >
> > > > ALTER TABLE name
> > > > SET OPTION optname { TO | = } { value | 'value' | DEFAULT };
> > > >
> > > > It's just a simple thought of course. We must think better about the
> > syntax
> > > > and purposes.
> > >
> > > If we're going to do this (and it seems like a good idea), we really,
> > > really, really need to include some general system views which expose
> > > the options in a user-friendly format (like columns, JSON or an
array).
> > >  It's already very hard for users to get information about what
> > > reloptions have been set.
> > >
> >
> > Maybe into "information_schema" ??
>
> Not there.  The information schema is defined pretty precisely in the
> SQL standard and contains only some of this kind of information.
>

You're absolutely right.. thanks...


> pg_catalog seems like a much more appropriate space for
> PostgreSQL-specific settings.
>

We can add views and some functions to get this info too.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund  wrote:
> So, what makes it work for me (among other unrelated stuff) seems to be
> the following in .gdbinit, defineing away some things that gdb doesn't
> handle:
> macro define __builtin_offsetof(T, F) ((int) &(((T *) 0)->F))
> macro define __extension__
> macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)
>
> Additionally I have "-ggdb -g3" in CFLAGS. That way gdb knows about
> postgres' macros. At least if you're in the right scope.
>
> As an example, the following works:
> (gdb) p dlist_is_empty(&BackendList) ? NULL : dlist_head_element(Backend, 
> elem, &BackendList)

Ah, cool.  I'll try that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 4:30 PM, Marti Raudsepp  wrote:
> On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan  wrote:
>> On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner  wrote:
>>> Although the last go-around does suggest that there is at least one
>>> point of difference on the semantics.  You seem to want to fire the
>>> BEFORE INSERT triggers before determining whether this will be an
>>> INSERT or an UPDATE.  That seems like a bad idea to me
>
> Indeed, the current behavior breaks even the canonical "keep track of
> how many posts are in a thread" trigger example because INSERT
> triggers are fired for an effective row UPDATE. I can't see how that's
> acceptable.

DB2 had the foresight to add the following restrictions to BEFORE ROW
triggers - they cannot:

"""
Contain any INSERT, DELETE, or UPDATE operations, nor invoke any
routine defined with MODIFIES SQL DATA, if it is not a compound SQL.
Contain any DELETE or UPDATE operations on the trigger subject table,
nor invoke any routine containing such operations, if it is a compound
SQL.
Reference a materialized query table defined with REFRESH IMMEDIATE
Reference a generated column other than the identity column in the NEW
transition variable.
"""

To get a sense of how doing fancy things in before triggers leads to
trouble, considering the hardening Kevin added in commit 6868ed74. In
short, use an AFTER trigger for this kind of thing, and all of these
issues go away.

>> Well, it isn't that I'm doing it because I think that it is a great
>> idea, with everything to recommend it. It's more like I don't see any
>> practical alternative.
>
> I proposed an alternative that avoids this surprise and might allow
> some other benefits. Can you please look into that?

I looked at that. You're talking about making the case where before
row triggers clearly are appropriate (when you just want to change the
tuple being inserted) fail, in order to make an arguably inappropriate
case for before row triggers work (when you don't change the tuple to
be inserted, but you do want to do some other thing). That seems
backwards. My proposed behavior seems far less surprising.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-08 Thread Ian Barwick

On 14/10/09 7:06, Stephen Frost wrote:
> * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
>> On Tue, Oct 7, 2014 at 1:24 PM, Simon Riggs  wrote:
>>> I hope we can get pgAudit in as a module for 9.5. I also hope that it
>>> will stimulate the requirements/funding of further work in this area,
>>> rather than squash it. My feeling is we have more examples of feature
>>> sets that grow over time (replication, view handling, hstore/JSONB
>>> etc) than we have examples of things languishing in need of attention
>>> (partitioning).
>>
>> +1
> 
> To this point, specifically, I'll volunteer to find time in Novemeber to
> review pgAudit for inclusion as a contrib module.  I feel a bit
> responsible for it not being properly reviewed earlier due to my upgrade
> and similar concerns.
> 
> Perhaps the latest version should be posted and added to the commitfest
> for 2014-10 and I'll put myself down as a reviewer..?  I don't see it
> there now.  I don't mean to be dismissive by suggesting it be added to
> the commitfest- I honestly don't see myself having time before November
> given the other things I'm involved in right now and pgConf.eu happening
> in a few weeks.

Thanks :) We're updating pgAudit for submission this for the upcoming 
commitfest,
it will be added within the next few days.

Regards


Ian Barwick


-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan  wrote:
> On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner  wrote:
>> Although the last go-around does suggest that there is at least one
>> point of difference on the semantics.  You seem to want to fire the
>> BEFORE INSERT triggers before determining whether this will be an
>> INSERT or an UPDATE.  That seems like a bad idea to me

Indeed, the current behavior breaks even the canonical "keep track of
how many posts are in a thread" trigger example because INSERT
triggers are fired for an effective row UPDATE. I can't see how that's
acceptable.

> Well, it isn't that I'm doing it because I think that it is a great
> idea, with everything to recommend it. It's more like I don't see any
> practical alternative.

I proposed an alternative that avoids this surprise and might allow
some other benefits. Can you please look into that? Even a "clarify
this", "this couldn't possibly work" or "you're not a major
contributor so your arguments are invalid" response. ;)

I admit I initially misunderstood some details about the current
behavior. I can dig into the code and write a clearer and/or more
detailed spec if I had even *some* feedback.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

On Thu, Oct 9, 2014 at 6:26 AM, Alvaro Herrera 
wrote:

> About patch 1, which I just pushed, there were two reviews:
>
> Andres Freund wrote:
> > On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:
> > > diff --git a/src/include/utils/ruleutils.h
> b/src/include/utils/ruleutils.h
> > > new file mode 100644
> > > index 000..520b066
> > > --- /dev/null
> > > +++ b/src/include/utils/ruleutils.h
>
> > I wondered for a minute whether any of these are likely to cause
> > problems for code just including builtins.h - but I think there will be
> > sufficiently few callers for them to make that not much of a concern.
>
> Great.
>
> Michael Paquier wrote:
>
> > Patch 1: I still like this patch as it gives a clear separation of the
> > built-in functions and the sub-functions of ruleutils.c that are
> > completely independent. Have you considered adding the external
> > declaration of quote_all_identifiers as well? It is true that this
> > impacts extensions (some of my stuff as well), but my point is to bite
> > the bullet and make the separation cleaner between builtins.h and
> > ruleutils.h. Attached is a patch that can be applied on top of patch 1
> > doing so... Feel free to discard for the potential breakage this would
> > create though.
>
> Yes, I did consider moving the quoting function definitions and the
> global out of builtins.h.  It is much more disruptive, however, because
> it affects a lot of external code not only our own; and given the looks
> I got after people had to mess with adding the htup_details.h header to
> external projects due to the refactoring I did to htup.h in an earlier
> release, I'm not sure about it.
>
> However, if I were to do it, I would instead create a quote.h file and
> would also add the quote_literal_cstr() prototype to it, perhaps even
> move the implementations from their current ruleutils.c location to
> quote.c.  (Why is half the stuff in ruleutils.c rather than quote.c is
> beyond me.)
>
Yes, an extra header file would be cleaner. Now if we are worried about
creating much incompatibilities with existing extension (IMO that's not
something core should worry much about), then it is better not to do it.
Now, if I can vote on it, I think it is worth doing in order to have
builtins.h only contain only Datum foo(PG_FUNCTION_ARGS), and move all the
quote-related things in quote.c.
-- 
Michael

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

On Wed, Oct 8, 2014 at 11:38 PM, Heikki Linnakangas  wrote:

> On 10/08/2014 04:59 PM, Fujii Masao wrote:
>
>> On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
>>  wrote:
>>
>>> Instead of creating any .done files during recovery, we could scan
>>> pg_xlog
>>> at promotion, and create a .done file for every WAL segment that's
>>> present
>>> at that point. That would be more robust. And then apply your patch, to
>>> recycle old segments during archive recovery, ignoring .done files.
>>>
>>
>> What happens if a user shutdowns the standby, removes recovery.conf and
>> starts the server as the master?
>>
>
> Um, that's not a safe thing to do anyway, is it?
>
That's not safe as it bypasses all the consistency checks of promotion.
Now, it is also something that repmgr for example does as far as I recall
to do a node "promotion". What if we simply document the problem properly
then? The apparition of those phantom WAL files is more scary than a user
or a utility that does a promotion with a server restart. Not to mention as
well that users as free to add themselves files to pg_xlog.
-- 
Michael

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner  wrote:
> Although the last go-around does suggest that there is at least one
> point of difference on the semantics.  You seem to want to fire the
> BEFORE INSERT triggers before determining whether this will be an
> INSERT or an UPDATE.  That seems like a bad idea to me, but if the
> consensus is that we want to do that, it does argue for your plan
> of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.

> As I understand it you are proposing that would be:
>
> INSERT INTO targettable(key, quantity, inserted_at)
>   VALUES(123, quantity, now())
>   ON CONFLICT WITHIN targettable_pkey
> UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = 
> now();

That's right, yes.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

On 2014-10-08 15:03:01 -0400, Robert Haas wrote:
> #1 - Just committed, per discussion last week.
> #2 - No comments, possibly because it's pretty boring.

fine with me.

> #3 - Most people seem happy with v2, modulo the above renaming request.

If you do it without the error handling stuff, I'm good with committing
it. I'm not yet convinced that verbatimly rethrowing errors from
background workers without any indications that the process changed is a
good idea.

> #4 - No comments.

This needs review imo.

> #5 - No comments; trivial.

fine.

> #6 - Updated per Amit's feedback.  Not sure if anyone else wants to
> review.  Still needs docs.

This definitely needs more view.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

On Thu, Sep 4, 2014 at 12:14 AM, Kevin Grittner  wrote:
>> Did I miss something?
>
> Apparently.  I did a search on the document and counted and got 101
> occurrences of "transition table".
> | A transient table is a named table that may come into existence
> | implicitly during the evaluation of a  or the
> | execution of a trigger.

D'oh, I was reading only the sections about triggers. You are correct.

Anyway, I tried out the latest from your GitHub branch and it seems
most of my concerns no longer apply to the current version, as
transition tables are now local to the trigger function. Thanks.

Not sure if you're looking for feedback on this level yet, but I tried
breaking it and found that transition tuplestores don't work with
cursors.

Probably not a valid use case once we have some other way to pass
tuplestores between functions. I don't know if it could work anyway,
as cursors may outlive the trigger call. But in that case, a better
error message is in order.

create table test1(i int);
create or replace function test1trg() returns trigger language plpgsql as $$
declare
curs cursor for select * from newtab;
r record;
begin
for r in curs loop
end loop;
return new;
end;$$;
create trigger test1trg after insert on test1 referencing new table as
newtab execute procedure test1trg();
insert into test1 values(1);
ERROR:  executor could not find named tuplestore "newtab"
CONTEXT:  PL/pgSQL function test1trg() line 6 at FOR over cursor

I still see a chance of introducing security problems with SECURITY
DEFINER trigger functions. An attacker can overshadow table names
queried by such a function and inject arbitrary data into it. Similar
to search_path vulnerabilities, but there are ways to avoid that.

Perhaps there should be a restriction when using REFERENCING and the
function is SECURITY DEFINER: require that the trigger definer (if not
superuser) matches the function owner?

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

>  /*
> + * Arrange to remove a dynamic shared memory mapping at cleanup time.
> + *
> + * dsm_keep_mapping() can be used to preserve a mapping for the entire
> + * lifetime of a process; this function reverses that decision, making
> + * the segment owned by the current resource owner.  This may be useful
> + * just before performing some operation that will invalidate the segment
> + * for future use by this backend.
> + */
> +void
> +dsm_unkeep_mapping(dsm_segment *seg)
> +{
> + Assert(seg->resowner == NULL);
> + ResourceOwnerEnlargeDSMs(CurrentResourceOwner);
> + seg->resowner = CurrentResourceOwner;
> + ResourceOwnerRememberDSM(seg->resowner, seg);
> +}

Hm, I dislike the name unkeep. I guess you want to be symmetric to
dsm_keep_mapping? dsm_manage_mapping(), dsm_ensure_mapping_cleanup()
dm_remember_mapping()?

> From c835a06f20792556d35a0eee4c2fa21f5f23e8a3 Mon Sep 17 00:00:00 2001
> From: Robert Haas 
> Date: Fri, 11 Jul 2014 09:53:40 -0400
> Subject: [PATCH 6/6] pg_background: Run commands in a background worker, and
>  get the results.
> 
> The currently-active GUC values from the user session will be copied
> to the background worker.  If the command returns a result set, you
> can retrieve the result set; if not, you can retrieve the command
> tags.  If the command fails with an error, the same error will be
> thrown in the launching process when the results are retrieved.
> Warnings and other messages generated by the background worker, and
> notifications received by it, are also propagated to the foreground
> process.

I got to ask: Why is it helpful that we have this in contrib? I have a
good share of blame to bear for that, but I think we need to stop
dilluting contrib evermore with test programs. These have a place, but I
don't think it should be contrib.

> +/* Table-of-contents constants for our dynamic shared memory segment. */
> +#define PG_BACKGROUND_MAGIC  0x50674267
> +#define PG_BACKGROUND_KEY_FIXED_DATA 0
> +#define PG_BACKGROUND_KEY_SQL1
> +#define PG_BACKGROUND_KEY_GUC2
> +#define PG_BACKGROUND_KEY_QUEUE  3
> +#define PG_BACKGROUND_NKEYS  4
> +
> +/* Fixed-size data passed via our dynamic shared memory segment. */
> +typedef struct pg_background_fixed_data
> +{
> + Oid database_id;
> + Oid authenticated_user_id;
> + Oid current_user_id;
> + int sec_context;
> + char database[NAMEDATALEN];
> + char authenticated_user[NAMEDATALEN];
> +} pg_background_fixed_data;

Why not NameData?

> +/* Private state maintained by the launching backend for IPC. */
> +typedef struct pg_background_worker_info
> +{
> + pid_t   pid;
> + dsm_segment *seg;
> + BackgroundWorkerHandle *handle; 
> + shm_mq_handle *responseq;   
> + boolconsumed;
> +} pg_background_worker_info;

whitespace damage.

> +static HTAB *worker_hash;

Hm. So the lifetime of this hash's contents is managed via
on_dsm_detach(), do I understand that correctly?

> +PG_FUNCTION_INFO_V1(pg_background_launch);
> +PG_FUNCTION_INFO_V1(pg_background_result);
> +PG_FUNCTION_INFO_V1(pg_background_detach);


> +void pg_background_worker_main(Datum);
> +
> +/*
> + * Start a dynamic background worker to run a user-specified SQL command.
> + */
> +Datum
> +pg_background_launch(PG_FUNCTION_ARGS)
> +{
> + text   *sql = PG_GETARG_TEXT_PP(0);
> + int32   queue_size = PG_GETARG_INT64(1);
> + int32   sql_len = VARSIZE_ANY_EXHDR(sql);
> + Sizeguc_len;
> + Sizesegsize;
> + dsm_segment *seg;
> + shm_toc_estimator e;
> + shm_toc*toc;
> +char*sqlp;

wrong indentation.

> + char   *gucstate;
> + shm_mq *mq;
> + BackgroundWorker worker;
> + BackgroundWorkerHandle *worker_handle;
> + pg_background_fixed_data *fdata;
> + pid_t   pid;
> + shm_mq_handle *responseq;
> + MemoryContext   oldcontext;
> +
> + /* Ensure a valid queue size. */
> + if (queue_size < 0 || ((uint64) queue_size) < shm_mq_minimum_size)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("queue size must be at least %zu bytes",
> + shm_mq_minimum_size)));

Hm. So every user can do this once the extension is created as the
functions are most likely to be PUBLIC. Is this a good idea?


> + /* Wait for the worker to start. */
> + switch (WaitForBackgroundWorkerStartup(worker_handle, &pid))
> + {
> + case BGWH_STARTED:
> + /* Success. */
> + break;
> + case BGWH_STOPPED:
> + pfree(worker_handle);
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_INSUFFICIENT_RES

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Wed, Oct 8, 2014 at 2:04 PM, Simon Riggs  wrote:
>> While I am also happy with taking a vote, if we do so I vote against
>> even this much less MERGE-like syntax. It's verbose, and makes much
>> less sense when the mechanism is driven by would-be duplicate key
>> violations rather than an outer join.
>
> It wouldn't be driven by an outer join, not sure where that comes from.

Right, I understood that it wouldn't be - which is the point. So with
an UPSERT that retains influence from MERGE, NOT MATCHED means "no
conflict", MATCHED means "conflict". That just seems like an odd way
to spell the concept given, as you say, that we're not talking about
an outer join.

> MERGE is verbose, I agree. I don't always like the SQL Standard, I
> just think we should follow it as much as possible. You can't change
> the fact that MERGE exists, so I don't see a reason to have two
> variants of syntax that do roughly the same thing.
>
> MERGE syntax would allow many things, such as this...
> WHEN NOT MATCHED AND x > 7 THEN
>   INSERT
> WHEN NOT MATCHED THEN
>   INSERT
> WHEN MATCHED AND y = 5 THEN
>   DO NOTHING
> WHEN MATCHED THEN
>  UPDATE
>
> etc

But then you can have before row insert triggers fire, which as you
acknowledge is more surprising with this syntax.

> I spoke to someone today that preferred a new command keyword, like
> UPSERT, because the semantics of triggers are weird. Having a before
> insert trigger fire when there is no insert is quite strange. Properly
> documenting that on hackers would help; has the comments made on the
> Django list been replayed here in some form?

Yes. It's also mentioned in the commit message of CONFLICTING() (patch
0003-*). And the documentation (both the proposed INSERT
documentation, and the trigger documentation). There is a large
comment on it in the code. So I've said it many times.

> I'm very scared by your comments about data modifying CTEs etc.. You
> have no definition of how they will work, not tests of that. That part
> isn't looking like a benefit as things currently stand.

Actually, I have a few smoke tests for that. But I don't see any need
for special handling. When you have a data-modifying CTE, it can
contain an INSERT, and there are no special restrictions on that
INSERT (other than that it may not itself have a CTE, but that's true
more generally). You can have data-modifying CTEs containing INSERTs,
and data-modifying CTEs containing UPDATEswhat I've done is have
data-modifying CTEs contain INSERTs that also happen to have an ON
CONFLICT UPDATE clause.

This new clause of INSERTs is in no more need of special documentation
regarding interactions with data-modifying CTEs than UPDATE  WHERE
CURRENT OF is. The only possible exception I can think of would be
cardinality violations where a vanilla INSERT in one part of a command
(one data-modifying CTE) gives problems to the "UPSERT part" of the
same command (because we give a special cardinality violation message
when we try to update the same tuple twice in the same command). But
that's a pretty imaginative complaint, and I doubt it would really
surprise someone.

Why would you be surprised by the fact that a new clause for INSERT
plays nicely with existing clauses? It's nothing special - there is no
special handling.

> I'm still waiting for some more docs to describe your intentions so
> they can be reviewed.

I think it would be useful to add several more isolation tests,
highlighting some of the cases you talked about. I'll work on that.
While the way forward for WITHIN isn't clear, I think a WITHIN PRIMARY
KEY variant would certainly be useful. Maybe it would be okay to
forget about naming a specific unique index, while supporting an
(optional) WITHIN PRIMARY KEY/NOT WITHIN PRIMARY KEY. It doesn't
totally solve the problems, but may be a good compromise that mostly
satisfies people that want to be able to clearly indicate user intent
(Kevin, in particular), and satisfies other people that don't want to
name a unique index (Heikki, in particular). Certainly, the Django
people would like that, since they said as much.

> Also, it would be useful to hear that your're going to do something
> about the references to rows using conflicting(), since nobody has
> agreed with you there. Or hopefully even that you've listened and
> implemented something differently already. (We need that, whatever we
> do with other elements of syntax).

Do you really expect me to do major work on some aspect of the syntax
like that, given, as you say, that nobody explicitly agreed with me
(and only you disagreed with me)? The only remark I heard on that was
from you (you'd prefer to use NEW.* and OLD.*). But you spent much
more time talking about getting something MERGE-like, which
NEW.*/OLD.* clearly isn't.

CONFLICTING() is very close (identical?) to MySQL's use of "ON
DUPLICATE KEY UPDATE val = VALUES(val)". I'm happy to discuss that,
but it's news to me that people take particular issue with it.

> Overall, I'm no

Re: [HACKERS] What exactly is our CRC algorithm?

2014-10-08 Thread Gavin Flower


On 09/10/14 10:13, Andres Freund wrote:

On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:

Hmm. So the simple, non-table driven, calculation gives the same result as
using the lookup table using the reflected lookup code. That's expected; the
lookup method is supposed to be the same, just faster. However, using the
"normal" lookup code, but with a "reflected" lookup table, produces the same
result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But
AFAICS, that's an incorrect combination. You're supposed to the
non-reflected lookup table with the non-reflected lookup code; you can't mix
and match.

As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
correspond to any bit-by-bit CRC variant and polynomial. My math skills are
not strong enough to determine what the consequences of that are. It might
still be a decent checksum. Or not. I couldn't tell if the good error
detection properties of the normal CRC-32 polynomial apply to our algorithm
or not.

Additional interesting datapoints are that hstore and ltree contain the
same tables - but properly use the reflected computation.


Thoughts?

It clearly seems like a bad idea to continue with this - I don't think
anybody here knows which guarantees this gives us.

The question is how can we move away from this. There's unfortunately
two places that embed PGC32 that are likely to prove problematic when
fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
their logic in a persistent way.  I think we should provide
INIT/COMP/FIN_PG32 using the current algorithm for these.

If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
breaking compat anyway...

Arguably we could also say that given that there's been little evident
problems with the borked computation we could also switch to a much
faster hash instead of continuing to use crc...

Greetings,

Andres Freund

Could a 64 bit variant of some kind be useful as an option - in addition 
to a correct 32 bit?


As most people have 64 bit processors and storage is less constrained 
now-a-days, as well as we tend to store much larger chunks of data.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

On Wed, Oct 8, 2014 at 2:50 PM, Kevin Grittner  wrote:
> What I said was in response to your assertion that your technique
> would *never* generate a duplicate key error.

That is strictly true: the INSERT cannot raise a unique violation
error, because to do so would cause it to take the UPDATE path
(assuming there is no WITHIN clause). The UPDATE may get one, though.
It doesn't have to get one for your statement to have effects that are
surprising, though.

> As others have again
> been pointing out, when there is more than one unique index you can
> have rows to apply which cannot be applied accurately without
> causing such an error.

It's simpler than that. You want to make sure that the right condition
- the right unique index having a would-be duplicate violation - is
the one taken *in practice*, the condition on which the UPDATE path is
taken. What happens after that is not that interesting (e.g. whether
or not there is an UPDATE-path duplicate violation), because either
way it's broken.

> What I requested was that the behavior in
> those cases be clear and documented.  I didn't take a position on
> whether you pick an index or ignore the input row (with a
> warning?), but we need to decide how it is handled.  I have made
> the same point Heikki is making, though -- we have no business
> referencing an index name in the statement.

I think that's fair enough. That's all fine - but actually doing that
is quite tricky. Look at what I've said in relation to doing that.
Consider the corner-cases I name. They're certainly only a small
minority of cases in practice, but as an implementer I still need to
worry about them (maybe even just as much). Yes, I would like to just
name columns, but it's hard to make that fully generally.

If you want me to do what you say, fine. But in order to do that, I
need support for having the corner cases make parse analysis throw up
its hands and error. Is that a price you're willing to pay? If so,
then I'll implement it. Or, alternatively, we could do WITHIN PRIMARY
key/NOT WITHIN PRIMARY KEY.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-10-08 Thread Stephen Frost

* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote:
> On Tue, Oct 7, 2014 at 1:24 PM, Simon Riggs  wrote:
> > I hope we can get pgAudit in as a module for 9.5. I also hope that it
> > will stimulate the requirements/funding of further work in this area,
> > rather than squash it. My feeling is we have more examples of feature
> > sets that grow over time (replication, view handling, hstore/JSONB
> > etc) than we have examples of things languishing in need of attention
> > (partitioning).
> 
> +1

To this point, specifically, I'll volunteer to find time in Novemeber to
review pgAudit for inclusion as a contrib module.  I feel a bit
responsible for it not being properly reviewed earlier due to my upgrade
and similar concerns.

Perhaps the latest version should be posted and added to the commitfest
for 2014-10 and I'll put myself down as a reviewer..?  I don't see it
there now.  I don't mean to be dismissive by suggesting it be added to
the commitfest- I honestly don't see myself having time before November
given the other things I'm involved in right now and pgConf.eu happening
in a few weeks.

Of course, if someone else has time to review, please do.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Kevin Grittner

Peter Geoghegan  wrote:
On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas  
wrote:

>> Instead of naming the index, you should name the columns, and
>> the system can look up the index or indexes that match those
>> columns.

+1

That is what I have been consistently requesting instead of index
names, every time I notice it mentioned.

> It's not totally clear that we need *any* WITHIN clause, BTW.
> I'm not dead set on it. It was something I mainly added at
> Kevin's request. I do see the risks, though.

What I said was in response to your assertion that your technique
would *never* generate a duplicate key error.  As others have again
been pointing out, when there is more than one unique index you can
have rows to apply which cannot be applied accurately without
causing such an error.  What I requested was that the behavior in
those cases be clear and documented.  I didn't take a position on
whether you pick an index or ignore the input row (with a
warning?), but we need to decide how it is handled.  I have made
the same point Heikki is making, though -- we have no business
referencing an index name in the statement.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

2014-10-08 Thread Alvaro Herrera

About patch 1, which I just pushed, there were two reviews:

Andres Freund wrote:
> On 2014-09-25 18:59:31 -0300, Alvaro Herrera wrote:
> > diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h
> > new file mode 100644
> > index 000..520b066
> > --- /dev/null
> > +++ b/src/include/utils/ruleutils.h

> I wondered for a minute whether any of these are likely to cause
> problems for code just including builtins.h - but I think there will be
> sufficiently few callers for them to make that not much of a concern.

Great.

Michael Paquier wrote:

> Patch 1: I still like this patch as it gives a clear separation of the
> built-in functions and the sub-functions of ruleutils.c that are
> completely independent. Have you considered adding the external
> declaration of quote_all_identifiers as well? It is true that this
> impacts extensions (some of my stuff as well), but my point is to bite
> the bullet and make the separation cleaner between builtins.h and
> ruleutils.h. Attached is a patch that can be applied on top of patch 1
> doing so... Feel free to discard for the potential breakage this would
> create though.

Yes, I did consider moving the quoting function definitions and the
global out of builtins.h.  It is much more disruptive, however, because
it affects a lot of external code not only our own; and given the looks
I got after people had to mess with adding the htup_details.h header to
external projects due to the refactoring I did to htup.h in an earlier
release, I'm not sure about it.

However, if I were to do it, I would instead create a quote.h file and
would also add the quote_literal_cstr() prototype to it, perhaps even
move the implementations from their current ruleutils.c location to
quote.c.  (Why is half the stuff in ruleutils.c rather than quote.c is
beyond me.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What exactly is our CRC algorithm?

On 2014-10-08 22:13:46 +0300, Heikki Linnakangas wrote:
> Hmm. So the simple, non-table driven, calculation gives the same result as
> using the lookup table using the reflected lookup code. That's expected; the
> lookup method is supposed to be the same, just faster. However, using the
> "normal" lookup code, but with a "reflected" lookup table, produces the same
> result as Postgres' algorithm. Indeed, that's what we do in PostgreSQL. But
> AFAICS, that's an incorrect combination. You're supposed to the
> non-reflected lookup table with the non-reflected lookup code; you can't mix
> and match.
> 
> As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't
> correspond to any bit-by-bit CRC variant and polynomial. My math skills are
> not strong enough to determine what the consequences of that are. It might
> still be a decent checksum. Or not. I couldn't tell if the good error
> detection properties of the normal CRC-32 polynomial apply to our algorithm
> or not.

Additional interesting datapoints are that hstore and ltree contain the
same tables - but properly use the reflected computation.

> Thoughts?

It clearly seems like a bad idea to continue with this - I don't think
anybody here knows which guarantees this gives us.

The question is how can we move away from this. There's unfortunately
two places that embed PGC32 that are likely to prove problematic when
fixing the algorithm: pg_trgm and tsgist both seem to include crc's in
their logic in a persistent way.  I think we should provide
INIT/COMP/FIN_PG32 using the current algorithm for these.

If we're switching to a saner computation, we should imo also switch to
a better polynom - CRC-32C has better error detection capabilities than
CRC32 and is available in hardware. As we're paying the price pf
breaking compat anyway...

Arguably we could also say that given that there's been little evident
problems with the borked computation we could also switch to a much
faster hash instead of continuing to use crc...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Kevin Grittner

Peter Geoghegan  wrote:
> On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas  wrote:

>> I think the problem is that it's not possible to respect the "usual
>> guarantees" even at READ COMMITTED level when performing an INSERT OR
>> UPDATE operation (however spelled).  You may find that there's a tuple
>> with the same PK which is committed but not visible to the snapshot
>> you took at the beginning of the statement.
>
> Can you please comment on this, Kevin? It would be nice to converge on
> an agreement on syntax here

Robert said "however spelled" -- which I take to mean that he at
least sees that the MERGE-like UPSERT syntax can be turned into the
desired semantics.  I have no idea why anyone would think otherwise.

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.  That doesn't
seem as clean to me as saying (for example):

UPSERT targettable t
  USING (VALUES (123, 100)) x(id, quantity)
  ON t.id = x.id
  WHEN NOT MATCHED
INSERT (id, quantity, inserted_at) VALUES (x.id, x.quantity, now())
  WHEN MATCHED
UPDATE SET quantity = t.quantity + x.quantity, updated_at = now();

As I understand it you are proposing that would be:

INSERT INTO targettable(key, quantity, inserted_at)
  VALUES(123, quantity, now())
  ON CONFLICT WITHIN targettable_pkey
UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now();

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On 8 October 2014 21:16, Peter Geoghegan  wrote:

>> My opinion is that syntax for this should be similar to MERGE in the
>> *body* of the command, rather than some completely different syntax.
>> e.g.
>>
>>> WHEN NOT MATCHED THEN
>>>   INSERT
>>> WHEN MATCHED THEN
>>>  UPDATE
>>
>> I'm happy that we put that to a vote on what the syntax should be, as
>> long as we bear in mind that we will one day have MERGE as well.
>
> While I am also happy with taking a vote, if we do so I vote against
> even this much less MERGE-like syntax. It's verbose, and makes much
> less sense when the mechanism is driven by would-be duplicate key
> violations rather than an outer join.

It wouldn't be driven by an outer join, not sure where that comes from.

MERGE is verbose, I agree. I don't always like the SQL Standard, I
just think we should follow it as much as possible. You can't change
the fact that MERGE exists, so I don't see a reason to have two
variants of syntax that do roughly the same thing.

MERGE syntax would allow many things, such as this...
WHEN NOT MATCHED AND x > 7 THEN
  INSERT
WHEN NOT MATCHED THEN
  INSERT
WHEN MATCHED AND y = 5 THEN
  DO NOTHING
WHEN MATCHED THEN
 UPDATE

etc

> I also like that when you UPSERT
> with the proposed ON CONFLICT UPDATE syntax, you get all the
> flexibility of an INSERT - you can use data-modifying CTEs, and nest
> the statement in a data-modifying CTE, and "INSERT ... SELECT... ON
> CONFLICT UPDATE ..." . And to be honest, it's much simpler to
> implement this whole feature as an adjunct to how INSERT statements
> are currently processed (during parse analysis, planning and
> execution); I don't want to make the syntax work against that.

I spoke to someone today that preferred a new command keyword, like
UPSERT, because the semantics of triggers are weird. Having a before
insert trigger fire when there is no insert is quite strange. Properly
documenting that on hackers would help; has the comments made on the
Django list been replayed here in some form?

I'm very scared by your comments about data modifying CTEs etc.. You
have no definition of how they will work, not tests of that. That part
isn't looking like a benefit as things currently stand.

I'm still waiting for some more docs to describe your intentions so
they can be reviewed.

Also, it would be useful to hear that your're going to do something
about the references to rows using conflicting(), since nobody has
agreed with you there. Or hopefully even that you've listened and
implemented something differently already. (We need that, whatever we
do with other elements of syntax).

Overall, I'm not seeing too many comments that indicate you are
accepting review comments and acting upon them. If you want acceptance
from others, you need to begin with some yourself.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Kevin Grittner

Peter Geoghegan  wrote:
> On Tue, Oct 7, 2014 at 5:23 AM, Simon Riggs  wrote:
>> IIRC it wasn't agreed that we needed to identify which indexes in the
>> upsert SQL statement itself, since this would be possible in other
>> ways and would require programmers to know which unique constraints
>> are declared.
>
> Kevin seemed quite concerned about that. That is something that seems
> hard to reconcile with supporting the MERGE syntax. Perhaps Kevin can
> comment on that, since he was in favor of both being able to specify
> user intent by accepting a unique index, while also being in favor of
> the MERGE syntax.

Well, I mostly wanted to make sure we properly considered what the
implications were of using the standard syntax without other
keywords or decorations before deciding to go the non-standard
route.  In spite of an alarming tendency for people to assume that
meant that I didn't understand the desired semantics, I feel enough
people have understood the question and weighed in in favor of an
explicit choice between semantics, rather than inferring
concurrency handling based on the availability of the index
necessary for the slicker behavior.  I'm willing to concede that
overall consensus is leaning toward the view that UPSERT semantics
should be conditioned on explicit syntax; I'll drop that much going
forward.

Granting that, I will say that I lean toward either the MERGE
syntax with CONCURRENTLY being the flag to use UPSERT semantics, or
a separate UPSERT command which is as close to identical to the
MERGE syntax (other than the opening verb) as possible.  I see that
as still needing the ON clause so that you can specify which values
match which columns from the target table.  I'm fine with starting
with the syntax in the standard, which has no DELETE or IGNORE
options (as of the latest version I've seen).  So the syntax I'm
suggesting is close to what Simon is suggesting, but a more
compliant form would be:

MERGE CONCURRENTLY INTO foo
  USING (VALUES (valuelist) aliases)
  ON (conditions)
  WHEN NOT MATCHED
INSERT [ (columnlist) ] VALUES (valuelist)
  WHEN MATCHED
UPDATE SET colname = expression [, ...]

Rather than pseudo-randomly picking a unique index or using a
constraint or index name, the ON condition would need to allow
matching based on equality to all columns of a unique index which
only referenced NOT NULL columns; we would pick an index which
matched those conditions.  In any event, the unique index would be
required if CONCURRENTLY was specified.  Using column matching to
pick the index (like we do when specifying a FOREIGN KEY
constraint) is more in keeping with other SQL statements, and seems
generally safer to me.  It would also make it fairly painless for
people to switch concurrency techniques for what is, after all,
exactly the same operation except for differences in handling of
concurrent conflicting DML.

As I said, I'm also OK with using UPSERT in place of MERGE
CONCURRENTLY.

I also feel that if we could allow:

  USING (VALUES (valuelist) [, ...])

that would be great.  In fact, I don't see why that can't be pretty
much any relation, but it doesn't have to be for a first cut.  A
relation would allow a temporary table to be loaded with a batch of
rows where the intent is to UPSERT every row in the batch, without
needing to write a loop to do it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-10-08 Thread Petr Jelinek


On 08/10/14 11:28, Peter Geoghegan wrote:

On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp  wrote:

But the
MERGE syntax, to me, strongly implies that insertion doesn't begin
before determining whether a conflict exists or not.


I think you're right. Another strike against the MERGE syntax, then,
since as I said we cannot even know what to check prior to having
before row insert triggers fire.



True, but to me it also seems to be strike against using INSERT for this 
as I don't really see how you can make triggers work in a sane way if 
the UPSERT is implemented as part of INSERT (at least I haven't seen any 
proposal that I would consider sane from the user point of view).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas  wrote:
> I think the problem is that it's not possible to respect the "usual
> guarantees" even at READ COMMITTED level when performing an INSERT OR
> UPDATE operation (however spelled).  You may find that there's a tuple
> with the same PK which is committed but not visible to the snapshot
> you took at the beginning of the statement.

Can you please comment on this, Kevin? It would be nice to converge on
an agreement on syntax here (without necessarily working out the exact
details right away, including for example the exact spelling of what
I'm calling WITHIN). Since Simon has changed his mind on MERGE (while
still wanting to retain a superficial flavor of MERGE,  a flavor that
does not include the MERGE keyword), I think that leaves you as the
only advocate for the MERGE syntax.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-08 Thread Petr Jelinek


On 08/10/14 21:03, Robert Haas wrote:

On Wed, Oct 8, 2014 at 10:09 AM, Andres Freund  wrote:


WRT my complaint in the other subthread, I think PQsendMethods should
just be renamed to PQcommMethods or similar. That'd, in combination with
a /* at some point we might want to add methods for receiving data here
*/ looks like it'd already satisfy my complaint ;)


Well, I'm happy enough to go ahead and change that, if that's all it
takes to make you happy.  Is it?

So far the status of these patches AIUI is:

#1 - Just committed, per discussion last week.
#2 - No comments, possibly because it's pretty boring.


Yeah there is nothing much to say there, it can just go in.


#4 - No comments.


I think I said I like it which I meant as I think it's good as it is, no 
objections to committing that one either.



#6 - Updated per Amit's feedback.  Not sure if anyone else wants to
review.  Still needs docs.



I'd like to see this one when it's updated since I lost track a little 
about how the changes looks like exactly.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Wed, Oct 8, 2014 at 6:25 AM, Simon Riggs  wrote:
> I change my view on this, after some more thought. (Hope that helps)

Great.

> If we implement MERGE, I can see we may also wish to implement MERGE
> CONCURRENTLY one day. That would be different to UPSERT.
>
> So in the future I think we will need 3 commands
>
> 1. MERGE
> 2. MERGE CONCURRENTLY
> 3. UPSERT
>
> So we no longer need to have the command start with the MERGE keyword.

As I've outlined, I don't see how MERGE CONCURRENTLY could ever work,
but I'm glad that you agree that it should not block this effort (or
indeed, some later effort to implement a MERGE that is comparable to
the implementations of other database systems).

> We will one day have MERGE according to the SQL Standard.

Agreed.

> My opinion is that syntax for this should be similar to MERGE in the
> *body* of the command, rather than some completely different syntax.
> e.g.
>
>> WHEN NOT MATCHED THEN
>>   INSERT
>> WHEN MATCHED THEN
>>  UPDATE
>
> I'm happy that we put that to a vote on what the syntax should be, as
> long as we bear in mind that we will one day have MERGE as well.

While I am also happy with taking a vote, if we do so I vote against
even this much less MERGE-like syntax. It's verbose, and makes much
less sense when the mechanism is driven by would-be duplicate key
violations rather than an outer join. I also like that when you UPSERT
with the proposed ON CONFLICT UPDATE syntax, you get all the
flexibility of an INSERT - you can use data-modifying CTEs, and nest
the statement in a data-modifying CTE, and "INSERT ... SELECT... ON
CONFLICT UPDATE ..." . And to be honest, it's much simpler to
implement this whole feature as an adjunct to how INSERT statements
are currently processed (during parse analysis, planning and
execution); I don't want to make the syntax work against that. For
example, consider how little I had to change the grammar to make all
of this work:

$ git diff master --stat | grep gram
 src/backend/parser/gram.y  |  72 ++-

The code footprint of this patch is relatively small, and I think we
can all agree that that's a good thing.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund  wrote:
> 2) Implement the wait free LW_SHARED algorithm.

+ * too high for workloads/locks that were locked in shared mode very

s/locked/taken/?

+ * frequently. Often we were spinning in the (obviously exlusive) spinlock,

exclusive.

+ * acquiration for locks that aren't exclusively locked.

acquisition.

+ * For exlusive lock acquisition we use an atomic compare-and-exchange on the

exclusive.

+ * lockcount variable swapping in EXCLUSIVE_LOCK/1<<31-1/0x7FFF if and only

Add comma after variable.  Find some way of describing the special
value (maybe "a sentinel value, EXCLUSIVE_LOCK") just once, instead of
three times.

+ * if the current value of lockcount is 0. If the swap was not successfull, we

successful.

+ * by 1 again. If so, we have to wait for the exlusive locker to release the

exclusive.

+ * The attentive reader probably might have noticed that naively doing the

"probably might" is redundant.  Delete probably.

+ * notice that we have to wait. Unfortunately until we have finished queuing,

until -> by the time

+ *   Phase 2: Add us too the waitqueue of the lock

too -> to.  And maybe us -> ourselves.

+ *get queued in Phase 2 and we can wake them up if neccessary or they will

necessary.

+ * When acquiring shared locks it's possible that we disturb an exclusive
+ * waiter. If that's a problem for the specific user, pass in a valid pointer
+ * for 'potentially_spurious'. Its value will be set to true if we possibly
+ * did so. The caller then has to handle that scenario.

"disturb" is not clear enough.

+/* yipeyyahee */

Although this will be clear to individuals with a good command of
English, I suggest avoiding such usages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-08 15:23:22 -0400, Robert Haas wrote:
> On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund  wrote:
> > 1) Convert PGPROC->lwWaitLink into a dlist. The old code was frail and
> >verbose. This also does:
> > * changes the logic in LWLockRelease() to release all shared lockers
> >   when waking up any. This can yield some significant performance
> >   improvements - and the fairness isn't really much worse than
> >   before,
> >   as we always allowed new shared lockers to jump the queue.
> >
> > * adds a memory pg_write_barrier() in the wakeup paths between
> >   dequeuing and unsetting ->lwWaiting. That was always required on
> >   weakly ordered machines, but f4077cda2 made it more urgent. I can
> >   reproduce crashes without it.
>
> I think it's a really bad idea to mix a refactoring change (like
> converting PGPROC->lwWaitLink into a dlist) with an attempted
> performance enhancement (like changing the rules for jumping the lock
> queue) and a bug fix (like adding pg_write_barrier where needed).  I'd
> suggest that the last of those be done first, and perhaps
> back-patched.

I think it makes sense to separate out the write barrier one. I don't
really see the point of separating the other two changes.

I've indeed previously started a thread
(http://archives.postgresql.org/message-id/20140210134625.GA15246%40awork2.anarazel.de)
about the barrier issue. IIRC you argued that that might be to
expensive.

> The current coding, using a hand-rolled list, touches shared memory
> fewer times.  When many waiters are awoken at once, we clip them all
> out of the list at one go.  Your revision moves them to a
> backend-private list one at a time, and then pops them off one at a
> time.  The backend-private memory accesses don't seem like they matter
> much, but the shared memory accesses would be nice to avoid.

I can't imagine this to matter.  We're entering the kernel for each PROC
for the PGSemaphoreUnlock() and we're dirtying the cacheline for
proc->lwWaiting = false anyway. This really is the slow path.

> Does LWLockUpdateVar's wake-up loop need a write barrier per
> iteration, or just one before the loop starts?  How about commenting
> the pg_write_barrier() with the read-fence to which it pairs?

Hm. Are you picking out LWLockUpdateVar for a reason or just as an
example? Because I don't see a difference between the different wakeup
loops?
It needs to be a barrier per iteration.

Currently the loop looks like
while (head != NULL)
{
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
}

Consider what happens if either the compiler or the cpu reorders this
to:
proc->lwWaiting = false;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
PGSemaphoreUnlock(&proc->sem);

as soon as lwWaiting = false, 'proc' can wake up and acquire a new
lock. Backends can wake up prematurely because proc->sem is used for
other purposes than this (that's why the loops around PGSemaphoreLock
exist). Then it could reset lwWaitLink while acquiring a new lock. And
some processes wouldn't be woken up anymore.

The barrier it pairs with is the spinlock acquiration before
requeuing. To be more obviously correct we could add a read barrier
before
if (!proc->lwWaiting)
break;
but I don't think it's needed.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Andrew Dunstan



On 10/08/2014 02:04 PM, Thom Brown wrote:


There is work already being done on providing update operations.

I've been looking out for that.  Has there been a discussion on how
that would look yet that you could point me to?




https://github.com/erthalion/jsonbx

Note that a) it's an extension, and b) it's jsonb only.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Wed, Oct 8, 2014 at 9:35 AM, Andres Freund  wrote:
> 1) Convert PGPROC->lwWaitLink into a dlist. The old code was frail and
>verbose. This also does:
> * changes the logic in LWLockRelease() to release all shared lockers
>   when waking up any. This can yield some significant performance
>   improvements - and the fairness isn't really much worse than
>   before,
>   as we always allowed new shared lockers to jump the queue.
>
> * adds a memory pg_write_barrier() in the wakeup paths between
>   dequeuing and unsetting ->lwWaiting. That was always required on
>   weakly ordered machines, but f4077cda2 made it more urgent. I can
>   reproduce crashes without it.

I think it's a really bad idea to mix a refactoring change (like
converting PGPROC->lwWaitLink into a dlist) with an attempted
performance enhancement (like changing the rules for jumping the lock
queue) and a bug fix (like adding pg_write_barrier where needed).  I'd
suggest that the last of those be done first, and perhaps
back-patched.

The current coding, using a hand-rolled list, touches shared memory
fewer times.  When many waiters are awoken at once, we clip them all
out of the list at one go.  Your revision moves them to a
backend-private list one at a time, and then pops them off one at a
time.  The backend-private memory accesses don't seem like they matter
much, but the shared memory accesses would be nice to avoid.

Does LWLockUpdateVar's wake-up loop need a write barrier per
iteration, or just one before the loop starts?  How about commenting
the pg_write_barrier() with the read-fence to which it pairs?

+if(waiter->lwWaitMode == LW_EXCLUSIVE)

Whitespace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Wed, Oct 8, 2014 at 12:12 PM, Peter Geoghegan  wrote:
> On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs  wrote:
>> Having said that, it would be much nicer to have a mode that allows
>> you to just say the word "UPDATE" and have it copy the data into the
>> correct columns, like MySQL does. That is very intuitive, even if it
>> isn't very flexible.
>
> I thought about this, and at first I agreed, but now I'm not so sure.

Actually, I don't think MySQL supports this. It doesn't allow INSERT
ON DUPLICATE KEY UPDATE to do it, AFAICT. Their REPLACE syntax
supports that, but that's a feature that is quite distinct to what I
have in mind here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] What exactly is our CRC algorithm?

Our CRC algorithm is a bit weird. It's supposedly CRC-32, with the same 
polynomial as used in Ethernet et al, but it actually is not. The 
comments refer to "Painless Guide to CRC Error Detection Algorithms" by 
Ross N. Williams [1] (http://www.ross.net/crc/download/crc_v3.txt), but 
I think it was implemented incorrectly.


As a test case, I used an input of a single zero byte. I calculated the 
CRC using Postgres' INIT_CRC32+COMP_CRC32+FIN_CRC32, and compared with 
various online CRC calculation tools and C snippets. The Postgres 
algorithm produces the value 2D02EF72, while the correct one is 
D202EF8D. The first and last byte are inverted. For longer inputs, the 
values diverge, and I can't see any obvious pattern between the Postgres 
and correct values.


There are many variants of CRC calculations, as explained in Ross's 
guide. But ours doesn't seem to correspond to the reversed or reflected 
variants either.


I compiled the code from Ross's document, and built a small test program 
to test it. I used Ross's "reverse" lookup table, which is the same 
table we use in Postgres. It produces this output:


Calculating CRC-32 (polynomial 04C11DB7) for a single zero byte:
D202EF8D 110100100010111010001101 (simple)
2D02EF72 101011010010111001110010 (lookup)
D202EF8D 110100100010111010001101 (lookup reflected)

Hmm. So the simple, non-table driven, calculation gives the same result 
as using the lookup table using the reflected lookup code. That's 
expected; the lookup method is supposed to be the same, just faster. 
However, using the "normal" lookup code, but with a "reflected" lookup 
table, produces the same result as Postgres' algorithm. Indeed, that's 
what we do in PostgreSQL. But AFAICS, that's an incorrect combination. 
You're supposed to the non-reflected lookup table with the non-reflected 
lookup code; you can't mix and match.



As far as I can tell, PostgreSQL's so-called CRC algorithm doesn't 
correspond to any bit-by-bit CRC variant and polynomial. My math skills 
are not strong enough to determine what the consequences of that are. It 
might still be a decent checksum. Or not. I couldn't tell if the good 
error detection properties of the normal CRC-32 polynomial apply to our 
algorithm or not.


Thoughts? Attached is the test program I used for this.

- Heikki


crcmodel-1.tar.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs  wrote:
> Having said that, it would be much nicer to have a mode that allows
> you to just say the word "UPDATE" and have it copy the data into the
> correct columns, like MySQL does. That is very intuitive, even if it
> isn't very flexible.

I thought about this, and at first I agreed, but now I'm not so sure.

Consider the case where you write an INSERT ... ON CONFLICT UPDATE ALL
query, or however we might spell this idea.

1. Developer writes the query, and it works fine.

2. Some time later, the DBA adds an inserted_at column (those are
common). The DBA is not aware of the existence of this particular
query. The new column has a default value of now(), say.

Should we UPDATE the inserted_at column to be NULL? Or (more
plausibly) the default value filled in by the INSERT? Or leave it be?
I think there is a case to be made for all of these behaviors, and
that tension makes me prefer to not do this at all. It's like
encouraging "SELECT *" queries in production, only worse.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

On Wed, Oct 8, 2014 at 10:09 AM, Andres Freund  wrote:
> On 2014-09-10 16:53:12 -0400, Robert Haas wrote:
>> diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
>> index 5da9d8d..0b8db42 100644
>> --- a/src/include/libpq/libpq.h
>> +++ b/src/include/libpq/libpq.h
>> @@ -37,6 +37,31 @@ typedef struct
>>   }   u;
>>  } PQArgBlock;
>>
>> +typedef struct
>> +{
>> + void (*comm_reset)(void);
>> + int (*flush)(void);
>> + int (*flush_if_writable)(void);
>> + bool (*is_send_pending)(void);
>> + int (*putmessage)(char msgtype, const char *s, size_t len);
>> + void (*putmessage_noblock)(char msgtype, const char *s, size_t len);
>> + void (*startcopyout)(void);
>> + void (*endcopyout)(bool errorAbort);
>> +} PQsendMethods;
>> +
>> +PQsendMethods *PqSendMethods;
>
> WRT my complaint in the other subthread, I think PQsendMethods should
> just be renamed to PQcommMethods or similar. That'd, in combination with
> a /* at some point we might want to add methods for receiving data here
> */ looks like it'd already satisfy my complaint ;)

Well, I'm happy enough to go ahead and change that, if that's all it
takes to make you happy.  Is it?

So far the status of these patches AIUI is:

#1 - Just committed, per discussion last week.
#2 - No comments, possibly because it's pretty boring.
#3 - Most people seem happy with v2, modulo the above renaming request.
#4 - No comments.
#5 - No comments; trivial.
#6 - Updated per Amit's feedback.  Not sure if anyone else wants to
review.  Still needs docs.

I'm tempted to go ahead and do the renaming you've requested here and
then commit #2, #3, #4, and #5.  But I'll wait if people want to
review more first.  #6 probably needs at least one more thorough
review, and of course the documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump refactor patch to remove global variables

2014-10-08 Thread Alvaro Herrera

Joachim Wieland wrote:
> On Sat, Aug 30, 2014 at 11:12 PM, Peter Eisentraut  wrote:

> > - The forward declaration of struct _dumpOptions in pg_backup.h seems
> > kind of useless.  You could move some things around so that that's not
> > necessary.
> 
> Agreed, fixed.
> 
> > - NewDumpOptions() and NewRestoreOptions() are both in
> > pg_backup_archiver.c, but NewRestoreOptions() is in pg_backup.h whereas
> > NewDumpOptions() is being put into pg_backup_archiver.h.  None of that
> > makes too much sense, but it could be made more consistent.
> 
> I would have created the prototype in the respective header of the .c
> file that holds the function definition, but that's a bit fuzzy around
> pg_dump. I have now moved the DumpOptions over to pg_backup.h, because
> pg_backup_archiver.h includes pg_backup.h so that makes pg_backup.h
> the lower header.

I gave this a look and my conclusion is that the current situation
doesn't make much sense -- supposedly, pg_backup.h is the public header
and pg_dump.h is the private header; then how does it make sense that
the former includes the latter?  I think the reason is that the struct
definitions are not well placed.  In the attached patch, which applies
on top of the rebase of Joachim's patch I submitted yesterday, I fix
that by moving some structs (those that make sense for the public
interface) to a new file, pg_dump_structs.h (better naming suggestions
welcome).  This is included everywhere as needed; it's included by
pg_backup.h, in particular.

With the new header in place I was able to remove most of cross-header
forward struct declarations that were liberally used to work around what
would otherwise be circularity in header dependencies.  I think it makes
much more sense this way.

With this, headers are cleaner and they compile standalone.  pg_backup.h
does not include pg_dump.h anymore.  DumpId and CatalogId, which were
kinda public but defined in the private header (!?), were moved to
pg_backup.h.  This is necessary because the public functions use them.

The one header not real clear to me is pg_backup_db.h.  Right now it
includes pg_backup_archiver.h, because some of its routines take an
ArchiveHandle argument, but that seems pretty strange if looking at it
from 10,000 miles.  I think we could fix this by having the routines
take Archive (the public one, defined in pg_dump_structs.h) and cast to
ArchiveHandle internally.  However, since pg_backup_db.h is not used
much, I guess it doesn't really matter.

There are some further (smaller) improvements that could be made if we
really cared about it, but I'm satisfied with this.

(I just realized after writing all this that I could achieve exactly the
same by putting the contents of the new pg_dump_structs.h in pg_backup.h
instead.  If there are no strong opinions to the contrary, I'm inclined
to do it that way instead, but I want to post it this way to gather
opinions in case anyone cares.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
commit b044ed612c51be47124974d0d2d5cab7f41cdab3
Author: Alvaro Herrera 
Date:   Wed Oct 8 14:57:38 2014 -0300

pg_dump_structs.h

diff --git a/src/bin/pg_dump/dumputils.h b/src/bin/pg_dump/dumputils.h
index b387aa1..a6a2d65 100644
--- a/src/bin/pg_dump/dumputils.h
+++ b/src/bin/pg_dump/dumputils.h
@@ -19,17 +19,7 @@
 #include "libpq-fe.h"
 #include "pqexpbuffer.h"

-typedef struct SimpleStringListCell
-{
-	struct SimpleStringListCell *next;
-	char		val[1];			/* VARIABLE LENGTH FIELD */
-} SimpleStringListCell;
-
-typedef struct SimpleStringList
-{
-	SimpleStringListCell *head;
-	SimpleStringListCell *tail;
-} SimpleStringList;
+#include "pg_dump_structs.h"

 extern int	quote_all_identifiers;
diff --git a/src/bin/pg_dump/parallel.h b/src/bin/pg_dump/parallel.h
index 81a823d..744c76b 100644
--- a/src/bin/pg_dump/parallel.h
+++ b/src/bin/pg_dump/parallel.h
@@ -19,10 +19,7 @@
 #ifndef PG_DUMP_PARALLEL_H
 #define PG_DUMP_PARALLEL_H

-#include "pg_backup_db.h"
-
-struct _archiveHandle;
-struct _tocEntry;
+#include "pg_backup_archiver.h"

 typedef enum
 {
@@ -35,8 +32,8 @@ typedef enum
 /* Arguments needed for a worker process */
 typedef struct ParallelArgs
 {
-	struct _archiveHandle *AH;
-	struct _tocEntry *te;
+	ArchiveHandle	*AH;
+	TocEntry *te;
 } ParallelArgs;

 /* State for each parallel activity slot */
@@ -74,20 +71,20 @@ extern void init_parallel_dump_utils(void);

 extern int	GetIdleWorker(ParallelState *pstate);
 extern bool IsEveryWorkerIdle(ParallelState *pstate);
-extern void ListenToWorkers(struct _archiveHandle * AH, ParallelState *pstate, bool do_wait);
+extern void ListenToWorkers(ArchiveHandle *AH, ParallelState *pstate, bool do_wait);
 extern int	ReapWorkerStatus(ParallelState *pstate, int *status);
-extern void EnsureIdleWorker(struct _archiveHandle * AH, ParallelState *pstate);
-extern void EnsureWorkersFinished(struct _archiveHandle * AH, ParallelState *pstate);
+extern void Ens

Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Thom Brown

On 8 October 2014 18:39, Andrew Dunstan  wrote:
>
> On 10/08/2014 12:13 PM, Paweł Cesar Sanjuan Szklarz wrote:
>>
>>
>>
>>
>> I don't think we need to import Mongo type notation here. But
>> there is probably a good case for some functions like:
>>
>>json_table_agg(anyrecord) -> json
>>
>> which would work like json_agg() but would return an array of
>> arrays instead of an array of objects. The caller would be assumed
>> to know which field is which in the array. That should take care
>> of both the table_to_json and query_to_json suggestions above.
>>
>> In the other direction, we could have something like:
>>
>> json_populate_recordset_from_table(base anyrecord, fields
>> text[], jsontable json) -> setof record
>>
>> where jsontable is an array of arrays of values  and fields is a
>> corresponding array of field names.
>>
>> I'm not sure how mainstream any of this is. Maybe an extension
>> would be more appropriate?
>>
>>
>>
>>
>> Hello.
>>
>> My personal interest is to send updates to a single json value in the
>> server. Which is the best way to make a update to a json value in postgres
>> without a full update of the already stored value??  the -> operator extract
>> a internal value, but to update the value I don't see any operator.
>>
>> I was not familiar with the extensions, but it looks like the best way to
>> start is to create a extension with possible implementations of new
>> functions. I will do so.
>>
>> In my project I considered to use mongo, but in my case the core part of
>> the model match perfectly a relational schema. I have some leaf concepts
>> that will change frequently, and to avoid migrations I store that
>> information in a json value. To make changes in such leaf values I would
>> like to have a "context lenses like api" in the server. I will start with
>> some toy extension and try to feel if this make sense.
>>
>>
>
> There is work already being done on providing update operations.

I've been looking out for that.  Has there been a discussion on how
that would look yet that you could point me to?

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 14:23:44 -0300, Alvaro Herrera wrote:
> Robert Haas wrote:
> > On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  
> > wrote:
> > > I don't see that as being relevant. The difference is an instruction or
> > > two - in the slow path we'll enter the kernel and sleep. This doesn't
> > > matter in comparison.
> > > And the code is *so* much more readable.
> > 
> > I find the slist/dlist stuff actually quite difficult to get right
> > compared to a hand-rolled linked list.  But the really big problem is
> > that the debugger can't do anything useful with it.  You have to work
> > out the structure-member offset in order to walk the list and manually
> > cast to char *, adjust the pointer, and cast back.  That sucks.
> 
> As far as I recall you can get gdb to understand those pointer games
> by defining some structs or macros.  Maybe we can improve by documenting
> this.

So, what makes it work for me (among other unrelated stuff) seems to be
the following in .gdbinit, defineing away some things that gdb doesn't
handle:
macro define __builtin_offsetof(T, F) ((int) &(((T *) 0)->F))
macro define __extension__
macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)

Additionally I have "-ggdb -g3" in CFLAGS. That way gdb knows about
postgres' macros. At least if you're in the right scope.

As an example, the following works:
(gdb) p dlist_is_empty(&BackendList) ? NULL : dlist_head_element(Backend, elem, 
&BackendList)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deferring some AtStart* allocations?

On Sun, Jun 29, 2014 at 9:12 PM, Tom Lane  wrote:
> Meh.  Even "SELECT 1" is going to be doing *far* more pallocs than that to
> get through raw parsing, parse analysis, planning, and execution startup.
> If you can find a few hundred pallocs we can avoid in trivial queries,
> it would get interesting; but I'll be astonished if saving 4 is measurable.

I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place.  The gains are indeed not measurable on a
macrobenchmark, but I had to write the patch to figure that out, so
here it is.

Although the gain isn't a measurable percentage of total runtime, it
does appear to be a significant percentage of the palloc traffic on
trivial queries. I did 30-second SELECT-only pgbench runs at scale
factor 10, using prepared queries.  According to perf, on unpatched
master, StartTransactionCommand accounts for 11.46% of the calls to
AllocSetAlloc.  (I imagine this is by runtime, not by call count, but
it probably doesn't matter much either way.)  But with this patch,
StartTransactionCommand's share drops to 4.43%.  Most of that is
coming from AtStart_Inval(), which wouldn't be hard to fix either.

I'm inclined to think that tamping down palloc traffic is worthwhile
even if we can't directly measure the savings on a macrobenchmark.
AllocSetAlloc is often at or near the top of profiling results, but
there's rarely a single caller responsible for enough of those calls
for to make it worth optimizing. But that means that if we refuse to
take patches that save just a few pallocs, we're basically giving up
on ever improving anything in this area, and that doesn't seem like a
good idea either; it's only going to get slowly worse over time as we
add more features.

BTW, if anyone's curious what the biggest source of AllocSetAlloc
calls is on this workload, the answer is ExecInitIndexScan(), which
looks to be indirectly responsible for almost half the total (or just
over half, with the patch applied).  That apparently calls a lot of
different things that allocate small amounts of memory, and it
accounts for nearly all of the AllocSetAlloc traffic that originates
from PortalStart().  I haven't poked around in there enough to know
whether there's anything worth optimizing, but given the degree to
which this patch shifts the profile, I bet the number that it takes to
make a material savings is more like a few dozen than a few hundred.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f4c0ffa..1db0666 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -90,6 +90,7 @@ static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 	  int event, bool row_trigger,
 	  HeapTuple oldtup, HeapTuple newtup,
 	  List *recheckIndexes, Bitmapset *modifiedCols);
+static void AfterTriggerEnlargeQueryState(void);

 /*
@@ -3203,9 +3204,7 @@ typedef struct AfterTriggersData
 	int			maxtransdepth;	/* allocated len of above arrays */
 } AfterTriggersData;

-typedef AfterTriggersData *AfterTriggers;
-
-static AfterTriggers afterTriggers;
+static AfterTriggersData afterTriggers;

 static void AfterTriggerExecute(AfterTriggerEvent event,
 	Relation rel, TriggerDesc *trigdesc,
@@ -3228,7 +3227,7 @@ GetCurrentFDWTuplestore()
 {
 	Tuplestorestate *ret;

-	ret = afterTriggers->fdw_tuplestores[afterTriggers->query_depth];
+	ret = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
 	if (ret == NULL)
 	{
 		MemoryContext oldcxt;
@@ -3255,7 +3254,7 @@ GetCurrentFDWTuplestore()
 		CurrentResourceOwner = saveResourceOwner;
 		MemoryContextSwitchTo(oldcxt);

-		afterTriggers->fdw_tuplestores[afterTriggers->query_depth] = ret;
+		afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = ret;
 	}

 	return ret;
@@ -3271,7 +3270,7 @@ static bool
 afterTriggerCheckState(AfterTriggerShared evtshared)
 {
 	Oid			tgoid = evtshared->ats_tgoid;
-	SetConstraintState state = afterTriggers->state;
+	SetConstraintState state = afterTriggers.state;
 	int			i;

 	/*
@@ -3282,19 +3281,22 @@ afterTriggerCheckState(AfterTriggerShared evtshared)
 		return false;

 	/*
-	 * Check if SET CONSTRAINTS has been executed for this specific trigger.
+	 * If constraint state exists, SET CONSTRAINTS might have been executed
+	 * either for this trigger or for all triggers.
 	 */
-	for (i = 0; i < state->numstates; i++)
+	if (state != NULL)
 	{
-		if (state->trigstates[i].sct_tgoid == tgoid)
-			return state->trigstates[i].sct_tgisdeferred;
-	}
+		/* Check for SET CONSTRAINTS for this specific trigger. */
+		for (i = 0; i < state->numstates; i++)
+		{
+			if (state->trigstates[i].sct_tgoid == tgoid)
+return state->trigstates[i].sct_tgisdeferred;
+		}

-	/*
-	 * Check if SET CONSTRAINTS ALL has been executed; if so use that.
-	 */
-	if (state->all

Re: [HACKERS] Log notice that checkpoint is to be written on shutdown

2014-10-08 Thread Michael Banck

Hi,

Am Samstag, den 04.10.2014, 15:05 -0500 schrieb Jim Nasby:
> On 10/4/14, 1:21 PM, Jeff Janes wrote:
> > On Thu, Oct 2, 2014 at 6:21 AM, Michael Banck wrote:
> > we have seen repeatedly that users can be confused about why PostgreSQL
> > is not shutting down even though they requested it. Usually, this is
> > because `log_checkpoints' is not enabled and the final checkpoint is
> > being written, delaying shutdown. As no message besides "shutting down"
> > is written to the server log in this case, we even had users believing
> > the server was hanging and pondering killing it manually.
> >
> >
>> Wouldn't a better place to write this message be the terminal from 
>> which "pg_ctl stop" was invoked, rather than the server log file?

Looking at it from a DBA perspective, this would indeed be better, yes.

However, I see a few issues with that:

1. If you are using an init script (or another wrapper around pg_ctl),
you don't get any of its output it seems.

2. Having taken a quick look at pg_ctl, it seems to just kill the
postmaster on shutdown and wait for its PID file to disappear.  I don't
see how it should figure out that PostgreSQL is waiting for a checkpoint
to be finished?

> Or do both. I suspect elog( INFO, ... ) might do that.

That would imply that pg_ctl receives and writes out log messages
directed at clients, which I don't think is true?  Even if it was,
client_min_messages does not include an INFO level, and LOG is not being
logged to clients by default. So the first common level above the
default of both client_min_messages and log_min_messages would be
WARNING, which seems excessive to me.

As I said, I only took a quick look at pg_ctl though, so I might well be
missing something.

Michael

-- 
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Andrew Dunstan



On 10/08/2014 12:13 PM, Paweł Cesar Sanjuan Szklarz wrote:




I don't think we need to import Mongo type notation here. But
there is probably a good case for some functions like:

   json_table_agg(anyrecord) -> json

which would work like json_agg() but would return an array of
arrays instead of an array of objects. The caller would be assumed
to know which field is which in the array. That should take care
of both the table_to_json and query_to_json suggestions above.

In the other direction, we could have something like:

json_populate_recordset_from_table(base anyrecord, fields
text[], jsontable json) -> setof record

where jsontable is an array of arrays of values  and fields is a
corresponding array of field names.

I'm not sure how mainstream any of this is. Maybe an extension
would be more appropriate?




Hello.

My personal interest is to send updates to a single json value in the 
server. Which is the best way to make a update to a json value in 
postgres without a full update of the already stored value??  the -> 
operator extract a internal value, but to update the value I don't see 
any operator.


I was not familiar with the extensions, but it looks like the best way 
to start is to create a extension with possible implementations of new 
functions. I will do so.


In my project I considered to use mongo, but in my case the core part 
of the model match perfectly a relational schema. I have some leaf 
concepts that will change frequently, and to avoid migrations I store 
that information in a json value. To make changes in such leaf values 
I would like to have a "context lenses like api" in the server. I will 
start with some toy extension and try to feel if this make sense.





There is work already being done on providing update operations.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-08 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  wrote:
> > I don't see that as being relevant. The difference is an instruction or
> > two - in the slow path we'll enter the kernel and sleep. This doesn't
> > matter in comparison.
> > And the code is *so* much more readable.
> 
> I find the slist/dlist stuff actually quite difficult to get right
> compared to a hand-rolled linked list.  But the really big problem is
> that the debugger can't do anything useful with it.  You have to work
> out the structure-member offset in order to walk the list and manually
> cast to char *, adjust the pointer, and cast back.  That sucks.

As far as I recall you can get gdb to understand those pointer games
by defining some structs or macros.  Maybe we can improve by documenting
this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 13:13:33 -0400, Robert Haas wrote:
> On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  wrote:
> > I don't see that as being relevant. The difference is an instruction or
> > two - in the slow path we'll enter the kernel and sleep. This doesn't
> > matter in comparison.
> > And the code is *so* much more readable.
> 
> I find the slist/dlist stuff actually quite difficult to get right
> compared to a hand-rolled linked list.

Really? I've spent more than a day debugging things with the current
code. And Heikki introduced a bug in it. If you look at how the code
looks before/after I find the difference pretty clear.

> But the really big problem is
> that the debugger can't do anything useful with it.  You have to work
> out the structure-member offset in order to walk the list and manually
> cast to char *, adjust the pointer, and cast back.  That sucks.

Hm. I can just do that with the debugger here. Not sure if that's
because I added the right thing to my .gdbinit or because I use the
correct compiler flags.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On Wed, Oct 8, 2014 at 8:47 AM, Andres Freund  wrote:
> I don't see that as being relevant. The difference is an instruction or
> two - in the slow path we'll enter the kernel and sleep. This doesn't
> matter in comparison.
> And the code is *so* much more readable.

I find the slist/dlist stuff actually quite difficult to get right
compared to a hand-rolled linked list.  But the really big problem is
that the debugger can't do anything useful with it.  You have to work
out the structure-member offset in order to walk the list and manually
cast to char *, adjust the pointer, and cast back.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Paweł Cesar Sanjuan Szklarz

On Wed, Oct 8, 2014 at 4:25 PM, Andrew Dunstan  wrote:

>
> On 10/08/2014 04:38 AM, Paweł Cesar Sanjuan Szklarz wrote:
>
>> Hello.
>>
>> I am interested in the json type on postgresql. I would like to implement
>> additional operations on the json structure that may extract/insert table
>> like information from the json tree structure.
>> I have a implementation on javascript that shows this type of operations.
>> You can see examples in this page
>> https://github.com/paweld2/eelnss/wiki
>>
>> Following the examples in the previous page, it may by possible to
>> implement a function similar to json_populate_record to extract multiple
>> records from a single json value, for example:
>> select * from json_populate_records_with_clen(null::myrowtype_users,
>> 'app.users.{:uID}.(email,data.name ,isActive)', '...
>> nested json value ...')
>>
>> may return
>> uID  |  email  | name  | isActive
>> 
>> --
>> "u1" | "ad...@pmsoft.eu "| "administrator" |
>> true
>> "u2" | "nor...@pmsoft.eu "   | "user"
>>  | true
>> "u3" | "testu...@pmsoft.eu " | "testUser"
>> | false
>>
>>
>> Also, assuming that we have a table User as above (uID, email, name,
>> isActive), with context lenses it is very simple to map the table to a json
>> object. I assume that a similar api to table_to_xml,query_to_xml may be
>> provided:
>>
>> table_to_json( Person, 'app.users.{:uID}.(email,data.name <
>> http://data.name>,isActive)');
>> query_to_json( 'select * from Person where ... ',
>> 'app.users.{:uID}.(email,data.name ,isActive)');
>>
>>
>> I don't know the details about the integration of functions/operators to
>> sql queries, but because context lenses maps between tables and tree
>> objects, it may be possible to use a column json value as a separate table
>> in the queries. Assume the table
>> create table Person {
>>   pID  Integer
>>   address Json
>> }
>> then it  may be possible to query:
>> select * from Person as P left join ( select * from
>> json_populate_records_with_clen(null::addressType,
>> 'addres.(street.number, street.local,city.code,city.name <
>> http://city.name>)', P.address);
>>
>> A final api for such functions needs to be defined. If such functions may
>> be usefull, I can try to prepare a implementation in postgres base code.
>>
>>
>>
>
> I don't think we need to import Mongo type notation here. But there is
> probably a good case for some functions like:
>
>json_table_agg(anyrecord) -> json
>
> which would work like json_agg() but would return an array of arrays
> instead of an array of objects. The caller would be assumed to know which
> field is which in the array. That should take care of both the
> table_to_json and query_to_json suggestions above.
>
> In the other direction, we could have something like:
>
> json_populate_recordset_from_table(base anyrecord, fields text[],
> jsontable json) -> setof record
>
> where jsontable is an array of arrays of values  and fields is a
> corresponding array of field names.
>
> I'm not sure how mainstream any of this is. Maybe an extension would be
> more appropriate?
>
> cheers
>
> andrew
>
>
Hello.

My personal interest is to send updates to a single json value in the
server. Which is the best way to make a update to a json value in postgres
without a full update of the already stored value??  the -> operator
extract a internal value, but to update the value I don't see any operator.

I was not familiar with the extensions, but it looks like the best way to
start is to create a extension with possible implementations of new
functions. I will do so.

In my project I considered to use mongo, but in my case the core part of
the model match perfectly a relational schema. I have some leaf concepts
that will change frequently, and to avoid migrations I store that
information in a json value. To make changes in such leaf values I would
like to have a "context lenses like api" in the server. I will start with
some toy extension and try to feel if this make sense.

Regards.
Pawel.

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)


On 10/08/2014 04:59 PM, Fujii Masao wrote:

On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
 wrote:

Instead of creating any .done files during recovery, we could scan pg_xlog
at promotion, and create a .done file for every WAL segment that's present
at that point. That would be more robust. And then apply your patch, to
recycle old segments during archive recovery, ignoring .done files.


What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master?


Um, that's not a safe thing to do anyway, is it?

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Andrew Dunstan



On 10/08/2014 04:38 AM, Paweł Cesar Sanjuan Szklarz wrote:

Hello.

I am interested in the json type on postgresql. I would like to 
implement additional operations on the json structure that may 
extract/insert table like information from the json tree structure.
I have a implementation on javascript that shows this type of 
operations. You can see examples in this page

https://github.com/paweld2/eelnss/wiki

Following the examples in the previous page, it may by possible to 
implement a function similar to json_populate_record to extract 
multiple records from a single json value, for example:
select * from json_populate_records_with_clen(null::myrowtype_users, 
'app.users.{:uID}.(email,data.name ,isActive)', '... 
nested json value ...')


may return
uID  |  email  | name  | isActive
--
"u1" | "ad...@pmsoft.eu "| "administrator" 
| true
"u2" | "nor...@pmsoft.eu "   | "user" 
  | true
"u3" | "testu...@pmsoft.eu " | "testUser"   
 | false



Also, assuming that we have a table User as above (uID, email, name, 
isActive), with context lenses it is very simple to map the table to a 
json object. I assume that a similar api to table_to_xml,query_to_xml 
may be provided:


table_to_json( Person, 'app.users.{:uID}.(email,data.name 
,isActive)');
query_to_json( 'select * from Person where ... ', 
'app.users.{:uID}.(email,data.name ,isActive)');



I don't know the details about the integration of functions/operators 
to sql queries, but because context lenses maps between tables and 
tree objects, it may be possible to use a column json value as a 
separate table in the queries. Assume the table

create table Person {
  pID  Integer
  address Json
}
then it  may be possible to query:
select * from Person as P left join ( select * from 
json_populate_records_with_clen(null::addressType, 
'addres.(street.number, street.local,city.code,city.name 
)', P.address);


A final api for such functions needs to be defined. If such functions 
may be usefull, I can try to prepare a implementation in postgres base 
code.






I don't think we need to import Mongo type notation here. But there is 
probably a good case for some functions like:


   json_table_agg(anyrecord) -> json

which would work like json_agg() but would return an array of arrays 
instead of an array of objects. The caller would be assumed to know 
which field is which in the array. That should take care of both the 
table_to_json and query_to_json suggestions above.


In the other direction, we could have something like:

json_populate_recordset_from_table(base anyrecord, fields text[], 
jsontable json) -> setof record


where jsontable is an array of arrays of values  and fields is a 
corresponding array of field names.


I'm not sure how mainstream any of this is. Maybe an extension would be 
more appropriate?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

On 2014-09-10 16:53:12 -0400, Robert Haas wrote:
> diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
> index 5da9d8d..0b8db42 100644
> --- a/src/include/libpq/libpq.h
> +++ b/src/include/libpq/libpq.h
> @@ -37,6 +37,31 @@ typedef struct
>   }   u;
>  } PQArgBlock;
>  
> +typedef struct
> +{
> + void (*comm_reset)(void);
> + int (*flush)(void);
> + int (*flush_if_writable)(void);
> + bool (*is_send_pending)(void);
> + int (*putmessage)(char msgtype, const char *s, size_t len);
> + void (*putmessage_noblock)(char msgtype, const char *s, size_t len);
> + void (*startcopyout)(void);
> + void (*endcopyout)(bool errorAbort);
> +} PQsendMethods;
> +
> +PQsendMethods *PqSendMethods;

WRT my complaint in the other subthread, I think PQsendMethods should
just be renamed to PQcommMethods or similar. That'd, in combination with
a /* at some point we might want to add methods for receiving data here
*/ looks like it'd already satisfy my complaint ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

On 2014-09-29 13:38:37 -0400, Robert Haas wrote:
> On Mon, Sep 29, 2014 at 12:05 PM, Stephen Frost  wrote:
> > Lastly, I will say that I feel it'd be good to support bi-directional
> > communication as I think it'll be needed eventually, but I'm not sure
> > that has to happen now.
> 
> I agree we need bidirectional communication; I just don't agree that
> the other direction should use the libpq format.  The data going from
> the worker to the process that launched it is stuff like errors and
> tuples, for which we already have a wire format.  The data going in
> the other direction is going to be things like plan trees to be
> executed, for which we don't.  But if we can defer the issue, so much
> the better.  Things will become clearer as we get closer to being
> done.

I think that might be true for your usecase, but not for others. It's
perfectly conceivable that one might want to ship tuples to a couple
bgworkers using the COPY protocol or such.

I don't think it needs to be fully implemented, but I think we should
design it a way that it's unlikely to require larger changes to the
added code from here to add it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-10-08 Thread Fujii Masao

On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
 wrote:
> On 10/08/2014 10:44 AM, Michael Paquier wrote:
>>
>> On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <
>> j...@dalibo.com> wrote:
>>
>>> We kept the WAL files and log files for further analysis. How can we help
>>> regarding this issue?
>>>
>>
>> Commit c2f79ba has added as assumption that the WAL receiver should always
>> enforce the create of .done files when WAL files are done being streamed
>> (XLogWalRcvWrite and WalReceiverMain) or archived
>> (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
>> changed a bit RemoveOldXlogFiles, removing a check looking if the node is
>> in recovery. Now, based on the information given here yes it happens that
>> there are still cases where .done file creation is not correctly done,
>> leading to those extra files. Even by looking at the code, I am not
>> directly seeing any code paths where an extra call to XLogArchiveForceDone
>> would be needed on the WAL receiver side but... Something like the patch
>> attached (which is clearly a band-aid) may help though as it would make
>> files to be removed even if they are not marked as .done for a node in
>> recovery. And this is consistent with the pre-1bd42cd.
>
>
>
> There are two mysteries here:
>
> 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to
> created, ever.
>
> Since this only happens with streaming replication, the FF segments are
> probably being created by walreceiver. XLogWalRcvWrite is the function that
> opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
> opens the file corresponding the start position in the message received from
> the master. There is no check that the start position is valid, though; if
> the master sends a start position in the FF segment, walreceiver will
> merrily write it. So the problem could be in the walsender side. However, I
> don't see anything wrong there either.
>
> I think we should add a check in walreceiver, to throw an error if the
> master sends an invalid WAL pointer, pointing to an FF segment.
>
>
> 2. Why are the .done files sometimes not being created?
>
> I may have an explanation for that. Walreceiver creates a .done file when it
> closes an old segment and opens a new one. However, it does this only when
> it's about to start writing to the new segment, and still has the old
> segment open. If you stream the FE segment fully, but drop replication
> connection at exactly that point, the .done file is not created. That might
> sound unlikely, but it's actually pretty easy to trigger. Just do "select
> pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a
> restart.
>
> The creation of the .done files seems quite unreliable anyway. If only a
> portion of a segment is streamed, we don't write a .done file for it, so we
> still have the original problem that we will try to archive the segment
> after failover, even though the master might already have archived it.
>
> I looked again at the thread where this was discussed:
> http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com.
> I believe the idea was that the server that generates a WAL segment is
> always responsible for archiving it. A standby should never attempt to
> archive a WAL segment that was restored from the archive, or streamed from
> the master.
>
> In that thread, it was not discussed what should happen to WAL files that an
> admin manually copies into pg_xlog of the standby. Should the standby
> archive them? I don't think so - the admin should copy them manually to the
> archive too, if he wants them archived. It's a good and simple rule that the
> server that generates the WAL, archives the WAL.
>
> Instead of creating any .done files during recovery, we could scan pg_xlog
> at promotion, and create a .done file for every WAL segment that's present
> at that point. That would be more robust. And then apply your patch, to
> recycle old segments during archive recovery, ignoring .done files.

What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master? In this case, no WAL files have .done status
files, so the server will create .ready and archive all of them. Probably this
is problematic. So even if we adopt your idea, ISTM that it's better to create
.done file whenever WAL file is fullly streamed and restored.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-10-08 Thread Fujii Masao

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
 wrote:
> (2014/09/13 2:42), Fujii Masao wrote:
>>
>> On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
>>  wrote:
>>>
>>> Fujii Masao wrote:

 On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
  wrote:
>>>
>>>
> PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
> Wouldn't it be easy-to-use to have only one parameter,
> PENDING_LIST_CLEANUP_SIZE?  How about setting PENDING_LIST_CLEANUP_SIZE
> to
> work_mem as the default value when running the CREATE INDEX command?


 That's an idea. But there might be some users who want to change
 the cleanup size per session like they can do by setting work_mem,
 and your idea would prevent them from doing that...

 So what about introduing pending_list_cleanup_size also as GUC?
 That is, users can set the threshold by using either the reloption or
 GUC.
>>>
>>>
>>> Yes, I think having both a GUC and a reloption makes sense -- the GUC
>>> applies to all indexes, and can be tweaked for individual indexes using
>>> the reloption.
>>
>>
>> Agreed.
>>
>>> I'm not sure about the idea of being able to change it per session,
>>> though.  Do you mean that you would like insert processes use a very
>>> large value so that they can just append new values to the pending list,
>>> and have vacuum use a small value so that it cleans up as soon as it
>>> runs?  Two things: 1. we could have an "autovacuum_" reloption which
>>> only changes what autovacuum does; 2. we could have autovacuum run
>>> index cleanup actions separately from actual vacuuming.
>>
>>
>> Yes, I was thinking something like that. But if autovacuum
>> has already been able to handle that, it's nice. Anyway,
>> as you pointed out, it's better to have both GUC and reloption
>> for the cleanup size of pending list.
>
>
> OK, I'd vote for your idea of having both the GUC and the reloption. So, I
> think the patch needs to be updated.  Fujii-san, what plan do you have about
> the patch?

Please see the attached patch. In this patch, I introduced the GUC parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the parameter.
But do you have any better idea about that default value?

BTW, I moved the CommitFest entry of this patch to next CF 2014-10.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 5907,5912  SET XML OPTION { DOCUMENT | CONTENT };
--- 5907,5933 

   
  
+  
+   pending_list_cleanup_size (integer)
+   
+pending_list_cleanup_size configuration parameter
+   
+   
+   
+
+ Sets the maximum size of the GIN pending list which is used
+ when FASTUPDATE is enabled. If the list grows
+ larger than this maximum size, it is cleaned up by moving
+ the entries in it to the main GIN data structure in bulk.
+ The default is four megabytes (4MB). This setting
+ can be overridden for individual GIN indexes by changing
+ storage parameters.
+  See  and 
+  for more information.
+
+   
+  
+ 
   
  
   
*** a/doc/src/sgml/gin.sgml
--- b/doc/src/sgml/gin.sgml
***
*** 728,735 
 from the indexed item). As of PostgreSQL 8.4,
 GIN is capable of postponing much of this work by inserting
 new tuples into a temporary, unsorted list of pending entries.
!When the table is vacuumed, or if the pending list becomes too large
!(larger than ), the entries are moved to the
 main GIN data structure using the same bulk insert
 techniques used during initial index creation.  This greatly improves
 GIN index update speed, even counting the additional
--- 728,735 
 from the indexed item). As of PostgreSQL 8.4,
 GIN is capable of postponing much of this work by inserting
 new tuples into a temporary, unsorted list of pending entries.
!When the table is vacuumed, or if the pending list becomes larger than
!, the entries are moved to the
 main GIN data structure using the same bulk insert
 techniques used during initial index creation.  This greatly improves
 GIN index update speed, even counting the additional
***
*** 812,829 

  

!
 
  
   During a series of insertions into an existing GIN
   index that has FASTUPDATE enabled, the system will clean up
   the pending-entry list whenever the list grows larger than
!  work_mem.  To avoid fluctuations in observed response time,
!  it's desirable to have pending-list cleanup occur in the background
!  (i.e., via autovacuum).  Foreground cleanup operations can be avoided by
!  increasing work_mem or making autovacuum more aggressive.
!  However, enlarging work_mem means that if a foreground
!  cleanup does occur, it will take even longer.
  
 

--- 812,837 

  

!

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

Hi,

Attached you can find the next version of my LW_SHARED patchset. Now
that atomics are committed, it seems like a good idea to also add their
raison d'être.

Since the last public version I have:
* Addressed lots of Amit's comments. Thanks!
* Peformed a fair amount of testing.
* Rebased the code. The volatile removal made that not entirely
  trivial...
* Significantly cleaned up and simplified the code.
* Updated comments and such
* Fixed a minor bug (unpaired HOLD/RESUME_INTERRUPTS in a corner case)

The feature currently consists out of two patches:
1) Convert PGPROC->lwWaitLink into a dlist. The old code was frail and
   verbose. This also does:
* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than
  before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting ->lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent. I can
  reproduce crashes without it.
2) Implement the wait free LW_SHARED algorithm.

Personally I'm quite happy with the new state. I think it needs more
review, but I personally don't know of anything that needs
changing. There's lots of further improvements that could be done, but
let's get this in first.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 7 Oct 2014 15:32:34 +0200
Subject: [PATCH 1/2] Convert the PGPROC->lwWaitLink list into a dlist instead
 of open coding it.

Besides being shorter and much easier to read it:

* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting ->lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent.

Author: Andres Freund
---
 src/backend/access/transam/twophase.c |   1 -
 src/backend/storage/lmgr/lwlock.c | 151 +-
 src/backend/storage/lmgr/proc.c   |   2 -
 src/include/storage/lwlock.h  |   5 +-
 src/include/storage/proc.h|   3 +-
 5 files changed, 60 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d5409a6..6401943 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc->roleId = owner;
 	proc->lwWaiting = false;
 	proc->lwWaitMode = 0;
-	proc->lwWaitLink = NULL;
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 9fe6855..e6f9158 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -35,6 +35,7 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "replication/slot.h"
+#include "storage/barrier.h"
 #include "storage/ipc.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
@@ -115,9 +116,9 @@ inline static void
 PRINT_LWDEBUG(const char *where, const LWLock *lock)
 {
 	if (Trace_lwlocks)
-		elog(LOG, "%s(%s %d): excl %d shared %d head %p rOK %d",
+		elog(LOG, "%s(%s %d): excl %d shared %d rOK %d",
 			 where, T_NAME(lock), T_ID(lock),
-			 (int) lock->exclusive, lock->shared, lock->head,
+			 (int) lock->exclusive, lock->shared,
 			 (int) lock->releaseOK);
 }
 
@@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
 	lock->exclusive = 0;
 	lock->shared = 0;
 	lock->tranche = tranche_id;
-	lock->head = NULL;
-	lock->tail = NULL;
+	dlist_init(&lock->waiters);
 }
 
 
@@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
 
 		proc->lwWaiting = true;
 		proc->lwWaitMode = mode;
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		dlist_push_head(&lock->waiters, &proc->lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(&lock->mutex);
@@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 
 		proc->lwWaiting = true;
 		proc->lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		dlist_push_head(&lock->waiters, &proc->lwWaitLink);
 
 		/* Can relea

Re: [HACKERS] Promise index tuples for UPSERT

On 8 October 2014 00:34, Peter Geoghegan  wrote:

> INSERTs see #2 win, and by a wider margin than #1 beat #2 with
> UPDATEs. However, insert.sh is by design very unsympathetic towards
> #1. It uses a serial primary key, so every INSERT uselessly obtains a
> HW lock on the same leaf page for the duration of heap insertion.
> Anyway, the median INSERT TPS numbers is 17,759.53 for #1, and
> 20,441.57 TPS for #2. So you're pretty much seeing the full brunt of
> page heavyweight locking, and it isn't all that bad.

Lets see the results of running a COPY please.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On 8 October 2014 01:47, Peter Geoghegan  wrote:

> It seems like what you're talking about here is just changing the
> spelling of what I already have. I think that would be confusing to
> users when the time comes to actually implement a fully-generalized
> MERGE, even with the clearly distinct MERGE CONCURRENTLY variant
> outlined here (which, of course, lacks an outer join, unlike MERGE
> proper).

I change my view on this, after some more thought. (Hope that helps)

If we implement MERGE, I can see we may also wish to implement MERGE
CONCURRENTLY one day. That would be different to UPSERT.

So in the future I think we will need 3 commands

1. MERGE
2. MERGE CONCURRENTLY
3. UPSERT

So we no longer need to have the command start with the MERGE keyword.

> However, unlike the idea of trying to square the circle of producing a
> general purpose MERGE command that also supports the UPSERT use-case,
> my objection to this much more limited proposal is made purely on
> aesthetic grounds. I think that it is not very user-friendly; I do not
> think that it's a total disaster, which is what trying to solve both
> problems at once (MERGE bulkloading and UPSERTing) would result in. So
> FWIW, if the community is really set on something that includes the
> keyword MERGE, which is really all you outline here, then I can live
> with that.

We will one day have MERGE according to the SQL Standard.

My opinion is that syntax for this should be similar to MERGE in the
*body* of the command, rather than some completely different syntax.
e.g.

> WHEN NOT MATCHED THEN
>   INSERT
> WHEN MATCHED THEN
>  UPDATE

I'm happy that we put that to a vote on what the syntax should be, as
long as we bear in mind that we will one day have MERGE as well.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas 
wrote:

> 1. Where do the FF files come from? In 9.2, FF-segments are not supposed
> to created, ever.
>
Since this only happens with streaming replication, the FF segments are
> probably being created by walreceiver. XLogWalRcvWrite is the function that
> opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
> opens the file corresponding the start position in the message received
> from the master. There is no check that the start position is valid,
> though; if the master sends a start position in the FF segment, walreceiver
> will merrily write it. So the problem could be in the walsender side.
> However, I don't see anything wrong there either.
>
Good to hear that. By looking at the wal receiver and sender code paths, I
found nothing really weird.

I think we should add a check in walreceiver, to throw an error if the
> master sends an invalid WAL pointer, pointing to an FF segment.
>
Then we're good for a check in ProcessWalSndrMessage for walEnd I guess.
Seems like a straight-forward patch.

2. Why are the .done files sometimes not being created?
>
> I may have an explanation for that. Walreceiver creates a .done file when
> it closes an old segment and opens a new one. However, it does this only
> when it's about to start writing to the new segment, and still has the old
> segment open. If you stream the FE segment fully, but drop replication
> connection at exactly that point, the .done file is not created. That might
> sound unlikely, but it's actually pretty easy to trigger. Just do "select
> pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a
> restart.
>

That's exactly the test I have been doing a couple of times to trigger this
behavior before sending my previous email, but without success on the
standby with master: all the WAL files were marked as .done. Now, I have
just retried it, with far more tries on REL9_3_STABLE and on HEAD and I
have been able to actually trigger the problem a couple of times. Simply
run a long transaction generating a lot of WAL like that:
=# create table aa as select generate_series(1,10);
And then run that:
$ psql -c 'select pg_switch_xlog()'; pg_ctl stop -m immediate; pg_ctl start
And with enough "luck", .ready files may appear. It may take a dozen of
tries before seeing at least ones. And I noticed that generally multiple
.ready files appear at the same time.


> The creation of the .done files seems quite unreliable anyway. If only a
> portion of a segment is streamed, we don't write a .done file for it, so we
> still have the original problem that we will try to archive the segment
> after failover, even though the master might already have archived it.
>
Yep. Agreed.


> I looked again at the thread where this was discussed:
> http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com.
> I believe the idea was that the server that generates a WAL segment is
> always responsible for archiving it. A standby should never attempt to
> archive a WAL segment that was restored from the archive, or streamed from
> the master.
>
In that thread, it was not discussed what should happen to WAL files that
> an admin manually copies into pg_xlog of the standby. Should the standby
> archive them? I don't think so - the admin should copy them manually to the
> archive too, if he wants them archived. It's a good and simple rule that
> the server that generates the WAL, archives the WAL.
>

Question time: why has the check based on recovery state of the node been
removed in 1bd42cd? Just assuming, but did you have in mind that relying on
XLogArchiveForceDone and XLogArchiveCheckDone was enough and more robust at
this point?

Instead of creating any .done files during recovery, we could scan pg_xlog
> at promotion, and create a .done file for every WAL segment that's present
> at that point. That would be more robust. And then apply your patch, to
> recycle old segments during archive recovery, ignoring .done files.
>

The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Regards,
-- 
Michael

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 14:47:44 +0200, Andres Freund wrote:
> On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> > 5.
> > LWLockWakeup()
> > {
> > ..
> > dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> > {
> > PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> > LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> > dlist_delete(&waiter->lwWaitLink);
> > pg_write_barrier();
> > waiter->lwWaiting = false;
> > PGSemaphoreUnlock(&waiter->sem);
> > }
> > ..
> > }
> > 
> > Why can't we decrement the nwaiters after waking up? I don't think
> > there is any major problem even if callers do that themselves, but
> > in some rare cases LWLockRelease() might spuriously assume that
> > there are some waiters and tries to call LWLockWakeup().  Although
> > this doesn't create any problem, keeping the value sane is good unless
> > there is some problem in doing so.
> 
> That won't work because then LWLockWakeup() wouldn't be called when
> necessary - precisely because nwaiters is 0.

Err, this is bogus. Memory fail.

The reason I've done so is that it's otherwise much harder to debug
issues where there are backend that have been woken up already, but
haven't rerun yet. Without this there's simply no evidence of that
state. I can't see this being relevant for performance, so I'd rather
have it stay that way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-06-25 19:06:32 +0530, Amit Kapila wrote:
> 2.
> LWLockWakeup()
> {
> ..
> #ifdef LWLOCK_STATS
> lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
> #else
> SpinLockAcquire(&lock->mutex);
> #endif
> ..
> }
> 
> Earlier while releasing lock, we don't count it towards LWLock stats
> spin_delay_count.  I think if we see other places in lwlock.c, it only gets
> counted when we try to acquire it in a loop.

I think the previous situation was clearly suboptimal. I've now modified
things so all spinlock acquirations are counted.

> 3.
> LWLockRelease()
> {
> ..
> /* grant permission to run, even if a spurious share lock increases
> lockcount */
> else if (mode == LW_EXCLUSIVE && have_waiters)
> check_waiters = true;
> /* nobody has this locked anymore, potential exclusive lockers get a chance
> */
> else if (lockcount == 0 && have_waiters)
> check_waiters = true;
> ..
> }
> 
> It seems comments have been reversed in above code.

No, they look right. But I've expanded them in the version I'm going to
post in a couple minutes.
 
> 5.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &wakeup)
> {
> PGPROC *waiter = dlist_container(PGPROC, lwWaitLink, iter.cur);
> LOG_LWDEBUG("LWLockRelease", l, mode, "release waiter");
> dlist_delete(&waiter->lwWaitLink);
> pg_write_barrier();
> waiter->lwWaiting = false;
> PGSemaphoreUnlock(&waiter->sem);
> }
> ..
> }
> 
> Why can't we decrement the nwaiters after waking up? I don't think
> there is any major problem even if callers do that themselves, but
> in some rare cases LWLockRelease() might spuriously assume that
> there are some waiters and tries to call LWLockWakeup().  Although
> this doesn't create any problem, keeping the value sane is good unless
> there is some problem in doing so.

That won't work because then LWLockWakeup() wouldn't be called when
necessary - precisely because nwaiters is 0.

> 6.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> {
> ..
> if (wokeup_somebody && waiter->lwWaitMode == LW_EXCLUSIVE)
> continue;
> ..
> if (waiter->lwWaitMode != LW_WAIT_UNTIL_FREE)
> {
> ..
> wokeup_somebody = true;
> }
> ..
> }
> ..
> }
> 
> a.
> IIUC above logic, if the waiter queue is as follows:
> (S-Shared; X-Exclusive) S X S S S X S S
> 
> it can skip the exclusive waiters and release shared waiter.
> 
> If my understanding is right, then I think instead of continue, there
> should be *break* in above logic.

No, it looks correct to me. What happened is that the first S was woken
up. So there's no point in waking up an exclusive locker, but further
non-exclusive lockers can be woken up.

> b.
> Consider below sequence of waiters:
> (S-Shared; X-Exclusive) S S X S S
> 
> I think as per un-patched code, it will wakeup waiters uptill (including)
> first Exclusive, but patch will wake up uptill (*excluding*) first
> Exclusive.

I don't think the current code does that. And it'd be a pretty pointless
behaviour, leading to useless increased contention. The only time it'd
make sense for X to be woken up is when it gets run faster than the S
processes.

> 7.
> LWLockWakeup()
> {
> ..
> dlist_foreach_modify(iter, (dlist_head *) &l->waiters)
> {
> ..
> dlist_delete(&waiter->lwWaitLink);
> dlist_push_tail(&wakeup, &waiter->lwWaitLink);
> ..
> }
> ..
> }
> 
> Use of dlist has simplified the code, but I think there might be a slight
> overhead of maintaining wakeup queue as compare to un-patched
> mechanism especially when there is a long waiter queue.

I don't see that as being relevant. The difference is an instruction or
two - in the slow path we'll enter the kernel and sleep. This doesn't
matter in comparison.
And the code is *so* much more readable.

> 8.
> LWLockConditionalAcquire()
> {
> ..
> /*
>  * We ran into an exclusive lock and might have blocked another
>  * exclusive lock from taking a shot because it took a time to back
>  * off. Retry till we are either sure we didn't block somebody (because
>  * somebody else certainly has the lock) or till we got it.
>  *
>  * We cannot rely on the two-step lock-acquisition protocol as in
>  * LWLockAcquire because we're not using it.
>  */
> if (potentially_spurious)
> {
> SPIN_DELAY();
> goto retry;
> }
> ..
> }
> 
> Due to above logic, I think it can keep on retrying for long time before
> it actually concludes whether it got lock or not incase other backend/'s
> takes Exclusive lock after *double_check* and release before
> unconditional increment of  shared lock in function LWLockAttemptLock.
> I understand that it might be difficult to have such a practical scenario,
> however still there is a theoratical possibility of same.

I'm not particularly concerned. We could optimize it a bit, but I really
don't think it's necessary.

> Is there any advantage of retrying in LWLockConditionalAcquire()?

It's required for correctness. We only retry if we potentially blocked
an exclusive acquirer (by spuriously incrementing/decrementing lockcount
with 1). We

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback


On 10/08/2014 11:47 AM, furu...@pm.nttdata.co.jp wrote:

On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote:

What set of options would you pass if you want to use it as a
synchronous standby? And if you don't? Could we just have a single

"--synchronous"

flag, instead of -F and --reply-fsync?


If you set "synchronous_commit" as "remote_write", the options would

be different .

The set of options in each case, see the following.


   Synchronous standby(synchronous_commit=on)
   --fsync-interval=-1
   --reply-fsync
   --slot=slotname

   Synchronous standby(synchronous_commit=remote_write)
   --fsync-interval=-1
   --reply-fsync

   Asynchronous
   There are no relative options.


Well, if the response time delay(value of

"--status-interval=interval") is acceptable, "--reply-fsync" is
unnecessary.

Instead of "--reply-fsync", using "--synchronous"(which summarizes

the "--reply-fsync" and "fsync-interval = -1") might be easy to
understand. Although, in that case,  "--fsync-interval=interval" would
be fixed value. Isn't there any problem ?

I think we should remove --fsync-interval and --reply-fsync, and just
have a --synchronous option, which would imply the same behavior you get
with --fsync-interval=-1 --reply--fsync.

That leaves the question of whether pg_receivexlog should do fsyncs when
it's not acting as a synchronous standby. There isn't any real need to
do so. In asynchronous mode, there are no guarantees anyway, and even
without an fsync, the OS will eventually flush the data to disk.

But we could do even better than that. It would be best if you didn't
need even the --synchronous flag. The server knows whether the client
is a synchronous standby or not. It could tell the client.


If we remove --fsync-interval, resoponse time to user will not be delay.
Although, fsync will be executed multiple times in a short period.
And there is no way to solve the problem without --fsync-interval, what should 
we do about it?


I'm sorry, I didn't understand that.


In asynchronous mode, I think there is no problem since the specification is 
same with release version.


The server knows whether the client is a synchronous standby or not.
It could tell the client.


When notifying the synchronous/asynchronous mode to the client from the server,
do we need to change the format of the Message ?


Yeah. Or rather, add a new message type, to indicate the 
synchronous/asynchronous status.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to support SEMI and ANTI join removal

On 2014-10-09 00:21:44 +1300, David Rowley wrote:
> Ok, so I've been hacking away at this for a couple of evenings and I think
> I have a working prototype finally!

Cool!

> So it seems it's not quite as efficient as join removal at planning time,
> but still a big win when it's possible to perform the join skipping.

Have you checked where the overhead is? Is it really just the additional
node that the tuples are passed through?

Have you measured the overhead of the plan/execution time checks over
master?

> One thing that we've lost out of this execution time join removal checks
> method is the ability to still remove joins where the join column is
> NULLable. The previous patch added IS NOT NULL to ensure the query was
> equivalent when a join was removed, of course this meant that any
> subsequent joins may later not have been removed due to the IS NOT NULL
> quals existing (which could restrict the rows and remove the ability that
> the foreign key could guarantee the existence of exactly 1 row matching the
> join condition). I've not yet come up with a nice way to reimplement these
> null checks at execution time, so I've thought that perhaps I won't bother,
> at least not for this patch. I'd just disable join skipping at planning
> time if any of the join columns can have NULLs.

Sounds fair enough for the first round.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-10-08 Thread David Rowley

On Tue, Oct 7, 2014 at 3:46 AM, Robert Haas  wrote:

> On Mon, Oct 6, 2014 at 5:57 AM, David Rowley  wrote:
> > Can anyone shed any light on how I might determine where the scan rel is
> in
> > the tree? I need to find it so I can check if the RangeTblEntry is
> marked as
> > skip-able.
>
> I think you're probably going to need logic that knows about
> particular node types and does the right thing for each one.  I think
> - but maybe I'm misunderstanding - what you're looking for is a
> function of the form Oid ScansOnePlainTableWithoutQuals().  The
> algorithm could be something like:
>
> switch (node type)
> {
> case T_SeqScanState:
> if (no quals)
> return the_appropriate_oid;
> return false;
> case T_HashJoin:
> decide whether we can ignore one side of the join
> fish out the node from the other side of the join (including
> reaching through the Hash node if necessary)
> return ScansOnePlainTableWithoutQuals(the node we fished out);
> ...other specific cases...
> default:
> return false;
> }
>

Thanks Robert.

Ok, so I've been hacking away at this for a couple of evenings and I think
I have a working prototype finally!
My earlier thoughts about having to descend down until I find a seqscan
were wrong. It looks like just need to look at the next node down, if it's
a seqscan and it's marked as not needed, then we can skip that side of the
join, or if the child node is a HashJoinState then check the skip status of
that node, if both sides are marked as not needed, then skip that side of
the join.

I've just completed some simple performance tests:

create table t3 (id int primary key);
create table t2 (id int primary key, t3_id int not null references t3);
create table t1 (id int primary key, t2_id int not null references t2);

I've loaded these tables with 4 million rows each.The performance is as
follows:

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join
t3 on t2.t3_id=t3.id;
  count
-
 400
(1 row)
Time: 1022.492 ms

test=# select count(*) from t1;
  count
-
 400
(1 row)
Time: 693.642 ms

test=# alter table t2 drop constraint t2_t3_id_fkey;
ALTER TABLE
Time: 2.141 ms
test=# alter table t1 drop constraint t1_t2_id_fkey;
ALTER TABLE
Time: 1.678 ms
test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id inner join
t3 on t2.t3_id=t3.id;
  count
-
 400
(1 row)
Time: 11682.525 ms

So it seems it's not quite as efficient as join removal at planning time,
but still a big win when it's possible to perform the join skipping.

As yet, I've only added support for hash joins, and I've not really looked
into detail on what's needed for nested loop joins or merge joins.

For hash join I just added some code into the case HJ_BUILD_HASHTABLE: in
ExecHashJoin(). The code just checks if any side can be skipped, if they
can then the node will never move out of the HJ_BUILD_HASHTABLE state, so
that each time ExecHashJoin() is called, it'll just return the next tuple
from the non-skip side of the join until we run out of tuples... Or if both
sides can be skipped NULL is returned as any nodes above this shouldn't
attempt to scan, perhaps this should just be an Assert() as I don't think
any parent nodes should ever bother executing that node if it's not
required. The fact that I've put this code into the switch
under HJ_BUILD_HASHTABLE makes me think it should add about close to zero
overhead for when the join cannot be skipped.

One thing that we've lost out of this execution time join removal checks
method is the ability to still remove joins where the join column is
NULLable. The previous patch added IS NOT NULL to ensure the query was
equivalent when a join was removed, of course this meant that any
subsequent joins may later not have been removed due to the IS NOT NULL
quals existing (which could restrict the rows and remove the ability that
the foreign key could guarantee the existence of exactly 1 row matching the
join condition). I've not yet come up with a nice way to reimplement these
null checks at execution time, so I've thought that perhaps I won't bother,
at least not for this patch. I'd just disable join skipping at planning
time if any of the join columns can have NULLs.

Anyway this is just a progress report, and also to say thanks to Andres,
you might just have saved this patch with the execution time checking idea.

I'll post a patch soon, hopefully once I have merge and nestloop join types
working.

Regards

David Rowley

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Wed, Oct 8, 2014 at 12:28 PM, Peter Geoghegan  wrote:
> On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp  wrote:
>> I think there's a subtle difference in expectations too. The current
>> BEFORE INSERT trigger behavior is somewhat defensible with an
>> INSERT-driven syntax (though I don't like it even now [1]).
>
> There is no way around it. We need to fire before row triggers to know
> what to insert on the one hand, but on the other hand (in general) we
> have zero ability to nullify the effects (or side-effects) of before
> triggers, since they may execute arbitrary user-defined code.

With my proposal this problem disappears: if we prevent BEFORE
triggers from changing key attributes of NEW in the case of upsert,
then we can acquire value locks before firing any triggers (before
even constructing the whole tuple), and have a guarantee that the
value locks are still valid by the time we proceed with the actual
insert/update.

Other than changing NEW, the side effects of triggers are not relevant.

Now, there may very well be reasons why this is tricky to implement,
but I haven't heard any. Can you see any concrete reasons why this
won't work? I can take a shot at implementing this, if you're willing
to consider it.

> I think
> there is a good case to be made for severely restricting what before
> row triggers can do, but it's too late for that.

There are no users of new "upsert" syntax out there yet, so it's not
too late to rehash the semantics of that. This in no way affects users
of old INSERT/UPDATE syntax.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Anssi Kääriäinen

On Wed, 2014-10-08 at 01:10 -0700, Peter Geoghegan wrote:
> On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen
>  wrote:
> > The MySQL documentation says that "you should try to avoid using an ON
> > DUPLICATE KEY UPDATE clause on tables with multiple unique indexes"[1].
> > The proposed feature's documentation has the same suggestion[2]. Still,
> > the feature defaults to this behavior. Why is the default something the
> > documentation says you shouldn't do?

> As we all know, naming a unique index in DML is ugly, and has poor
> support in ORMs. It seems likely that we're better off making it
> optional - it hasn't been much of a problem with the existing subxact
> looping pattern.

The subxact approach is a bit different than the proposed UPSERT
command. It loops:

   try:
   INSERT INTO author VALUE('Jack', 't...@example.com', 34)
   except UniqueConstraintViolation:
   UPDATE author SET ... WHERE name = 'Jack'

while the UPSERT command does something like:

   try:
   INSERT INTO author VALUE('Jack', 't...@example.com', 34)
   except UniqueConstaintViolation:
   UPDATE author SET ... WHERE name = 'Jack' OR email = 't...@example.com' 
LIMIT 1;

 - Anssi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Anssi Kääriäinen

On Wed, 2014-10-08 at 02:22 -0700, Peter Geoghegan wrote:
> On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas
>  wrote:
> > Instead of naming the index, you should name the columns, and the system can
> > look up the index or indexes that match those columns.
> 
> It's not totally clear that we need *any* WITHIN clause, BTW. I'm not
> dead set on it. It was something I mainly added at Kevin's request. I
> do see the risks, though.

To be usable in Django ORM's .save() method there must be an option to
use the primary key index, and only the primary key index for conflict
resolution.

Assume an author table with id SERIAL PRIMARY KEY, email TEXT UNIQUE,
age INTEGER, then when saving an object, Django must update the row with
matching primary key value, or otherwise do an insert. Doing an update
of matching email column isn't allowed. So, Django can't use the query:

   INSERT INTO author values(1, 't...@example.com', 35)
   ON CONFLICT UPDATE
   SET email = conflicting(email), age = conflicting(age);

If the table contains data (id=2, email='t...@example.com', age=34), the
query would update tom's age to 35 when it should have resulted in
unique constraint violation.

Other ORMs have similar save/persist implementations, that is objects
are persisted on primary key identity alone.

 - Anssi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

On 10/08/2014 10:44 AM, Michael Paquier wrote:

On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <
j...@dalibo.com> wrote:

We kept the WAL files and log files for further analysis. How can we help
regarding this issue?

Commit c2f79ba has added as assumption that the WAL receiver should always
enforce the create of .done files when WAL files are done being streamed
(XLogWalRcvWrite and WalReceiverMain) or archived
(KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
changed a bit RemoveOldXlogFiles, removing a check looking if the node is
in recovery. Now, based on the information given here yes it happens that
there are still cases where .done file creation is not correctly done,
leading to those extra files. Even by looking at the code, I am not
directly seeing any code paths where an extra call to XLogArchiveForceDone
would be needed on the WAL receiver side but... Something like the patch
attached (which is clearly a band-aid) may help though as it would make
files to be removed even if they are not marked as .done for a node in
recovery. And this is consistent with the pre-1bd42cd.

There are two mysteries here:

1. Where do the FF files come from? In 9.2, FF-segments are not supposed
to created, ever.

Since this only happens with streaming replication, the FF segments are
probably being created by walreceiver. XLogWalRcvWrite is the function
that opens the file. I don't see anything obviously wrong there.
XLogWalRcvWrite opens the file corresponding the start position in the
message received from the master. There is no check that the start
position is valid, though; if the master sends a start position in the
FF segment, walreceiver will merrily write it. So the problem could be
in the walsender side. However, I don't see anything wrong there either.

I think we should add a check in walreceiver, to throw an error if the
master sends an invalid WAL pointer, pointing to an FF segment.

2. Why are the .done files sometimes not being created?

I may have an explanation for that. Walreceiver creates a .done file
when it closes an old segment and opens a new one. However, it does this
only when it's about to start writing to the new segment, and still has
the old segment open. If you stream the FE segment fully, but drop
replication connection at exactly that point, the .done file is not
created. That might sound unlikely, but it's actually pretty easy to
trigger. Just do "select pg_switch_xlog()" in the master, followed by
"pg_ctl stop -m i" and a restart.

The creation of the .done files seems quite unreliable anyway. If only a
portion of a segment is streamed, we don't write a .done file for it, so
we still have the original problem that we will try to archive the
segment after failover, even though the master might already have
archived it.

I looked again at the thread where this was discussed:
http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=a+zo+avfbvhlgoypo9g_qdkbabexgxbvgd...@mail.gmail.com.
I believe the idea was that the server that generates a WAL segment is
always responsible for archiving it. A standby should never attempt to
archive a WAL segment that was restored from the archive, or streamed
from the master.

In that thread, it was not discussed what should happen to WAL files
that an admin manually copies into pg_xlog of the standby. Should the
standby archive them? I don't think so - the admin should copy them
manually to the archive too, if he wants them archived. It's a good and
simple rule that the server that generates the WAL, archives the WAL.

Instead of creating any .done files during recovery, we could scan
pg_xlog at promotion, and create a .done file for every WAL segment
that's present at that point. That would be more robust. And then apply
your patch, to recycle old segments during archive recovery, ignoring
.done files.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Tue, Oct 7, 2014 at 2:27 PM, Marti Raudsepp  wrote:
> but the new approach seems
> surprising: changes from BEFORE INSERT get persisted in the database,
> but AFTER INSERT is not fired.

I am sorry, I realize now that I misunderstood the current proposed
trigger behavior, I thought what Simon Riggs wrote here already
happens:
https://groups.google.com/forum/#!msg/django-developers/hdzkoLYVjBY/bnXyBVqx95EJ

But the point still stands: firing INSERT triggers when the UPDATE
path is taken is counterintuitive.  If we prevent changes of upsert
key columns in BEFORE triggers then we get the benefits, including
more straightforward trigger behavior and avoid problems with serial
columns.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp  wrote:
> I think there's a subtle difference in expectations too. The current
> BEFORE INSERT trigger behavior is somewhat defensible with an
> INSERT-driven syntax (though I don't like it even now [1]).

There is no way around it. We need to fire before row triggers to know
what to insert on the one hand, but on the other hand (in general) we
have zero ability to nullify the effects (or side-effects) of before
triggers, since they may execute arbitrary user-defined code. I think
there is a good case to be made for severely restricting what before
row triggers can do, but it's too late for that.

> But the
> MERGE syntax, to me, strongly implies that insertion doesn't begin
> before determining whether a conflict exists or not.

I think you're right. Another strike against the MERGE syntax, then,
since as I said we cannot even know what to check prior to having
before row insert triggers fire.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

On Wed, Oct 8, 2014 at 1:25 AM, Heikki Linnakangas
 wrote:
> Instead of naming the index, you should name the columns, and the system can
> look up the index or indexes that match those columns.

It's not totally clear that we need *any* WITHIN clause, BTW. I'm not
dead set on it. It was something I mainly added at Kevin's request. I
do see the risks, though.

> (Remind me again, where did this need to name an index come from in the
> first place?)

I agree that naming indexes is ugly, and best avoided. It's tricky,
though. The first thing you might try is to look up the index during
parse analysis and/or planning. That could work in simple cases (which
are probably the vast majority), but I'm worried about stuff like
before row triggers that change the values being inserted out from
under us, in a way that interferes with partial unique indexes. Maybe
you could choose the wrong one if you didn't leave it until the last
minute. But it's not very convenient to leave it until the last
minute.

If you're willing to live with the command conservatively failing when
there isn't a clean specification (although not in a way that can make
the command fail when the user innocently adds a unique index later),
then I think I can do it. Otherwise, it could be surprisingly hard to
cover all the cases non-surprisingly. I freely admit that putting a
unique index in a DML statement is weird, but it is sort of the most
direct way of expressing what we want. Oracle actually have an
INSERT-IGNORE like hint that names a unique index (yes, really - see
the UPSERT wiki page). That's really bizarre, but the point is that
they may have felt that there was no reasonable alternative.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-08 Thread furuyao

> On 10/08/2014 07:23 AM, furu...@pm.nttdata.co.jp wrote:
> >> What set of options would you pass if you want to use it as a
> >> synchronous standby? And if you don't? Could we just have a single
> "--synchronous"
> >> flag, instead of -F and --reply-fsync?
> >
> > If you set "synchronous_commit" as "remote_write", the options would
> be different .
> > The set of options in each case, see the following.
> >
> >
> >   Synchronous standby(synchronous_commit=on)
> >   --fsync-interval=-1
> >   --reply-fsync
> >   --slot=slotname
> >
> >   Synchronous standby(synchronous_commit=remote_write)
> >   --fsync-interval=-1
> >   --reply-fsync
> >
> >   Asynchronous
> >   There are no relative options.
> >
> >
> > Well, if the response time delay(value of
> "--status-interval=interval") is acceptable, "--reply-fsync" is
> unnecessary.
> > Instead of "--reply-fsync", using "--synchronous"(which summarizes
> the "--reply-fsync" and "fsync-interval = -1") might be easy to
> understand. Although, in that case,  "--fsync-interval=interval" would
> be fixed value. Isn't there any problem ?
> 
> I think we should remove --fsync-interval and --reply-fsync, and just
> have a --synchronous option, which would imply the same behavior you get
> with --fsync-interval=-1 --reply--fsync.
> 
> That leaves the question of whether pg_receivexlog should do fsyncs when
> it's not acting as a synchronous standby. There isn't any real need to
> do so. In asynchronous mode, there are no guarantees anyway, and even
> without an fsync, the OS will eventually flush the data to disk.
> 
> But we could do even better than that. It would be best if you didn't
> need even the --synchronous flag. The server knows whether the client
> is a synchronous standby or not. It could tell the client.

Thanks for the reply.

If we remove --fsync-interval, resoponse time to user will not be delay.
Although, fsync will be executed multiple times in a short period.
And there is no way to solve the problem without --fsync-interval, what should 
we do about it?

In asynchronous mode, I think there is no problem since the specification is 
same with release version.

> The server knows whether the client is a synchronous standby or not. 
> It could tell the client.

When notifying the synchronous/asynchronous mode to the client from the server,
do we need to change the format of the Message ?

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Context lenses to set/get values in json values.

2014-10-08 Thread Paweł Cesar Sanjuan Szklarz

Hello.

I am interested in the json type on postgresql. I would like to implement
additional operations on the json structure that may extract/insert table
like information from the json tree structure.
I have a implementation on javascript that shows this type of operations.
You can see examples in this page
https://github.com/paweld2/eelnss/wiki

Following the examples in the previous page, it may by possible to
implement a function similar to json_populate_record to extract multiple
records from a single json value, for example:
select * from json_populate_records_with_clen(null::myrowtype_users,
'app.users.{:uID}.(email,data.name,isActive)', '... nested json value ...')

may return
uID  |  email  | name  | isActive
--
"u1" | "ad...@pmsoft.eu"| "administrator" | true
"u2" | "nor...@pmsoft.eu"   | "user"   | true
"u3" | "testu...@pmsoft.eu" | "testUser"| false


Also, assuming that we have a table User as above (uID, email, name,
isActive), with context lenses it is very simple to map the table to a json
object. I assume that a similar api to table_to_xml,query_to_xml may be
provided:

table_to_json( Person, 'app.users.{:uID}.(email,data.name,isActive)');
query_to_json( 'select * from Person where ... ', 'app.users.{:uID}.(email,
data.name,isActive)');


I don't know the details about the integration of functions/operators to
sql queries, but because context lenses maps between tables and tree
objects, it may be possible to use a column json value as a separate table
in the queries. Assume the table
create table Person {
  pID  Integer
  address Json
}
then it  may be possible to query:
select * from Person as P left join ( select * from
json_populate_records_with_clen(null::addressType, 'addres.(street.number,
street.local,city.code,city.name)', P.address);

A final api for such functions needs to be defined. If such functions may
be usefull, I can try to prepare a implementation in postgres base code.

Regards.
Pawel Cesar Sanjuan Szklarz.

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

On Wed, Oct 8, 2014 at 3:47 AM, Peter Geoghegan  wrote:
> It seems like what you're talking about here is just changing the
> spelling of what I already have.

I think there's a subtle difference in expectations too. The current
BEFORE INSERT trigger behavior is somewhat defensible with an
INSERT-driven syntax (though I don't like it even now [1]). But the
MERGE syntax, to me, strongly implies that insertion doesn't begin
before determining whether a conflict exists or not.

[1] 
http://www.postgresql.org/message-id/CABRT9RD6zriK+t6mnqQOqaozZ5z1bUaKh+kNY=o9zqbzfoa...@mail.gmail.com

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT


On 10/08/2014 11:10 AM, Peter Geoghegan wrote:

The reasoning behind making the unique index specification optional is:

We cannot easily cover corner cases with another syntax - unique
indexes must be named directly to cover every case, and make the
user's intent absolutely clear. That's not obviously the case, but
consider partial unique indexes, for example. Or consider uniquely
constrained columns, with an almost equivalent uniquely constrained
expression on those same columns. On the one hand I am not comfortable
failing to support those, but on the other hand it could get very
messy to do it another way.

As we all know, naming a unique index in DML is ugly, and has poor
support in ORMs.


I vehemently object to naming indexes in the UPSERT statement. That 
mixes logical and physical database design, which is a bad idea. This is 
not ISAM.


Instead of naming the index, you should name the columns, and the system 
can look up the index or indexes that match those columns.


(Remind me again, where did this need to name an index come from in the 
first place?)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

On Wed, Oct 8, 2014 at 12:41 AM, Anssi Kääriäinen
 wrote:
> The MySQL documentation says that "you should try to avoid using an ON
> DUPLICATE KEY UPDATE clause on tables with multiple unique indexes"[1].
> The proposed feature's documentation has the same suggestion[2]. Still,
> the feature defaults to this behavior. Why is the default something the
> documentation says you shouldn't do?

MySQL started saying that when they realized it broke their
statement-based replication. They have their own weird reasons for
disliking it. Now, that's not to minimize the risks.

The reasoning behind making the unique index specification optional is:

We cannot easily cover corner cases with another syntax - unique
indexes must be named directly to cover every case, and make the
user's intent absolutely clear. That's not obviously the case, but
consider partial unique indexes, for example. Or consider uniquely
constrained columns, with an almost equivalent uniquely constrained
expression on those same columns. On the one hand I am not comfortable
failing to support those, but on the other hand it could get very
messy to do it another way.

As we all know, naming a unique index in DML is ugly, and has poor
support in ORMs. It seems likely that we're better off making it
optional - it hasn't been much of a problem with the existing subxact
looping pattern. A lot of the time, there will only be a single unique
index anyway, or there will be a trivial serial PK unique index and
the unique index we always want to treat would-be conflicts within as
triggering the alternative UPDATE/IGNORE path.

> Going a bit further, I am wondering what is the use case of doing an
> UPSERT against multiple unique indexes? If multiple unique indexes
> UPSERT could be dropped that might allow for faster or cleaner index
> locking techniques.

I see no reason to think that. I don't think it buys us much at all.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <
j...@dalibo.com> wrote:

> We kept the WAL files and log files for further analysis. How can we help
> regarding this issue?
>

Commit c2f79ba has added as assumption that the WAL receiver should always
enforce the create of .done files when WAL files are done being streamed
(XLogWalRcvWrite and WalReceiverMain) or archived
(KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
changed a bit RemoveOldXlogFiles, removing a check looking if the node is
in recovery. Now, based on the information given here yes it happens that
there are still cases where .done file creation is not correctly done,
leading to those extra files. Even by looking at the code, I am not
directly seeing any code paths where an extra call to XLogArchiveForceDone
would be needed on the WAL receiver side but... Something like the patch
attached (which is clearly a band-aid) may help though as it would make
files to be removed even if they are not marked as .done for a node in
recovery. And this is consistent with the pre-1bd42cd.

Comments welcome.
-- 
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5a4dbb9..39701a3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3771,7 +3771,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
 			strspn(xlde->d_name, "0123456789ABCDEF") == 24 &&
 			strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
 		{
-			if (XLogArchiveCheckDone(xlde->d_name))
+			if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
 			{
 snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Promise index tuples for UPSERT

2014-10-08 Thread Anssi Kääriäinen

On Tue, 2014-10-07 at 13:33 +0100, Simon Riggs wrote:
> Is there a way of detecting that we are updating a unique constraint
> column and then applying the HW locking only in that case? Or can we
> only apply locking when we have multiple unique constraints on a
> table?

What is the use case of doing an UPSERT into a table with multiple
unique constraints?

Consider table user with unique columns name and email and a non-unique
column age. If it has data

Jack | j...@example.com |33 
Tom | t...@example.com | 35

And the user does UPSERT values (Jack, t...@example.com, 34). The
proposed specification will pick random unique index (either name or
email index) and do an update of that row.

First, this will cause unique violation, second, doing the UPSERT on
random index seems confusing.

The MySQL documentation says that "you should try to avoid using an ON
DUPLICATE KEY UPDATE clause on tables with multiple unique indexes"[1].
The proposed feature's documentation has the same suggestion[2]. Still,
the feature defaults to this behavior. Why is the default something the
documentation says you shouldn't do?

Going a bit further, I am wondering what is the use case of doing an
UPSERT against multiple unique indexes? If multiple unique indexes
UPSERT could be dropped that might allow for faster or cleaner index
locking techniques.

 - Anssi

1: http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html
2: 
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog always handles -d option argument as connstr

2014-10-08 Thread Sawada Masahiko

Amit Kapila

> On Tue, Oct 7, 2014 at 8:13 PM, Sawada Masahiko  > wrote:
> >
> > On Tue, Oct 7, 2014 at 12:58 PM, Amit Kapila  > wrote:
> > > On Mon, Oct 6, 2014 at 10:23 PM, Sawada Masahiko <
> sawada.m...@gmail.com
> >
> > > wrote:
> > >>
> > >> Hi all,
> > >>
> > >> pg_receivexlog always handles argument of -d option as  connstr
> formatted
> > >> value.
> > >> We can doubly specify host name, port number.
> > >> The other client tools handles -d option as connstr value only if
> > >> argument has "=" character.
> > >
> > > pg_basebackup also seems to behave same as pg_receivexlog.
> > > psql also treats it in similar way.  The behaviour of psql is as
> > > below:
> > > psql.exe -d="host=localhost port=5432 dbname=postgres"
> > > psql: invalid connection option ""
> > >
> > > psql.exe -d "host=localhost port=5432 dbname=postgres"
> > > psql (9.5devel)
> > > WARNING: Console code page (437) differs from Windows code page (1252)
> > >  8-bit characters might not work correctly. See psql reference
> > >  page "Notes for Windows users" for details.
> > > Type "help" for help.
> > >
> > > postgres=#
> > >
> > >> The document says that pg_receivexlog ignores database name, and this
> > >> option is called for consistency with other client applications.
> > >> But if we specify database name like other client tool '-d hoge' ,
> > >> then we will definitely got error.
> > >
> > > What I understand from document is that it ignores database name
> > > when given in connection string.
> > >
> >
> > Yep, but we can use -d option like 'psql -d postgres'.
> > pg_receivexlog and pg_basebackup doesn't seem to work like that.
> > they always handles argument as connstr formatted value.
>
> psql needs to connect to particular database for its functioning where
> as pg_basebackup/pg_receivexlog works differently and doesn't
> need to connect to particular database for its functioning, so I am not
> sure why you want them to behave similarly for -d switch.
> Moreover the same is clarified in docs, so whats the issue?
>
>
Yes, I got your point.

I thought that it's a little confusion that,
Although the document says -d option of pg_receivexlog is called for
consistency,
Behavior of psql(other tool) and pg_receivexlog aren't exactly same.

If user use same -d option like '-d hoge_db' on both tool pg_receivexlog and
psql for easir setting, then it leads to occur error on pg_receivexlog side.
I thought pg_receivexlog should ignore database name even thought -d option
value is not connsrr formatted value, same as psql dose.
But there may be not any such case.

Regards,

Sawada Masahiko


-- 
Regards,

---
Sawada Masahiko

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback