date:20150817

Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Jeff Janes

On Mon, Aug 17, 2015 at 6:23 AM, Jeff Janes jeff.ja...@gmail.com wrote:


 On Aug 16, 2015 11:49 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 
  On 08/16/2015 12:58 AM, Jeff Janes wrote:
 
  When ginbulkdelete gets called for the first time in a  VACUUM(i.e.
 stats
  == NULL), one of the first things it does is call ginInsertCleanup to
 get
  rid of the pending list.  It does this in lieu of vacuuming the pending
  list.
 
  This is important because if there are any dead tids still in the
 Pending
  list, someone else could come along during the vacuum and post the dead
  tids into a part of the index that VACUUM has already passed over.
 
  The potential bug is that ginInsertCleanup exits early (ginfast.c lines
  796, 860, 898) if it detects that someone else is cleaning up the
 pending
  list, without waiting for that someone else to finish the job.
 
  Isn't this a problem?
 
 
  Yep, I think you're right. When that code runs as part of VACUUM, it
 should not give up like that.
 
  Hmm, I see other race conditions in that code too. Even if VACUUM wins
 the race you spotted, and performs all the insertions, reaches the end of
 the pending items list, and deletes the pending list pages, it's possible
 that another backend started earlier, and is still processing the same
 items from the pending items list. It will add them to the tree, and after
 it's finished with that it will see that the pending list page was already
 deleted, and bail out. But if there is a dead tuple in the pending items
 list, you have trouble. The other backend will re-insert it, and that might
 happen after VACUUM had already removed it from the tree.

 Could the right to clean the pending list be represented by a
 self-conflicting heavy weight lock on the index?  Vacuum could block on it,
 while user back-ends could try to get it conditionally and just give up on
 the cleanup if it is not available.

 
  Also, ginInsertCleanup() seems to assume that if another backend has
 just finished cleaning up the same page, it will see the page marked as
 deleted. But what if the page is not only marked as deleted, but also
 reused for something else already?

 Yeah.  Which is possible but pretty unlikely now; but would be far more
 likely if we added these page to the fsm more aggressively.


The attached patch takes a ShareUpdateExclusiveLock lock on the index in
order to clean the pending list.

This fixes the problem I had been seeing when testing
https://commitfest.postgresql.org/6/322/ (which was probably caused by the
deleted page situation you mentioned, not by tids getting revived issue I
originally brought up.)

User backends attempt to take the lock conditionally, because otherwise
they would cause an autovacuum already holding the lock to cancel itself,
which seems quite bad.

Not that this a substantial behavior change, in that with this code the
user backends which find the list already being cleaned will just add to
the end of the pending list and go about their business.  So if things are
added to the list faster than they can be cleaned up, the size of the
pending list can increase without bound.

Under the existing code each concurrent user backend will try to clean the
pending list at the same time.  The work doesn't parallelize, so doing this
is just burns CPU (and possibly consuming up to maintenance_work_mem  for
*each* backend) but it does server to throttle the insertion rate and so
keep the list from growing without bound.

This is just a proof-of-concept patch, because I don't know if this
approach is the right approach.

One potential problem is how it will interact with create index
concurrently.

Cheers,

Jeff


gin_pending_lock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More WITH

2015-08-17 Thread Robert Haas

On Mon, Aug 17, 2015 at 1:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan p...@heroku.com writes:
 On Mon, Aug 17, 2015 at 10:22 AM, Josh Berkus j...@agliodbs.com wrote:
 Would be tricky.  We don't currently have any way to wrap an EXPLAIN in
 any larger statement, do we?  Would be very useful for automated query
 analysis, though.

 No. In the grammar, ExplainStmt expects the EXPLAIN to be at the
 top-level. Having it work any other way would require significant
 refactoring.

 You can use EXPLAIN as the source of rows in a plpgsql FOR-over-query
 loop, so there's a workaround available that way when you need to read
 EXPLAIN output programmatically.  I'm not convinced there's sufficient
 value in trying to make EXPLAIN a full-fledged subquery otherwise.

I think a lot of people would find that handy - I would - but I don't
know how hard it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error message with plpgsql CONTINUE

2015-08-17 Thread Tom Lane

Jim Nasby jim.na...@bluetreble.com writes:
 On 8/17/15 9:48 AM, Tom Lane wrote:
 I'm inclined to think that if we wanted to make this better, the way to
 improve it would be to detect the error*at compile time*, and get rid of
 this hack in plpgsql_exec_function altogether.

 So split PLPGSQL_NSTYPE_LABEL into PLPGSQL_NSTYPE_BLOCK_LABEL and 
 PLPGSQL_NSTYPE_LOOP_LABEL, and split opt_block_label and opt_label the 
 same way?

I think using two NSTYPE codes would probably be a pain because there are
numerous places that don't care about the distinction; it'd be better to
have a secondary attribute distinguishing these cases.  (It looks like you
could perhaps reuse the itemno field for the purpose, since that seems
to be going unused in LABEL items.)

You likely do need to split opt_block_label into two productions, since
that will be the easiest way to pass forward the knowledge of whether
it's being called from a loop or non-loop construct.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-17 Thread Victor Wagner

Rationale
=

Since introduction of the WAL-based replication into the PostgreSQL, it is
possible to create high-availability and load-balancing clusters.

However, there is no support for failover in the client libraries. So, only
way to provide transparent for client application failover is IP address
migration. This approach has some limitation, i.e. it requires that
master and backup servers reside in the same subnet or may not be
feasible for other reasons.

Commercial RDBMS, such as Oracle, employ more flexible approach. They
allow to specify multiple servers in the connect string, so if primary
server is not available, client library tries to connect to other ones.

This approach allows to use geographically distributed failover clusters
and also is a cheap way to implement load-balancing (which is not
possible with IP address migration).

Proposed change
===

Allow to specify multiple hosts in the libpq connect string. Make libpq
attempt to connect to all host simultaneously or in random order 
and use of the server which successfully establishes connection first.


Syntax
--


Libpq connect string can be either set of the keyword=value pairs
or an URL. 

In the first form it can be just allowed to specify keyword host
multiple times.

host=main-server host=standby1 host=standby2 port=5432 dbname=database

In the second form host can be specified either in the first part of URL
or in the query parameters.

postgresql://user@host/database

postgresql:///database?host=hostnameuser=username

If host is specified as a parameter, it is also possible to allow
multiple host parameters without breaking existing syntax.

postgresql:///database?host=main-serverhost=standby1host=standby2

In order to implement load-balancing clusters, additional parameters
should be added readonly=boolean and loadbalancing=boolean

Support for this syntax extensions is added to the PQconnectdb, 
PQconnectdbParams, PQConnectStart and PQConnectStartParams,
but not PQsetdb/PQsetdblogin functions. 


Behavoir


If PQconnectdb encounters connect string with multiple hosts specified,
it attempts to establish connection with all these hosts simultaneously,
and begins to work with server which responds first, unless
loadbalancing parameter is true.

If the loadbalancing parameter is true, it tries servers sequentially in 
the random order.

If the parameter readonly is false, after authenticating with server it
executes show transaction_read_only, to find out whether current
connection is to the master or to the hot standby, and connection is
considered successful only if server allows read write transactions.

This allows to have clients which write to the database and clients
which perform read-only access. Read-only clients would be load-balanced
between the master and slave servers, and read-write clients connect only to
the master (whichever server has this role at the moment of connection).

Information of the alternate servers should be stored in the PGconn structure.

Function PQreset should be able to take advantage of new syntax and
possibly open connection to the new master, if failover occurred
during lifetime of the connection.

Possible drawbacks
==

Compatibility
-

Proposed patch requires no modifications to the server or protocol, and 
modification of synchronous function (PQconnectdb, PQconnectdbParams) 
doesn't introduce incompatible changes to the client library. 

Even if connect string with multiple host would be erroneously used
with version of libpq, which do not support this feature, it is not an
error.  It just use last host specified in the connect string.

There could be some compatibility problems with asynchronous connections
created with PQConnectStart functions. Problem is that we are trying
to establish several connections at once, and there are several sockets
which should be integrated into application event loop.

Even if we would try servers in some particular order (such as randomized
order during load balancing), file descriptor of socket can change during
execution PQConnectPoll, and existing applications are not prepared to it.

Performance impact
--

Performance impact seems to be negligible.

1. If connect string contain only one host, the only complication is the
maintenance of the data structure, which possible can hold more than
one host name. Connection process itself would not be affected.

2. If there is pure high-availability cluster, i.e. standby servers do
not accept client connections on the specified port, there is no extra
load on standby servers, and almost no (only several unsuccessful
connect calls) on client.

3. If there is load balancing cluster, there is no performance impacts
for read-only client, but each read-write client causes standby servers
to process extra connection to the point where server can report
read-only state of transaction (i.e. including SSL handshake and

Re: [HACKERS] checkpointer continuous flushing

2015-08-17 Thread Andres Freund

On 2015-08-11 17:15:22 +0200, Fabien COELHO wrote:
 +void
 +PerformFileFlush(FileFlushContext * context)
 +{
 + if (context-ncalls != 0)
 + {
 + int rc;
 +
 +#if defined(HAVE_SYNC_FILE_RANGE)
 +
 + /* Linux: tell the memory manager to move these blocks to io so
 +  * that they are considered for being actually written to disk.
 +  */
 + rc = sync_file_range(context-fd, context-offset, 
 context-nbytes,
 +  SYNC_FILE_RANGE_WRITE);
 +
 +#elif defined(HAVE_POSIX_FADVISE)
 +
 + /* Others: say that data should not be kept in memory...
 +  * This is not exactly what we want to say, because we want to 
 write
 +  * the data for durability but we may need it later 
 nevertheless.
 +  * It seems that Linux would free the memory *if* the data has
 +  * already been written do disk, else the dontneed call is 
 ignored.
 +  * For FreeBSD this may have the desired effect of moving the
 +  * data to the io layer, although the system does not seem to
 +  * take into account the provided offset  size, so it is rather
 +  * rough...
 +  */
 + rc = posix_fadvise(context-fd, context-offset, 
 context-nbytes,
 +POSIX_FADV_DONTNEED);
 +
 +#endif
 +
 + if (rc  0)
 + ereport(ERROR,
 + (errcode_for_file_access(),
 +  errmsg(could not flush block  
 INT64_FORMAT
 +  on  INT64_FORMAT  
 blocks in file \%s\: %m,
 + context-offset / 
 BLCKSZ,
 + context-nbytes / 
 BLCKSZ,
 + context-filename)));
 + }

I'm a bit wary that this might cause significant regressions on
platforms not supporting sync_file_range, but support posix_fadvise()
for workloads that are bigger than shared_buffers. Consider what happens
if the workload does *not* fit into shared_buffers but *does* fit into
the OS's buffer cache. Suddenly reads will go to disk again, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Andres Freund

On 2015-08-16 03:31:48 -0400, Noah Misch wrote:
 As we see from the patches' need to add #include statements to .c files and
 from Robert's report of a broken EDB build, this change creates work for
 maintainers of non-core code, both backend code (modules) and frontend code
 (undocumented, but folks do it).  That's to be expected and doesn't take a
 great deal of justification, but users should get benefits in connection with
 the work.  This brings to mind the htup_details.h introduction, which created
 so much busy work in non-core code.  I don't expect the lock.h split to be
 quite that disruptive.  Statistics on PGXN modules:

 Four modules (imcs, pg_stat_kcache, query_histogram, query_recorder) mentioned
 LWLock without mentioning lwlock.h.  These patches (trivially) break the imcs
 build.  The other three failed to build for unrelated reasons, but I suspect
 these patches give those modules one more thing to update.  What do users get
 out of this?  They'll learn if their code is not portable to pademelon, but
 #warning atomics.h in frontend code is not portable would communicate the
 same without compelling non-core code to care.

I'd love to make it a #warning intead of an error, but unfortunately
that's not standard C :(


 Other than that benefit, making headers #error-on-FRONTEND mostly lets
 us congratulate ourselves for having introduced the start of a header
 layer distinction.  I'd be for that if PostgreSQL were new, but I
 can't justify doing it at the user cost already apparent.  That would
 be bad business.

To me that's basically saying that we'll never ever have any better
separation between frontend/backend headers since each incremental
improvement won't be justifiable.  Adding an explicit include which
exists in all version of postgres is really rather low cost, you don't
even need version dependant define.  The modules you name are, as you
noticed, likely to require minor surgery for new major versions anyway,
being rather low level.

As you say breaking C code over major releases doesn't have to meet a
high barrier, and getting rid of the wart of lock.h being used that
widely seems to be more than suffient to meet that qualification.


If others think this is the right way, ok, I can live with that and
implement it, but I won't agree.

 Therefore, I urge you to instead add the aforementioned #warning and wrap with
 #ifdef FRONTEND the two #include port/atomics.h in headers.  That tightly
 limits build breakage; it can only break frontend code, which is rare outside
 core.  Hackers will surely notice if a patch makes the warning bleat, so
 mistakes won't survive long.


 Non-core frontend code, if willing to cede a shred of portability, can
 experiment with atomics.  Folks could do nifty things with that.

I don't think that's something worth keeping in mind from our side. If
you don't care about portability you can just use C11 atomics or
such.

If somebody actually wants to add FRONTEND support for the fallback code
- possibly falling back to pthread mutexes - ok, but before that...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-17 Thread Andres Freund

Hi Fabien,

On 2015-08-12 22:34:59 +0200, Fabien COELHO wrote:
 sort/flush : tps avg  stddev (percent of time beyond 10.0 tps)
  on   on   : 631 +- 131 (0.1%)
  on   off  : 564 +- 303 (12.0%)
  off  on   : 167 +- 315 (76.8%) # stuck...
  off  off  : 177 +- 305 (71.2%) # ~ current pg

What exactly do you mean with 'stuck'?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Warnings around booleans

2015-08-17 Thread Andres Freund

Hi,

On 2015-08-12 16:46:01 -0400, Stephen Frost wrote:
 From ab44660e9b8fc5b10f84ce6ba99a8340047ac8a5 Mon Sep 17 00:00:00 2001
 From: Stephen Frost sfr...@snowman.net
 Date: Wed, 12 Aug 2015 15:50:54 -0400
 Subject: [PATCH] In AlterRole, make bypassrls an int

 When reworking bypassrls in AlterRole to operate the same way the other
 attribute handling is done, I missed that the variable was incorrectly a
 bool rather than an int.  This meant that on platforms with an unsigned
 char, we could end up with incorrect behavior during ALTER ROLE.

 Pointed out by Andres thanks to tests he did changing our bool to be the
 one from stdbool.h which showed this and a number of other issues.

 Add regression tests to test CREATE/ALTER role for the various role
 attributes.  Arrange to leave roles behind for testing pg_dumpall, but
 none which have the LOGIN attribute.

 In passing, to avoid confusion, rename CreatePolicyStmt's 'cmd' to
 'cmd_name', parse_policy_command's 'cmd' to 'polcmd', and
 AlterPolicy's 'cmd_datum' to 'polcmd_datum', per discussion with Noah
 and as a follow-up to his correction of copynodes/equalnodes handling of
 the CreatePolicyStmt 'cmd' field.

I'd rather see those split into separate commits. Doing polishing and
active bugs in one commit imo isn't a good idea if the polishing goes
beyond a line or two.

Otherwise this looks ok to me.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Tom Lane

Peter Geoghegan p...@heroku.com writes:
 This patch does not add an operator at all, actually. If feels like
 there ought to be an operator, but in fact there is not. The parser is
 hard-coded to recognize array-style subscripts, which this uses.

 While I'm certainly glad that Dmitry took the time to work on this, I
 think we will need an operator, too. Or, more accurately, there should
 probably be a way to make something like this use some available GIN
 index:

 postgres=# explain analyze select * from testjsonb where p['a'] = '[1]';

Hm.  There is definitely attraction to the idea that x[y] is some sort of
operator-like syntactic sugar for invocation of an underlying function,
rather than necessarily a hard-coded behavior.  That would provide a
framework for extending the feature to all container-like datatypes,
whereas the approach Dimitry has prototyped doesn't look like it offers
much help at all for other datatypes.

But I do not think that that will change things very much as far as
making it possible for things like the above to be indexed.  You'd
still have an expression tree that's a nest of two operators, which
doesn't fit into our ideas about index opclasses.  (That is, it still
has to be a special case, so it doesn't matter that much if one of
the nodes is some kind of NotAnArrayRef rather than a FuncExpr with
some new CoercionForm value to make it print differently.)

Also, the syntactic sugar would have to be able to deal with multiple
subscripts, which makes things a lot more complicated.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-08-17 Thread Rahila Syed

In case of vacuum, I think we need to track the number of scanned heap
pages at least, and the information about index scan is the additional
information

Actually the progress of heap pages scan depend on index scans. So complete
VACUUM progress
needs to have a count of index pages scanned too. So, progress can be
calculated by measuring index_pages_scanned + heap_pages_scanned
against total_index_pages + total_heap_pages. This can make essential
information.

This can be followed by additional individual phase information.

Following fields  common across different commands can be used to display
progress

Command work done total workpercent complete  message

VACUUM  x y  z
total progress

 uv   w
   phase 1

The command code can be divided into distinct phases and each phase
progress can be represented separately. With a summary of entire command
progress as the first entry. The summary can be the summation of individual
phase entries.

If the phase repeats during command execution the previous entry for the
phase will be replaced.(for ex. index scan in vacuum)

Essential information has one numeric data, which is stored
essentially information regarding of its processing.
We may need more than one numeric data as mentioned above to represent
scanned blocks versus total blocks.

Additional information has two data: text and numeric. These data is
free-style data which is stored by each backend as it like.
If I understand your point correctly, I think you are missing following,
The amount of additional information for each command can be different. We
may need an array of  text and numeric data to represent more additional
information.

Thank you,
Rahila Syed

Re: [HACKERS] replication slot restart_lsn initialization

2015-08-17 Thread Michael Paquier

On Tue, Aug 18, 2015 at 4:46 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-08-17 15:22:44 -0400, Peter Eisentraut wrote:
 On 8/14/15 3:54 AM, Andres Freund wrote:
  I'd name it RESERVE_WAL.

 Why reserve?  Isn't it preserve?

 Hm. I honestly do not know. I was thinking of ticket reservations...

Or retain?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-17 Thread Amit Kapila

On Tue, Aug 18, 2015 at 1:02 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:


 Hello Andres,

 [...] posix_fadvise().


 My current thinking is maybe yes, maybe no:-), as it may depend on the
 OS
 implementation of posix_fadvise, so it may differ between OS.


 As long as fadvise has no 'undirty' option, I don't see how that
 problem goes away. You're telling the OS to throw the buffer away, so
 unless it ignores it that'll have consequences when you read the page
 back in.


 Yep, probably.

 Note that we are talking about checkpoints, which write buffers out
 *but* keep them nevertheless. As the buffer is kept, the OS page is a
 duplicate, and freeing it should not harm, at least immediatly.


This theory could makes sense if we can predict in some way that
the data we are flushing out of OS cache won't be needed soon.
After flush, we can only rely to an extent that data could be found in
shared_buffers if the usage_count is high, other wise it could be
replaced any moment by backend needing the buffer and there is no
free buffer.  Now here one way to think is that if the usage_count is
low, then anyway it's okay to assume that this won't be needed in near
future, however I don't think relying only on usage_count for such a thing
is good idea.

To sum up, I agree that it is indeed possible that flushing with
 posix_fadvise could reduce read OS-memory hits on some systems for some
 workloads, although not on Linux, see below.

 So the option is best kept as off for now, without further data, I'm
 fine with that.


One point to think here is on what basis user can decide make
this option on, is it predictable in any way?
I think one case could be when the data set fits in shared_buffers.

In general, providing an option is a good idea if user can decide with
ease when to use that option or we can give some clear recommendation
for the same otherwise one has to recommend that test your workload
with this option and if it works then great else don't use it which might
also
be okay in some cases, but it is better to be clear.


One minor point, while glancing through the patch, I noticed that couple
of multiline comments are not written in the way which is usually used
in code (Keep the first line as empty).

+/* Status of buffers to checkpoint for a particular tablespace,

+ * used internally in BufferSync.

+ * - space: oid of the tablespace

+ * - num_to_write: number of checkpoint pages counted for this tablespace

+ * - num_written: number of pages actually written out


+/* entry structure for table space to count hashtable,

+ * used internally in BufferSync.

+ */


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [patch] psql tab completion for grant execute

2015-08-17 Thread Michael Paquier

On Tue, Aug 18, 2015 at 6:07 AM, Daniel Verite dan...@manitou-mail.org wrote:
  Hi,

 When tab-completing after GRANT EXECUTE, currently psql injects
 PROCEDURE, rather than the expected ON.

 The code for completing with ON is there, but it's not reached due to
 falling earlier into another branch, one that handles CREATE TRIGGER.

 A trivial patch is attached. It adds the condition that if EXECUTE is
 preceded by GRANT itself preceded by nothing, then that completion
 with PROCEDURE is skipped.

 I've looked at fixing it more directly, by testing if the EXECUTE
 is part of a CREATE TRIGGER, but it didn't seem fitting to go
 looking backwards  that many words into the string (more
 than the 5 words suggested by the rest of the code).

You should consider adding it to the next CF:
https://commitfest.postgresql.org/6/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Alvaro Herrera

Jeff Janes wrote:

 The attached patch takes a ShareUpdateExclusiveLock lock on the index in
 order to clean the pending list.

Does it take a lock on the table also?  Because if not ...

 One potential problem is how it will interact with create index
 concurrently.

... then I don't understand how you could have a problem here.  Surely
no pending list cleanup can happen concurrently with the index being
created?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-17 Thread Fabien COELHO



Hello Andres,


On 2015-08-12 22:34:59 +0200, Fabien COELHO wrote:

sort/flush : tps avg  stddev (percent of time beyond 10.0 tps)
 on   on   : 631 +- 131 (0.1%)
 on   off  : 564 +- 303 (12.0%)
 off  on   : 167 +- 315 (76.8%) # stuck...
 off  off  : 177 +- 305 (71.2%) # ~ current pg


What exactly do you mean with 'stuck'?


I mean that the during the I/O storms induced by the checkpoint pgbench 
sometimes get stuck, i.e. does not report its progression every second (I 
run with -P 1). This occurs when sort is off, either with or without 
flush, for instance an extract from the off/off medium run:


 progress: 573.0 s, 5.0 tps, lat 933.022 ms stddev 83.977
 progress: 574.0 s, 777.1 tps, lat 7.161 ms stddev 37.059
 progress: 575.0 s, 148.9 tps, lat 4.597 ms stddev 10.708
 progress: 814.4 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 815.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 816.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 817.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 818.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 819.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 820.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 821.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 822.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 823.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 824.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 825.0 s, 0.0 tps, lat -nan ms stddev -nan
 progress: 826.0 s, 0.0 tps, lat -nan ms stddev -nan

There is a 239.4 seconds gap in pgbench output. This occurs from time to 
time and may represent a significant part of the run, and I count these 
stuck times as 0 tps. Sometimes pgbench is stuck performance wise but 
manages nevetheless to report a 0.0 tps every second, as above after it 
unstuck.


The actual origin of the issue with a stuck client (pgbench, libpq, OS, 
postgres...) is unclear to me, but the whole system does not behave well 
under an I/O storm anyway, and I have not succeeded in understanding where 
pgbench is stuck when it does not report its progress. I tried some runs 
with gdb but it did not get stuck and reported a lot of 0.0 tps during 
the storms.



Here are a few more figures with the v8 version of the patch, on a host 
with 8 cores, 16 GB, RAID 1 HDD, under Ubuntu precise. I already reported 
the medium case, and the small case turned afterwards.


  small postgresql.conf:
shared_buffers = 2GB
checkpoint_timeout = 300s # this is the default
checkpoint_completion_target = 0.8
# initialization: pgbench -i -s 120

  medium postgresql.conf: ## ALREADY REPORTED
shared_buffers = 4GB
checkpoint_timeout = 15min
checkpoint_completion_target = 0.8
max_wal_size = 4GB
# initialization: pgbench -i -s 250

  warmup pgbench -T 1200 -M prepared -S -j 2 -c 4

  # 400 tps throttled test
  sh pgbench -M prepared -N -P 1 -T 4000 -R 400 -L 100 -j 2 -c 4

  options  / percent of skipped/late transactions
sort/flush /   small  medium
 on   on   :3.52.7
 on   off  :   24.6   16.2
 off  on   :   66.1   68.4
 off  off  :   63.2   68.7

  # 200 tps throttled test
  sh pgbench -M prepared -N -P 1 -T 4000 -R 200 -L 100 -j 2 -c 4

  options  / percent of skipped/late transactions
sort/flush /   small  medium
 on   on   :1.92.7
 on   off  :   14.39.5
 off  on   :   45.6   47.4
 off  off  :   47.4   48.8

  # 100 tps throttled test
  sh pgbench -M prepared -N -P 1 -T 4000 -R 100 -L 100 -j 2 -c 4

  options  / percent of skipped/late transactions
sort/flush /   small  medium
 on   on   :0.91.8
 on   off  :9.37.9
 off  on   :5.0   13.0
 off  off  :   31.2   31.9

  # full speed 1 client
  sh pgbench -M prepared -N -P 1 -T 4000

  options  / tps avg  stddev (percent of time below 10.0 tps)
sort/flush /small  medium
 on   on   : 564 +- 148 ( 0.1%)   631 +- 131 ( 0.1%)
 on   off  : 470 +- 340 (21.7%)   564 +- 303 (12.0%)
 off  on   : 157 +- 296 (66.2%)   167 +- 315 (76.8%)
 off  off  : 154 +- 251 (61.5%)   177 +- 305 (71.2%)

  # full speed 2 threads 4 clients
  sh pgbench -M prepared -N -P 1 -T 4000 -j 2 -c 4

  options  / tps avg  stddev (percent of time below 10.0 tps)
sort/flush /small  medium
 on   on   : 757 +- 417 ( 0.1%)  1058 +- 455 ( 0.1%)
 on   off  : 752 +- 893 (48.4%)  1056 +- 942 (32.8%)
 off  on   : 173 +- 521 (83.0%)   170 +- 500 (88.3%)
 off  off  : 199 +- 512 (82.5%)   209 +- 506 (82.0%)

In all cases, the sort on  flush on provides the best results, with tps 
speedup from 3-5, and overall high responsiveness ( lower latency).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Merlin Moncure

On Wed, Jul 1, 2015 at 11:14 AM, Andres Freund and...@anarazel.de wrote:
 Hi,

 During the 9.5 cycle, and earlier, the topic of increasing our minimum
 bar for compilers came up a bunch of times. Specifically whether we
 still should continue to use C90 as a baseline.

Minor question: is there any value to keeping the client side code to
older standards?  On a previous project compiled libpq on many legacy
architectures because we were using it as frontend to backup software.
The project didn't end up taking off so it's no big deal to me, but
just throwing it out there: libpq has traditionally enjoyed broader
compiler support than the full project (borland, windows back in the
day).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan

2015-08-17 Thread Kouhei Kaigai

Here is one other thing I could learn from TPC-DS benchmark.

The attached query is Q4 of TPC-DS, and its result was towards SF=100.
It took long time to compete (about 30min), please see the attached
EXPLAIN ANALYZE output.

Its core workload is placed on CTE year_total. Its Append node has
underlying three HashAggregate nodes which also has tables join for
each.

Below shows the first HashAggregate node. It consumes 268M rows, then
generates 9M rows. Underlying GpuJoin takes 74sec to process 268M rows,
so we can guess HashAggregate consumed 400sec.

  HashAggregate  (cost=18194948.40..21477516.00 rows=262605408 width=178) 
(actual time=480652.856..488055.918 rows=9142442 loops=1)
Group Key: customer.c_customer_id, customer.c_first_name, 
customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, 
customer.c_login, customer.c_email_address, date_dim.d_year
-  Custom Scan (GpuJoin)  (cost=101342.54..9660272.64 rows=262605408 
width=178) (actual time=2472.174..73908.894 rows=268562375 loops=1)
  Bulkload: On (density: 100.00%)
  Depth 1: Logic: GpuHashJoin, HashKeys: (ss_sold_date_sk), JoinQual: 
(ss_sold_date_sk = d_date_sk), nrows (287997024 - 275041999, 95.50% expected 
95.47%)
  Depth 2: Logic: GpuHashJoin, HashKeys: (ss_customer_sk), JoinQual: 
(ss_customer_sk = c_customer_sk), nrows (275041999 - 268562375, 93.25% 
expected 91.18%)
  -  Custom Scan (BulkScan) on store_sales  (cost=0.00..9649559.60 
rows=287996960 width=38) (actual time=18.372..54522.803 rows=287997024 loops=1)
  -  Seq Scan on date_dim  (cost=0.00..2705.49 rows=73049 width=16) 
(actual time=0.015..15.533 rows=73049 loops=1)
  -  Seq Scan on customer  (cost=0.00..87141.74 rows=274 
width=156) (actual time=0.018..582.373 rows=200 loops=1)

Other two HashAggregate nodes have similar behavior. The second one
consumed 281sec, including 30sec by underlying GpuJoin. The third one
consumed 138sec, including 25sec by underlying GpuJoin.

Apart from my custom join implementation, It seems to me HashAggregate
node consumed too much time than expectation.

One characteristics of this workload is, this aggregation takes eight
grouping-keys. I doubt cost of function invocation for hash-value and
equal-checks may be criminal.


Let's dive into nodeAgg.c.
ExecAgg() calls agg_fill_hash_table() to fill up the hash table with
rows fetched from the underlying tables. On construction of the hash
table, it calls hash functions (at TupleHashTableHash) and equal check
functions (at execTuplesMatch) repeatedly.
Both of them uses FunctionCallX() interface that exchange the argument
using FmgrInfo structure. Usually, it is not the best way from performance
perspective. Especially, this workload takes 268M input rows and
8 grouping keys, so 268M (rows) x 8 (grouping keys) x 2 (for hash/equal),
4.2B times function calls via FmgrInfo shall happen.

I think SortSupport logic provides a reasonable way to solve this
kind of problem. For example, btint4sortsupport() informs a function
pointer of the fast version of comparator (btint4fastcmp) which takes
two Datum argument without indirect memory reference.
This mechanism will also make sense for HashAggregate logic, to reduce
the cost of function invocations.

Please comment on the idea I noticed here.


And, as an aside, if HashAggregate picks up a hash-value of grouping-keys
from the target-list of underlying plan node (GpuJoin in this case), it
may be possible to off-load calculation of hash-values on GPU device. :-)

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
 Sent: Thursday, August 13, 2015 8:23 PM
 To: Greg Stark
 Cc: PostgreSQL-development
 Subject: Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable 
 plan
 
  On Thu, Aug 13, 2015 at 2:49 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
   In fact, cost of HashJoin underlying Sort node is:
   -  Hash Join  (cost=621264.91..752685.48 rows=1 width=132)
  
   On the other hands, NestedLoop on same place is:
   -  Nested Loop  (cost=0.00..752732.26 rows=1 width=132)
  
   Probably, small GUC adjustment may make optimizer to prefer HashJoin 
   towards
   these kind of queries.
 
  With that kind of discrepancy I doubt adjusting GUCs will be sufficient
 
   Do you have a good idea?
 
  Do you have EXPLAIN ANALYZE from the plan that finishes? Are there any
  row estimates that are way off?
 
 Yes, EXPLAIN ANALYZE is attached.
 
 According to this, CTE year_total generates 384,208 rows. It is much smaller
 than estimation (4.78M rows), however, filter's selectivity of CTE Scan was
 not large as expectation.
 For example, the deepest CTE Scan returns 37923 rows and 26314 rows, even 
 though
 40 rows were expected. On the next level, relations join between 11324 rows 
 and

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Andres Freund

On 2015-08-17 15:51:33 +0100, Greg Stark wrote:
 But I'm not clear from the discussion exactly which compilers we're
 thinking of ruling out.

None.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Greg Stark

On Wed, Jul 1, 2015 at 5:14 PM, Andres Freund and...@anarazel.de wrote:
 During the 9.5 cycle, and earlier, the topic of increasing our minimum
 bar for compilers came up a bunch of times. Specifically whether we
 still should continue to use C90 as a baseline.

 I think the time has come to rely at least on some newer features.

I don't have much opinion on the topic (aside from it's nice that we
run on old systems but that's neither here nor there).

But I'm not clear from the discussion exactly which compilers we're
thinking of ruling out. For GCC are we talking about bumping the
minimum version required or are all current versions of GCC new enough
and we're only talking about old HPUX/Solaris/etc compilers?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] what would tar file FDW look like?

2015-08-17 Thread Bear Giles

I'm starting to work on a tar FDW as a proxy for a much more specific FDW.
(It's the 'faster to build two and toss the first away' approach - tar lets
me get the FDW stuff nailed down before attacking the more complex
container.) It could also be useful in its own right, or as the basis for a
zip file FDW.

I have figured out that in one mode the FDW mapping that would take the
name of the tarball as an option and produce a relation that has all of the
metadata for the contained files - filename, size, owner, timestamp, etc. I
can use the same approach I used for the /etc/passwd FDW for that.

(BTW the current version is at https://github.com/beargiles/passwd-fdw.
It's skimpy on automated tests until I can figure out how to handle the
user mapping but it works.)

The problem is the second mode where I pull a single file out of the FDW.
I've identified three approachs so far:

1. A FDW mapping specific to each file. It would take the name of the
tarfile and the embedded file. Cleanest in some ways but it would be a real
pain if you're reading a tarball dynamically.

2. A user-defined function that takes the name of the tarball and file and
returns a blob. This is the traditional approach but why bother with a FDW
then? It also brings up access control issues since it requires disclosure
of the tarball name to the user. A FDW could hide that.

3. A user-defined function that takes a tar FDW and the name of a file and
returns a blob. I think this is the best approach but I don't know if I can
specify a FDW as a parameter or how to access it.

I've skimmed the existing list of FDW but didn't find anything that can
serve as a model. The foreign DB are closest but, again, they aren't
designed for dynamic use where you want to do something with every file in
an archive / table in a foreign DB.

Is there an obvious approach? Or is it simply a bad match for FDW and
should be two standard UDF?  (One returns the metadata, the second returns
the specific file.)

Thanks,

Bear

Re: [HACKERS] what would tar file FDW look like?

2015-08-17 Thread Greg Stark

On Mon, Aug 17, 2015 at 3:14 PM, Bear Giles bgi...@coyotesong.com wrote:
 I'm starting to work on a tar FDW as a proxy for a much more specific FDW.
 (It's the 'faster to build two and toss the first away' approach - tar lets
 me get the FDW stuff nailed down before attacking the more complex
 container.) It could also be useful in its own right, or as the basis for a
 zip file FDW.

Hm. tar may be a bad fit where zip may be much easier. Tar has no
index or table of contents. You have to scan the entire file to find
all the members. IIRC Zip does have a table of contents at the end of
the file.

The most efficient way to process a tar file is to describe exactly
what you want to happen with each member and then process it linearly
from start to end (or until you've found the members you're looking
for). Trying to return meta info and then go looking for individual
members will be quite slow and have a large startup cost.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-17 Thread Fabien COELHO




Oops, stalled post, sorry wrong From, resent..


Hello Andres,


+   rc = posix_fadvise(context-fd, context-offset, [...]


I'm a bit wary that this might cause significant regressions on
platforms not supporting sync_file_range, but support posix_fadvise()
for workloads that are bigger than shared_buffers. Consider what happens
if the workload does *not* fit into shared_buffers but *does* fit into
the OS's buffer cache. Suddenly reads will go to disk again, no?


That is an interesting question!

My current thinking is maybe yes, maybe no:-), as it may depend on the OS 
implementation of posix_fadvise, so it may differ between OS.


This is a reason why I think that flushing should be kept a guc, even if the 
sort guc is removed and always on. The sync_file_range implementation is 
clearly always very beneficial for Linux, and the posix_fadvise may or may 
not induce a good behavior depending on the underlying system.


This is also a reason why the default value for the flush guc is currently 
set to false in the patch. The documentation should advise to turn it on for 
Linux and to test otherwise. Or if Linux is assumed to be often a host, then 
maybe to set the default to on and to suggest that on some systems it may be 
better to have it off. (Another reason to keep it off is that I'm not sure 
about what happens with such HD flushing features on virtual servers).


Overall, I'm not pessimistic, because I've seen I/O storms on a FreeBSD host 
and it was as bad as Linux (namely the database and even the box was offline 
for long minutes...), and if you can avoid that having to read back some data 
may be not that bad a down payment.


The issue is largely mitigated if the data is not removed from 
shared_buffers, because the OS buffer is just a copy of already hold data. 
What I would do on such systems is to increase shared_buffers and keep 
flushing on, that is to count less on the system cache and more on postgres 
own cache.


Overall, I'm not convince that the practice of relying on the OS cache is a 
good one, given what it does with it, at least on Linux.


Now, if someone could provide a dedicated box with posix_fadvise (say 
FreeBSD, maybe others...) for testing that would allow to provide data 
instead of speculating... and then maybe to decide to change its default 
value.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error message with plpgsql CONTINUE

2015-08-17 Thread Tom Lane

Jim Nasby jim.na...@bluetreble.com writes:
 Calling CONTINUE with a label that's not a loop produces an error 
 message with no context info [1].

True.

 I think err_stmt should probably only be reset in the non-return case a 
 bit below that. I'm not sure about err_text though.

That is not going to help, as you'd soon find if you experimented:
given your example, the produced error message would be

ERROR:  CONTINUE cannot be used outside a loop
CONTEXT:  PL/pgSQL function inline_code_block line 2 at statement block

rather than pointing at the CONTINUE.  To get where you needed to be,
you'd need to have some complicated and fragile rules about where err_stmt
is reset or not reset as a statement nest gets unwound.

I'm inclined to think that if we wanted to make this better, the way to
improve it would be to detect the error *at compile time*, and get rid of
this hack in plpgsql_exec_function altogether.  pl_gram.y already
successfully detects cases where CONTINUE mentions a label that doesn't
exist or isn't surrounding the CONTINUE.  What it is missing is that we
don't distinguish labels on loops from labels on non-loop statements, and
thus it can't tell if CONTINUE is referencing a non-loop label or has no
label but is not inside any loop-type statement.  Seems like that detail
could be added to the PLpgSQL_nsitem data structure without a huge amount
of work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Pavel Stehule

Hi

2015-08-17 21:12 GMT+02:00 Jim Nasby jim.na...@bluetreble.com:

 On 8/17/15 12:57 PM, Dmitry Dolgov wrote:

 * is it interesting for the community?


 We definitely need better ways to manipulate JSON.

 * is that a good idea to extend the `ArrayRef` for jsonb? If it's
 appropriate, probably we can rename it to `ArrayJsonbRef` of something.
 * what can be improved in the code at the top level (function placement,
 probably, functionality duplication, etc.)?
 * are there any special cases, that I should take care of in this
 implementation?


 How would this work when you have a JSON array? Postgres array syntax
 suddenly becoming key/value syntax for JSON seems like a pretty bad idea to
 me. Could a different syntax (maybe {}) be used instead?


I don't understand why '{}' should be better than '[]' ?

The lot of modern languages doesn't different between arrays and hash.

Regards

Pavel



 I'm not sure having the UPDATE you show cause objects to spring to life is
 so great either.
 --
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
 Data in Trouble? Get it in Treble! http://BlueTreble.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Test code is worth the space

2015-08-17 Thread Noah Misch

On Mon, Aug 17, 2015 at 02:04:40PM -0500, Jim Nasby wrote:
 On 8/16/15 8:48 AM, Greg Stark wrote:
 On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch n...@leadboat.com wrote:
 When I've just spent awhile implementing a behavior change, the test diffs 
 are
 a comforting sight.  They confirm that the test suite exercises the topic I
 just changed.  Furthermore, most tests today do not qualify under this
 stringent metric you suggest.  The nature of pg_regress opposes it.
 
 It's not a comforting sight for me. It means I need to change the test
 results and then of course the tests will pass. So they didn't really
 test anything and I don't really know whether the changes broke
 anything. Any test I had to adjust for my change was effectively
 worthless.
 
 This is precisely my point about pg_regress and why it's the reason
 there's pressure not to have extensive tests. It's useful to have some
 end-to-end black box tests like this but it's self-limiting. You can't
 test many details without tying the tests to specific behaviour.
 
 I have worked with insanely extensive testing suites that didn't have
 this problem. But they were unit tests for code that was structured
 around testability. When the API is designed to be testable and you
 have good infrastructure for mocking up the environment and comparing
 function results in a much narrower way than just looking at text
 output you can test the correctness of each function without tying the
 test to the specific behaviour.
 
 *That* gives a real comfort. When you change a function to behave
 entirely differently and can still run all the tests and see that all
 the code that depends on it still operate correctly then you know you
 haven't broken anything. In that case it actually *speeds up*
 development rather than slowing it down. It lets newbie developers
 hack on code where they don't necessarily know all the downstream
 dependent code and not be paralyzed by fear that they'll break
 something.
 
 As someone who oversaw a database test suite with almost 500 test files
 (each testing multiple cases), I completely agree. Our early framework was
 actually similar to pg_regress and that worked OK up to about 200 test
 files, but it got very unwieldy very quickly. We switched to pgTap and it
 was far easier to work with.

My own position is based on having maintained a pg_regress suite an order of
magnitude larger than that.  I don't know why that outcome was so different.

 I suspect any effort to significantly improve Postgres test coverage is
 doomed until there's an alternative to pg_regress.

There is the src/test/perl/TestLib.pm harness.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing documentation for partial WAL files

2015-08-17 Thread Michael Paquier

On Tue, Aug 18, 2015 at 3:57 AM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Aug 17, 2015 at 11:50 AM, Peter Eisentraut pete...@gmx.net wrote:
 The commit message for de76884 contains some important information about
 the purpose and use of the new .partial WAL files.  But I don't see
 anything about this in the documentation or another user-visible place.
  We should probably add something.

This makes sense, those files are exposed in the user's archives when
the end of a timeline is reached at promotion. I think that this
should be added in Continuous archiving in standby with a new
paragraph, as the first paragraph argues about archive_mode = 'always'
and the second about 'on'. What about the attached?

 Uh, some documentation around .ready files would be nice too.

Why? Users normally need to have no knowledge of that, those status
files are managed only by the backend.
-- 
Michael
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 37aa047..fe161b6 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1259,6 +1259,18 @@ primary_slot_name = 'node_a_slot'
  When a server is not in recovery mode, there is no difference between
  literalon/literal and literalalways/literal modes.
/para
+
+   para
+When standby is promoted and varnamearchive_mode/varname is set to
+literalon/ or literalalways/literal, it will archive the last,
+partial and incomplete segment from the previous timeline with suffix
+filename.partial/filename. There is no automatic mechanism to detect
+and use filename.partial/filename files at recovery so they will go
+unused except if they are renamed and placed in
+filenamepg_xlog/filename. This segment would not be needed when
+recovering on the new timeline as the first segment of the new timeline
+would be used instead, still it may be useful for debugging purposes.
+   /para
   /sect2
   /sect1
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Jeff Janes

On Mon, Aug 17, 2015 at 3:02 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Jeff Janes wrote:

  The attached patch takes a ShareUpdateExclusiveLock lock on the index in
  order to clean the pending list.

 Does it take a lock on the table also?  Because if not ...


There must be some kind of lock held on the table already (either
RowExclusive at least for user backends or ShareRowExclusive for vacuum
backends), but I don't do anything in this patch with it.



  One potential problem is how it will interact with create index
  concurrently.

 ... then I don't understand how you could have a problem here.  Surely
 no pending list cleanup can happen concurrently with the index being
 created?


While grepping through the code looking for precedent, I noticed that
create index concurrently takes a ShareUpdateExclusiveLock on both table
and index. I don't know why it needs one on the index, I didn't investigate
it thoroughly.

During the last stage of the create index concurrently, inserters and
updaters are obliged to maintain the index even though they can't use it
yet.  So that would mean adding to the pending list, which might get big
enough to need cleaning.

Cheers,

Jeff

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Kaare Rasmussen


On 2015-08-17 22:33, Josh Berkus wrote:
So, both perl and python do not allow deep nesting of assignments. 
For example:

d = { a : { } }
d[a][a1][a2] = 42

Traceback (most recent call last):
   File stdin, line 1, in module
KeyError: 'a1'


Not sure I understand what you mean. In Perl you'd do

$ perl -e '%d = (a = {}); $d{a}{a1}{a2} = 42; print $d{a}{a1}{a2}'
42

which looks pretty much like what's proposed.

/kaare


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what would tar file FDW look like?

2015-08-17 Thread Andrew Dunstan




On 08/17/2015 10:14 AM, Bear Giles wrote:
I'm starting to work on a tar FDW as a proxy for a much more specific 
FDW. (It's the 'faster to build two and toss the first away' approach 
- tar lets me get the FDW stuff nailed down before attacking the more 
complex container.) It could also be useful in its own right, or as 
the basis for a zip file FDW.


I have figured out that in one mode the FDW mapping that would take 
the name of the tarball as an option and produce a relation that has 
all of the metadata for the contained files - filename, size, owner, 
timestamp, etc. I can use the same approach I used for the /etc/passwd 
FDW for that.


(BTW the current version is at 
https://github.com/beargiles/passwd-fdw. It's skimpy on automated 
tests until I can figure out how to handle the user mapping but it works.)


The problem is the second mode where I pull a single file out of the 
FDW. I've identified three approachs so far:


1. A FDW mapping specific to each file. It would take the name of the 
tarfile and the embedded file. Cleanest in some ways but it would be a 
real pain if you're reading a tarball dynamically.


2. A user-defined function that takes the name of the tarball and file 
and returns a blob. This is the traditional approach but why bother 
with a FDW then? It also brings up access control issues since it 
requires disclosure of the tarball name to the user. A FDW could hide 
that.


3. A user-defined function that takes a tar FDW and the name of a file 
and returns a blob. I think this is the best approach but I don't know 
if I can specify a FDW as a parameter or how to access it.


I've skimmed the existing list of FDW but didn't find anything that 
can serve as a model. The foreign DB are closest but, again, they 
aren't designed for dynamic use where you want to do something with 
every file in an archive / table in a foreign DB.


Is there an obvious approach? Or is it simply a bad match for FDW and 
should be two standard UDF?  (One returns the metadata, the second 
returns the specific file.)






I would probably do something like this:

In this mode, define a table that has path, blob. To get the blob for 
a single file, just do select blob from fdwtable where path = 
'/path/to/foo'. Make sure you process the qual in the FDW.


e.g.

   create foreign table tarblobs (path text, blob bytea)
   server tarfiles options (filename  '/path/to/tarball', mode 'contents');


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] what would tar file FDW look like?

2015-08-17 Thread Bear Giles

I've written readers for both from scratch. Tar isn't that bad since it's
blocked - you read the header, skip forward N blocks, continue. The hardest
part is setting up the decompression libraries if you want to support
tar.gz or tar.bz2 files.

Zip files are more complex. You have (iirc) 5 control blocks - start of
archive, start of file, end of file, start of index, end of archive, and
the information in the control block is pretty limited. That's not a huge
burden since there's support for extensions for things like the unix file
metadata. One complication is that you need to support compression from the
start.

Zip files support two types of encryption. There's a really weak version
that almost nobody supports and a much stronger modern version that's
subject to license restrictions.  (Some people use the weak version on
embedded systems because of legal requirements to /do something/, no matter
how lame.)

There are third-party libraries, of course, but that introduces
dependencies. Both formats are simple enough to write from scratch.

I guess my bigger question is if there's an interest in either or both for
real use. I'm doing this as an exercise but am willing to contrib the
code if there's a general interest in it.

(BTW the more complex object I'm working on is the .p12 keystore for
digital certificates and private keys. We have everything we need in the
openssl library so there's no additional third-party dependencies. I have a
minimal FDW for the digital certificate itself and am now working on a way
to access keys stored in a standard format on the filesystem instead of in
the database itself. A natural fit is a specialized archive FDW. Unlike tar
and zip it will have two payloads, the digital certificate and the
(optionally encrypted) private key. It has searchable metadata, e.g.,
finding all records with a specific subject.)

Bear

On Mon, Aug 17, 2015 at 8:29 AM, Greg Stark st...@mit.edu wrote:

 On Mon, Aug 17, 2015 at 3:14 PM, Bear Giles bgi...@coyotesong.com wrote:
  I'm starting to work on a tar FDW as a proxy for a much more specific
 FDW.
  (It's the 'faster to build two and toss the first away' approach - tar
 lets
  me get the FDW stuff nailed down before attacking the more complex
  container.) It could also be useful in its own right, or as the basis
 for a
  zip file FDW.

 Hm. tar may be a bad fit where zip may be much easier. Tar has no
 index or table of contents. You have to scan the entire file to find
 all the members. IIRC Zip does have a table of contents at the end of
 the file.

 The most efficient way to process a tar file is to describe exactly
 what you want to happen with each member and then process it linearly
 from start to end (or until you've found the members you're looking
 for). Trying to return meta info and then go looking for individual
 members will be quite slow and have a large startup cost.


 --
 greg

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Robert Haas

On Sat, Aug 15, 2015 at 8:03 PM, Andres Freund and...@anarazel.de wrote:
 That gave me new respect for STATIC_IF_INLINE.  While it does add tedious 
 work
 to the task of introducing a new batch of inline functions, the work is
 completely mechanical.  Anyone can write it; anyone can review it; there's 
 one
 correct way to write it.

 What's the advantage of using STATIC_IF_INLINE over just #ifndef
 FRONTEND? That doesn't work well for entire headers in my opinion, but
 for individual functions it shouldn't be a problem? In fact we've done
 it for years for MemoryContextSwitchTo(). And by the problem definition
 such functions cannot be required by frontend code.

 STATIC_IF_INLINE is really tedious because to know whether it works you
 essentially need to recompile postgres with inlines enabled/disabled.

 In fact STATIC_IF_INLINE does *not* even help here unless we also detect
 compilers that support inlining but balk when inline functions reference
 symbols not available. There was an example of that in the buildfarm as
 of 2014, c.f. a9baeb361d63 . We'd have to disable inlining for such
 compilers.

The advantage of STATIC_IF_INLINE is that we had everything working in
that model.  We seem to be replacing problems that we had solved and
code that worked on our entire buildfarm with new problems that we
haven't solved yet and which don't seem to be a whole lot simpler than
the ones they replaced.

As far as I can see, the anticipated benefits of what we're doing here are:

- Get a cleaner separation of frontend and backend headers (this could
also be done independently of STATIC_IF_INLINE, but removing
STATIC_IF_INLINE increases the urgency).
- Eliminate multiple evaluations hazards.
- Modest improvements to code generation.

And the costs are:

- Lots of warnings with compilers that are not smart about static
inline, and probably significantly worse code generation.
- The possibility that may repeatedly break #define FRONTEND
compilation when we add static inline functions, where instead adding
macros would not have caused breakage, thus resulting in continual
tinkering with the header files.

What I'm basically worried about is that second one.  Actually, what
I'm specifically worried about is that we will break somebody's #ifdef
FRONTEND code, they will eventually complain, and we will refuse to
fix it because we don't think their use case is valid.  If that
happens even a few times, it will lead me to think that this whole
effort was misguided.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More WITH

2015-08-17 Thread David Fetter

On Mon, Aug 17, 2015 at 10:22:11AM -0700, Josh Berkus wrote:
 
  EXPLAIN [ANALYZE]
 
 Would be tricky.  We don't currently have any way to wrap an EXPLAIN
 in any larger statement, do we?

We do, but it's kinda horrible.

CREATE OR REPLACE FUNCTION get_something_from_explain(your_query)
RETURNS TEXT
LANGUAGE plpgsql /* uh oh */
AS $$
DECLARE
foo JSON;
BEGIN
EXECUTE format('EXPLAIN (FORMAT json), your_query) INTO foo;
RETURN foo # '{bar,baz,quux}';
END;
$$;

 Would be very useful for automated query analysis, though.

Among many other things, certainly :)

  SHOW
 
 Not very useful, easy to work around (pg_settings).

This particular one is just about being consistent, or the way I look
at it, about avoiding surprising users with inconsistencies.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More WITH

2015-08-17 Thread Andrew Dunstan




On 08/17/2015 01:30 PM, Peter Geoghegan wrote:

On Mon, Aug 17, 2015 at 10:22 AM, Josh Berkus j...@agliodbs.com wrote:

Would be tricky.  We don't currently have any way to wrap an EXPLAIN in
any larger statement, do we?  Would be very useful for automated query
analysis, though.

No. In the grammar, ExplainStmt expects the EXPLAIN to be at the
top-level. Having it work any other way would require significant
refactoring.




Slightly apropos, I have wrapped EXPLAIN calls inside a function, such 
as one that gets back the result and then sends it off to 
http://explain.depesz.com, returning the URL


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configure with thread sanitizer fails the thread test

2015-08-17 Thread Alvaro Herrera

Ewan Higgs wrote:

 So I changed volatile to _Atomic and continued (patch is in
 thread_test_atomic.patch). I then ran it against sqlsmith. The good
 news: I didn't happen to find any problems in normal use. The bad
 news: I did find a lot of warnings about improper use of functions
 like malloc and free from signal handlers.

There's a reason why we don't offer a threaded server ...  The
postmaster process in particular runs in a rather unusual arrangement,
where most of the interesting stuff does happen in signal handlers.  I
doubt there's any chance that we would make it run in a threaded
environment.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configure with thread sanitizer fails the thread test

2015-08-17 Thread Andres Freund

On 2015-08-17 07:37:45 +, Ewan Higgs wrote:
 So I changed volatile to _Atomic and continued (patch is in
 thread_test_atomic.patch). I then ran it against sqlsmith. The good
 news: I didn't happen to find any problems in normal use. The bad
 news: I did find a lot of warnings about improper use of functions
 like malloc and free from signal handlers. I attached the log under
 'errors.log'.

Aren't pretty much all of those false positives? I checked the first few
and they all look like:

 ==
 WARNING: ThreadSanitizer: signal-unsafe call inside of a signal (pid=26767)
 #0 free null (libtsan.so.0+0x00025d09)
 #1 AllocSetDelete 
 /home/ehiggs/src/download/postgresql/src/backend/utils/mmgr/aset.c:643 
 (postgres+0x009ece3d)
 #2 MemoryContextDelete 
 /home/ehiggs/src/download/postgresql/src/backend/utils/mmgr/mcxt.c:226 
 (postgres+0x009ef7cc)
 #3 MemoryContextDeleteChildren 
 /home/ehiggs/src/download/postgresql/src/backend/utils/mmgr/mcxt.c:246 
 (postgres+0x009ef83b)
 #4 MemoryContextDelete 
 /home/ehiggs/src/download/postgresql/src/backend/utils/mmgr/mcxt.c:209 
 (postgres+0x009ef798)
 #5 StartChildProcess 
 /home/ehiggs/src/download/postgresql/src/backend/postmaster/postmaster.c:5211 
 (postgres+0x007b2e72)
 #6 reaper 
 /home/ehiggs/src/download/postgresql/src/backend/postmaster/postmaster.c:2717 
 (postgres+0x007b44ac)
 #7 null null (libtsan.so.0+0x000236e3)
 #8 ServerLoop 
 /home/ehiggs/src/download/postgresql/src/backend/postmaster/postmaster.c:1631 
 (postgres+0x007b6f78)
 #9 PostmasterMain 
 /home/ehiggs/src/download/postgresql/src/backend/postmaster/postmaster.c:1274 
 (postgres+0x007b6f78)
 #10 main /home/ehiggs/src/download/postgresql/src/backend/main/main.c:223 
 (postgres+0x006f2da9)

This is after a fork. And fork is a async-signal-safe function. So it
seems tsan doesn't properly recognize that we actually escaped the
signal handler and are in a new process.

  #include stdio.h
  #include stdlib.h
 +#include stdatomic.h
  #include unistd.h
  #include netdb.h
  #include sys/types.h
 @@ -84,11 +85,11 @@ static void func_call_2(void);
  
  static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
  
 -static volatile int thread1_done = 0;
 -static volatile int thread2_done = 0;
 +static _Atomic int thread1_done = 0;
 +static _Atomic int thread2_done = 0;
  
 -static volatile int errno1_set = 0;
 -static volatile int errno2_set = 0;
 +static _Atomic int errno1_set = 0;
 +static _Atomic int errno2_set = 0;
  
  #ifndef HAVE_STRERROR_R
  static char *strerror_p1;

We can't do that because we have to work on older compilers as well. My
suggestion for now would be to disable tsan during configure and only
enable it during the actual build (you can do stuff like make
COPT='-fsanitize...'.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] jsonb array-style subscripting

2015-08-17 Thread Dmitry Dolgov

Hi,

Some time ago the array-style subscripting for the jsonb data type was
discussed in this mailing list. I think it will be quite convenient to have
a such nice syntax to update jsonb objects, so I'm trying to implement
this. I created a patch, that allows doing something like this:


=# create TEMP TABLE test_jsonb_subscript (
   id int,
   test_json jsonb
);

=# insert into test_jsonb_subscript values
(1, '{}'),
(2, '{}');

=# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;
=# select * from test_jsonb_subscript;
 id |test_json
+--
  1 | {a: {a1: {a2: 42}}}
  2 | {a: {a1: {a2: 42}}}
(2 rows)

=# select test_json['a']['a1'] from test_jsonb_subscript;
 test_json

 {a2: 42}
 {a2: 42}
(2 rows)


This patch has a status work in progress of course. Generally speaking,
this implementation extends the `ArrayRef` usage for the jsonb.
And I need some sort of advice about several questions:

* is it interesting for the community?
* is that a good idea to extend the `ArrayRef` for jsonb? If it's
appropriate, probably we can rename it to `ArrayJsonbRef` of something.
* what can be improved in the code at the top level (function placement,
probably, functionality duplication, etc.)?
* are there any special cases, that I should take care of in this
implementation?


jsonb_subscript2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More WITH

2015-08-17 Thread Tom Lane

Peter Geoghegan p...@heroku.com writes:
 On Mon, Aug 17, 2015 at 10:22 AM, Josh Berkus j...@agliodbs.com wrote:
 Would be tricky.  We don't currently have any way to wrap an EXPLAIN in
 any larger statement, do we?  Would be very useful for automated query
 analysis, though.

 No. In the grammar, ExplainStmt expects the EXPLAIN to be at the
 top-level. Having it work any other way would require significant
 refactoring.

You can use EXPLAIN as the source of rows in a plpgsql FOR-over-query
loop, so there's a workaround available that way when you need to read
EXPLAIN output programmatically.  I'm not convinced there's sufficient
value in trying to make EXPLAIN a full-fledged subquery otherwise.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Andres Freund

On 2015-08-17 12:30:56 -0400, Robert Haas wrote:
 As far as I can see, the anticipated benefits of what we're doing here are:
 
 - Get a cleaner separation of frontend and backend headers (this could
 also be done independently of STATIC_IF_INLINE, but removing
 STATIC_IF_INLINE increases the urgency).
 - Eliminate multiple evaluations hazards.
 - Modest improvements to code generation.

Plus:
* Not having 7k long macros, that e.g. need extra flags to even be
  supported. C.f. 
http://archives.postgresql.org/message-id/4407.1435763473%40sss.pgh.pa.us
* Easier development due to actual type checking and such. Compare the
  errors from heap_getattr as a macro being passed a boolean with the
  same from an inline function: Inline:

/home/andres/src/postgresql/src/backend/executor/spi.c: In function 
‘SPI_getvalue’:
/home/andres/src/postgresql/src/backend/executor/spi.c:883:46: error: 
incompatible type for argument 4 of ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
  ^
In file included from 
/home/andres/src/postgresql/src/backend/executor/spi.c:17:0:
/home/andres/src/postgresql/src/include/access/htup_details.h:765:1: note: 
expected ‘_Bool *’ but argument is of type ‘_Bool’
 heap_getattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
 ^

Macro:

In file included from 
/home/andres/src/postgresql/src/backend/executor/spi.c:17:0:
/home/andres/src/postgresql/src/backend/executor/spi.c: In function 
‘SPI_getvalue’:
/home/andres/src/postgresql/src/include/access/htup_details.h:750:6: error: 
invalid type argument of unary ‘*’ (have ‘int’)
 (*(isnull) = true), \
  ^
/home/andres/src/postgresql/src/backend/executor/spi.c:883:8: note: in 
expansion of macro ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
^
/home/andres/src/postgresql/src/include/access/htup_details.h:750:23: warning: 
left-hand operand of comma expression has no effect [-Wunused-value]
 (*(isnull) = true), \
   ^
/home/andres/src/postgresql/src/backend/executor/spi.c:883:8: note: in 
expansion of macro ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
^
/home/andres/src/postgresql/src/include/access/htup_details.h:697:3: error: 
invalid type argument of unary ‘*’ (have ‘int’)
  (*(isnull) = false),   \
   ^
/home/andres/src/postgresql/src/include/access/htup_details.h:754:5: note: in 
expansion of macro ‘fastgetattr’
 fastgetattr((tup), (attnum), (tupleDesc), (isnull)) \
 ^
/home/andres/src/postgresql/src/backend/executor/spi.c:883:8: note: in 
expansion of macro ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
^
/home/andres/src/postgresql/src/include/access/htup_details.h:713:5: error: 
invalid type argument of unary ‘*’ (have ‘int’)
(*(isnull) = true),  \
 ^
/home/andres/src/postgresql/src/include/access/htup_details.h:754:5: note: in 
expansion of macro ‘fastgetattr’
 fastgetattr((tup), (attnum), (tupleDesc), (isnull)) \
 ^
/home/andres/src/postgresql/src/backend/executor/spi.c:883:8: note: in 
expansion of macro ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
^
/home/andres/src/postgresql/src/include/access/htup_details.h:713:22: warning: 
left-hand operand of comma expression has no effect [-Wunused-value]
(*(isnull) = true),  \
  ^
/home/andres/src/postgresql/src/include/access/htup_details.h:754:5: note: in 
expansion of macro ‘fastgetattr’
 fastgetattr((tup), (attnum), (tupleDesc), (isnull)) \
 ^
/home/andres/src/postgresql/src/backend/executor/spi.c:883:8: note: in 
expansion of macro ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
^
/home/andres/src/postgresql/src/include/access/htup_details.h:697:21: warning: 
left-hand operand of comma expression has no effect [-Wunused-value]
  (*(isnull) = false),   \
 ^
/home/andres/src/postgresql/src/include/access/htup_details.h:754:5: note: in 
expansion of macro ‘fastgetattr’
 fastgetattr((tup), (attnum), (tupleDesc), (isnull)) \
 ^
/home/andres/src/postgresql/src/backend/executor/spi.c:883:8: note: in 
expansion of macro ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
^
/home/andres/src/postgresql/src/include/access/htup_details.h:757:50: warning: 
passing argument 4 of ‘heap_getsysattr’ makes pointer from integer without a 
cast [-Wint-conversion]
heap_getsysattr((tup), (attnum), (tupleDesc), (isnull)) \
  ^
/home/andres/src/postgresql/src/backend/executor/spi.c:883:8: note: in 
expansion of macro ‘heap_getattr’
  val = heap_getattr(tuple, fnumber, tupdesc, isnull);
^
/home/andres/src/postgresql/src/include/access/htup_details.h:771:14: note: 
expected ‘bool * {aka char *}’ but argument is of type ‘bool {aka char}’
 extern Datum

[HACKERS] More WITH

2015-08-17 Thread David Fetter

Folks,

In the interest of consistency, which is to say, of not hitting
barriers that are essentially implementation details, I'd like to
propose that we allow the rest of the row-returning commands inside
WITH clauses.  We currently have:

SELECT
VALUES
INSERT/UPDATE/DELETE ... RETURNING

We don't yet have:

EXPLAIN [ANALYZE]
SHOW
FETCH

A little further out there, although this would be an API change, we
might consider allowing the results of VACUUM and ANALYZE as row sets,
which would also be good to wrap in WITH.

Is there a good reason, or more than one, why we shouldn't have all
the row-returning commands in WITH?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More WITH

2015-08-17 Thread Josh Berkus


 EXPLAIN [ANALYZE]

Would be tricky.  We don't currently have any way to wrap an EXPLAIN in
any larger statement, do we?  Would be very useful for automated query
analysis, though.

 SHOW

Not very useful, easy to work around (pg_settings).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More WITH

2015-08-17 Thread Peter Geoghegan

On Mon, Aug 17, 2015 at 10:22 AM, Josh Berkus j...@agliodbs.com wrote:
 Would be tricky.  We don't currently have any way to wrap an EXPLAIN in
 any larger statement, do we?  Would be very useful for automated query
 analysis, though.

No. In the grammar, ExplainStmt expects the EXPLAIN to be at the
top-level. Having it work any other way would require significant
refactoring.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Andres Freund

On 2015-08-17 12:30:56 -0400, Robert Haas wrote:
 - The possibility that may repeatedly break #define FRONTEND
 compilation when we add static inline functions, where instead adding
 macros would not have caused breakage, thus resulting in continual
 tinkering with the header files.

Again, that's really independent. Inlines have that problem, even with
STATIC_IF_INLINE. C.f. MemoryContextSwitch() and a9baeb361d.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory allocation in spi_printtup()

2015-08-17 Thread Tom Lane

Neil Conway neil.con...@gmail.com writes:

Hi Neil!  Long time no see.

 spi_printtup() has the following code (spi.c:1798):
 if (tuptable-free == 0)
 {
 tuptable-free = 256;
 tuptable-alloced += tuptable-free;
 tuptable-vals = (HeapTuple *) repalloc(tuptable-vals,
tuptable-alloced * sizeof(HeapTuple));
 }

 i.e., it grows the size of the tuptable by a fixed amount when it runs
 out of space. That seems odd; doubling the size of the table would be
 more standard. Does anyone know if there is a rationale for this
 behavior?

Seems like it must be just legacy code.  We're only allocating pointers
here; the actual tuples will likely be significantly larger.  So there's
not a lot of reason not to use the normal doubling rule.

 Attached is a one-liner to double the size of the table when space is
 exhausted.

I think this could use a comment, but otherwise seems OK.

Should we back-patch this change?  Seems like it's arguably a
performance bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] DTrace build dependency rules

2015-08-17 Thread Mark Johnston

Hi,

There seems to be a bug in the make rules when DTrace is enabled. It
causes dtrace -G to be invoked twice when building PostgreSQL as a
FreeBSD port: once during the build itself, and once during
installation. For a long time this has been worked around on FreeBSD
with a change to libdtrace itself, but this workaround is proving
problematic and I'd like to fix the problem properly. I'm not sure
whether the problem has been observed on other operating systems that
support DTrace.

The bug is in src/backend/Makefile. probes.o, the dtrace(1)-generated
object file, depends on the objfiles.txt for each of the backend
subdirs. These files depend in turn on the object files themselves; if
objfiles.txt is out of date with respect to one of its object files, the
mtime of objfiles.txt is updated with touch (see backend/common.mk).
The problem is that dtrace -G, which runs at the end of the build,
modifies a number of object files (it overwrites their probe sites with
NOPs), thus making their corresponding objfiles.txt out of date. Then,
when make install traverses the backend subdirs, it updates
objfiles.txt, which causes probes.o to be rebuilt, resulting in an error
from dtrace(1).

The attached patch fixes the problem by having probes.o depend on object
files directly, rather than on objfiles.txt. I've tested it with
PostgreSQL 9.0-9.4 on FreeBSD CURRENT.

Thanks!
-Mark
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 98b978f..b1e3969 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -180,8 +180,8 @@ $(top_builddir)/src/include/utils/probes.h: utils/probes.h
 	$(LN_S) ../../../$(subdir)/utils/probes.h .
 
 
-utils/probes.o: utils/probes.d $(SUBDIROBJS)
-	$(DTRACE) $(DTRACEFLAGS) -C -G -s $(call expand_subsys,$^) -o $@
+utils/probes.o: utils/probes.d $(call expand_subsys,$(SUBDIROBJS))
+	$(DTRACE) $(DTRACEFLAGS) -C -G -s $ $(call expand_subsys,$(SUBDIROBJS)) -o $@
 
 
 ##

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory allocation in spi_printtup()

2015-08-17 Thread Neil Conway

On Mon, Aug 17, 2015 at 7:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hi Neil!  Long time no see.

Likewise :)

 Attached is a one-liner to double the size of the table when space is
 exhausted.

 I think this could use a comment, but otherwise seems OK.

Attached is a revised patch with a comment.

 Should we back-patch this change?  Seems like it's arguably a
 performance bug.

Sounds good to me.

Neil
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d544ad9..05a2b21 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1797,7 +1797,8 @@ spi_printtup(TupleTableSlot *slot, DestReceiver *self)
 
 	if (tuptable-free == 0)
 	{
-		tuptable-free = 256;
+		/* Double the size of the table */
+		tuptable-free = tuptable-alloced;
 		tuptable-alloced += tuptable-free;
 		tuptable-vals = (HeapTuple *) repalloc(tuptable-vals,
 	  tuptable-alloced * sizeof(HeapTuple));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-17 Thread Andres Freund

On 2015-08-17 15:21:22 +0200, Fabien COELHO wrote:
 My current thinking is maybe yes, maybe no:-), as it may depend on the OS
 implementation of posix_fadvise, so it may differ between OS.

As long as fadvise has no 'undirty' option, I don't see how that
problem goes away. You're telling the OS to throw the buffer away, so
unless it ignores it that'll have consequences when you read the page
back in.

 This is a reason why I think that flushing should be kept a guc, even if the
 sort guc is removed and always on. The sync_file_range implementation is
 clearly always very beneficial for Linux, and the posix_fadvise may or may
 not induce a good behavior depending on the underlying system.

That's certainly an argument.

 This is also a reason why the default value for the flush guc is currently
 set to false in the patch. The documentation should advise to turn it on for
 Linux and to test otherwise. Or if Linux is assumed to be often a host, then
 maybe to set the default to on and to suggest that on some systems it may be
 better to have it off. 

I'd say it should then be an os-specific default. No point in making
people work for it needlessly on linux and/or elsewhere.

 (Another reason to keep it off is that I'm not sure about what
 happens with such HD flushing features on virtual servers).

I don't see how that matters? Either the host will entirely ignore
flushing, and thus the sync_file_range and the fsync won't cost much, or
fsync will be honored, in which case the pre-flushing is helpful.


 Overall, I'm not pessimistic, because I've seen I/O storms on a FreeBSD host
 and it was as bad as Linux (namely the database and even the box was offline
 for long minutes...), and if you can avoid that having to read back some
 data may be not that bad a down payment.

I don't see how that'd alleviate my fear. Sure, the latency for many
workloads will be better, but I don't how that argument says anything
about the reads? And we'll not just use this in cases it'd be
beneficial...

 The issue is largely mitigated if the data is not removed from
 shared_buffers, because the OS buffer is just a copy of already hold data.
 What I would do on such systems is to increase shared_buffers and keep
 flushing on, that is to count less on the system cache and more on postgres
 own cache.

That doesn't work that well for a bunch of reasons. For one it's
completely non-adaptive. With the OS's page cache you can rely on free
memory being used for caching *and* it be available should a query or
another program need lots of memory.

 Overall, I'm not convince that the practice of relying on the OS cache is a
 good one, given what it does with it, at least on Linux.

The alternatives aren't super realistic near-term though. Using direct
IO efficiently on the set of operating systems we support is
*hard*. It's more or less trivial to hack pg up to use direct IO for
relations/shared_buffers, but it'll perform utterly horribly in many
many cases.

To pick one thing out: Without the OS buffering writes any write will
have to wait for the disks, instead being asynchronous. That'll make
writes performed by backends a massive bottleneck.

 Now, if someone could provide a dedicated box with posix_fadvise (say
 FreeBSD, maybe others...) for testing that would allow to provide data
 instead of speculating... and then maybe to decide to change its default
 value.

Testing, as an approximation, how it turns out to work on linux would be
a good step.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] missing documentation for partial WAL files

2015-08-17 Thread Peter Eisentraut

The commit message for de76884 contains some important information about
the purpose and use of the new .partial WAL files.  But I don't see
anything about this in the documentation or another user-visible place.
 We should probably add something.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Test code is worth the space

2015-08-17 Thread Jim Nasby


On 8/15/15 4:45 AM, Petr Jelinek wrote:

We could fix a) by adding ORDER BY to those queries but I don't see how
to fix the rest easily or at all without sacrificing some test coverage.


Hopefully at some point we'll have test frameworks that don't depend on 
capturing raw psql output, which presumably makes dealing with a lot of 
this easier.


Maybe some of this could be handled by factoring BLKSZ into the queries...

A really crude solution would be to have expected output be BLKSZ 
dependent where appropriate. No one will remember to test that by hand, 
but if we have CI setup then maybe it's workable...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising our compiler requirements for 9.6

2015-08-17 Thread Robert Haas

On Mon, Aug 17, 2015 at 12:36 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-08-17 12:30:56 -0400, Robert Haas wrote:
 - The possibility that may repeatedly break #define FRONTEND
 compilation when we add static inline functions, where instead adding
 macros would not have caused breakage, thus resulting in continual
 tinkering with the header files.

 Again, that's really independent. Inlines have that problem, even with
 STATIC_IF_INLINE. C.f. MemoryContextSwitch() and a9baeb361d.

Inlines, yes, but macros don't.

I'm not saying we shouldn't do this, but I *am* saying that we need to
be prepared to treat breaking FRONTEND compilation as a problem, not
just today and tomorrow, but way off into the future.  It's not at all
a stretch to think that we could still be hitting fallout from these
changes in 2-3 years time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Test code is worth the space

2015-08-17 Thread Jim Nasby


On 8/16/15 8:48 AM, Greg Stark wrote:

On Sun, Aug 16, 2015 at 7:33 AM, Noah Misch n...@leadboat.com wrote:

When I've just spent awhile implementing a behavior change, the test diffs are
a comforting sight.  They confirm that the test suite exercises the topic I
just changed.  Furthermore, most tests today do not qualify under this
stringent metric you suggest.  The nature of pg_regress opposes it.


It's not a comforting sight for me. It means I need to change the test
results and then of course the tests will pass. So they didn't really
test anything and I don't really know whether the changes broke
anything. Any test I had to adjust for my change was effectively
worthless.

This is precisely my point about pg_regress and why it's the reason
there's pressure not to have extensive tests. It's useful to have some
end-to-end black box tests like this but it's self-limiting. You can't
test many details without tying the tests to specific behaviour.

I have worked with insanely extensive testing suites that didn't have
this problem. But they were unit tests for code that was structured
around testability. When the API is designed to be testable and you
have good infrastructure for mocking up the environment and comparing
function results in a much narrower way than just looking at text
output you can test the correctness of each function without tying the
test to the specific behaviour.

*That* gives a real comfort. When you change a function to behave
entirely differently and can still run all the tests and see that all
the code that depends on it still operate correctly then you know you
haven't broken anything. In that case it actually *speeds up*
development rather than slowing it down. It lets newbie developers
hack on code where they don't necessarily know all the downstream
dependent code and not be paralyzed by fear that they'll break
something.


As someone who oversaw a database test suite with almost 500 test files 
(each testing multiple cases), I completely agree. Our early framework 
was actually similar to pg_regress and that worked OK up to about 200 
test files, but it got very unwieldy very quickly. We switched to pgTap 
and it was far easier to work with.


I suspect any effort to significantly improve Postgres test coverage is 
doomed until there's an alternative to pg_regress.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing documentation for partial WAL files

2015-08-17 Thread Peter Geoghegan

On Mon, Aug 17, 2015 at 11:50 AM, Peter Eisentraut pete...@gmx.net wrote:
 The commit message for de76884 contains some important information about
 the purpose and use of the new .partial WAL files.  But I don't see
 anything about this in the documentation or another user-visible place.
  We should probably add something.

Uh, some documentation around .ready files would be nice too.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configure with thread sanitizer fails the thread test

2015-08-17 Thread Andres Freund

On 2015-08-17 14:31:24 -0300, Alvaro Herrera wrote:
 The postmaster process in particular runs in a rather unusual
 arrangement, where most of the interesting stuff does happen in signal
 handlers.

FWIW, I think it might be worthwhile to convert postmaster into a loop
over a process local latch, with that latch being set in signal
handlers. My feeling is that that'd simplify the code rather
significantly. I'm not 100% it's worth the code churn, but it'd
definitely be easier to understand.  Thread sanitizer isn't the first
analysis tool that has problems coping with forks in signal handlers
btw, valgrind on amd64 for a long while had misaligned stacks in the
children afterwards leading to very odd crashes.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench bug in head

2015-08-17 Thread Heikki Linnakangas


On 08/11/2015 06:44 PM, Fabien COELHO wrote:

Probably since 1bc90f7a (shame on me) pgbench does not report skipped
transactions (-L) counts properly: data are not aggregated over all
threads as they should be, either under --progress or in the end of run
report.

The attached patch fixes these regressions.


Applied, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous Transaction is back

2015-08-17 Thread Albe Laurenz

Noah Misch wrote:
   If the autonomous transaction can interact with uncommitted
   work in a way that other backends could not, crazy things happen when the
   autonomous transaction commits and the suspended transaction aborts:
  
   CREATE TABLE t (c) AS SELECT 1;
   BEGIN;
   UPDATE t SET c = 2 WHERE c = 1;
   BEGIN_AUTONOMOUS;
   UPDATE t SET c = 3 WHERE c = 1;
   UPDATE t SET c = 4 WHERE c = 2;
   COMMIT_AUTONOMOUS;
   ROLLBACK;
  
   If you replace the autonomous transaction with a savepoint, the c=3 update
   finds no rows, and the c=4 update changes one row.  When the outer 
   transaction
   aborts, only the original c=1 row remains live.  If you replace the 
   autonomous
   transaction with a dblink/pg_background call, the c=3 update waits
   indefinitely for c=2 to commit or abort, an undetected deadlock.

 My starting expectation is that the semantics of an autonomous transaction
 will be exactly those of dblink/pg_background.  (I said that during the
 unconference session.)  The application would need to read data from tables
 before switching to the autonomous section.  Autonomous transactions are then
 a performance and syntactic help, not a source of new semantics.  Does any
 database have autonomous transactions that do otherwise?

Oracle behaves like that, i.e. it deadlocks with your example:

SQL SELECT * FROM t;

 C
--
 1

SQL CREATE PROCEDURE proc2 IS
  2  PRAGMA AUTONOMOUS_TRANSACTION;
  3  BEGIN
  4  UPDATE t SET c = 3 WHERE c = 1;
  5  UPDATE t SET c = 4 WHERE c = 2;
  6  COMMIT;
  7  END;
  8  /

Procedure created.

SQL CREATE PROCEDURE proc1 IS
  2  BEGIN
  3  UPDATE t SET c = 2 WHERE c = 1;
  4  proc2;
  5  ROLLBACK;
  6  END;
  7  /

Procedure created.

SQL CALL proc1();
CALL proc1()
 *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at LAURENZ.PROC2, line 4
ORA-06512: at LAURENZ.PROC1, line 4

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script other stuff

2015-08-17 Thread Fabien COELHO




v7 is a rebase after another small bug fix in pgbench.


v8 is a rebase after yet another small bug fix in pgbench.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 2517a3a..99670d4 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -261,6 +261,23 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
 benchmarking arguments:
 
 variablelist
+ varlistentry
+  termoption-b/ replaceablescriptname[@weight]//term
+  termoption--builtin/ replaceablescriptname[@weight]//term
+  listitem
+   para
+Add the specified builtin script to the list of executed scripts.
+An optional integer weight after literal@/ allows to adjust the
+probability of drawing the test.
+Available builtin scripts are: literaltpcb-like/,
+literalsimple-update/ and literalselect-only/.
+The provided repleacablescriptname/ needs only to be a prefix
+of the builtin name, hence literalsimp/ would be enough to select
+literalsimple-update/.
+   /para
+  /listitem
+ /varlistentry
+
 
  varlistentry
   termoption-c/option replaceableclients//term
@@ -307,14 +324,15 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
  /varlistentry
 
  varlistentry
-  termoption-f/option replaceablefilename//term
-  termoption--file=/optionreplaceablefilename//term
+  termoption-f/ replaceablefilename[@weight]//term
+  termoption--file=/replaceablefilename[@weight]//term
   listitem
para
-Read transaction script from replaceablefilename/.
+Add a transaction script read from replaceablefilename/ to
+the list of executed scripts.
+An optional integer weight after literal@/ allows to adjust the
+probability of drawing the test.
 See below for details.
-option-N/option, option-S/option, and option-f/option
-are mutually exclusive.
/para
   /listitem
  /varlistentry
@@ -404,10 +422,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
   termoption--skip-some-updates/option/term
   listitem
para
-Do not update structnamepgbench_tellers/ and
-structnamepgbench_branches/.
-This will avoid update contention on these tables, but
-it makes the test case even less like TPC-B.
+Shorthand for option-b simple-update@1/.
/para
   /listitem
  /varlistentry
@@ -499,9 +514,9 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
 Report the specified scale factor in applicationpgbench/'s
 output.  With the built-in tests, this is not necessary; the
 correct scale factor will be detected by counting the number of
-rows in the structnamepgbench_branches/ table.  However, when testing
-custom benchmarks (option-f/ option), the scale factor
-will be reported as 1 unless this option is used.
+rows in the structnamepgbench_branches/ table.
+However, when testing only custom benchmarks (option-f/ option),
+the scale factor will be reported as 1 unless this option is used.
/para
   /listitem
  /varlistentry
@@ -511,7 +526,7 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
   termoption--select-only/option/term
   listitem
para
-Perform select-only transactions instead of TPC-B-like test.
+Shorthand for option-b select-only@1/.
/para
   /listitem
  /varlistentry
@@ -567,6 +582,16 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
  /varlistentry
 
  varlistentry
+  termoption--per-script-stats/option/term
+  listitem
+   para
+Report some statistics per script run by pgbench.
+   /para
+  /listitem
+ /varlistentry
+
+
+ varlistentry
   termoption--sampling-rate=replaceablerate//option/term
   listitem
para
@@ -661,7 +686,20 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
   titleWhat is the quoteTransaction/ Actually Performed in pgbench?/title
 
   para
-   The default transaction script issues seven commands per transaction:
+   Pgbench executes test scripts chosen randomly from a specified list.
+   They include built-in scripts with option-b/ and
+   user-provided custom scripts with option-f/.
+   Each script may be given a relative weight specified after a
+   literal@/ so as to change its drawing probability.
+   The default weight is literal1/.
+ /para
+
+  para
+   The default builtin transaction script (also invoked with option-b tpcb-like/)
+   issues seven commands per transaction over randomly chosen literalaid/,
+   literaltid/, literalbid/ and literalbalance/.
+   The scenario is inspired by the TPC-B benchmark, but is not actually TPC-B,
+   hence the name.
   /para
 
   orderedlist
@@ -675,9

Re: [HACKERS] replication slot restart_lsn initialization

2015-08-17 Thread Peter Eisentraut

On 8/14/15 3:54 AM, Andres Freund wrote:
 I'd name it RESERVE_WAL. My feeling is that the options for the logical
 case are geared towards the output plugin, not the walsender. I think
 it'd be confusing to use (slot_options) differently for physical slots.

Why reserve?  Isn't it preserve?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Andrew Dunstan




On 08/17/2015 03:26 PM, Merlin Moncure wrote:


I'm not sure if this:
update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;

...is a good idea. postgres operators tend to return immutable copies
of the item they are referring to.  In other words, you'd never see a
column operator on the 'left' side of the equals in an update
statement.  I think you need to look at a function to get the behavior
you want:

update test_jsonb_subscript set test_json = jsonb_modify(test_json,
'[a][a1][a2] = 42');]

...as a hypothetical example.   The idea is you need to make a
function that provides the ability to make the complete json you want.
Update statements always make a copy of the record anyways.




Why should jsonb be different from an array? You can assign to an array 
element, using exactly this syntax, except that the index expressions 
have to be integers.


This was discussed at pgcon and generally met with approval. There is 
some demand for it. If Dmitry hadn't done this I would probably have 
done it myself.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error message with plpgsql CONTINUE

2015-08-17 Thread Jim Nasby


On 8/17/15 9:48 AM, Tom Lane wrote:

I'm inclined to think that if we wanted to make this better, the way to
improve it would be to detect the error*at compile time*, and get rid of
this hack in plpgsql_exec_function altogether.  pl_gram.y already
successfully detects cases where CONTINUE mentions a label that doesn't
exist or isn't surrounding the CONTINUE.  What it is missing is that we
don't distinguish labels on loops from labels on non-loop statements, and
thus it can't tell if CONTINUE is referencing a non-loop label or has no
label but is not inside any loop-type statement.  Seems like that detail
could be added to the PLpgSQL_nsitem data structure without a huge amount
of work.


So split PLPGSQL_NSTYPE_LABEL into PLPGSQL_NSTYPE_BLOCK_LABEL and 
PLPGSQL_NSTYPE_LOOP_LABEL, and split opt_block_label and opt_label the 
same way?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Peter Geoghegan

On Mon, Aug 17, 2015 at 12:26 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I'm not sure if this:
 update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;

 ...is a good idea.

This kind of targetlist indirection is already possible with arrays
and composite types.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Jim Nasby


On 8/17/15 12:57 PM, Dmitry Dolgov wrote:

* is it interesting for the community?


We definitely need better ways to manipulate JSON.


* is that a good idea to extend the `ArrayRef` for jsonb? If it's
appropriate, probably we can rename it to `ArrayJsonbRef` of something.
* what can be improved in the code at the top level (function placement,
probably, functionality duplication, etc.)?
* are there any special cases, that I should take care of in this
implementation?


How would this work when you have a JSON array? Postgres array syntax 
suddenly becoming key/value syntax for JSON seems like a pretty bad idea 
to me. Could a different syntax (maybe {}) be used instead?


I'm not sure having the UPDATE you show cause objects to spring to life 
is so great either.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Merlin Moncure

On Mon, Aug 17, 2015 at 12:57 PM, Dmitry Dolgov 9erthali...@gmail.com wrote:
 Hi,

 Some time ago the array-style subscripting for the jsonb data type was
 discussed in this mailing list. I think it will be quite convenient to have
 a such nice syntax to update jsonb objects, so I'm trying to implement this.
 I created a patch, that allows doing something like this:


 =# create TEMP TABLE test_jsonb_subscript (
id int,
test_json jsonb
 );

 =# insert into test_jsonb_subscript values
 (1, '{}'),
 (2, '{}');

 =# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;
 =# select * from test_jsonb_subscript;
  id |test_json
 +--
   1 | {a: {a1: {a2: 42}}}
   2 | {a: {a1: {a2: 42}}}
 (2 rows)

 =# select test_json['a']['a1'] from test_jsonb_subscript;
  test_json
 
  {a2: 42}
  {a2: 42}
 (2 rows)


 This patch has a status work in progress of course. Generally speaking,
 this implementation extends the `ArrayRef` usage for the jsonb.
 And I need some sort of advice about several questions:

 * is it interesting for the community?
 * is that a good idea to extend the `ArrayRef` for jsonb? If it's
 appropriate, probably we can rename it to `ArrayJsonbRef` of something.
 * what can be improved in the code at the top level (function placement,
 probably, functionality duplication, etc.)?
 * are there any special cases, that I should take care of in this
 implementation?

I'm not sure if this:
update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;

...is a good idea. postgres operators tend to return immutable copies
of the item they are referring to.  In other words, you'd never see a
column operator on the 'left' side of the equals in an update
statement.  I think you need to look at a function to get the behavior
you want:

update test_jsonb_subscript set test_json = jsonb_modify(test_json,
'[a][a1][a2] = 42');]

...as a hypothetical example.   The idea is you need to make a
function that provides the ability to make the complete json you want.
Update statements always make a copy of the record anyways.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2015-08-17 Thread Fabien COELHO



Hello Andres,


[...] posix_fadvise().


My current thinking is maybe yes, maybe no:-), as it may depend on the OS
implementation of posix_fadvise, so it may differ between OS.


As long as fadvise has no 'undirty' option, I don't see how that
problem goes away. You're telling the OS to throw the buffer away, so
unless it ignores it that'll have consequences when you read the page
back in.


Yep, probably.

Note that we are talking about checkpoints, which write buffers out 
*but* keep them nevertheless. As the buffer is kept, the OS page is a 
duplicate, and freeing it should not harm, at least immediatly.


The situation is different if the memory is reused in between, which is 
the work of the bgwriter I think, based on LRU/LFU heuristics, but such 
writes are not flushed by the current patch.


Now, if a buffer was recently updated it should not be selected by the 
bgwriter, if the LRU/LFU heuristics works as expected, which mitigate the 
issue somehow...


To sum up, I agree that it is indeed possible that flushing with 
posix_fadvise could reduce read OS-memory hits on some systems for some 
workloads, although not on Linux, see below.


So the option is best kept as off for now, without further data, I'm 
fine with that.


[...] I'd say it should then be an os-specific default. No point in 
making people work for it needlessly on linux and/or elsewhere.


Ok. Version 9 attached does that, on for Linux, off for others because 
of the potential issues you mentioned.



(Another reason to keep it off is that I'm not sure about what
happens with such HD flushing features on virtual servers).


I don't see how that matters? Either the host will entirely ignore
flushing, and thus the sync_file_range and the fsync won't cost much, or
fsync will be honored, in which case the pre-flushing is helpful.


Possibly. I know that I do not know:-)  The distance between the database 
and real hardware is so great in VM, that I think that it may have any 
effect, including good, bad or none:-)



Overall, I'm not pessimistic, because I've seen I/O storms on a FreeBSD host
and it was as bad as Linux (namely the database and even the box was offline
for long minutes...), and if you can avoid that having to read back some
data may be not that bad a down payment.


I don't see how that'd alleviate my fear.


I'm trying to mitigate your fears, not to alleviate them:-)

Sure, the latency for many workloads will be better, but I don't how 
that argument says anything about the reads?


It just says that there may be a compromise, better in some case, possibly 
not so in others, because posix_fadvise does not really say what the 
database would like to say to the OS, this is why I wrote such a large 
comment about it in the source file in the first place.



And we'll not just use this in cases it'd be beneficial...


I'm fine if it is off by default for some systems. If people want to avoid 
write stalls they can use the option, but it may have adverse effect on 
the tps in some cases, that's life? Not using the option also has adverse 
effects in some cases, because you have write stalls... and currently you 
do not have the choice, so it would be a progress.



The issue is largely mitigated if the data is not removed from
shared_buffers, because the OS buffer is just a copy of already hold data.
What I would do on such systems is to increase shared_buffers and keep
flushing on, that is to count less on the system cache and more on postgres
own cache.


That doesn't work that well for a bunch of reasons. For one it's
completely non-adaptive. With the OS's page cache you can rely on free
memory being used for caching *and* it be available should a query or
another program need lots of memory.


Yep. I was thinking about a dedicated database server, not a shared one.


Overall, I'm not convince that the practice of relying on the OS cache is a
good one, given what it does with it, at least on Linux.


The alternatives aren't super realistic near-term though. Using direct
IO efficiently on the set of operating systems we support is
*hard*. [...]


Sure.  This is not necessarily what I had in mind.

Currently pg writes stuff to the OS, and then suddenly calls fsync out 
of the blue, hoping that in between the OS will actually have done a good 
job with the underlying hardware.  This is pretty naive, the fsync 
generates write storms, and the database is offline: trying to improve 
these things is the motivation for this patch.


Now if you think of the bgwriter, it does pretty much the same, and 
probably may generate plenty of random I/Os, because the underlying 
LRU/LFU heuristics used to select buffers does not care about the file 
structures.


So I think that to get good performance the database must take some 
control over the OS. That does not mean that direct I/O needs to be 
involved, although maybe it could, but this patch shows that it is not 
needed to improve things.



Now, if someone could provide a

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Alvaro Herrera

Dmitry Dolgov wrote:

 Some time ago the array-style subscripting for the jsonb data type was
 discussed in this mailing list. I think it will be quite convenient to have
 a such nice syntax to update jsonb objects, so I'm trying to implement
 this. I created a patch, that allows doing something like this:


 =# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;

FWIW we discussed exactly this during the unconference
https://wiki.postgresql.org/wiki/PgCon_2015_Developer_Unconference#Direction_of_json_and_jsonb
and Andrew and Tom seemed okay with this syntax.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication slot restart_lsn initialization

2015-08-17 Thread Andres Freund

On 2015-08-17 15:22:44 -0400, Peter Eisentraut wrote:
 On 8/14/15 3:54 AM, Andres Freund wrote:
  I'd name it RESERVE_WAL.

 Why reserve?  Isn't it preserve?

Hm. I honestly do not know. I was thinking of ticket reservations...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [patch] psql tab completion for grant execute

2015-08-17 Thread Daniel Verite

 Hi,

When tab-completing after GRANT EXECUTE, currently psql injects
PROCEDURE, rather than the expected ON.

The code for completing with ON is there, but it's not reached due to
falling earlier into another branch, one that handles CREATE TRIGGER.

A trivial patch is attached. It adds the condition that if EXECUTE is
preceded by GRANT itself preceded by nothing, then that completion
with PROCEDURE is skipped.

I've looked at fixing it more directly, by testing if the EXECUTE
is part of a CREATE TRIGGER, but it didn't seem fitting to go
looking backwards  that many words into the string (more
than the 5 words suggested by the rest of the code).

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 62cb721..816deda 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2622,6 +2622,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_views, NULL);
 	/* complete CREATE TRIGGER ... EXECUTE with PROCEDURE */
 	else if (pg_strcasecmp(prev_wd, EXECUTE) == 0 
+			 !(pg_strcasecmp(prev2_wd, GRANT) == 0  prev3_wd[0] == '\0') 
 			 prev2_wd[0] != '\0')
 		COMPLETE_WITH_CONST(PROCEDURE);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Jim Nasby


On 8/17/15 3:33 PM, Josh Berkus wrote:

Again, how do we handle missing keys?  Just return NULL?  or ERROR?  I'd
prefer the former, but there will be arguments the other way.


I've been wondering if we should add some kind of strict JSON. My big 
concern is throwing an error if you try to provide duplicate keys, but 
it seems reasonable that json_strict would throw an error if you try to 
reference something that doesn't exist.



This patch has a status work in progress of course. Generally
speaking, this implementation extends the `ArrayRef` usage for the jsonb.
And I need some sort of advice about several questions:

* is it interesting for the community?
* is that a good idea to extend the `ArrayRef` for jsonb? If it's
appropriate, probably we can rename it to `ArrayJsonbRef` of something.
* what can be improved in the code at the top level (function placement,
probably, functionality duplication, etc.)?
* are there any special cases, that I should take care of in this
implementation?

array/key ambiguity is going to be painful.


JSON keys are required to be strings, so maybe it's OK to differentiate 
based on whether the index is a string or a number. Or perhaps we use 
different nomenclature (ie: {} for objects).


We should also think about what would work for hstore. Adding this for 
hstore is clearly separate work, but it'd be bad to close that door here.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Tom Lane

Dmitry Dolgov 9erthali...@gmail.com writes:
 * is that a good idea to extend the `ArrayRef` for jsonb?

No.  Make a new expression node type.

(Salesforce did something similar for an internal feature, and it was a
disaster both for code modularity and performance.  We had to change it to
a separate node type, which I just got finished doing.  Don't go down that
path.  While you're at it, I'd advise that fetch and assignment be two
different node types rather than copying ArrayRef's bad precedent of using
only one.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Josh Berkus

On 08/17/2015 02:18 PM, Jim Nasby wrote:
 On 8/17/15 3:33 PM, Josh Berkus wrote:
 Again, how do we handle missing keys?  Just return NULL?  or ERROR?  I'd
 prefer the former, but there will be arguments the other way.
 
 I've been wondering if we should add some kind of strict JSON. My big
 concern is throwing an error if you try to provide duplicate keys, but
 it seems reasonable that json_strict would throw an error if you try to
 reference something that doesn't exist.

Only if there's demand for it.  Is there?

 array/key ambiguity is going to be painful.
 
 JSON keys are required to be strings, so maybe it's OK to differentiate
 based on whether the index is a string or a number. Or perhaps we use
 different nomenclature (ie: {} for objects).

Well, we did get rid of all of those implicit conversions for a reason.
 So maybe that's good enough?  i.e.

json['a']['b'][1] = 5
assign 5 as the first element in the array 'b' of object 'a'

json['a']['b']['1'] = 5
assign 5 to key '1' of object 'b' of object 'a'

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Memory allocation in spi_printtup()

2015-08-17 Thread Neil Conway

spi_printtup() has the following code (spi.c:1798):

if (tuptable-free == 0)
{
tuptable-free = 256;
tuptable-alloced += tuptable-free;
tuptable-vals = (HeapTuple *) repalloc(tuptable-vals,

   tuptable-alloced * sizeof(HeapTuple));
}

i.e., it grows the size of the tuptable by a fixed amount when it runs
out of space. That seems odd; doubling the size of the table would be
more standard. Does anyone know if there is a rationale for this
behavior?

Attached is a one-liner to double the size of the table when space is
exhausted. Constructing a simple test case in which a large result is
materialized via SPI_execute() (e.g., by passing two large queries to
crosstab() from contrib/tablefunc), this change reduces the time
required to exceed the palloc size limit from ~300 seconds to ~93
seconds on my laptop.

Of course, using SPI_execute() rather than cursors for queries with
large result sets is not a great idea, but there is demonstrably code
that does this (e.g., contrib/tablefunc -- I'll send a patch for that
shortly).

Neil
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index d544ad9..2573b3f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1797,7 +1797,7 @@ spi_printtup(TupleTableSlot *slot, DestReceiver *self)
 
 	if (tuptable-free == 0)
 	{
-		tuptable-free = 256;
+		tuptable-free = tuptable-alloced;
 		tuptable-alloced += tuptable-free;
 		tuptable-vals = (HeapTuple *) repalloc(tuptable-vals,
 	  tuptable-alloced * sizeof(HeapTuple));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configure checks and optimizations/crc32 tests

2015-08-17 Thread Heikki Linnakangas


On 08/14/2015 07:42 PM, Andres Freund wrote:

Going over my vpaths I noticed another problem with the test. With gcc I
get slice-by-8 instead of the runtime variant:

checking for builtin __atomic int64 atomic operations... yes
checking for __get_cpuid... yes
checking for __cpuid... no
checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=... no
checking for _mm_crc32_u8 and _mm_crc32_u32 with CFLAGS=-msse4.2... no
checking which CRC-32C implementation to use... slicing-by-8

That's because I get a warning
conftest.c:179:1: warning: old-style function definition 
[-Wold-style-definition]
  main ()
  ^
and PGAC_SSE42_CRC32_INTRINSICS turns on ac_c_werror_flag. Now I can
work around this by , but I don't really see why that test needs to turn on
-Werror? Isn't the point of using a linker test that we'd get a proper
linker failure if the functions don't exist?


Yeah, it was probably just copy-pasted from the other macros in 
c-compiler.m4 without thinking.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error message with plpgsql CONTINUE

2015-08-17 Thread Pavel Stehule

2015-08-17 6:19 GMT+02:00 Jim Nasby jim.na...@bluetreble.com:

 Calling CONTINUE with a label that's not a loop produces an error message
 with no context info [1]. This is because of

 rc = exec_stmt_block(estate, func-action);
 if (rc != PLPGSQL_RC_RETURN)
 {
 estate.err_stmt = NULL;
 estate.err_text = NULL;

 I trawled through git blame a bit and it looks like it's been that way for
 a very long time.

 I think err_stmt should probably only be reset in the non-return case a
 bit below that. I'm not sure about err_text though. Also, the code treats
 PLPGSQL_RC_OK and PLPGSQL_RC_EXIT the same, which seems like a bug; I would
 think PLPGSQL_RC_EXIT should be handled the same way as CONTINUE.

 If someone can confirm this and tell me what to do about err_text I'll
 submit a patch.


maybe during function exit ?

Regards

Pavel



 [1]
 decibel@decina.attlocal/50703=# do $$
 begin
 outer
 for a in 1..3 loop
 sub
 BEGIN
 inner
 for b in 8..9 loop
 if a=2 then
 continue sub;
 end if;
 raise notice '% %', a, b;
 end loop inner;
 END sub;
 end loop outer;
 end;
 $$;
 NOTICE:  1 8
 NOTICE:  1 9
 ERROR:  CONTINUE cannot be used outside a loop
 CONTEXT:  PL/pgSQL function inline_code_block
 decibel@decina.attlocal/50703=#

 [2]
 https://github.com/postgres/postgres/blob/83604cc42353b6c0de2a3f3ac31f94759a9326ae/src/pl/plpgsql/src/pl_exec.c#L438
 --
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
 Data in Trouble? Get it in Treble! http://BlueTreble.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Potential GIN vacuum bug

2015-08-17 Thread Heikki Linnakangas


On 08/16/2015 12:58 AM, Jeff Janes wrote:

When ginbulkdelete gets called for the first time in a  VACUUM(i.e. stats
== NULL), one of the first things it does is call ginInsertCleanup to get
rid of the pending list.  It does this in lieu of vacuuming the pending
list.

This is important because if there are any dead tids still in the Pending
list, someone else could come along during the vacuum and post the dead
tids into a part of the index that VACUUM has already passed over.

The potential bug is that ginInsertCleanup exits early (ginfast.c lines
796, 860, 898) if it detects that someone else is cleaning up the pending
list, without waiting for that someone else to finish the job.

Isn't this a problem?


Yep, I think you're right. When that code runs as part of VACUUM, it 
should not give up like that.


Hmm, I see other race conditions in that code too. Even if VACUUM wins 
the race you spotted, and performs all the insertions, reaches the end 
of the pending items list, and deletes the pending list pages, it's 
possible that another backend started earlier, and is still processing 
the same items from the pending items list. It will add them to the 
tree, and after it's finished with that it will see that the pending 
list page was already deleted, and bail out. But if there is a dead 
tuple in the pending items list, you have trouble. The other backend 
will re-insert it, and that might happen after VACUUM had already 
removed it from the tree.


Also, ginInsertCleanup() seems to assume that if another backend has 
just finished cleaning up the same page, it will see the page marked as 
deleted. But what if the page is not only marked as deleted, but also 
reused for something else already?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Peter Geoghegan

On Mon, Aug 17, 2015 at 12:26 PM, Merlin Moncure mmonc...@gmail.com wrote:
 ...is a good idea. postgres operators tend to return immutable copies
 of the item they are referring to.

This patch does not add an operator at all, actually. If feels like
there ought to be an operator, but in fact there is not. The parser is
hard-coded to recognize array-style subscripts, which this uses.

While I'm certainly glad that Dmitry took the time to work on this, I
think we will need an operator, too. Or, more accurately, there should
probably be a way to make something like this use some available GIN
index:

postgres=# explain analyze select * from testjsonb where p['a'] = '[1]';
 QUERY PLAN
-
 Seq Scan on testjsonb  (cost=0.00..27.00 rows=7 width=32) (actual
time=0.022..0.023 rows=1 loops=1)
   Filter: (p['a'] = '[1]'::jsonb)
 Planning time: 0.070 ms
 Execution time: 0.054 ms
(4 rows)

This doesn't really matter with arrays, but ISTM that it matters here.
I have no strong feelings on how it should work, but certain things do
seem to suggest themselves. For example, maybe the parser can be made
to create a query tree that uses an indexable operator based on
special-case logic. Although maybe that's a kludge too far, since I
can imagine it breaking other legitimate things. My sense is that this
will need to be discussed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Merlin Moncure

On Mon, Aug 17, 2015 at 2:44 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 08/17/2015 03:26 PM, Merlin Moncure wrote:


 I'm not sure if this:
 update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;

 ...is a good idea. postgres operators tend to return immutable copies
 of the item they are referring to.  In other words, you'd never see a
 column operator on the 'left' side of the equals in an update
 statement.  I think you need to look at a function to get the behavior
 you want:

 update test_jsonb_subscript set test_json = jsonb_modify(test_json,
 '[a][a1][a2] = 42');]

 ...as a hypothetical example.   The idea is you need to make a
 function that provides the ability to make the complete json you want.
 Update statements always make a copy of the record anyways.



 Why should jsonb be different from an array? You can assign to an array
 element, using exactly this syntax, except that the index expressions have
 to be integers.

 This was discussed at pgcon and generally met with approval. There is some
 demand for it. If Dmitry hadn't done this I would probably have done it
 myself.

Yeah, this all makes sense.  I withdraw the statement.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Josh Berkus

On 08/17/2015 10:57 AM, Dmitry Dolgov wrote:
 Hi,
 
 Some time ago the array-style subscripting for the jsonb data type was
 discussed in this mailing list. I think it will be quite convenient to
 have a such nice syntax to update jsonb objects, so I'm trying to
 implement this. I created a patch, that allows doing something like this:

Yaaay!

 =# create TEMP TABLE test_jsonb_subscript (
id int,
test_json jsonb
 );
 
 =# insert into test_jsonb_subscript values
 (1, '{}'),
 (2, '{}');
 
 =# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;
 =# select * from test_jsonb_subscript;
  id |test_json 
 +--
   1 | {a: {a1: {a2: 42}}}
   2 | {a: {a1: {a2: 42}}}
 (2 rows)

So, both perl and python do not allow deep nesting of assignments.
For example:

 d = { a : { } }
 d[a][a1][a2] = 42
Traceback (most recent call last):
  File stdin, line 1, in module
KeyError: 'a1'

... you have to append one key level at a time.  Your approach, on the
other hand, feels more user-friendly to me; I can't tell you the number
of if 'a2' in dic[key] tests I've written.

So, is there any reason why consistency with perl/python behavior would
be more desirable than user-friendliness?  I'm thinking no, but figured
that it's something which needs to come up.

There is one ambiguous case you need to address:

testjson = '{ a : { } }'

SET testjson['a']['a1']['1'] = 42

... so in this case, is '1' a key, or the first item of an array?  how
do we determine that? How does the user assign something to an array?

 
 =# select test_json['a']['a1'] from test_jsonb_subscript;
  test_json  
 
  {a2: 42}
  {a2: 42}
 (2 rows)

Again, how do we handle missing keys?  Just return NULL?  or ERROR?  I'd
prefer the former, but there will be arguments the other way.

 This patch has a status work in progress of course. Generally
 speaking, this implementation extends the `ArrayRef` usage for the jsonb.
 And I need some sort of advice about several questions:
 
 * is it interesting for the community?
 * is that a good idea to extend the `ArrayRef` for jsonb? If it's
 appropriate, probably we can rename it to `ArrayJsonbRef` of something.
 * what can be improved in the code at the top level (function placement,
 probably, functionality duplication, etc.)?
 * are there any special cases, that I should take care of in this
 implementation?

array/key ambiguity is going to be painful.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

75 matches

Mail list logo