Re: [HACKERS] On partitioning

2014-12-05 Thread Amit Kapila
On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote langote_amit...@lab.ntt.co.jp
wrote:
 From: Amit Kapila [mailto:amit.kapil...@gmail.com]
 On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote 
langote_amit...@lab.ntt.co.jp wrote:
 
   The more SQL way would be records (composite types). That would make
   catalog inspection a LOT easier and presumably make it easier to
change the
   partitioning key (I'm assuming ALTER TYPE cascades to stored data).
Records
   are stored internally as tuples; not sure if that would be faster
than a List of
   Consts or a pg_node_tree. Nodes would theoretically allow using
things other
   than Consts, but I suspect that would be a bad idea.
  
 
  While I couldn’t find an example in system catalogs where a
record/composite type is used, there are instances of pg_node_tree at a
number of places like in pg_attrdef and others. Could you please point me
to such a usage for reference?
 

  I think you can check the same by manually creating table
  with a user-defined type.

  Create type typ as (f1 int, f2 text);
  Create table part_tab(c1 int, c2 typ);

 Is there such a custom-defined type used in some system catalog? Just not
sure how one would put together a custom type to use in a system catalog
given the way a system catalog is created. That's my concern but it may not
be valid.


I think you are right.  I think in this case we need something similar
to column pg_index.indexprs which is of type pg_node_tree(which
seems to be already suggested by Robert). So may be we can proceed
with this type and see if any one else has better idea.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] On partitioning

2014-12-05 Thread Amit Langote

From: Amit Kapila [mailto:amit.kapil...@gmail.com] 
On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote langote_amit...@lab.ntt.co.jp 
wrote:
 From: Amit Kapila [mailto:amit.kapil...@gmail.com]
 On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote langote_amit...@lab.ntt.co.jp 
 wrote:
 
   The more SQL way would be records (composite types). That would make
   catalog inspection a LOT easier and presumably make it easier to change 
   the
   partitioning key (I'm assuming ALTER TYPE cascades to stored data). 
   Records
   are stored internally as tuples; not sure if that would be faster than a 
   List of
   Consts or a pg_node_tree. Nodes would theoretically allow using things 
   other
   than Consts, but I suspect that would be a bad idea.
  
 
  While I couldn’t find an example in system catalogs where a 
  record/composite type is used, there are instances of pg_node_tree at a 
  number of places like in pg_attrdef and others. Could you please point me 
  to such a usage for reference?
 

  I think you can check the same by manually creating table
  with a user-defined type.

  Create type typ as (f1 int, f2 text);
  Create table part_tab(c1 int, c2 typ);

 Is there such a custom-defined type used in some system catalog? Just not 
 sure how one would put together a custom type to use in a system catalog 
 given the way a system catalog is created. That's my concern but it may not 
 be valid.


 I think you are right.  I think in this case we need something similar
 to column pg_index.indexprs which is of type pg_node_tree(which
 seems to be already suggested by Robert). So may be we can proceed
 with this type and see if any one else has better idea.

Yeah, with that, I was thinking we may be able to do something like dump a Node 
that describes the range partition bounds or list of allowed values (say, 
RangePartitionValues, ListPartitionValues).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Peter Geoghegan
On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen
anssi.kaariai...@thl.fi wrote:
 For Django's use case this is a requirement. We must inform the user if
 the save() action created a new row or if it modified an existing one.

Can you explain that in more detail, please?

 Another way to do this would be to expose the excluded alias in the
 returning clause. All columns of the excluded alias would be null in
 the case of insert (especially the primary key column), and thus if a
 query
 insert into foobar values(2, '2') on conflict (id) update set 
 other_col=excluded.other_col returning excluded.id
 returns a non-null value, then it was an update.

I don't like that idea much, TBH. Consider this:

postgres=# update upsert u set key = 1 from upsert i returning key;
ERROR:  42702: column reference key is ambiguous
LINE 1: update upsert u set key = 1 from upsert i returning key;
^

So, suppose this example was actually an ON CONFLICT UPDATE query. If
I similarly make the aliases in the ON CONFLICT UPDATE
(target/excluded) visible in the returning list, it becomes
necessary to qualify every column - an ambiguity is introduced by
making both aliases visible, since any non-qualified column in the
RETURNING clause could be from either the target or excluded
alias/RTE. This is particularly annoying for the common, simple cases.
Also, when there was an update in respect of a any given slot, how, in
general, can I be sure that *any* visible excluded.* attribute is not
null (which you suggest as a reliable proxy for the update path having
been taken)? For one thing, the unique index that arbitrates whether
or not we take the alternative path is not restricting to covering
non-nullable attributes. So does the user end up specifying
system/hidden atrributes, just to make what you outline work? That
seems sort of messy.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL regression test suite

2014-12-05 Thread Noah Misch
On Thu, Dec 04, 2014 at 02:42:41PM +0200, Heikki Linnakangas wrote:
 On 10/06/2014 04:21 PM, Heikki Linnakangas wrote:
 This probably needs some further cleanup before it's ready for
 committing. One issues is that it creates a temporary cluster that
 listens for TCP connections on localhost, which isn't safe on a
 multi-user system.
 
 This issue remains. There isn't much we can do about it; SSL doesn't work
 over Unix domain sockets. We could make it work, but that's a whole
 different feature.

A large subset of the test suite could be made secure.  Omit or lock down
trustdb, and skip affected tests.  (Perhaps have an --unsafe-tests option to
reactivate them.)  Instead of distributing frozen keys, generate all keys
on-demand.  Ensure that key files have secure file modes from the start.
Having said that, ...

 How do people feel about including this test suite in the source tree? It's
 probably not suitable for running as part of make check-world, but it's
 extremely handy if you're working on a patch related to SSL. I'd like to
 commit this, even if it has some rough edges. That way we can improve it
 later, rather than have it fall into oblivion. Any objections?

... +1 for having this suite in the tree, even if check-world ignores it.
Echoing Tom's comment, the README should mention its security weakness.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Anssi Kääriäinen
On Fri, 2014-12-05 at 00:21 -0800, Peter Geoghegan wrote:
 On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen
 anssi.kaariai...@thl.fi wrote:
  For Django's use case this is a requirement. We must inform the user if
  the save() action created a new row or if it modified an existing one.
 
 Can you explain that in more detail, please?

Django has a post_save signal. The signal provide information of the
save operation. One piece of information is a created boolean flag. When
set to True the operation was an insert, on False it was an update.
See
https://docs.djangoproject.com/en/1.7/ref/signals/#django.db.models.signals.post_save
for details.

The created flag is typically used to perform some related action. An
example is User and UserProfile models. Each user must have an
UserProfile, so post_save can be used to create an empty userprofile on
creation of user.

If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in
Django for the existing save() method, then it needs to know if the
result was an UPDATE or INSERT. If we are going to use this for other
operations (for example bulk merge of rows to the database), it would be
very convenient to have per-row updated/created information available so
that we can fire the post_save signals for the rows. If we don't have
that information available, it means we can't fire signals, and no
signals means we can't use the bulk merge operation internally as we
have to fire the signals where that happened before.

Outside of Django there are likely similar reasons to want to know if
the result of an operation was a creation of a new row. The reason could
be creation of related row, doing some action in application layer, or
just UI message telling object created successfully vs object updated
successfully.

 - Anssi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New wal format distorts pg_xlogdump --stats

2014-12-05 Thread Heikki Linnakangas

On 12/05/2014 02:31 AM, Andres Freund wrote:

On 2014-12-05 08:58:33 +0900, Michael Paquier wrote:

On Fri, Dec 5, 2014 at 8:09 AM, Andres Freund and...@2ndquadrant.com
wrote:


On 2014-12-04 16:26:02 +0200, Heikki Linnakangas wrote:

Yeah, that's broken.

I propose the attached. Or does anyone want to argue for adding an
XLogRecGetFPILen() accessor macro for the hole_length in xlogreader.h.

It's

not something a redo routine would need, nor most XLOG-reading

applications,

so I thought it's better to just have pg_xlogdump peek into the
DecodedBkpBlock struct directly.


I think both would be justifyable, so I don't really care for now. One
slight reason for wrapping it in xlogreader.h is that we might add
compression of some form or another soon and it'd possibly be better to
have it in xlogreader.h so pg_xlogdump doesn't have to be changed. But
that's really rather minor.


Good point about the compression patch.

I committed this as posted. It's not yet clear how the compression patch 
is going to do the compression. If we just compress all records above a 
certain size, pg_xlogdump should display the compressed and uncompressed 
sizes separately, and an accessor macro for the backup block's size 
would help much. The compression patch will almost certainly need to 
modify pg_xlogdump --stats anyway, so there's not much point trying to 
guess now what it's going to need.



If we go this road and want to be complete, you may as well add access
macros for the image offset, the block image and for the block data stuff.
That would be handy and consistent with the rest, now both approaches are
fine as long as DecodedBkpBlock is in xlogreader.h.


I don't see the point. Let's introduce that if (which I doubt a bit)
there's a user.


Agreed.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-05 Thread Amit Langote


From: Amit Kapila [mailto:amit.kapil...@gmail.com] 
Sent: Friday, December 05, 2014 5:10 PM
To: Amit Langote
Cc: Jim Nasby; Robert Haas; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg 
Hackers
Subject: Re: [HACKERS] On partitioning

On Fri, Dec 5, 2014 at 12:27 PM, Amit Langote langote_amit...@lab.ntt.co.jp 
wrote:
 From: Amit Kapila [mailto:amit.kapil...@gmail.com]
 On Thu, Dec 4, 2014 at 10:46 AM, Amit Langote langote_amit...@lab.ntt.co.jp 
 wrote:
 
   The more SQL way would be records (composite types). That would make
   catalog inspection a LOT easier and presumably make it easier to change 
   the
   partitioning key (I'm assuming ALTER TYPE cascades to stored data). 
   Records
   are stored internally as tuples; not sure if that would be faster than a 
   List of
   Consts or a pg_node_tree. Nodes would theoretically allow using things 
   other
   than Consts, but I suspect that would be a bad idea.
  
 
  While I couldn’t find an example in system catalogs where a 
  record/composite type is used, there are instances of pg_node_tree at a 
  number of places like in pg_attrdef and others. Could you please point me 
  to such a usage for reference?
 

  I think you can check the same by manually creating table
  with a user-defined type.

  Create type typ as (f1 int, f2 text);
  Create table part_tab(c1 int, c2 typ);

 Is there such a custom-defined type used in some system catalog? Just not 
 sure how one would put together a custom type to use in a system catalog 
 given the way a system catalog is created. That's my concern but it may not 
 be valid.


  I think you are right.  I think in this case we need something similar
 to column pg_index.indexprs which is of type pg_node_tree(which
 seems to be already suggested by Robert). So may be we can proceed
 with this type and see if any one else has better idea.

One point raised about/against pg_node_tree was the values represented therein 
would turn out to be too generalized to be used with advantage during planning. 
But, it seems we could deserialize it in advance back to the internal form 
(like an array of a struct) as part of the cached relation data. This overhead 
would only be incurred in case of partitioned tables. Perhaps this is what 
Robert suggested elsewhere.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XLOG_PARAMETER_CHANGE (WAL record) missing two params in its desc routine

2014-12-05 Thread Heikki Linnakangas

On 12/05/2014 04:54 AM, Michael Paquier wrote:

Hi all,

While reading the code in this area this morning, I noticed that
wal_log_hints and track_commit_timestamp are not mentioned in the desc
routine of XLOG_CHANGE_PARAMETER. Also, it is not mentioned in
postgresql.conf.sample that a value update of wal_log_hints requires a
system restart.

In order to fix those things, attached are two patches:
- patch 1 should be applied back to 9.4 where wal_log_hints has been added


You got the wal_level and wal_log_hints strings backwards:

rmgr: XLOGlen (rec/tot): 24/50, tx:  0, lsn: 
0/07172488, prev 0/07172418, desc: PARAMETER_CHANGE max_connections=100 
max_worker_processes=8 max_prepared_xacts=0 max_locks_per_xact=64 
wal_level=false wal_log_hints=hot_standby


Fixed that, and changed true/false to on/off, as that's more 
common phrasing for boolean GUCs.



- patch 2 is master-only, and needs to be applied on top of patch 1.


Same here.

Committed both patches with those fixes, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] check-world failure: dummy_seclabel

2014-12-05 Thread Heikki Linnakangas

On 12/05/2014 07:29 AM, Adam Brightwell wrote:

All,

I've noticed that 'check-world' fails for dummy_seclabel after a 'clean'.
I believe that in commit da34731, the EXTRA_CLEAN statement should have
been removed from 'src/test/modules/dummy_seclabel/Makefile' as well.


Ah, that's why those files kept disappearing.


Attached is a proposed patch that addresses this issue.


Applied, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq pipelining

2014-12-05 Thread Heikki Linnakangas

On 12/05/2014 02:30 AM, Matt Newell wrote:



The explanation of PQgetFirstQuery makes it sound pretty hard to match
up the result with the query. You have to pay attention to PQisBusy.


PQgetFirstQuery should also be valid after
calling PQgetResult and then you don't have to worry about PQisBusy, so I
should probably change the documentation to indicate that is the preferred
usage, or maybe make that the only guaranteed usage, and say the results
are undefined if you call it before calling PQgetResult.  That usage also
makes it consistent with PQgetLastQuery being called immediately after
PQsendQuery.


I changed my second example to call PQgetFirstQuery after PQgetResult instead
of before, and that removes the need to call PQconsumeInput and PQisBusy when
you don't mind blocking.  It makes the example super simple:

PQsendQuery(conn, INSERT INTO test(id) VALUES (DEFAULT),(DEFAULT)
RETURNING id);
query1 = PQgetLastQuery(conn);

/* Duplicate primary key error */
PQsendQuery(conn, UPDATE test SET id=2 WHERE id=1);
query2 = PQgetLastQuery(conn);

PQsendQuery(conn, SELECT * FROM test);
query3 = PQgetLastQuery(conn);

while( (result = PQgetResult(conn)) != NULL )
{
curQuery = PQgetFirstQuery(conn);

if (curQuery == query1)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
if (curQuery == query2)
checkResult(conn,result,curQuery,PGRES_FATAL_ERROR);
if (curQuery == query3)
checkResult(conn,result,curQuery,PGRES_TUPLES_OK);
}

Note that the curQuery == queryX check will work no matter how many results a
query produces.


Oh, that's what the PQgetLastQuery/PQgetNextQuery functions work! I 
didn't understand that before. I'd suggest renaming them to something 
like PQgetSentQuery() and PQgetResultQuery(). The first/last/next names 
made me think that they're used to iterate a list of queries, but in 
fact they're supposed to be used at very different stages.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API

2014-12-05 Thread David Rowley
On 2 December 2014 at 15:36, Craig Ringer cr...@2ndquadrant.com wrote:

 On 12/01/2014 09:51 PM, Marco Nenciarini wrote:
  I think this is a leftover, as you don't use elog afterwards.

 Good catch, fixed.


I've looked over this again and tested it on a windows 8.1 machine. I
cannot find any problems

The only comments about the code I have would maybe be to use some
constants like:

#define FILETIME_PER_SEC 1000L
#define FILETIME_PER_USEC 10

I had to read the Microsoft documentation to see that A file time is a
64-bit value that represents the number of 100-nanosecond intervals that
have elapsed since 12:00 A.M. January 1, 1601 Coordinated Universal Time
(UTC).

http://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx

The attached patch gets rid of those magic numbers, and hopefully makes it
a bit easier to see what's going on.

I agree with the lack of real need to log any sort of errors
if init_win32_gettimeofday() gets any unexpected errors while trying to
lookup GetSystemTimePreciseAsFileTime.

I'm marking this as ready for committer. It seems worth going in just for
the performance improvement alone, never mind the increased clock accuracy.

I'll leave it up to the committer to decide if it's better with or without
the attached patch.

Regards

David Rowley
diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c
index ab4f491..f4d8393 100644
--- a/src/port/gettimeofday.c
+++ b/src/port/gettimeofday.c
@@ -35,6 +35,13 @@
 static const unsigned __int64 epoch = UINT64CONST(1164447360);
 
 /*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_PER_SEC   1000L
+#define FILETIME_PER_USEC  10
+
+/*
  * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a
  * signature, so we can just store a pointer to whichever we find. This
  * is the pointer's type.
@@ -98,8 +105,9 @@ gettimeofday(struct timeval * tp, struct timezone * tzp)
ularge.LowPart = file_time.dwLowDateTime;
ularge.HighPart = file_time.dwHighDateTime;
 
-   tp-tv_sec = (long) ((ularge.QuadPart - epoch) / 1000L);
-   tp-tv_usec = (long) (((ularge.QuadPart - epoch) % 1000L) / 10);
+   tp-tv_sec = (long) ((ularge.QuadPart - epoch) / FILETIME_PER_SEC);
+   tp-tv_usec = (long) (((ularge.QuadPart - epoch) % FILETIME_PER_SEC)
+   / FILETIME_PER_USEC);
 
return 0;
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Bruce Momjian
On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
 Standard regression tests are helpful because patch authors include new test
 cases in the patches that stress their new options or commands.  This new test
 framework needs to be something that internally runs the regression tests and
 exercises DDL deparsing as the regression tests execute DDL.  That would mean
 that new commands and options would automatically be deparse-tested by our new
 framework as soon as patch authors add standard regress support.

Are you saying every time a new option is added to a command that a new
regression test needs to be added?  We don't normally do that, and it
could easily bloat the regression tests over time.  In summary, this
testing will help, but it will not be fully reliable.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Alvaro Herrera
Bruce Momjian wrote:
 On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
  Standard regression tests are helpful because patch authors include new test
  cases in the patches that stress their new options or commands.  This new 
  test
  framework needs to be something that internally runs the regression tests 
  and
  exercises DDL deparsing as the regression tests execute DDL.  That would 
  mean
  that new commands and options would automatically be deparse-tested by our 
  new
  framework as soon as patch authors add standard regress support.
 
 Are you saying every time a new option is added to a command that a new
 regression test needs to be added?

Not necessarily -- an existing test could be modified, as well.

 We don't normally do that,

I sure hope we do have all options covered by tests.

 and it could easily bloat the regression tests over time.

We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
qualify as bloat?

 In summary, this testing will help, but it will not be fully reliable.

No testing is ever fully reliable.  If it were, there would never be
bugs.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb: Improve error recovery.

2014-12-05 Thread Heikki Linnakangas

On 11/27/2014 02:28 PM, Mats Erik Andersson wrote:

Hello there,

I would like to improve error recovery of initdb when the
password file is empty. The present code declares Error 0
as the cause of failure. It suffices to use ferrer() since
fgets() returns NULL also at a premature EOF.


Thanks, committed.


In addition, a minor case is the need of a line feed in
order to print the error message on a line of its own
seems desirable.


Hmm. If you've piped stdout somewhere else, that produces a spurious 
empty line in stderr:


$ bin/initdb -D data-foo  /dev/null

initdb: input file /home/heikki/pgsql.master/share/postgres.bki does 
not belong to PostgreSQL 9.5devel

Check your installation or specify the correct path using the option -L.
initdb: removing data directory data-foo

I think the newline needs to go to stdout instead.

But in any case, that's just one error out of many similar ones that can 
happen during initdb. If we care enough to fix this, we should fix them 
all. Not sure it's worth the churn, though, unless you can come up with 
some kind of a simple wholesale solution, rather than adding 
fprintf(stdout, \n) calls to every error condition.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] superuser() shortcuts

2014-12-05 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Dec 4, 2014 at 3:59 PM, Stephen Frost sfr...@snowman.net wrote:
  I have a hard time wrapping my head around what a *lot* of our users ask
  when they see a given error message, but if our error message is 'you
  must have a pear-shaped object to run this command' then I imagine that
  a new-to-PG user might think well, I can't create pear shaped objects
  in PG, so I guess this just isn't supported.  That's not necessairly
  any fault of ours, but I do think permission denied reduces the chance
  that there's any confusion about the situation.
 
 This is just ridiculous.  You're postulating the existing of a person
 who thinks that a replication role is something that they buy at
 Wendi's rather than something granted by the database administrator.
 Give me a break.

I was pointing out that it might not be recognizable as a permission by
a person new to PG, yes.  I did not characterize that individual as a
database administrator, nor would they characterize themselves that way
(at least, the person I'm thinking of).

  I do prefer my version but it's not an unfounded preference.
 
 I never said that your preference was unfounded.  I said that I
 disagreed with it.

And this doesn't seem like it's going to change and certainly not over
email and so I'm going to go with Alvaro's approach and simply drop it.
I'll finish up the changes that I commented on above and move on.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] compress method for spgist - 2

2014-12-05 Thread Heikki Linnakangas

On 12/01/2014 02:44 PM, Teodor Sigaev wrote:


Initial message:
http://www.postgresql.org/message-id/5447b3ff.2080...@sigaev.ru

Second version fixes a forgotten changes in pg_am.



+   /* Get the information we need about each relevant datatypes */
+
+   if (OidIsValid(cache-config.leafType)  
cache-config.leafType != atttype)
+   {
+   if (!OidIsValid(index_getprocid(index, 1, 
SPGIST_COMPRESS_PROC)))
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg(compress method is not 
defined although input and leaf types are differ)));
+
+   fillTypeDesc(cache-attType, cache-config.leafType);
+   }
+   else
+   {
+   fillTypeDesc(cache-attType, atttype);
+   }
+


For some datatypes, the compress method might be useful even if the leaf 
type is the same as the column type. For example, you could allow 
indexing text datums larger than the page size, with a compress function 
that just truncates the input.


Could you find some use for this in one of the built-in or contrib 
types? Just to have something that exercises it as part of the 
regression suite. How about creating an opclass for the built-in polygon 
type that stores the bounding box, like the PostGIS guys are doing?


The documentation needs to be updated.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Rahila Syed
I attempted quick review and could not come up with much except this

+   /*
+* Calculate the amount of FPI data in the record. Each backup block
+* takes up BLCKSZ bytes, minus the hole length.
+*
+* XXX: We peek into xlogreader's private decoded backup blocks for the
+* hole_length. It doesn't seem worth it to add an accessor macro for
+* this.
+*/
+   fpi_len = 0;
+   for (block_id = 0; block_id = record-max_block_id; block_id++)
+   {
+   if (XLogRecHasCompressedBlockImage(record, block_id))
+   fpi_len += BLCKSZ - record-blocks[block_id].compress_len;


IIUC, fpi_len in case of compressed block image should be

fpi_len = record-blocks[block_id].compress_len;


Thank you,
Rahila Syed 




--
View this message in context: 
http://postgresql.nabble.com/Compression-of-full-page-writes-tp5769039p5829403.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of GetUserId() Usage

2014-12-05 Thread Stephen Frost
* Stephen Frost (sfr...@snowman.net) wrote:
 * Stephen Frost (sfr...@snowman.net) wrote:
   3. It messes around with pg_signal_backend().  There are currently no
   cases in which pg_signal_backend() throws an error, which is good,
   because it lets you write queries against pg_stat_activity() that
   don't fail halfway through, even if you are missing permissions on
   some things.  This patch introduces such a case, which is bad.
  
  Good point, I'll move that check up into the other functions, which will
  allow for a more descriptive error as well.
 
 Err, I'm missing something here, as pg_signal_backend() is a misc.c
 static internal function?  How would you be calling it from a query
 against pg_stat_activity()?
 
 I'm fine making the change anyway, just curious..

Updated patch attached which move the ereport() out of
pg_signal_backend() and into its callers, as the other permissions
checks are done, and includes the documentation changes.  The error
messages are minimally changed to match the new behvaior.

Thanks,

Stephen
From c60718e7a7ecb8d671627271533d6bd2b6878782 Mon Sep 17 00:00:00 2001
From: Stephen Frost sfr...@snowman.net
Date: Mon, 1 Dec 2014 11:13:09 -0500
Subject: [PATCH] GetUserId() changes to has_privs_of_role()

The pg_stat and pg_signal-related functions have been using GetUserId()
instead of has_privs_of_role() for checking if the current user should
be able to see details in pg_stat_activity or signal other processes,
requiring a user to do 'SET ROLE' for inheirited roles for a permissions
check, unlike other permissions checks.

This patch changes that behavior to, instead, act like most other
permission checks and use has_privs_of_role(), removing the 'SET ROLE'
need.  Documentation and error messages updated accordingly.

Per discussion with Alvaro, Peter, Adam (though not using Adam's patch),
and Robert.
---
 doc/src/sgml/func.sgml  | 13 ++---
 src/backend/utils/adt/misc.c| 39 +
 src/backend/utils/adt/pgstatfuncs.c | 19 +-
 3 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 62ec275..dbb61dc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16121,9 +16121,9 @@ SELECT set_config('log_statement_stats', 'off', false);
 literalfunctionpg_cancel_backend(parameterpid/parameter typeint/)/function/literal
 /entry
entrytypeboolean/type/entry
-   entryCancel a backend's current query.  You can execute this against
-another backend that has exactly the same role as the user calling the
-function.  In all other cases, you must be a superuser.
+   entryCancel a backend's current query.  This is also allowed if the
+calling role is a member of the role whose backend is being cancelled,
+however only superusers can cancel superuser backends.
 /entry
   /row
   row
@@ -16145,10 +16145,9 @@ SELECT set_config('log_statement_stats', 'off', false);
 literalfunctionpg_terminate_backend(parameterpid/parameter typeint/)/function/literal
 /entry
entrytypeboolean/type/entry
-   entryTerminate a backend.  You can execute this against
-another backend that has exactly the same role as the user
-calling the function.  In all other cases, you must be a
-superuser.
+   entryTerminate a backend.  This is also allowed if the calling role
+is a member of the role whose backend is being terminated, however only
+superusers can terminate superuser backends.
/entry
   /row
  /tbody
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 67539ec..2e74eb9 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -37,6 +37,7 @@
 #include utils/lsyscache.h
 #include utils/ruleutils.h
 #include tcop/tcopprot.h
+#include utils/acl.h
 #include utils/builtins.h
 #include utils/timestamp.h
 
@@ -81,7 +82,9 @@ current_query(PG_FUNCTION_ARGS)
  * role as the backend being signaled. For dangerous signals, an explicit
  * check for superuser needs to be done prior to calling this function.
  *
- * Returns 0 on success, 1 on general failure, and 2 on permission error.
+ * Returns 0 on success, 1 on general failure, 2 on normal permission error
+ * and 3 if the caller needs to be a superuser.
+ *
  * In the event of a general failure (return code 1), a warning message will
  * be emitted. For permission errors, doing that is the responsibility of
  * the caller.
@@ -89,6 +92,7 @@ current_query(PG_FUNCTION_ARGS)
 #define SIGNAL_BACKEND_SUCCESS 0
 #define SIGNAL_BACKEND_ERROR 1
 #define SIGNAL_BACKEND_NOPERMISSION 2
+#define SIGNAL_BACKEND_NOSUPERUSER 3
 static int
 pg_signal_backend(int pid, int sig)
 {
@@ -113,7 +117,12 @@ pg_signal_backend(int pid, int sig)
 		return SIGNAL_BACKEND_ERROR;
 	}
 
-	if (!(superuser() || 

Re: [HACKERS] Review of GetUserId() Usage

2014-12-05 Thread Andres Freund
On 2014-12-05 09:28:13 -0500, Stephen Frost wrote:
  static int
  pg_signal_backend(int pid, int sig)
  {
 @@ -113,7 +117,12 @@ pg_signal_backend(int pid, int sig)
   return SIGNAL_BACKEND_ERROR;
   }
  
 - if (!(superuser() || proc-roleId == GetUserId()))
 + /* Only allow superusers to signal superuser-owned backends. */
 + if (superuser_arg(proc-roleId)  !superuser())
 + return SIGNAL_BACKEND_NOSUPERUSER;
 +
 + /* Users can signal backends they have role membership in. */
 + if (!has_privs_of_role(GetUserId(), proc-roleId))
   return SIGNAL_BACKEND_NOPERMISSION;
  
   /*
 @@ -141,35 +150,49 @@ pg_signal_backend(int pid, int sig)
  }

Is the 'Only allow superusers to signal superuser-owned backends' check
actually safe that way? I personally try to never use a superuser role
as the login user, but grant my account a superuser role that doesn't
inherit. But IIRC PGPROC-roleId won't change, even if a user does SET
ROLE.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of GetUserId() Usage

2014-12-05 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 Is the 'Only allow superusers to signal superuser-owned backends' check
 actually safe that way? I personally try to never use a superuser role
 as the login user, but grant my account a superuser role that doesn't
 inherit. But IIRC PGPROC-roleId won't change, even if a user does SET
 ROLE.

You're correct- but it's exactly the same as it is today.  If you grant
another user your role and then they 'SET ROLE' to you, they can cancel
any of your queries or terminate your backends, regardless of if those
roles have done some other 'SET ROLE'.  This change only removes the
need for those users to 'SET ROLE' to your user first.

The backend isn't considered 'superuser-owned' unless it's the login
role that's a superuser.  It might be interesting to change that to mean
'when a SET ROLE to superuser has been done', but what about security
definer functions or other transient escalation to superuser?  Would
those calls have to muck with PGPROC-roleId?

If we want to go there, it should definitely be a different patch.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] XLOG_PARAMETER_CHANGE (WAL record) missing two params in its desc routine

2014-12-05 Thread Michael Paquier
On Fri, Dec 5, 2014 at 7:12 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 On 12/05/2014 04:54 AM, Michael Paquier wrote:

 Hi all,

 While reading the code in this area this morning, I noticed that
 wal_log_hints and track_commit_timestamp are not mentioned in the desc
 routine of XLOG_CHANGE_PARAMETER. Also, it is not mentioned in
 postgresql.conf.sample that a value update of wal_log_hints requires a
 system restart.

 In order to fix those things, attached are two patches:
 - patch 1 should be applied back to 9.4 where wal_log_hints has been added


 You got the wal_level and wal_log_hints strings backwards:

Oops. Thanks for double-checking.
-- 
Michael


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Stephen Frost
José,

* José Luis Tallón (jltal...@adv-solutions.net) wrote:
 On 12/04/2014 07:35 AM, Amit Kapila wrote:
 The number of worker backends that can be used for
 parallel seq scan can be configured by using a new GUC
 parallel_seqscan_degree, the default value of which is zero
 and it means parallel seq scan will not be considered unless
 user configures this value.
 
 The number of parallel workers should be capped (of course!) at the
 maximum amount of processors (cores/vCores, threads/hyperthreads)
 available.
 
 More over, when load goes up, the relative cost of parallel working
 should go up as well.
 Something like:
 p = number of cores
 l = 1min-load
 
 additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
 
 (for c1, of course)

While I agree in general that we'll need to come up with appropriate
acceptance criteria, etc, I don't think we want to complicate this patch
with that initially.  A SUSET GUC which caps the parallel GUC would be
enough for an initial implementation, imv.

 Not directly (I haven't had the time to read the code yet), but I'm
 thinking about the ability to simply *replace* executor methods from
 an extension.

You probably want to look at the CustomScan thread+patch directly then..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Stephen Frost
Amit,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
 postgres=# explain select c1 from t1;
   QUERY PLAN
 --
  Seq Scan on t1  (cost=0.00..101.00 rows=100 width=4)
 (1 row)
 
 
 postgres=# set parallel_seqscan_degree=4;
 SET
 postgres=# explain select c1 from t1;
   QUERY PLAN
 --
  Parallel Seq Scan on t1  (cost=0.00..25.25 rows=100 width=4)
Number of Workers: 4
Number of Blocks Per Workers: 25
 (3 rows)

This is all great and interesting, but I feel like folks might be
waiting to see just what kind of performance results come from this (and
what kind of hardware is needed to see gains..).  There's likely to be
situations where this change is an improvement while also being cases
where it makes things worse.

One really interesting case would be parallel seq scans which are
executing against foreign tables/FDWs..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Andres Freund
On 2014-12-06 00:10:11 +0900, Michael Paquier wrote:
 On Sat, Dec 6, 2014 at 12:06 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:
 
 
 
 
  On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed rahilasyed...@gmail.com
  wrote:
 
  I attempted quick review and could not come up with much except this
 
  +   /*
  +* Calculate the amount of FPI data in the record. Each backup block
  +* takes up BLCKSZ bytes, minus the hole length.
  +*
  +* XXX: We peek into xlogreader's private decoded backup blocks for
  the
  +* hole_length. It doesn't seem worth it to add an accessor macro for
  +* this.
  +*/
  +   fpi_len = 0;
  +   for (block_id = 0; block_id = record-max_block_id; block_id++)
  +   {
  +   if (XLogRecHasCompressedBlockImage(record, block_id))
  +   fpi_len += BLCKSZ - record-blocks[block_id].compress_len;
 
 
  IIUC, fpi_len in case of compressed block image should be
 
  fpi_len = record-blocks[block_id].compress_len;
 
  Yep, true. Patches need a rebase btw as Heikki fixed a commit related to
  the stats of pg_xlogdump.
 
 
 In any case, any opinions to switch this patch as Ready for committer?

Needing a rebase is a obvious conflict to that... But I guess some wider
looks afterwards won't hurt.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw does not see enums

2014-12-05 Thread Merlin Moncure
On Wed, Dec 3, 2014 at 5:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 David Fetter da...@fetter.org writes:
 On Wed, Dec 03, 2014 at 05:52:03PM -0500, Tom Lane wrote:
 What do you mean reconstruct the enum?

 Capture its state at the time when IMPORT FOREIGN SCHEMA is executed.
 Right now, if you try IMPORT SCHEMA on a foreign table with an enum in
 it, postgresql_fdw errors out rather than trying to notice that
 there's an enum definition which should precede creation and execute
 it in the correct order.

 Oh, you think IMPORT FOREIGN SCHEMA should try to import enums?
 I doubt it.  What happens if the enum already exists locally?
 And why enums, and not domains, ranges, composite types, etc?

Probably IMPORT FOREIGN SCHEMA should not attempt to include type
dependencies.

However, if they are present in the importer (that is, the type exists
by name), it should assume that they are correct come what may.
Something like 'IMPORT FOREIGN TYPE'  would probably be needed to
translate a type between servers.  Unless the SQL standard has it or
gets it I doubt it will ever appear but the status quo isn't too bad
IMO.  Personally I have no issues with the risks involved with type
synchronizion; the problems faced are mostly academic with clean
errors and easily managed unless binary format is used with postgres
to postgres transfer (which IIRC the postgres fdw does not utilize at
this time).

User created types can't be transmitted between servers with the
existing binary format; you have to transmit them as text and hope the
structures agree.  Binary format transmission in postgres tends to be
quite a bit faster depending on the nature of the types involved,
things like ints, numerics, and timestamps tend to be much faster.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-12-05 Thread Robert Haas
On Thu, Dec 4, 2014 at 8:45 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Here is patch which renames action_at_recovery_target to
 recovery_target_action everywhere.
 Thanks, Looks good to me.

 A couple of things that would be good to document as well in
 recovery-config.sgml:
 - the fact that pause_at_recovery_target is deprecated.

Why not just remove it altogether?  I don't think the
backward-compatibility break is enough to get excited about, or to
justify the annoyance of carrying an extra parameter that does (part
of) the same thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Robert Haas
On Thu, Dec 4, 2014 at 1:27 PM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Dec 4, 2014 at 3:04 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 Yes, I think that's pretty important. With a negative attno so it's
 treated as a hidden col that must be explicitly named to be shown and
 won't be confused with user columns.

 I think that the standard for adding a new system attribute ought to
 be enormous. The only case where a new one was added post-Postgres95
 was tableoid. I'm pretty sure that others aren't going to want to do
 it that way.

+1.  System attributes are a mess; a negative attribute number implies
not only that the column is by default hidden from view but also that
it is stored in the tuple header rather than the tuple data.  Using
one here, where we're not even talking about a property of the tuple
per se, would be a pretty goofy solution.

 Besides, I'm not entirely convinced that this is actually
 an important distinction to expose.

I think it's probably an important distinction, for the kinds of
reasons Anssi mentions, but we should look for some method other than
a system column of indicating it.  Maybe there's a magic function that
returns a Boolean which you can call, or maybe some special clause, as
with WITH ORDINALITY.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-12-05 Thread Michael Paquier
On Sat, Dec 6, 2014 at 12:49 AM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Dec 4, 2014 at 8:45 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  Here is patch which renames action_at_recovery_target to
  recovery_target_action everywhere.
  Thanks, Looks good to me.
 
  A couple of things that would be good to document as well in
  recovery-config.sgml:
  - the fact that pause_at_recovery_target is deprecated.

 Why not just remove it altogether?  I don't think the
 backward-compatibility break is enough to get excited about, or to
 justify the annoyance of carrying an extra parameter that does (part
 of) the same thing.

At least we won't forget removing in the future something that has been
marked as deprecated for years. So +1 here for a simple removal, and a
mention in the future release notes.
-- 
Michael


Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas
On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote
langote_amit...@lab.ntt.co.jp wrote:
 So, we're going to support exactly two levels of partitioning?
 partitions with partissub=false and subpartitions with partissub=true?
  Why not support only one level of partitioning here but then let the
 children have their own pg_partitioned_rel entries if they are
 subpartitioned?  That seems like a cleaner design and lets us support
 an arbitrary number of partitioning levels if we ever need them.

 Yeah, that's what I thought at some point in favour of dropping partissub 
 altogether. However, not that this design solves it, there is one question - 
 if we would want to support defining for a table both partition key and 
 sub-partition key in advance? That is, without having defined a first level 
 partition yet; in that case, what level do we associate sub-(sub-) 
 partitioning key with or more to the point where do we keep it?

Do we really need to allow that?  I think you let people partition a
toplevel table, and then partition its partitions once they've been
created.  I'm not sure there's a good reason to associate the
subpartitioning scheme with the toplevel table.  For one thing, that
forces all subpartitions to be partitioned the same way - do we want
to insist on that?  If we do, then I agree that we need to think a
little harder here.

 That would be a default partition. That is, where the tuples that don't 
 belong elsewhere (other defined partitions) go. VALUES clause of the 
 definition for such a partition would look like:

 (a range partition) ... VALUES LESS THAN MAXVALUE
 (a list partition) ... VALUES DEFAULT

 There has been discussion about whether there shouldn't be such a place for 
 tuples to go. That is, it should generate an error if a tuple can't go 
 anywhere (or support auto-creating a new one like in interval partitioning?)

I think Alvaro's response further down the thread is right on target.
But to go into a bit more detail, let's consider the three possible
cases:

- Hash partitioning.  Every key value gets hashed to some partition.
The concept of an overflow or default partition doesn't even make
sense.

- List partitioning.  Each key for which the user has defined a
mapping gets sent to the corresponding partition.  The keys that
aren't mapped anywhere can either (a) cause an error or (b) get mapped
to some default partition.  It's probably useful to offer both
behaviors.  But I don't think it requires a partitionisoverflow
column, because you can represent it some other way, such as by making
partitionvalues NULL, which is otherwise meaningless.

- Range partitioning.  In this case, what you've basically got is a
list of partition bounds and a list of target partitions.   Suppose
there are N partition bounds; then there will be N+1 targets.  Some of
those targets can be undefined, meaning an attempt to insert a key
with that value will error out.  For example, suppose the user defines
a partition for values 1-3 and 10-13.  Then your list of partition
bounds looks like this:

1,3,10,13

And your list of destinations looks like this:

undefined,firstpartition,undefined,secondpartition,undefined

More commonly, the ranges will be contiguous, so that there are no
gaps.  If you have everything 10 in the first partition, everything
10-20 in the second partition, and everything else in a third
partition, then you have bounds 10,20 and destinations
firstpartition,secondpartition,thirdpartition.  If you want values
greater than 20 to error out, then you have bounds 10,20 and
destinations firstpartition,secondpartition,undefined.

In none of this do you really have an overflow partition.  Rather,
the first and last destinations, if defined, catch everything that has
a key lower than the lowest key or higher than the highest key.  If
not defined, you error out.

 I wonder if your suggestion of pg_node_tree plays well here. This then could 
 be a list of CONSTs or some such... And I am thinking it's a concern only for 
 range partitions, no? (that is, a multicolumn partition key)

I guess you could list or hash partition on multiple columns, too.
And yes, this is why I though of pg_node_tree.

  * DDL syntax (no multi-column partitioning, sub-partitioning support as 
  yet):
 
  -- create partitioned table and child partitions at once.
  CREATE TABLE parent (...)
  PARTITION BY [ RANGE | LIST ] (key_column) [ opclass ]
  [ (
   PARTITION child
 {
 VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
   | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
 }
 [ WITH ( ... ) ] [ TABLESPACE tbs ]
   [, ...]
) ] ;

 How are you going to dump and restore this, bearing in mind that you
 have to preserve a bunch of OIDs across pg_upgrade?  What if somebody
 wants to do pg_dump --table name_of_a_partition?

 Assuming everything's (including partitioned relation and partitions at all 
 levels) got a pg_class entry of its own, would OIDs be a problem? Or 

Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas
On Fri, Dec 5, 2014 at 2:18 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Do we really need to support dml or pg_dump for individual partitions?

I think we do.  It's quite reasonable for a DBA (or developer or
whatever) to want to dump all the data that's in a single partition;
for example, maybe they have the table partitioned, but also spread
across several servers.  When the data on one machine grows too big,
they want to dump that partition, move it to a new machine, and drop
the partition from the old machine.  That needs to be easy and
efficient.

More generally, with inheritance, I've seen the ability to reference
individual inheritance children be a real life-saver on any number of
occasions.  Now, a new partitioning system that is not as clunky as
constraint exclusion will hopefully be fast enough that people don't
need to do it very often any more.  But I would be really cautious
about removing the option.  That is the equivalent of installing a new
fire suppression system and then boarding up the emergency exit.
Yeah, you *hope* the new fire suppression system is good enough that
nobody will ever need to go out that way any more.  But if you're
wrong, people will die, so getting rid of it isn't prudent.  The
stakes are not quite so high here, but the principle is the same.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas
On Fri, Dec 5, 2014 at 3:11 AM, Amit Langote
langote_amit...@lab.ntt.co.jp wrote:
 I think you are right.  I think in this case we need something similar
 to column pg_index.indexprs which is of type pg_node_tree(which
 seems to be already suggested by Robert). So may be we can proceed
 with this type and see if any one else has better idea.

 Yeah, with that, I was thinking we may be able to do something like dump a 
 Node that describes the range partition bounds or list of allowed values 
 (say, RangePartitionValues, ListPartitionValues).

That's exactly what the kind of thing I was thinking about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-12-05 Thread Robert Haas
On Fri, Dec 5, 2014 at 1:49 AM, Rahila Syed rahilasyed...@gmail.com wrote:
If that's really true, we could consider having no configuration any
time, and just compressing always.  But I'm skeptical that it's
actually true.

 I was referring to this for CPU utilization:
 http://www.postgresql.org/message-id/1410414381339-5818552.p...@n5.nabble.com
 http://

 The above tests were performed on machine with configuration as follows
 Server specifications:
 Processors:Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T, 20 MB) * 2 nos
 RAM: 32GB
 Disk : HDD  450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos
 1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s, 10,000 rpm

I think that measurement methodology is not very good for assessing
the CPU overhead, because you are only measuring the percentage CPU
utilization, not the absolute amount of CPU utilization.  It's not
clear whether the duration of the tests was the same for all the
configurations you tried - in which case the number of transactions
might have been different - or whether the number of operations was
exactly the same - in which case the runtime might have been
different.  Either way, it could obscure an actual difference in
absolute CPU usage per transaction.  It's unlikely that both the
runtime and the number of transactions were identical for all of your
tests, because that would imply that the patch makes no difference to
performance; if that were true, you wouldn't have bothered writing
it

What I would suggest is instrument the backend with getrusage() at
startup and shutdown and have it print the difference in user time and
system time.  Then, run tests for a fixed number of transactions and
see how the total CPU usage for the run differs.

Last cycle, Amit Kapila did a bunch of work trying to compress the WAL
footprint for updates, and we found that compression was pretty darn
expensive there in terms of CPU time.  So I am suspicious of the
finding that it is free here.  It's not impossible that there's some
effect which causes us to recoup more CPU time than we spend
compressing in this case that did not apply in that case, but the
projects are awfully similar, so I tend to doubt it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Josh Berkus
On 12/05/2014 07:59 AM, Robert Haas wrote:
 I think it's probably an important distinction, for the kinds of
 reasons Anssi mentions, but we should look for some method other than
 a system column of indicating it.  Maybe there's a magic function that
 returns a Boolean which you can call, or maybe some special clause, as
 with WITH ORDINALITY.

I thought the point of INSERT ... ON CONFLICT update was so that you
didn't have to care if it was a new row or not?

If you do care, it seems like it makes more sense to do your own INSERTs
and UPDATEs, as Django currently does.

I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff
which let me know updated|inserted|ignored, but I also don't see it as a
feature requirement for 9.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-05 Thread Adam Brightwell
All,

I have attached an updated patch that incorporates the feedback and
recommendations provided.

Thanks,
Adam


-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 133143d..ccdff09
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 22,32 
--- 22,34 
  #include access/xlog_internal.h
  #include access/xlogutils.h
  #include catalog/catalog.h
+ #include catalog/pg_authid.h
  #include catalog/pg_type.h
  #include funcapi.h
  #include miscadmin.h
  #include replication/walreceiver.h
  #include storage/smgr.h
+ #include utils/acl.h
  #include utils/builtins.h
  #include utils/numeric.h
  #include utils/guc.h
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,60 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg(must be superuser or replication role to run a backup)));
--- 56,62 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser()  !check_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg(must be superuser or replication role to run a backup)));
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,88 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg(must be superuser or replication role to run a backup;
--- 84,90 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser()  !check_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg(must be superuser or replication role to run a backup;
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
new file mode 100644
index d30612c..0b23ba0
*** a/src/backend/catalog/aclchk.c
--- b/src/backend/catalog/aclchk.c
*** aclcheck_error_type(AclResult aclerr, Oi
*** 3423,3448 
  }
  
  
- /* Check if given user has rolcatupdate privilege according to pg_authid */
- static bool
- has_rolcatupdate(Oid roleid)
- {
- 	bool		rolcatupdate;
- 	HeapTuple	tuple;
- 
- 	tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid));
- 	if (!HeapTupleIsValid(tuple))
- 		ereport(ERROR,
- (errcode(ERRCODE_UNDEFINED_OBJECT),
-  errmsg(role with OID %u does not exist, roleid)));
- 
- 	rolcatupdate = ((Form_pg_authid) GETSTRUCT(tuple))-rolcatupdate;
- 
- 	ReleaseSysCache(tuple);
- 
- 	return rolcatupdate;
- }
- 
  /*
   * Relay for the various pg_*_mask routines depending on object kind
   */
--- 3423,3428 
*** pg_class_aclmask(Oid table_oid, Oid role
*** 3630,3636 
  	if ((mask  (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) 
  		IsSystemClass(table_oid, classForm) 
  		classForm-relkind != RELKIND_VIEW 
! 		!has_rolcatupdate(roleid) 
  		!allowSystemTableMods)
  	{
  #ifdef ACLDEBUG
--- 3610,3616 
  	if ((mask  (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) 
  		IsSystemClass(table_oid, classForm) 
  		classForm-relkind != RELKIND_VIEW 
! 		!has_role_attribute(roleid, ROLE_ATTR_CATUPDATE) 
  		!allowSystemTableMods)
  	{
  #ifdef ACLDEBUG
*** pg_extension_ownercheck(Oid ext_oid, Oid
*** 5051,5102 
  }
  
  /*
!  * Check whether specified role has CREATEROLE privilege (or is a superuser)
   *
!  * Note: roles do not have owners per se; instead we use this test in
!  * places where an ownership-like permissions test is needed for a role.
!  * Be sure to apply it to the role trying to do the operation, not the
!  * role being operated on!	Also note that this generally should not be
!  * considered enough privilege if the target role is a superuser.
!  * (We don't handle that consideration here because we want to give a
!  * separate error message for such cases, so the caller has to deal with it.)
   */
  bool
! has_createrole_privilege(Oid roleid)
  {
! 	bool		result = false;
! 	HeapTuple	utup;
! 
! 	/* Superusers bypass all permission checking. */
! 	if (superuser_arg(roleid))
  		return true;
  
! 	utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid));
! 	if (HeapTupleIsValid(utup))
! 	{
! 		result = ((Form_pg_authid) GETSTRUCT(utup))-rolcreaterole;
! 		ReleaseSysCache(utup);
! 	}
! 	return result;
  }
  
  bool
! has_bypassrls_privilege(Oid roleid)
  {
! 	bool		result = false;
! 	HeapTuple	utup;
  
! 	/* Superusers bypass all permission checking. */
! 	if (superuser_arg(roleid))
! 		return true;
  
! 	utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid));
! 	if (HeapTupleIsValid(utup))
! 	{
! 		result = ((Form_pg_authid) GETSTRUCT(utup))-rolbypassrls;
! 		

Re: [HACKERS] GSSAPI, SSPI - include_realm default

2014-12-05 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 On Wed, Nov 26, 2014 at 8:01 PM, Stephen Frost sfr...@snowman.net wrote:
As such, I'd like to propose changing the default to be
'include_realm=1'.
 
 Per our previous discussions, but to make sure it's also on record for
 others, +1 for this suggestion.

Patch attached which does this for master.

This would be done for 9.5 and we would need to note it in the release
notes, of course.
 
 I suggest we also backpatch some documentation suggesting that people
 manually change the include_realm parameter (perhaps also with a note
 saying that the default will change in 9.5).

I'll work on a patch for back-branches if everyone is alright with this
patch against master.  Given my recent track record for changing
wording around, it seems prudent to get agreement on this first.

Thanks,

Stephen
From fe4d7a01f5d31fac565a8de2485cd9d113a9cbb4 Mon Sep 17 00:00:00 2001
From: Stephen Frost sfr...@snowman.net
Date: Fri, 5 Dec 2014 12:54:05 -0500
Subject: [PATCH] Change default for include_realm to zero

The default behavior for GSS and SSPI authentication methods has long
been to strip the realm off of the principal, however, this is not a
secure approach in multi-realm environments and the use-case for the
parameter at all has been superseded by the regex-based mapping support
available in pg_ident.conf.

Change the default for include_realm to be 'zero', meaning that we do
NOT remove the realm from the principal by default.  Any installations
which depend on the existing behavior will need to update their
configurations (ideally by leaving include_realm on and adding a mapping
in pg_ident.conf).  Note that the mapping capability exists in all
currently supported versions of PostgreSQL and so this change can be
done today.  Barring that, existing users can update their
configurations today to explicitly set include_realm=0 to ensure that
the prior behavior is maintained when they upgrade.

This needs to be noted in the release notes.

Per discussion with Magnus and Peter.
---
 doc/src/sgml/client-auth.sgml | 66 ---
 src/backend/libpq/hba.c   | 13 +
 2 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 7704f73..69517dd 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -943,15 +943,24 @@ omicron bryanh  guest1
/para
 
para
-Client principals must have their productnamePostgreSQL/ database user
-name as their first component, for example
-literalpgusername@realm/.  Alternatively, you can use a user name
-mapping to map from the first component of the principal name to the
-database user name.  By default, the realm of the client is
-not checked by productnamePostgreSQL/. If you have cross-realm
-authentication enabled and need to verify the realm, use the
-literalkrb_realm/ parameter, or enable literalinclude_realm/
-and use user name mapping to check the realm.
+Client principals can be mapped to different productnamePostgreSQL/
+database user names with filenamepg_ident.conf/.  For example,
+literalpgusername@realm/ could be mapped to just literalpgusername/.
+Alternatively, you can use the full literalusername@realm/ principal as
+the role name in productnamePostgreSQL/ without any mapping.
+   /para
+
+   para
+productnamePostgreSQL/ also supports a parameter to strip the realm from
+the principal.  This method is supported for backwards compatibility and is
+strongly discouraged as it is then impossible to distinguish different users
+with the same username but coming from different realms.  To enable this,
+set literalinclude_realm/ to zero.  For simple single-realm
+installations, literalinclude_realm/ combined with the
+literalkrb_realm/ parameter (which checks that the realm provided
+matches exactly what is in the krb_realm parameter) would be a secure but
+less capable option compared to specifying an explicit mapping in
+filenamepg_ident.conf/.
/para
 
para
@@ -993,10 +1002,13 @@ omicron bryanh  guest1
   termliteralinclude_realm/literal/term
   listitem
para
-If set to 1, the realm name from the authenticated user
-principal is included in the system user name that's passed through
-user name mapping (xref linkend=auth-username-maps). This is
-useful for handling users from multiple realms.
+If set to 0, the realm name from the authenticated user principal is
+stripped off before being passed through the user name mapping
+(xref linkend=auth-username-maps). This is discouraged and is
+primairly available for backwards compatibility as it is not secure
+in multi-realm environments unless krb_realm is also used.  Users

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Peter Geoghegan
On Fri, Dec 5, 2014 at 10:00 AM, Josh Berkus j...@agliodbs.com wrote:
 I thought the point of INSERT ... ON CONFLICT update was so that you
 didn't have to care if it was a new row or not?

 If you do care, it seems like it makes more sense to do your own INSERTs
 and UPDATEs, as Django currently does.

 I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff
 which let me know updated|inserted|ignored, but I also don't see it as a
 feature requirement for 9.5.

Agreed. Importantly, we won't have painted ourselves into a corner
where we cannot add it later, now that RETURNING projects updates
tuples, too (V1.5 established that). I'm pretty confident that it
would be a mistake to try and make the inner excluded and target
aliases visible, in any case, because of the annoying ambiguity it
creates for the common, simple cases. I think it has to be a magic
function, or something like that.

I really hoped we'd be having a more technical, implementation level
discussion at this point. Simon rightly emphasized the importance of
getting the semantics right, and the importance of discussing that up
front, but I'm concerned;  that's almost all that has been discussed
here so far, which is surely not balanced either.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Robert Haas
On Fri, Dec 5, 2014 at 1:07 PM, Peter Geoghegan p...@heroku.com wrote:
 Agreed. Importantly, we won't have painted ourselves into a corner
 where we cannot add it later, now that RETURNING projects updates
 tuples, too (V1.5 established that).

Yeah, it seems fine to postpone that to a later version, as long as we
haven't painted ourselves into a corner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-05 Thread Stephen Frost
Adam,

* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
 I have attached an updated patch that incorporates the feedback and
 recommendations provided.

Thanks.  Comments below.

 diff --git a/src/backend/access/transam/xlogfuncs.c 
 b/src/backend/access/transam/xlogfuncs.c
 --- 56,62 
   
   backupidstr = text_to_cstring(backupid);
   
 ! if (!superuser()  !check_role_attribute(GetUserId(), 
 ROLE_ATTR_REPLICATION))
   ereport(ERROR,
   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  errmsg(must be superuser or replication role to run a 
 backup)));

The point of has_role_attribute() was to avoid the need to explicitly
say !superuser() everywhere.  The idea with check_role_attribute() is
that we want to present the user (in places like pg_roles) with the
values that are *actually* set.

In other words, the above should just be:

if (!has_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))

 --- 84,90 
   {
   XLogRecPtr  stoppoint;
   
 ! if (!superuser()  !check_role_attribute(GetUserId(), 
 ROLE_ATTR_REPLICATION))
   ereport(ERROR,
   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
(errmsg(must be superuser or replication role to run a 
 backup;

Ditto here.

 diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
 --- 5031,5081 
   }
   
   /*
 !  * has_role_attribute
 !  *   Check if the role has the specified role has a specific role attribute.
 !  *   This function will always return true for roles with superuser 
 privileges
 !  *   unless the attribute being checked is CATUPDATE.
*
 !  * roleid - the oid of the role to check.
 !  * attribute - the attribute to check.
*/
   bool
 ! has_role_attribute(Oid roleid, RoleAttr attribute)
   {
 ! /*
 !  * Superusers bypass all permission checking except in the case of 
 CATUPDATE.
 !  */
 ! if (!(attribute  ROLE_ATTR_CATUPDATE)  superuser_arg(roleid))
   return true;
   
 ! return check_role_attribute(roleid, attribute);
   }
   
 + /*
 +  * check_role_attribute
 +  *   Check if the role with the specified id has been assigned a specific
 +  *   role attribute.  This function does not allow any superuser bypass.

I don't think we need to say that it doesn't allow superuser bypass.
Instead, I'd comment that has_role_attribute() should be used for
permissions checks while check_role_attribute is for checking what the
value in the rolattr bitmap is and isn't for doing permissions checks
directly.

 diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
 *** CreateRole(CreateRoleStmt *stmt)
 *** 82,93 
   boolencrypt_password = Password_encryption; /* encrypt 
 password? */
   charencrypted_password[MD5_PASSWD_LEN + 1];
   boolissuper = false;/* Make the user a superuser? */
 ! boolinherit = true; /* Auto inherit privileges? */
   boolcreaterole = false; /* Can this user create 
 roles? */
   boolcreatedb = false;   /* Can the user create 
 databases? */
   boolcanlogin = false;   /* Can this user login? 
 */
   boolisreplication = false;  /* Is this a replication role? 
 */
   boolbypassrls = false;  /* Is this a row 
 security enabled role? */
   int connlimit = -1; /* maximum connections allowed 
 */
   List   *addroleto = NIL;/* roles to make this a member of */
   List   *rolemembers = NIL;  /* roles to be members of this 
 role */
 --- 74,86 
   boolencrypt_password = Password_encryption; /* encrypt 
 password? */
   charencrypted_password[MD5_PASSWD_LEN + 1];
   boolissuper = false;/* Make the user a superuser? */
 ! boolinherit = true; /* Auto inherit privileges? */
   boolcreaterole = false; /* Can this user create 
 roles? */
   boolcreatedb = false;   /* Can the user create 
 databases? */
   boolcanlogin = false;   /* Can this user login? 
 */
   boolisreplication = false;  /* Is this a replication role? 
 */
   boolbypassrls = false;  /* Is this a row 
 security enabled role? */
 + RoleAttrattributes = ROLE_ATTR_NONE;/* role attributes, 
 initialized to none. */
   int connlimit = -1; /* maximum connections allowed 
 */
   List   *addroleto = NIL;/* roles to make this a member of */
   List   *rolemembers = NIL;  /* roles to be members of this 
 role */

Please clean up the whitespace changes- there's no need for the
'inherit' line to change..

 diff --git 

Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Jim Nasby

On 12/5/14, 9:08 AM, José Luis Tallón wrote:


More over, when load goes up, the relative cost of parallel working should go 
up as well.
Something like:
 p = number of cores
 l = 1min-load

 additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)

(for c1, of course)


...


The parallel seq scan nodes are definitively the best approach for parallel 
query, since the planner can optimize them based on cost.
I'm wondering about the ability to modify the implementation of some methods themselves 
once at execution time: given a previously planned query, chances are that, at execution 
time (I'm specifically thinking about prepared statements here), a different 
implementation of the same node might be more suitable and could be used 
instead while the condition holds.


These comments got me wondering... would it be better to decide on parallelism 
during execution instead of at plan time? That would allow us to dynamically 
scale parallelism based on system load. If we don't even consider parallelism 
until we've pulled some number of tuples/pages from a relation, this would also 
eliminate all parallel overhead on small relations.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Bruce Momjian
On Fri, Dec  5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
   Standard regression tests are helpful because patch authors include new 
   test
   cases in the patches that stress their new options or commands.  This new 
   test
   framework needs to be something that internally runs the regression tests 
   and
   exercises DDL deparsing as the regression tests execute DDL.  That would 
   mean
   that new commands and options would automatically be deparse-tested by 
   our new
   framework as soon as patch authors add standard regress support.
  
  Are you saying every time a new option is added to a command that a new
  regression test needs to be added?
 
 Not necessarily -- an existing test could be modified, as well.
 
  We don't normally do that,
 
 I sure hope we do have all options covered by tests.

Are you saying that every combination of ALTER options is tested?  We
have rejected simple regression test additions on the basis that the
syntax works and is unlikely to break once tested once by the developer.

  and it could easily bloat the regression tests over time.
 
 We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
 qualify as bloat?

No, that seems fine.  I am worried about having to have a test for every
syntax change, which we currently don't do?  Was that issue not clear in
my first email?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Alvaro Herrera
Bruce Momjian wrote:
 On Fri, Dec  5, 2014 at 09:29:59AM -0300, Alvaro Herrera wrote:
  Bruce Momjian wrote:
   On Fri, Nov 28, 2014 at 01:43:36PM +0900, Ian Barwick wrote:
Standard regression tests are helpful because patch authors include new 
test
cases in the patches that stress their new options or commands.  This 
new test
framework needs to be something that internally runs the regression 
tests and
exercises DDL deparsing as the regression tests execute DDL.  That 
would mean
that new commands and options would automatically be deparse-tested by 
our new
framework as soon as patch authors add standard regress support.
   
   Are you saying every time a new option is added to a command that a new
   regression test needs to be added?
  
  Not necessarily -- an existing test could be modified, as well.
  
   We don't normally do that,
  
  I sure hope we do have all options covered by tests.
 
 Are you saying that every combination of ALTER options is tested?

Well, ALTER TABLE is special: you can give several subcommands, and each
subcommand can be one of a rather long list of possible subcommands.
Testing every combination would mean a combinatorial explosion, which
would indeed be too large.  But surely we want a small bunch of tests to
prove that having several subcommands works fine, and also at least one
test for every possible subcommand.

 We have rejected simple regression test additions on the basis that
 the syntax works and is unlikely to break once tested once by the
 developer.

This rationale doesn't sound so good to me.  Something might work fine
the minute it is committed, but someone else might break it
inadvertently later; this has actually happened.  Having no tests at all
for a feature isn't good.  

I know we have recently rejected patches that added tests only to
improve the coverage percent, for instance in CREATE DATABASE, because
the runtime of the tests got too large.  Are there other examples of
rejected tests?

   and it could easily bloat the regression tests over time.
  
  We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
  qualify as bloat?
 
 No, that seems fine.  I am worried about having to have a test for every
 syntax change, which we currently don't do?  Was that issue not clear in
 my first email?

Well, if with every syntax change you mean every feature addition,
then I think we should have at least one test for each, yes.  It's not
like we add new syntax every day anyway.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-05 Thread Jim Nasby

On 12/5/14, 3:42 AM, Amit Langote wrote:

  I think you are right.  I think in this case we need something similar
to column pg_index.indexprs which is of type pg_node_tree(which
seems to be already suggested by Robert). So may be we can proceed
with this type and see if any one else has better idea.

One point raised about/against pg_node_tree was the values represented therein 
would turn out to be too generalized to be used with advantage during planning. 
But, it seems we could deserialize it in advance back to the internal form 
(like an array of a struct) as part of the cached relation data. This overhead 
would only be incurred in case of partitioned tables. Perhaps this is what 
Robert suggested elsewhere.


In order to store a composite type in a catalog, we would need to have one field that has 
the typid of the composite, and the field that stores the actual composite data would 
need to be a dumb varlena that stores the composite HeapTupleHeader.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-05 Thread Jim Nasby

On 12/5/14, 1:22 PM, Jim Nasby wrote:

On 12/5/14, 3:42 AM, Amit Langote wrote:

  I think you are right.  I think in this case we need something similar
to column pg_index.indexprs which is of type pg_node_tree(which
seems to be already suggested by Robert). So may be we can proceed
with this type and see if any one else has better idea.

One point raised about/against pg_node_tree was the values represented therein 
would turn out to be too generalized to be used with advantage during planning. 
But, it seems we could deserialize it in advance back to the internal form 
(like an array of a struct) as part of the cached relation data. This overhead 
would only be incurred in case of partitioned tables. Perhaps this is what 
Robert suggested elsewhere.


In order to store a composite type in a catalog, we would need to have one field that has 
the typid of the composite, and the field that stores the actual composite data would 
need to be a dumb varlena that stores the composite HeapTupleHeader.


On further thought; if we disallow NULL as a partition boundary, we don't need 
a separate rowtype; we could just use the one associated with the relation 
itself. Presumably that would make comparing tuples to the relation list a lot 
easier.

I was hung up on how that would work in the case of ALTER TABLE, but we'd have 
the same problem with using pg_node_tree: if you alter a table in such a way 
that *might* affect your partitioning, you have to do some kind of revalidation 
anyway.

The other option would be to use some custom rowtype to store boundary values 
and have a method that can form a boundary tuple from a real one. Either way, I 
suspect this is better than frequently evaluating pg_node_trees.

There may be one other option. If range partitions are defined in terms of an 
expression that is different for every partition (ie: (substr(product_key, 1, 
4), date_trunc('month', sales_date))) then we could use a hash of that 
expression to identify a partition. In other words, range partitioning becomes 
a special case of hash partitioning. I do think we need a programmatic means to 
identify the range of an individual partition and hash won't solve that, but 
the performance of that case isn't critical so we could use pretty much 
whatever we wanted to there.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas
On Fri, Dec 5, 2014 at 2:52 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 The other option would be to use some custom rowtype to store boundary
 values and have a method that can form a boundary tuple from a real one.
 Either way, I suspect this is better than frequently evaluating
 pg_node_trees.

On what basis do you expect that?  Every time you use a view, you're
using a pg_node_tree.  Nobody's ever complained that having to reload
the pg_node_tree column was too slow, and I see no reason to suppose
that things would be any different here.

I mean, we can certainly invent something new if there is a reason to
do so.  But you (and a few other people) seem to be trying pretty hard
to avoid using the massive amount of infrastructure that we already
have to do almost this exact thing, which puzzles the heck out of me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-05 Thread Peter Geoghegan
On Fri, Dec 5, 2014 at 1:01 AM, Anssi Kääriäinen
anssi.kaariai...@thl.fi wrote:
 If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in
 Django for the existing save() method, then it needs to know if the
 result was an UPDATE or INSERT. If we are going to use this for other
 operations (for example bulk merge of rows to the database), it would be
 very convenient to have per-row updated/created information available so
 that we can fire the post_save signals for the rows. If we don't have
 that information available, it means we can't fire signals, and no
 signals means we can't use the bulk merge operation internally as we
 have to fire the signals where that happened before.

 Outside of Django there are likely similar reasons to want to know if
 the result of an operation was a creation of a new row. The reason could
 be creation of related row, doing some action in application layer, or
 just UI message telling object created successfully vs object updated
 successfully.

It probably isn't ideal, but you'd at least be able to do something
with row level triggers in the absence of a standard way of directly
telling if an insert or update was performed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-05 Thread Jim Nasby

On 12/5/14, 2:02 PM, Robert Haas wrote:

On Fri, Dec 5, 2014 at 2:52 PM, Jim Nasby jim.na...@bluetreble.com wrote:

The other option would be to use some custom rowtype to store boundary
values and have a method that can form a boundary tuple from a real one.
Either way, I suspect this is better than frequently evaluating
pg_node_trees.


On what basis do you expect that?  Every time you use a view, you're
using a pg_node_tree.  Nobody's ever complained that having to reload
the pg_node_tree column was too slow, and I see no reason to suppose
that things would be any different here.

I mean, we can certainly invent something new if there is a reason to
do so.  But you (and a few other people) seem to be trying pretty hard
to avoid using the massive amount of infrastructure that we already
have to do almost this exact thing, which puzzles the heck out of me.


My concern is how to do the routing of incoming tuples. I'm assuming it'd be 
significantly faster to compare two tuples than to run each tuple through a 
bunch of nodetrees.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-05 Thread Robert Haas
On Fri, Dec 5, 2014 at 3:05 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On what basis do you expect that?  Every time you use a view, you're
 using a pg_node_tree.  Nobody's ever complained that having to reload
 the pg_node_tree column was too slow, and I see no reason to suppose
 that things would be any different here.

 I mean, we can certainly invent something new if there is a reason to
 do so.  But you (and a few other people) seem to be trying pretty hard
 to avoid using the massive amount of infrastructure that we already
 have to do almost this exact thing, which puzzles the heck out of me.

 My concern is how to do the routing of incoming tuples. I'm assuming it'd be
 significantly faster to compare two tuples than to run each tuple through a
 bunch of nodetrees.

As I said before, that's a completely unrelated problem.

To quickly route tuples for range or list partitioning, you're going
to want to have an array of Datums in memory and bseach it.  That says
nothing about how they should be stored on disk.  Whatever the on-disk
representation looks like, the relcache is going to need to reassemble
it into an array that can be binary-searched.  As long as that's not
hard to do - and none of the proposals here would make it hard to do -
there's no reason to care about this from that point of view.

At least, not that I can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-05 Thread Robert Haas
On Wed, Dec 3, 2014 at 9:21 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Dec 2, 2014 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote:
 Basically, the case in which I think it's helpful to issue a
 suggestion here is when the user has used the table name rather than
 the alias name.  I wonder if it's worth checking for that case
 specifically, in lieu of what you've done here, and issuing a totally
 different hint in that case (HINT: You must refer to this as column
 as prime_minister.id rather than cameron.id).

 Well, if an alias is used, and you refer to an attribute using a
 non-alias name (i.e. the original table name), then you'll already get
 an error suggesting that the alias be used instead -- of course,
 that's nothing new. It doesn't matter to the existing hinting
 mechanism if the attribute name is otherwise wrong. Once you fix the
 code to use the alias suggested, you'll then get this new
 Levenshtein-based hint.

In that case, I think I favor giving no hint at all when the RTE name
is specified but doesn't match exactly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-12-05 Thread Peter Geoghegan
On Fri, Dec 5, 2014 at 12:33 PM, Robert Haas robertmh...@gmail.com wrote:
 Well, if an alias is used, and you refer to an attribute using a
 non-alias name (i.e. the original table name), then you'll already get
 an error suggesting that the alias be used instead -- of course,
 that's nothing new. It doesn't matter to the existing hinting
 mechanism if the attribute name is otherwise wrong. Once you fix the
 code to use the alias suggested, you'll then get this new
 Levenshtein-based hint.

 In that case, I think I favor giving no hint at all when the RTE name
 is specified but doesn't match exactly.

I don't follow. The existing mechanism only concerns what to do when
the original table name was used when an alias should have been used
instead. What does that have to do with this patch?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing DDL deparsing support

2014-12-05 Thread Bruce Momjian
On Fri, Dec  5, 2014 at 04:10:12PM -0300, Alvaro Herrera wrote:
 Well, ALTER TABLE is special: you can give several subcommands, and each
 subcommand can be one of a rather long list of possible subcommands.
 Testing every combination would mean a combinatorial explosion, which
 would indeed be too large.  But surely we want a small bunch of tests to
 prove that having several subcommands works fine, and also at least one
 test for every possible subcommand.
 
  We have rejected simple regression test additions on the basis that
  the syntax works and is unlikely to break once tested once by the
  developer.
 
 This rationale doesn't sound so good to me.  Something might work fine
 the minute it is committed, but someone else might break it
 inadvertently later; this has actually happened.  Having no tests at all
 for a feature isn't good.  
 
 I know we have recently rejected patches that added tests only to
 improve the coverage percent, for instance in CREATE DATABASE, because
 the runtime of the tests got too large.  Are there other examples of
 rejected tests?

Yes, there are many cases we have added options or keywords but didn't
add a regression test.

and it could easily bloat the regression tests over time.
   
   We had 103 regression tests in 8.2 and we have 145 in 9.4.  Does this
   qualify as bloat?
  
  No, that seems fine.  I am worried about having to have a test for every
  syntax change, which we currently don't do?  Was that issue not clear in
  my first email?
 
 Well, if with every syntax change you mean every feature addition,
 then I think we should have at least one test for each, yes.  It's not
 like we add new syntax every day anyway.

Well, my point is that this is a new behavior we have to do, at least to
test the logical DDL behavior --- I suppose it could be remove after
testing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Josh Berkus
Hackers,

This is not a complete enough report for a diagnosis.  I'm posting it
here just in case someone else sees something like it, and having an
additional report will help figure out the underlying issue.

* 700GB database with around 5,000 writes per second
* 8 replicas handling around 10,000 read queries per second each
* replicas are slammed (40-70% utilization)
* replication produces lots of replication query cancels

In this scenario, a specific query against some of the less busy and
fairly small tables would produce a segfault (signal 11) once every 1-4
days randomly.  This query could have 100's of successful runs for every
segfault. This was not reproduceable manually, and the segfaults never
happened on the master.  Nor did we ever see a segfault based on any
other query, including against the tables which were generally the
source of the query cancels.

In case it's relevant, the query included use of regexp_split_to_array()
and ORDER BY random(), neither of which are generally used in the user's
other queries.

We made some changes which decreased query cancel (optimizing queries,
turning on hot_standby_feedback) and we haven't seen a segfault since
then.  As far as the user is concerned, this solves the problem, so I'm
never going to get a trace or a core dump file.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Josh Berkus
On 12/05/2014 12:54 PM, Josh Berkus wrote:
 Hackers,
 
 This is not a complete enough report for a diagnosis.  I'm posting it
 here just in case someone else sees something like it, and having an
 additional report will help figure out the underlying issue.
 
 * 700GB database with around 5,000 writes per second
 * 8 replicas handling around 10,000 read queries per second each
 * replicas are slammed (40-70% utilization)
 * replication produces lots of replication query cancels
 
 In this scenario, a specific query against some of the less busy and
 fairly small tables would produce a segfault (signal 11) once every 1-4
 days randomly.  This query could have 100's of successful runs for every
 segfault. This was not reproduceable manually, and the segfaults never
 happened on the master.  Nor did we ever see a segfault based on any
 other query, including against the tables which were generally the
 source of the query cancels.
 
 In case it's relevant, the query included use of regexp_split_to_array()
 and ORDER BY random(), neither of which are generally used in the user's
 other queries.
 
 We made some changes which decreased query cancel (optimizing queries,
 turning on hot_standby_feedback) and we haven't seen a segfault since
 then.  As far as the user is concerned, this solves the problem, so I'm
 never going to get a trace or a core dump file.

Forgot a major piece of evidence as to why I think this is related to
query cancel:  in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Peter Geoghegan
On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus j...@agliodbs.com wrote:
 We made some changes which decreased query cancel (optimizing queries,
 turning on hot_standby_feedback) and we haven't seen a segfault since
 then.  As far as the user is concerned, this solves the problem, so I'm
 never going to get a trace or a core dump file.

 Forgot a major piece of evidence as to why I think this is related to
 query cancel:  in each case, the segfault was preceeded by a
 multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
 the backend running the query which segfaulted might have been the only
 backend *not* cancelled due to query conflict concurrently.
 Contradicting this, there are other multi-backend query cancels in the
 logs which do NOT produce a segfault.

I wonder if it would be useful to add additional instrumentation so
that even without a core dump, there was some cursory information
about the nature of a segfault.

Yes, doing something with a SIGSEGV handler is very scary, and there
are major portability concerns (e.g.
https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
robust on Linux. For what it's worth, this open source project offers
that kind of functionality in the form of a library:
https://github.com/vmarkovtsev/DeathHandler

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Role Attribute Bitmask Catalog Representation

2014-12-05 Thread Adam Brightwell
Stephen,

Thanks for the feedback.


  diff --git a/src/backend/access/transam/xlogfuncs.c
 b/src/backend/access/transam/xlogfuncs.c
  --- 56,62 
 
backupidstr = text_to_cstring(backupid);
 
  ! if (!superuser()  !check_role_attribute(GetUserId(),
 ROLE_ATTR_REPLICATION))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
   errmsg(must be superuser or replication role to run a
 backup)));

 The point of has_role_attribute() was to avoid the need to explicitly
 say !superuser() everywhere.  The idea with check_role_attribute() is
 that we want to present the user (in places like pg_roles) with the
 values that are *actually* set.

 In other words, the above should just be:

 if (!has_role_attribute(GetUserId(), ROLE_ATTR_REPLICATION))


Yes, I understand.  My original thought here was that I was replacing
'has_rolreplication' which didn't perform any superuser check and that I
was trying to adhere to minimal changes.  But I agree this would be the
appropriate solution.  Fixed.


  + /*
  +  * check_role_attribute
  +  *   Check if the role with the specified id has been assigned a
 specific
  +  *   role attribute.  This function does not allow any superuser
 bypass.

 I don't think we need to say that it doesn't allow superuser bypass.
 Instead, I'd comment that has_role_attribute() should be used for
 permissions checks while check_role_attribute is for checking what the
 value in the rolattr bitmap is and isn't for doing permissions checks
 directly.


Ok. Understood.  Fixed.

 diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
  *** pg_role_aclcheck(Oid role_oid, Oid rolei
  *** 4602,4607 
  --- 4603,4773 
return ACLCHECK_NO_PRIV;
}
 
  + /*
  +  * pg_has_role_attribute_id_attr

 I'm trying to figure out what the point of the trailing _attr in the
 function name is..?  Doesn't seem necessary to have that for these
 functions and it'd look a bit cleaner without it, imv.


So, I was trying to follow what I perceived as the following convention for
these functions: pg_function_name_arg1_arg2_argN.  So, what _attr
represents is the second argument of the function which is the attribute to
check.  I could agree that might be unnecessary, but that was my thought
process on it.  At any rate, I've removed it.

 ! #define ROLE_ATTR_ALL  255 /* or (1  N_ROLE_ATTRIBUTES) - 1 */

 I'd say equals or something rather than or since that kind of
 implies that it's an laternative, but we can't do that as discussed in
 the comment (which I like).


Agreed. Fixed.


  ! /* role attribute check routines */
  ! extern bool has_role_attribute(Oid roleid, RoleAttr attribute);
  ! extern bool check_role_attribute(Oid roleid, RoleAttr attribute);

 With all the 'has_role_attribute(GetUserId(), ROLEATTR_BLAH)' cases, I'd
 suggest doing the same as 'superuser()' and also provide a simpler
 version: 'have_role_attribute(ROLEATTR_BLAH)' which handles doing the
 GetUserId() itself.  That'd simplify quite a few of the above calls.


Good point.  Added.

Attached is an updated patch.

-Adam


-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
new file mode 100644
index 133143d..8475985
*** a/src/backend/access/transam/xlogfuncs.c
--- b/src/backend/access/transam/xlogfuncs.c
***
*** 22,32 
--- 22,34 
  #include access/xlog_internal.h
  #include access/xlogutils.h
  #include catalog/catalog.h
+ #include catalog/pg_authid.h
  #include catalog/pg_type.h
  #include funcapi.h
  #include miscadmin.h
  #include replication/walreceiver.h
  #include storage/smgr.h
+ #include utils/acl.h
  #include utils/builtins.h
  #include utils/numeric.h
  #include utils/guc.h
*** pg_start_backup(PG_FUNCTION_ARGS)
*** 54,60 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg(must be superuser or replication role to run a backup)));
--- 56,62 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	if (!have_role_attribute(ROLE_ATTR_REPLICATION))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		   errmsg(must be superuser or replication role to run a backup)));
*** pg_stop_backup(PG_FUNCTION_ARGS)
*** 82,88 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!superuser()  !has_rolreplication(GetUserId()))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg(must be superuser or replication role to run a backup;
--- 84,90 
  {
  	XLogRecPtr	stoppoint;
  
! 	if (!have_role_attribute(ROLE_ATTR_REPLICATION))
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
  		 (errmsg(must be superuser or replication role to run a 

Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Jim Nasby

On 12/5/14, 4:11 PM, Peter Geoghegan wrote:

On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus j...@agliodbs.com wrote:

We made some changes which decreased query cancel (optimizing queries,
turning on hot_standby_feedback) and we haven't seen a segfault since
then.  As far as the user is concerned, this solves the problem, so I'm
never going to get a trace or a core dump file.


Forgot a major piece of evidence as to why I think this is related to
query cancel:  in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.


I wonder if it would be useful to add additional instrumentation so
that even without a core dump, there was some cursory information
about the nature of a segfault.

Yes, doing something with a SIGSEGV handler is very scary, and there
are major portability concerns (e.g.
https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
robust on Linux. For what it's worth, this open source project offers
that kind of functionality in the form of a library:
https://github.com/vmarkovtsev/DeathHandler


Perhaps we should also officially recommend production servers be setup to 
create core files. AFAIK the only downside is the time it would take to write a 
core that's huge because of shared buffers, but perhaps there's some way to 
avoid writing those? (That means the core won't help if the bug is due to 
something in a buffer, but that seems unlikely enough that the tradeoff is 
worth it...)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Peter Geoghegan
On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 Perhaps we should also officially recommend production servers be setup to
 create core files. AFAIK the only downside is the time it would take to
 write a core that's huge because of shared buffers

I don't think that's every going to be practical.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Fri, Dec 5, 2014 at 2:41 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 Perhaps we should also officially recommend production servers be setup to
 create core files. AFAIK the only downside is the time it would take to
 write a core that's huge because of shared buffers

 I don't think that's every going to be practical.

I'm fairly sure that on some distros (Red Hat, at least) there is distro
policy against having daemons produce core dumps by default, for multiple
reasons including possible disk space consumption and leakage of secure
information.  So even if we recommended this, the recommendation would be
overridden by some/many packagers.

There is much to be said though for trying to emit at least a minimal
stack trace into the postmaster log file.  I'm pretty sure glibc has a
function for that; dunno if it's going to be practical on other platforms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Josh Berkus
On 12/05/2014 02:41 PM, Jim Nasby wrote:
 Perhaps we should also officially recommend production servers be setup
 to create core files. AFAIK the only downside is the time it would take
 to write a core that's huge because of shared buffers, but perhaps
 there's some way to avoid writing those? (That means the core won't help
 if the bug is due to something in a buffer, but that seems unlikely
 enough that the tradeoff is worth it...)

Not practical in a lot of cases.  For example, this user was unwilling
to enable core dumps on the production replicas because writing out the
16GB of shared buffers they had took over 10 minutes in a test.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Peter Geoghegan
On Fri, Dec 5, 2014 at 3:49 PM, Josh Berkus j...@agliodbs.com wrote:
 to enable core dumps on the production replicas because writing out the
 16GB of shared buffers they had took over 10 minutes in a test.

No one ever thinks it'll happen to them anyway - recommending enabling
core dumps seems like a waste of time, since as Tom mentioned package
managers shouldn't be expected to get on board with that plan. I think
a zero overhead backtrace feature from within a SIGSEGV handler (with
appropriate precautions around corrupt/exhausted call stacks) using
glibc is the right thing here.

Indeed, glibc does have infrastructure that can be used to get a
backtrace [1], which is probably what we'd end up using, but even
POSIX has infrastructure like sigaltstack(). It can be done.

[1] https://www.gnu.org/software/libc/manual/html_node/Backtraces.html
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Elusive segfault with 9.3.5 query cancel

2014-12-05 Thread Jim Nasby

On 12/5/14, 5:49 PM, Josh Berkus wrote:

On 12/05/2014 02:41 PM, Jim Nasby wrote:

Perhaps we should also officially recommend production servers be setup
to create core files. AFAIK the only downside is the time it would take
to write a core that's huge because of shared buffers, but perhaps
there's some way to avoid writing those? (That means the core won't help
if the bug is due to something in a buffer, but that seems unlikely
enough that the tradeoff is worth it...)


Not practical in a lot of cases.  For example, this user was unwilling
to enable core dumps on the production replicas because writing out the
16GB of shared buffers they had took over 10 minutes in a test.


Which is why I wondered if there's a way to avoid writing out shared buffers...

But at least getting a stack trace would be a big start.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila
On Fri, Dec 5, 2014 at 8:38 PM, José Luis Tallón jltal...@adv-solutions.net
wrote:

 On 12/04/2014 07:35 AM, Amit Kapila wrote:

 [snip]

 The number of worker backends that can be used for
 parallel seq scan can be configured by using a new GUC
 parallel_seqscan_degree, the default value of which is zero
 and it means parallel seq scan will not be considered unless
 user configures this value.


 The number of parallel workers should be capped (of course!) at the
maximum amount of processors (cores/vCores, threads/hyperthreads)
available.


Also, it should consider MaxConnections configured by user.

 More over, when load goes up, the relative cost of parallel working
should go up as well.
 Something like:
 p = number of cores
 l = 1min-load

 additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)

 (for c1, of course)


How will you identify load in above formula and what is exactly 'c'
(is it parallel workers involved?).

For now, I have managed this simply by having a configuration
variable and it seems to me that the same should be good
enough for first version, we can definitely enhance it in future
version by dynamically allocating the number of workers based
on their availability and need of query, but I think lets leave that
for another day.


 In ExecutorStart phase, initiate the required number of workers
 as per parallel seq scan plan and setup dynamic shared memory and
 share the information required for worker to execute the scan.
 Currently I have just shared the relId, targetlist and number
 of blocks to be scanned by worker, however I think we might want
 to generate a plan for each of the workers in master backend and
 then share the same to individual worker.

 [snip]

 Attached patch is just to facilitate the discussion about the
 parallel seq scan and may be some other dependent tasks like
 sharing of various states like combocid, snapshot with parallel
 workers.  It is by no means ready to do any complex test, ofcourse
 I will work towards making it more robust both in terms of adding
 more stuff and doing performance optimizations.

 Thoughts/Suggestions?


 Not directly (I haven't had the time to read the code yet), but I'm
thinking about the ability to simply *replace* executor methods from an
extension.
 This could be an alternative to providing additional nodes that the
planner can include in the final plan tree, ready to be executed.

 The parallel seq scan nodes are definitively the best approach for
parallel query, since the planner can optimize them based on cost.
 I'm wondering about the ability to modify the implementation of some
methods themselves once at execution time: given a previously planned
query, chances are that, at execution time (I'm specifically thinking about
prepared statements here), a different implementation of the same node
might be more suitable and could be used instead while the condition holds.


Idea sounds interesting and I think probably in some cases
different implementation of same node might help, but may be
at this stage if we focus on one kind of implementation (which is
a win for reasonable number of cases) and make it successful,
then doing alternative implementations will be comparatively
easier and have more chances of success.

 If this latter line of thinking is too off-topic within this thread and
there is any interest, we can move the comments to another thread and I'd
begin work on a PoC patch. It might as well make sense to implement the
executor overloading mechanism alongide the custom plan API, though.


Sure, please go ahead which ever way you like to proceed.
If you want to contribute in this area/patch, then you are
welcome.

 Any comments appreciated.


 Thank you for your work, Amit

Many thanks to you as well for showing interest.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread David Rowley
On 4 December 2014 at 19:35, Amit Kapila amit.kapil...@gmail.com wrote:


 Attached patch is just to facilitate the discussion about the
 parallel seq scan and may be some other dependent tasks like
 sharing of various states like combocid, snapshot with parallel
 workers.  It is by no means ready to do any complex test, ofcourse
 I will work towards making it more robust both in terms of adding
 more stuff and doing performance optimizations.

 Thoughts/Suggestions?


This is good news!
I've not gotten to look at the patch yet, but I thought you may be able to
make use of the attached at some point.

It's bare-bones core support for allowing aggregate states to be merged
together with another aggregate state. I would imagine that if a query such
as:

SELECT MAX(value) FROM bigtable;

was run, then a series of parallel workers could go off and each find the
max value from their portion of the table and then perhaps some other node
type would then take all the intermediate results from the workers, once
they're finished, and join all of the aggregate states into one and return
that. Naturally, you'd need to check that all aggregates used in the
targetlist had a merge function first.

This is just a few hours of work. I've not really tested the pg_dump
support or anything yet. I've also not added any new functions to allow
AVG() or COUNT() to work, I've really just re-used existing functions where
I could, as things like MAX() and BOOL_OR() can just make use of the
existing transition function. I thought that this might be enough for early
tests.

I'd imagine such a workload, ignoring IO overhead, should scale pretty much
linearly with the number of worker processes. Of course, if there was a
GROUP BY clause then the merger code would have to perform more work.

If you think you might be able to make use of this, then I'm willing to go
off and write all the other merge functions required for the other
aggregates.

Regards

David Rowley


merge_aggregate_state_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila
On Fri, Dec 5, 2014 at 8:46 PM, Stephen Frost sfr...@snowman.net wrote:

 Amit,

 * Amit Kapila (amit.kapil...@gmail.com) wrote:
  postgres=# explain select c1 from t1;
QUERY PLAN
  --
   Seq Scan on t1  (cost=0.00..101.00 rows=100 width=4)
  (1 row)
 
 
  postgres=# set parallel_seqscan_degree=4;
  SET
  postgres=# explain select c1 from t1;
QUERY PLAN
  --
   Parallel Seq Scan on t1  (cost=0.00..25.25 rows=100 width=4)
 Number of Workers: 4
 Number of Blocks Per Workers: 25
  (3 rows)

 This is all great and interesting, but I feel like folks might be
 waiting to see just what kind of performance results come from this (and
 what kind of hardware is needed to see gains..).

Initially I was thinking that first we should discuss if the design
and idea used in patch is sane, but now as you have asked and
even Robert has asked the same off list to me, I will take the
performance data next week (Another reason why I have not
taken any data is that still the work to push qualification down
to workers is left which I feel is quite important).  However I still
think if I get some feedback on some of the basic things like below,
it would be good.

1. As the patch currently stands, it just shares the relevant
data (like relid, target list, block range each worker should
perform on etc.) to the worker and then worker receives that
data and form the planned statement which it will execute and
send the results back to master backend.  So the question
here is do you think it is reasonable or should we try to form
the complete plan for each worker and then share the same
and may be other information as well like range table entries
which are required.   My personal gut feeling in this matter
is that for long term it might be better to form the complete
plan of each worker in master and share the same, however
I think the current way as done in patch (okay that needs
some improvement) is also not bad and quite easier to implement.

2. Next question related to above is what should be the
output of ExplainPlan, as currently worker is responsible
for forming its own plan, Explain Plan is not able to show
the detailed plan for each worker, is that okay?

3. Some places where optimizations are possible:
- Currently after getting the tuple from heap, it is deformed by
worker and sent via message queue to master backend, master
backend then forms the tuple and send it to upper layer which
before sending it to frontend again deforms it via slot_getallattrs(slot).
- Master backend currently receives the data from multiple workers
serially.  We can optimize in a way that it can check other queues,
if there is no data in current queue.
- Master backend is just responsible for coordination among workers
It shares the required information to workers and then fetch the
data processed by each worker, by using some more logic, we might
be able to make master backend also fetch data from heap rather than
doing just co-ordination among workers.

I think in all above places we can do some optimisation, however
we can do that later as well, unless they hit the performance badly for
cases which people care most.

4. Should parallel_seqscan_degree value be dependent on other
backend processes like MaxConnections, max_worker_processes,
autovacuum_max_workers do or should it be independent like
max_wal_senders?

I think it is better to keep it dependent on other backend processes,
however for simplicity, I have kept it similar to max_wal_senders for now.

 There's likely to be
 situations where this change is an improvement while also being cases
 where it makes things worse.

Agreed and I think that will be more clear after doing some
performance tests.

 One really interesting case would be parallel seq scans which are
 executing against foreign tables/FDWs..


Sure.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila
On Fri, Dec 5, 2014 at 8:43 PM, Stephen Frost sfr...@snowman.net wrote:

 José,

 * José Luis Tallón (jltal...@adv-solutions.net) wrote:
  On 12/04/2014 07:35 AM, Amit Kapila wrote:
  The number of worker backends that can be used for
  parallel seq scan can be configured by using a new GUC
  parallel_seqscan_degree, the default value of which is zero
  and it means parallel seq scan will not be considered unless
  user configures this value.
 
  The number of parallel workers should be capped (of course!) at the
  maximum amount of processors (cores/vCores, threads/hyperthreads)
  available.
 
  More over, when load goes up, the relative cost of parallel working
  should go up as well.
  Something like:
  p = number of cores
  l = 1min-load
 
  additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
 
  (for c1, of course)

 While I agree in general that we'll need to come up with appropriate
 acceptance criteria, etc, I don't think we want to complicate this patch
 with that initially.

A SUSET GUC which caps the parallel GUC would be
 enough for an initial implementation, imv.


This is exactly what I have done in patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila
On Sat, Dec 6, 2014 at 12:27 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 12/5/14, 9:08 AM, José Luis Tallón wrote:


 More over, when load goes up, the relative cost of parallel working
should go up as well.
 Something like:
  p = number of cores
  l = 1min-load

  additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)

 (for c1, of course)


 ...

 The parallel seq scan nodes are definitively the best approach for
parallel query, since the planner can optimize them based on cost.
 I'm wondering about the ability to modify the implementation of some
methods themselves once at execution time: given a previously planned
query, chances are that, at execution time (I'm specifically thinking about
prepared statements here), a different implementation of the same node
might be more suitable and could be used instead while the condition holds.


 These comments got me wondering... would it be better to decide on
parallelism during execution instead of at plan time? That would allow us
to dynamically scale parallelism based on system load. If we don't even
consider parallelism until we've pulled some number of tuples/pages from a
relation,


this would also eliminate all parallel overhead on small relations.
 --

I think we have access to this information in planner (RelOptInfo - pages),
if we want, we can use that to eliminate the small relations from
parallelism, but question is how big relations do we want to consider
for parallelism, one way is to check via tests which I am planning to
follow, do you think we have any heuristic which we can use to decide
how big relations should be consider for parallelism?




With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2014-12-05 Thread Amit Kapila
On Sat, Dec 6, 2014 at 10:43 AM, David Rowley dgrowle...@gmail.com wrote:
 On 4 December 2014 at 19:35, Amit Kapila amit.kapil...@gmail.com wrote:

 Attached patch is just to facilitate the discussion about the
 parallel seq scan and may be some other dependent tasks like
 sharing of various states like combocid, snapshot with parallel
 workers.  It is by no means ready to do any complex test, ofcourse
 I will work towards making it more robust both in terms of adding
 more stuff and doing performance optimizations.

 Thoughts/Suggestions?


 This is good news!

Thanks.

 I've not gotten to look at the patch yet, but I thought you may be able
to make use of the attached at some point.


I also think so, that it can be used in near future to enhance
and provide more value to the parallel scan feature.  Thanks
for taking the initiative to do the leg-work for supporting
aggregates.

 It's bare-bones core support for allowing aggregate states to be merged
together with another aggregate state. I would imagine that if a query such
as:

 SELECT MAX(value) FROM bigtable;

 was run, then a series of parallel workers could go off and each find the
max value from their portion of the table and then perhaps some other node
type would then take all the intermediate results from the workers, once
they're finished, and join all of the aggregate states into one and return
that. Naturally, you'd need to check that all aggregates used in the
targetlist had a merge function first.


Direction sounds to be right.

 This is just a few hours of work. I've not really tested the pg_dump
support or anything yet. I've also not added any new functions to allow
AVG() or COUNT() to work, I've really just re-used existing functions where
I could, as things like MAX() and BOOL_OR() can just make use of the
existing transition function. I thought that this might be enough for early
tests.

 I'd imagine such a workload, ignoring IO overhead, should scale pretty
much linearly with the number of worker processes. Of course, if there was
a GROUP BY clause then the merger code would have to perform more work.


Agreed.

 If you think you might be able to make use of this, then I'm willing to
go off and write all the other merge functions required for the other
aggregates.


Don't you think that first we should stabilize the basic (target list
and quals that can be independently evaluated by workers) parallel
scan and then jump to do such enhancements?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com