Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Noah Misch
On Tue, Dec 23, 2014 at 02:29:58AM -0500, Noah Misch wrote:
 On Mon, Dec 22, 2014 at 10:46:16AM -0500, Tom Lane wrote:
  I still find the ChainAggregate approach too ugly at a system structural
  level to accept, regardless of Noah's argument about number of I/O cycles
  consumed.  We'll be paying for that in complexity and bugs into the
  indefinite future, and I wonder if it isn't going to foreclose some other
  performance opportunities as well.
 
 Among GROUPING SETS GroupAggregate implementations, I bet there's a nonempty
 intersection between those having maintainable design and those having optimal
 I/O usage, optimal memory usage, and optimal number of sorts.  Let's put more
 effort into finding it.  I'm hearing that the shared tuplestore is
 ChainAggregate's principal threat to system structure; is that right?

The underlying algorithm, if naively expressed in terms of our executor
concepts, would call ExecProcNode() on a SortState, then feed the resulting
slot to both a GroupAggregate and to another Sort.  That implies a non-tree
graph of executor nodes, which isn't going to fly anytime soon.  The CTE
approach bypasses that problem by eliminating cooperation between sorts,
instead reading 2N+1 copies of the source data for N sorts.  ChainAggregate is
a bit like a node having two parents, a Sort and a GroupAggregate.  However,
the graph edge between ChainAggregate and its GroupAggregate is a tuplestore
instead of the usual, synchronous ExecProcNode().

Suppose one node orchestrated all sorting and aggregation.  Call it a
MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
would directly manage two Tuplesortstate, CUR and NEXT.  The node would have
an initial phase similar to ExecSort(), in which it drains the outer node to
populate the first CUR.  After that, it looks more like agg_retrieve_direct(),
except that CUR is the input source, and each tuple drawn is also put into
NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.  This
design does not add I/O consumption or require a nonstandard communication
channel between executor nodes.  Tom, Andrew, does that look satisfactory?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Atri Sharma
   ChainAggregate is

 a bit like a node having two parents, a Sort and a GroupAggregate.
 However,
 the graph edge between ChainAggregate and its GroupAggregate is a
 tuplestore
 instead of the usual, synchronous ExecProcNode().


Well, I dont buy the two parents theory. The Sort nodes are intermediately
stacked amongst ChainAggregate nodes, so there is still the single edge.
However, as you rightly said, there is a shared tuplestore, but note that
only the head of chain ChainAggregate has the top GroupAggregate as its
parent.


 Suppose one node orchestrated all sorting and aggregation.  Call it a
 MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
 performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
 would directly manage two Tuplesortstate, CUR and NEXT.  The node would
 have
 an initial phase similar to ExecSort(), in which it drains the outer node
 to
 populate the first CUR.  After that, it looks more like
 agg_retrieve_direct(),
 except that CUR is the input source, and each tuple drawn is also put into
 NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
 This
 design does not add I/O consumption or require a nonstandard communication
 channel between executor nodes.  Tom, Andrew, does that look satisfactory?


So you are essentially proposing merging ChainAggregate and its
corresponding Sort node?

So the structure would be something like:

GroupAggregate
-- MultiGroupAgg (a,b)
 MultiGroupAgg (c,d) ...

I am not sure if I understand you correctly. Only the top level
GroupAggregate node projects the result of the entire operation. The key to
ChainAggregate nodes is that each ChainAggregate node handles grouping sets
that fit a single ROLLUP list i.e. can be done by a single sort order.
There can be multiple lists of this type in a single GS operation, however,
our current design has only a single top GroupAggregate node but a
ChainAggregate node + Sort node per sort order. If you are proposing
replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
with a single MultiGroupAggregate node, I am not able to understand how it
will handle all the multiple sort orders. If you are proposing replacing
only ChainAggregate + Sort node with a single MultiGroupAgg node, that
still shares the tuplestore with top level GroupAggregate node.

I am pretty sure I have messed up my understanding of your proposal. Please
correct me if I am wrong.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-12-31 Thread Michael Paquier
On Wed, Dec 31, 2014 at 4:44 PM, Guillaume Lelarge
guilla...@lelarge.info wrote:
 2014-12-12 14:58 GMT+01:00 Heikki Linnakangas hlinnakan...@vmware.com:
 Now, what do we do with the back-branches? I'm not sure. Changing the
 behaviour in back-branches could cause nasty surprises. Perhaps it's best to
 just leave it as it is, even though it's buggy.


 As long as master is fixed, I don't actually care. But I agree with Dennis
 that it's hard to see what's been commited with all the different issues
 found, and if any commits were done, in which branch. I'd like to be able to
 tell my customers: update to this minor release to see if it's fixed, but I
 can't even do that.
This bug does not endanger at all data consistency as only old WAL
files remain on the standby, so I'm fine as well with just a clean fix
on master, and nothing done on back-branches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2014-12-31 Thread Heikki Linnakangas

(reviving an old thread)

On 08/24/2013 12:53 AM, Josh Berkus wrote:

On 08/23/2013 02:08 PM, Heikki Linnakangas wrote:


Here's a bigger patch, which does more. It is based on the ideas in the
post I started this thread with, with feedback incorporated from the
long discussion. With this patch, WAL disk space usage is controlled by
two GUCs:

min_recycle_wal_size
checkpoint_wal_size


snip


These settings are fairly intuitive for a DBA to tune. You begin by
figuring out how much disk space you can afford to spend on WAL, and set
checkpoint_wal_size to that (with some safety margin, of course). Then
you set checkpoint_timeout based on how long you're willing to wait for
recovery to finish. Finally, if you have infrequent batch jobs that need
a lot more WAL than the system otherwise needs, you can set
min_recycle_wal_size to keep enough WAL preallocated for the spikes.


We'll want to rename them to make it even *more* intuitive.


Sure, I'm all ears.


But ... do I understand things correctly that checkpoint wouldn't kick
in until you hit checkpoint_wal_size?  If that's the case, isn't real
disk space usage around 2X checkpoint_wal_size if spread checkpoint is
set to 0.9?  Or does checkpoint kick in sometime earlier?


It kicks in earlier, so that the checkpoint *completes* just when 
checkpoint_wal_size of WAL is used up. So the real disk usage is 
checkpoint_wal_size.


There is still an internal variable called CheckPointSegments that 
triggers the checkpoint, but it is now derived from checkpoint_wal_size 
(see CalculateCheckpointSegments function):


CheckPointSegments = (checkpoint_wal_size / 16 MB) / (2 + 
checkpoint_completion_target)


This is the same formula we've always had in the manual for calculating 
the amount of WAL space used, but in reverse. I.e. we calculate 
CheckPointSegments based on the desired disk space usage, not the other 
way round.



As a note, pgBench would be a terrible test for this patch; we really
need something which creates uneven traffic.  I'll see if I can devise
something.


Attached is a rebased version of this patch. Everyone, please try this 
out on whatever workloads you have, and let me know:


a) How does the auto-tuning of the number of recycled segments work? 
Does pg_xlog reach a steady-state size, or does it fluctuate a lot?


b) Are the two GUCs, checkpoint_wal_size, and min_recycle_wal_size, 
intuitive to set?


- Heikki

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bcb106..34f9466 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1325,7 +1325,7 @@ include_dir 'conf.d'
 40% of RAM to varnameshared_buffers/varname will work better than a
 smaller amount.  Larger settings for varnameshared_buffers/varname
 usually require a corresponding increase in
-varnamecheckpoint_segments/varname, in order to spread out the
+varnamecheckpoint_wal_size/varname, in order to spread out the
 process of writing large quantities of new or changed data over a
 longer period of time.
/para
@@ -2394,18 +2394,21 @@ include_dir 'conf.d'
  titleCheckpoints/title
 
 variablelist
- varlistentry id=guc-checkpoint-segments xreflabel=checkpoint_segments
-  termvarnamecheckpoint_segments/varname (typeinteger/type)
+ varlistentry id=guc-checkpoint-wal-size xreflabel=checkpoint_wal_size
+  termvarnamecheckpoint_wal_size/varname (typeinteger/type)/term
   indexterm
-   primaryvarnamecheckpoint_segments/ configuration parameter/primary
+   primaryvarnamecheckpoint_wal_size/ configuration parameter/primary
   /indexterm
   /term
   listitem
para
-Maximum number of log file segments between automatic WAL
-checkpoints (each segment is normally 16 megabytes). The default
-is three segments.  Increasing this parameter can increase the
-amount of time needed for crash recovery.
+Maximum size to let the WAL grow to between automatic WAL
+checkpoints. This is a soft limit; WAL size can exceed
+varnamecheckpoint_wal_size/ under special circumstances, like
+under heavy load, a failing varnamearchive_command/, or a high
+varnamewal_keep_segments/ setting. The default is 128 MB.
+Increasing this parameter can increase the amount of time needed for
+crash recovery.
 This parameter can only be set in the filenamepostgresql.conf/
 file or on the server command line.
/para
@@ -2458,7 +2461,7 @@ include_dir 'conf.d'
 Write a message to the server log if checkpoints caused by
 the filling of checkpoint segment files happen closer together
 than this many seconds (which suggests that
-varnamecheckpoint_segments/ ought to be raised).  The default is
+varnamecheckpoint_wal_size/ ought to be raised).  The default is
 30 seconds (literal30s/).  Zero disables the warning.
 No 

Re: [HACKERS] Redesigning checkpoint_segments

2014-12-31 Thread Heikki Linnakangas

On 09/01/2013 10:37 AM, Amit Kapila wrote:

On Sat, Aug 24, 2013 at 2:38 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

a.
In XLogFileInit(),
/*
!  * XXX: What should we use as max_segno? We used to use XLOGfileslop when
!  * that was a constant, but that was always a bit dubious: normally, at a
!  * checkpoint, XLOGfileslop was the offset from the checkpoint record,
!  * but here, it was the offset from the insert location. We can't do the
!  * normal XLOGfileslop calculation here because we don't have access to
!  * the prior checkpoint's redo location. So somewhat arbitrarily, just
!  * use CheckPointSegments.
!  */
! max_segno = logsegno + CheckPointSegments;
   if (!InstallXLogFileSegment(installed_segno, tmppath,
! *use_existent, max_segno,
   use_lock))

Earlier max_advance is same when InstallXLogFileSegment is called from
RemoveOldXlogFiles() and XLogFileInit(),
but now they will be different (and it seems there is no direct
relation between these 2 numbers), so will it be okay for scenario
when someone else has created the file while this function was
filling, because it needs to restore as future segment which will be
decided based on max_segno?


I haven't really thought hard about the above. As the comment says, 
passing the same max_advance value here and in RemoveOldXlogFiles() was 
a bit dubious too, because the reference point was different.


I believe it's quite rare that two processes create a new WAL segment 
concurrently, so it isn't terribly important what we do here.



b. Do createrestartpoint need to update the
CheckPointDistanceEstimate, as when it will try to remove old xlog
files, it needs recycleSegNo which is calculated using
CheckPointDistanceEstimate?


Yeah, you're right, it should. I haven't tested this with archive 
recovery or replication at all yet.



As a developer, I would love to have configuration knob such as
min_recycle_wal_size, but not sure how many users will be comfortable
setting this value, actually few users I had talked about this earlier
are interested in setting max WAL size which can allow them to set an
upper limit on space required by WAL.
Can't we think of doing the calculation of files to recycle only based
on CheckPointDistanceEstimate.


You can always just leave min_recycle_wal_size to the default. It sets a 
minimum for the number of preallocated segments, which can help if you 
have spikes that consume a lot of WAL, like nightly batch jobs. But if 
you don't have such spikes, or the overhead of creating new segments 
when such a spike happens isn't too large, you don't need to set it.


One idea is to try to make the creation of new WAL segments faster. Then 
it wouldn't hurt so much if you run out of preallocated/recycled 
segments and need to suddenly create a lot of new ones. Then we might 
not need a minimum setting at all.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-31 Thread Fabien COELHO


Here is a review  updated version of the patch.

Patch applies and compiles without problem on current head,
and worked for the various tests I made with custom scripts.

This patch is a good thing, and I recommand warmly its inclusion. It 
extends greatly pgbench custom capabilities by allowing arbitrary integer 
expressions, and will allow to add other functions (I'll send abs()  a 
hash(), and possibly others).


I'm not sure about what Makefile changes could be necessary for
various environments, it looks ok to me as it is.

I have included the following additional changes in v2:

(1) small update to pgbench documentation. English proof reading is welcome.

(2) improve error reporting to display the file and line from where an error
is raised, and also the column on syntax errors (yyline is always 1...).
The prior status was that there were no way to get this information, which
was quite annoying.  It could probably be improved further.

(3) spacing fixed in a few places.

If Robert is ok with these changes, I think it could be applied.

--
Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..67e2bf6 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,7 @@ PGFILEDESC = pgbench - a simple program for running benchmark tests
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +18,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..243c6b9
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include postgres_fe.h
+
+#include pgbench.h
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix=expr_yy
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type expr expr
+%type ival INTEGER
+%type str VARIABLE
+%token INTEGER VARIABLE
+%token CHAR_ERROR /* never used, will raise a syntax error */
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; }
+	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_INTEGER_CONSTANT;
+	expr-u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_VARIABLE;
+	expr-u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_OPERATOR;
+	expr-u.operator.operator = operator;
+	expr-u.operator.lexpr = lexpr;
+	expr-u.operator.rexpr = rexpr;
+	return expr;
+}
+
+#include exprscan.c
diff --git a/contrib/pgbench/exprscan.l 

Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Tomas Vondra
On 28.12.2014 00:46, Noah Misch wrote:
 On Tue, Dec 23, 2014 at 03:32:59PM +0100, Tomas Vondra wrote:
 On 23.12.2014 15:21, Andrew Dunstan wrote:

 No, config_opts is what's passed to configure. Try something like:

 if ($branch eq 'REL9_0_STABLE')
 {
 $skip_steps{'pl-install-check'} = 1;
 }

 Applied to all three animals.
 
 As of the time of this change, the animals ceased to report in.

The strange thing is that while the first run worked fine (as
illustrated by the results submitted to pgbuildfarm.org), all the
following runs fail like this:

Global symbol %skip_steps requires explicit package name at
build-farm.conf line 308.
Compilation failed in require at ./run_branches.pl line 55.

Apparently, something is broken, but my Perl-fu is limited so I have no
idea what/why :-(

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread José Luis Tallón

On 12/30/2014 04:16 PM, Stephen Frost wrote:

[snip]
The approach I was thinking was to document and implement this as
impliciting granting exactly USAGE and SELECT rights, no more (not
BYPASSRLS) and no less (yes, the role could execute functions).  I agree
that doing so would be strictly more than what pg_dump actually requires
but it's also what we actually have support for in our privilege system.


Hmmm coupled with your comments below, I'd say some tweaking of the 
existing privileges is actually needed if we want to add these new 
capabilities.


BTW, and since this is getting a bit bigger than originally considered: 
would it be interesting to support some extension-defined capabilities, too?
(for things can't be easily controlled by the existing USAGE / 
SELECT / ... rights, I mean)



it would only give you COPY access. (And also
COPY != SELECT in the way that the rule system applies, I think? And this
one could be for COPY only)

COPY certainly does equal SELECT rights..  We haven't got an independent
COPY privilege and I don't think it makes sense to invent one.


FWIW, a COPY(DUMP?) privilege different from SELECT would make sense.
Considering your below comments it would be better that it not imply 
BYPASSRLS, though.



It
sounds like you're suggesting that we add a hack directly into COPY for
this privilege, but my thinking is that the right place to change for
this is in the privilege system and not hack it into individual
commands..  I'm also a bit nervous that we'll end up painting ourselves
into a corner if we hack this to mean exactly what pg_dump needs today.

Agreed.


Lastly, I've been considering other use-cases for this privilege beyond
the pg_dump one (thinking about auditing, for example).


ISTR there was something upthread on an AUDIT role, right?
This might be the moment to add it



[snip]

Similar concerns would exist for the existing REPLICATION role for example
- that one clearly lets you bypass RLS as well, just not with a SQL
statement.

I'm not sure that I see the point you're making here.  Yes, REPLICATION
allows you to do a filesystem copy of the entire database and that
clearly bypasses RLS and *all* of our privilege system.  I'm suggesting
that this role attribute work as an implicit grant of privileges we
already have.  That strikes me as easy to document and very clear for
users.


+1

[snip]
So you're saying a privilege that would allow you to do
pg_start_backup()/pg_stop_backup() but *not* actually use pg_basebackup?
Yes.


That would be EXCLUSIVEBACKUP or something like that, to be consistent
with existing terminology though.

Ok.  I agree that working out the specific naming is difficult and would
like to focus on that, but we probably need to agree on the specific
capabilities first. :)


Yes :)
For the record, LOGBACKUP vs PHYSBACKUP was suggested a couple days ago. 
I'd favor DUMP(logical) and BACKUP(physical) --- for lack of a better name.
The above reasoning would have pg_basebackup requiring REPLICATION, 
which is a logical consequence of the implementation but strikes me as a 
bit surprising in the light of these other privileges.




[snip]

Personalyl I think using the DUMP name makes that a lot more clear. Maybe
we need to avoid using BACKUP alone as well, to make sure it doesn't go the
other way - using BASEBACKUP and EXCLUSIVEBACKUP for those two different
ones perhaps?

DUMP - implicitly grants existing rights, to facilitate pg_dump and
perhaps some other use-cases
BASEBACKUP - allows pg_basebackup, which means bypassing all in-DB
  privilege systems
EXCLUSIVEBACKUP - just allows pg_start/stop_backup and friends

I'm still not entirely happy with the naming, but I feel like we're
getting there.  One thought I had was using ARCHIVE somewhere, but I
kind of like BASEBACKUP meaning what's needed to run pg_basebackup, and,
well, we say 'backup' in the pg_start/stop_backup function names, so
it's kind of hard to avoid that.  EXCLUSIVEBACKUP seems really long tho.


ARCHIVE, though completely appropriate for the exclusivebackup case 
above (so this would become DUMP/BASEBACKUP/ARCHIVE +REPLICATION) might 
end up causing quite some confusion ... (what? WAL archiving is NOT 
granted by the archive privilege, but requires a superuser to turn it 
on(via ALTER SYSTEM)?


POLA again

I had defined them when I started the thread:

pg_start_backup
pg_stop_backup
pg_switch_xlog
pg_create_restore_point


... for BACKUP / EXCLUSIVEBACKUP (or, actually, FSBACKUP/PHYSBACKUP ...)

Later I added:

pg_xlog_replay_pause
pg_xlog_replay_resume

Though I'd be find if the xlog_replay ones were their own privilege (eg:
REPLAY or something).

+1

I think just calling them xlog related functions is doing us a disservice
there. Definitely once we have an actual documentation to write for it, but
also in this discussion.

[snip]

If it's for replicas, then why are we not using the REPLICATION privilege
which is extremely similar to this?

I 

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-12-31 Thread Amit Kapila
On Mon, Dec 29, 2014 at 11:10 AM, Dilip kumar dilip.ku...@huawei.com
wrote:

 On 29 December 2014 10:22 Amit Kapila Wrote,


 I think nothing more to be handled from my side, you can go ahead with
review..


The patch looks good to me.  I have done couple of
cosmetic changes (spelling mistakes, improve some comments,
etc.), check the same once and if you are okay, we can move
ahead.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


vacuumdb_parallel_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread Magnus Hagander
On Tue, Dec 30, 2014 at 4:16 PM, Stephen Frost sfr...@snowman.net wrote:

 * Magnus Hagander (mag...@hagander.net) wrote:
  On Mon, Dec 29, 2014 at 11:01 PM, Stephen Frost sfr...@snowman.net
 wrote:
   That said, a 'DUMP' privilege which allows the user to dump the
 contents
   of the entire database is entirely reasonable.  We need to be clear in
   the documentation though- such a 'DUMP' privilege is essentially
   granting USAGE on all schemas and table-level SELECT on all tables and
 
  sequences (anything else..?).  To be clear, a user with this privilege
   can utilize that access without using pg_dump.
 
  Well, it would not give you full USAGE - granted USAGE on a schema, you
 can
  execute functions in there for example (provided permissions). This
  privilege would not do that,

 The approach I was thinking was to document and implement this as
 impliciting granting exactly USAGE and SELECT rights, no more (not
 BYPASSRLS) and no less (yes, the role could execute functions).  I agree


If the role could execute functions, it's *not* exactly USAGE and SELECT
rights, it's now USAGE and SELECT and EXECUTE rights... Just to be
nitpicking, but see below.



 that doing so would be strictly more than what pg_dump actually requires
 but it's also what we actually have support for in our privilege system.


Yeah, but would it also be what people would actually *want*?

I think having it do exactly what pg_dump needs, and not things like
execute functions etc, would be the thing people want for a 'DUMP'
privilege.

We might *also* want a USAGEANDSELECTANDEXECUTEANDWHATNOTBUTNOBYPASSURL
(crap, it has to fit within NAMEDATALEN?) privilege, but I think that's a
different thing.


 it would only give you COPY access. (And also
  COPY != SELECT in the way that the rule system applies, I think? And this
  one could be for COPY only)

 COPY certainly does equal SELECT rights..  We haven't got an independent
 COPY privilege and I don't think it makes sense to invent one.  It


We don't, but I'm saying it might make sense to implement this. Not as a
regular privilige, but as a role attribute. I'm not sure it's the right
thing, but it might actually be interesting to entertain.



 sounds like you're suggesting that we add a hack directly into COPY for
 this privilege, but my thinking is that the right place to change for
 this is in the privilege system and not hack it into individual
 commands..  I'm also a bit nervous that we'll end up painting ourselves
 into a corner if we hack this to mean exactly what pg_dump needs today.


One point would be that if we define it as exactly what pg_dump needs,
that definition can change in a future major version.


 One other point is that this shouldn't imply any other privileges, imv.
   I'm specifically thinking of BYPASSRLS- that's independently grantable
   and therefore should be explicitly set, if it's intended.  Things
 
  I think BYPASSRLS would have to be implicitly granted by the DUMP
  privilege. Without that, the DUMP privilege is more or less meaningless
  (for anybody who uses RLS - but if they don't use RLS, then having it
  include BYPASSRLS makes no difference). Worse than that, you may end up
  with a dump that you cannot restore.

 The dump would fail, as I mentioned before.  I don't see why BYPASSRLS
 has to be implicitly granted by the DUMP privilege and can absolutely
 see use-cases for it to *not* be.  Those use-cases would require passing
 pg_dump the --enable-row-security option, but that's why it's there.


So you're basically saying that if RLS is in use anywhere, this priviliege
alone would be useless, correct? And you would get a hard failure at *dump
time*, not at reload time? That would at least make it acceptable, as you
would know it's wrong. and we could make the error messages shown
specifically hint that you need to grant both privileges to make it useful.

We could/should also throw a WARNING if DUMP Is granted to a role without
BYPASSRLS in case row level security is enabled in the system, I think. But
that's more of an implementation detail.



Implementations which don't use RLS are not impacted either way, so we
 don't need to consider them.  Implementations which *do* use RLS, in my
 experience, would certainly understand and expect BYPASSRLS to be
 required if they want this role to be able to dump all data out
 regardless of the RLS policies.  What does implicitly including
 BYPASSRLS in this privilege gain us?  A slightly less irritated DBA who
 forgot to include BYPASSRLS and ended up getting a pg_dump error because
 of it.  On the other hand, it walls off any possibility of using this
 privilege for roles who shouldn't be allowed to bypass RLS or for
 pg_dumps to be done across the entire system for specific RLS policies.


Yeah, I think that's acceptable as long as we get the error at dump, and
not at reload (when it's too late to fix it).


 Similar concerns would exist for the existing REPLICATION role for example
  - that 

[HACKERS] Performance improvement for joins where outer side is unique

2014-12-31 Thread David Rowley
Hi,

I've been hacking a bit at the join code again... This time I've been
putting some effort into  optimising the case where the inner side of the
join is known to be unique.
For example, given the tables:

create table t1 (id int primary key);
create table t2 (id int primary key);

And query such as:

select * from t1 left outer join t2 on t1.id=t2.id;

It is possible to deduce at planning time that for each row in the outer
relation, only 0 or 1 rows can exist in the inner relation, (inner being
t2)

In all of our join algorithms in the executor, if the join type is SEMI,
 we skip to the next outer row once we find a matching inner row. This is
because we don't want to allow duplicate rows in the inner side to
duplicate outer rows in the result set. Obviously this is required per SQL
spec. I believe we can also skip to the next outer row in this case when
we've managed to prove that no other row can possibly exist that matches
the current outer row, due to a unique index or group by/distinct clause
(for subqueries).

I've attached a patch which aims to prove this idea works. Although so far
I've only gotten around to implementing this for outer joins.

Since code already existed in analyzejoin.c which aimed to determine if a
join to a relation was unique on the join's condition, the patch is pretty
much just some extra plumbing and a small rewire of analyzejoin.c, which
just aims to get the unique_inner bool value down to the join node.

The performance improvement is somewhere around 5-10% for hash joins, and a
bit less for merge join. In theory it could be 50% for nested loop joins.
In my life in the OLTP world, these unique joins pretty much cover all
joins that ever happen ever. Perhaps the OLAP world is different, so from
my point of view this is going to be a very nice performance gain.

I'm seeing this patch (or a more complete version) as the first step to a
whole number of other planner improvements:

A couple of examples of those are:

1.
explain select * from sales s inner join product p on p.productid =
s.productid order by s.productid,p.name;

The current plan for this looks like:
   QUERY PLAN

 Sort  (cost=149.80..152.80 rows=1200 width=46)
   Sort Key: s.productid, p.name
   -  Hash Join  (cost=37.00..88.42 rows=1200 width=46)
 Hash Cond: (s.productid = p.productid)
 -  Seq Scan on sales s  (cost=0.00..31.40 rows=2140 width=8)
 -  Hash  (cost=22.00..22.00 rows=1200 width=38)
   -  Seq Scan on product p  (cost=0.00..22.00 rows=1200
width=38)

But in reality we could have executed this with a simple merge join using
the PK of product (productid) to provide presorted input. The extra sort
column on p.name is redundant due to productid being unique.

The UNION planning is crying out for help for cases like this. Where it
could provide sorted input on a unique index, providing the union's
targetlist contained all of the unique index's columns, and we also managed
to find an index on the other part of the union on the same set of columns,
then we could perform a Merge Append and a Unique. This would cause a
signification improvement in execution time for these types of queries, as
the current planner does an append/sort/unique, which especially sucks for
plans with a LIMIT clause.

I think this should solve some of the problems that Kyotarosan encountered
in his episode of dancing with indices over here -
http://www.postgresql.org/message-id/20131031.194310.212107585.horiguchi.kyot...@lab.ntt.co.jp
where he was unable to prove that he could trim down sort nodes once all of
the columns of a unique index had been seen in the order by due to not
knowing if joins were going to duplicate the outer rows.

2.
It should also be possible to reuse a join in situations like:
create view product_sales as select p.productid,sum(s.qty) soldqty from
product p inner join sales s group by p.productid;

Where the consumer does:

select ps.*,p.current_price from product_sales ps inner join product p on
ps.productid = p.productid;

Here we'd currently join the product table twice, both times on the same
condition, but we could safely not bother doing that if we knew that the
join could never match more than 1 row on the inner side. Unfortunately I
deal with horrid situations like this daily, where people have used views
from within views, and all the same tables end up getting joined multiple
times :-(

Of course, both 1 and 2 should be considered separately from the attached
patch, this was just to show where it might lead.

Performance of the patch are as follows:


Test case:
create table t1 (id int primary key);
create table t2 (id int primary key);

insert into t1 select x.x from generate_series(1,100) x(x);
insert into t2 select x.x from generate_series(1,100) x(x);

vacuum analyze;

Query:
select count(t2.id) from t1 left outer join t2 on t1.id=t2.id;

Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-31 Thread David Rowley
On 31 December 2014 at 18:20, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Dec 26, 2014 at 7:57 AM, David Rowley dgrowle...@gmail.com
 wrote:
  1. Do we need to keep the 128 byte aggregate state size for machines
 without
  128 bit ints? This has been reduced to 48 bytes in the patch, which is in
  favour code being compiled with a compiler which has 128 bit ints.  I
 kind
  of think that we need to keep the 128 byte estimates for compilers that
  don't support int128, but I'd like to hear any counter arguments.

 I think you're referring to the estimated state size in pg_aggregate
 here, and I'd say it's probably not a big deal one way or the other.
 Presumably, at some point, 128-bit arithmetic will become common
 enough that we'll want to change that estimate, but I don't know
 whether we've reached that point or not.


Yeah, that's what I was talking about.
I'm just looking at the code which uses this size estimate
in choose_hashed_grouping(). I'd be a bit worried giving the difference
between 48 and 128 that we'd under estimate the hash size too much and end
up going to disk during hashagg.

I think the patch should be modified so that it uses 128 bytes for the size
estimate on machines that don't support int128, but I'm not quite sure at
this stage if that causes issues for replication, if 1 machine had support
and one didn't, would it matter if the pg_aggregate row on the replica was
48 bytes instead of 128? Is this worth worrying about?



  2. References to int16 meaning 16 bytes. I'm really in two minds about
 this,
  it's quite nice to keep the natural flow, int4, int8, int16, but I can't
  help think that this will confuse someone one day. I think it'll be a
 long
  time before it confused anyone if we called it int128 instead, but I'm
 not
  that excited about seeing it renamed either. I'd like to hear what others
  have to say... Is there a chance that some sql standard in the distant
  future will have HUGEINT and we might regret not getting the internal
 names
  nailed down?

 Yeah, I think using int16 to mean 16-bytes will be confusing to
 someone almost immediately.


hmm, I think it should be changed to int128 then.  Pitty int4 was selected
as a name instead of int32 back in the day...

I'm going to mark the patch as waiting on author, pending those two changes.

My view with the size estimates change is that if a committer deems it
overkill, then they can rip it out, but at least it's been thought of and
considered rather than forgotten about.

Regards

David Rowley


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 On Tue, Dec 30, 2014 at 4:16 PM, Stephen Frost sfr...@snowman.net wrote:
  The approach I was thinking was to document and implement this as
  impliciting granting exactly USAGE and SELECT rights, no more (not
  BYPASSRLS) and no less (yes, the role could execute functions).  I agree
 
 If the role could execute functions, it's *not* exactly USAGE and SELECT
 rights, it's now USAGE and SELECT and EXECUTE rights... Just to be
 nitpicking, but see below.

We seem to be coming at it from two different directions, so I'll try to
clarify.  What I'm suggesting is that this role attribute be strictly
*additive*.  Any other privileges granted to the role with this role
attribute would still be applicable, including grants made to PUBLIC.

This role attribute would *not* include EXECUTE rights, by that logic.
However, if PUBLIC was granted EXECUTE rights for the function, then
this role could execute that function.

What it sounds like is you're imagining this role attribute to mean the
role has *only* USAGE and SELECT (or COPY or whatever) rights across the
board and that any grants done explicitly to this role or to public
wouldn't be respected.  In my view, that moves this away from a role
*attribute* to being a pre-defined *role*, as such a role would not be
usable for anything else.

  that doing so would be strictly more than what pg_dump actually requires
  but it's also what we actually have support for in our privilege system.
 
 Yeah, but would it also be what people would actually *want*?

I can certainly see reasons why you'd want such a role to be able to
execute functions- in particular, consider xlog_pause anx xlog_resume.
If this role can't execute those functions then they probably can't
successfully run pg_dump against a replica.

 I think having it do exactly what pg_dump needs, and not things like
 execute functions etc, would be the thing people want for a 'DUMP'
 privilege.

What if we want pg_dump in 9.6 to have an option to execute xlog_pause
and xlog_resume for you?  You wouldn't be able to run that against a 9.5
database (or at least, that option wouldn't work).

 We might *also* want a USAGEANDSELECTANDEXECUTEANDWHATNOTBUTNOBYPASSURL
 (crap, it has to fit within NAMEDATALEN?) privilege, but I think that's a
 different thing.

Our privilege system doesn't currently allow for negative grants (deny
user X the ability to run functions, even if EXECUTE is granted to
PUBLIC).  I'm not against that idea, but I don't see it as the same as
this.

  it would only give you COPY access. (And also
   COPY != SELECT in the way that the rule system applies, I think? And this
   one could be for COPY only)
 
  COPY certainly does equal SELECT rights..  We haven't got an independent
  COPY privilege and I don't think it makes sense to invent one.  It
 
 We don't, but I'm saying it might make sense to implement this. Not as a
 regular privilige, but as a role attribute. I'm not sure it's the right
 thing, but it might actually be interesting to entertain.

We've discussed having a role attribute for COPY-from-filesystem, but
pg_dump doesn't use that ever, it only uses COPY TO STDOUT.  I'm not
a fan of making a COPY_TO_STDOUT-vs-SELECT distinction just for this..

  sounds like you're suggesting that we add a hack directly into COPY for
  this privilege, but my thinking is that the right place to change for
  this is in the privilege system and not hack it into individual
  commands..  I'm also a bit nervous that we'll end up painting ourselves
  into a corner if we hack this to mean exactly what pg_dump needs today.
 
 One point would be that if we define it as exactly what pg_dump needs,
 that definition can change in a future major version.

Sure, but that then gets a bit ugly because we encourage running the
latest version of pg_dump against the prior-major-version.

   I think BYPASSRLS would have to be implicitly granted by the DUMP
   privilege. Without that, the DUMP privilege is more or less meaningless
   (for anybody who uses RLS - but if they don't use RLS, then having it
   include BYPASSRLS makes no difference). Worse than that, you may end up
   with a dump that you cannot restore.
 
  The dump would fail, as I mentioned before.  I don't see why BYPASSRLS
  has to be implicitly granted by the DUMP privilege and can absolutely
  see use-cases for it to *not* be.  Those use-cases would require passing
  pg_dump the --enable-row-security option, but that's why it's there.
 
 So you're basically saying that if RLS is in use anywhere, this priviliege
 alone would be useless, correct?

No, it would still be *extremely* useful.  We have an option to pg_dump
to say please dump with RLS enabled.  What that means is that you'd be
able to dump the entire database for all data you're allowed to see
through RLS policies.  How is that useless?  I certainly think it'd be
very handy and a good way to get *segregated* dumps according to policy.

 And you would 

Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-31 Thread Andres Freund
On 2015-01-01 03:00:50 +1300, David Rowley wrote:
   2. References to int16 meaning 16 bytes. I'm really in two minds about
  this,
   it's quite nice to keep the natural flow, int4, int8, int16, but I can't
   help think that this will confuse someone one day. I think it'll be a
  long
   time before it confused anyone if we called it int128 instead, but I'm
  not
   that excited about seeing it renamed either. I'd like to hear what others
   have to say... Is there a chance that some sql standard in the distant
   future will have HUGEINT and we might regret not getting the internal
  names
   nailed down?
 
  Yeah, I think using int16 to mean 16-bytes will be confusing to
  someone almost immediately.
 
 
 hmm, I think it should be changed to int128 then.  Pitty int4 was selected
 as a name instead of int32 back in the day...

Note that the C datatype has been int32/int64 for a while now, it's just
the SQL datatype and the names of its support functions. Given that,
afaiu, we're talking about the C datatype it seems pretty clear that it
should be named int128.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-31 Thread Thom Brown
On 18 December 2014 at 16:03, Amit Kapila amit.kapil...@gmail.com wrote:



 On Thu, Dec 18, 2014 at 9:22 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 
  On Mon, Dec 8, 2014 at 10:40 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  
   On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost sfr...@snowman.net
 wrote:
   
  
   So to summarize my understanding, below are the set of things
   which I should work on and in the order they are listed.
  
   1. Push down qualification
   2. Performance Data
   3. Improve the way to push down the information related to worker.
   4. Dynamic allocation of work for workers.
  
  
 
  I have worked on the patch to accomplish above mentioned points
  1, 2 and partly 3 and would like to share the progress with community.

 Sorry forgot to attach updated patch in last mail, attaching it now.


When attempting to recreate the plan in your example, I get an error:

 ➤ psql://thom@[local]:5488/pgbench

# create table t1(c1 int, c2 char(500)) with (fillfactor=10);
CREATE TABLE
Time: 13.653 ms

 ➤ psql://thom@[local]:5488/pgbench

# insert into t1 values(generate_series(1,100),'amit');
INSERT 0 100
Time: 4.796 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain select c1 from t1;
ERROR:  could not register background process
HINT:  You may need to increase max_worker_processes.
Time: 1.659 ms

 ➤ psql://thom@[local]:5488/pgbench

# show max_worker_processes ;
 max_worker_processes
--
 8
(1 row)

Time: 0.199 ms

# show parallel_seqscan_degree ;
 parallel_seqscan_degree
-
 10
(1 row)


Should I really need to increase max_worker_processes to =
parallel_seqscan_degree?  If so, shouldn't there be a hint here along with
the error message pointing this out?  And should the error be produced when
only a *plan* is being requested?

Also, I noticed that where a table is partitioned, the plan isn't
parallelised:

# explain select distinct bid from pgbench_accounts;


   QUERY
PLAN

 HashAggregate  (cost=1446639.00..1446643.99 rows=499 width=4)
   Group Key: pgbench_accounts.bid
   -  Append  (cost=0.00..1321639.00 rows=5001 width=4)
 -  Seq Scan on pgbench_accounts  (cost=0.00..0.00 rows=1 width=4)
 -  Seq Scan on pgbench_accounts_1  (cost=0.00..4279.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_2  (cost=0.00..2640.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_3  (cost=0.00..2640.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_4  (cost=0.00..2640.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_5  (cost=0.00..2640.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_6  (cost=0.00..2640.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_7  (cost=0.00..2640.00
rows=10 width=4)
...
 -  Seq Scan on pgbench_accounts_498  (cost=0.00..2640.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_499  (cost=0.00..2640.00
rows=10 width=4)
 -  Seq Scan on pgbench_accounts_500  (cost=0.00..2640.00
rows=10 width=4)
(504 rows)

Is this expected?

Thom


[HACKERS] Small doc patch about pg_service.conf

2014-12-31 Thread David Fetter
Folks,

There was a slash missing, which I've added.  Where is the default
directory on Windows, or is there one?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d829a4b..de272c5 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -6952,7 +6952,7 @@ myEventProc(PGEventId evtId, void *evtInfo, void 
*passThrough)
at filename~/.pg_service.conf/filename or the location
specified by the environment variable envarPGSERVICEFILE/envar,
or it can be a system-wide file
-   at filenameetc/pg_service.conf/filename or in the directory
+   at filename/etc/pg_service.conf/filename or in the directory
specified by the environment variable
envarPGSYSCONFDIR/envar.  If service definitions with the same
name exist in the user and the system file, the user file takes

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Michael Paquier wrote:
  HI all.
  
  markhor has run for the first time in 8 days, and there is something
  in range e703261..72dd233 making the regression test of brin crashing.
  See here:
  http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhordt=2014-12-30%2020%3A58%3A49
 
 This shows that the crash was in the object_address test, not brin.
 Will research.

I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
the object_address test.  The backtrace is pretty strange:

#0  0x7f08ce674107 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f08ce6754e8 in __GI_abort () at abort.c:89
#2  0x007ac071 in ExceptionalCondition (
conditionName=conditionName@entry=0x800f28 !(keylen  64), 
errorType=errorType@entry=0x7e724f FailedAssertion, 
fileName=fileName@entry=0x800ef0 
/pgsql/source/master/src/backend/access/hash/hashfunc.c, 
lineNumber=lineNumber@entry=147)
at /pgsql/source/master/src/backend/utils/error/assert.c:54
#3  0x00494a93 in hashname (fcinfo=fcinfo@entry=0x7fff244324a0)
at /pgsql/source/master/src/backend/access/hash/hashfunc.c:147
#4  0x007b450d in DirectFunctionCall1Coll (func=0x494a50 hashname, 
collation=collation@entry=0, arg1=optimized out)
at /pgsql/source/master/src/backend/utils/fmgr/fmgr.c:1027
#5  0x00797aca in CatalogCacheComputeHashValue 
(cache=cache@entry=0x10367d8, 
nkeys=optimized out, cur_skey=cur_skey@entry=0x7fff244328e0)
at /pgsql/source/master/src/backend/utils/cache/catcache.c:212
#6  0x00798ff7 in SearchCatCache (cache=0x10367d8, v1=18241016, v2=6, 
v3=11, v4=0)
at /pgsql/source/master/src/backend/utils/cache/catcache.c:1149
#7  0x007a67ae in GetSysCacheOid (cacheId=cacheId@entry=15, 
key1=optimized out, 
key2=key2@entry=6, key3=key3@entry=11, key4=key4@entry=0)
at /pgsql/source/master/src/backend/utils/cache/syscache.c:988
#8  0x00504699 in get_collation_oid (name=name@entry=0x11655c0, 
missing_ok=missing_ok@entry=0 '\000')
at /pgsql/source/master/src/backend/catalog/namespace.c:3323
#9  0x0050d8dc in get_object_address 
(objtype=objtype@entry=OBJECT_COLLATION, 
objname=objname@entry=0x11655c0, objargs=objargs@entry=0x0, 
relp=relp@entry=0x7fff24432c28, lockmode=lockmode@entry=1, 
missing_ok=missing_ok@entry=0 '\000')
at /pgsql/source/master/src/backend/catalog/objectaddress.c:704


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread Stephen Frost
Amit,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
 On Tue, Dec 30, 2014 at 6:35 PM, Stephen Frost sfr...@snowman.net wrote:
  I'm pretty sure we've agreed that having a catch-all role attribute like
  'DBA' is a bad idea because we'd likely be adding privileges to it later
  which would expand the rights of users with that attribute, possibly
  beyond what is intended.
 
 Yes, that could happen but do you want to say that is the only reason
 to consider server level roles (such as DBA) a bad idea.  

No, the other reason is that having more granular permissions is better
than catch-all's.  'superuser' is that catch-all today.

 Can't we make
 users aware of such things via documentation and then there will be
 some onus on user's to give such a privilege with care.

Perhaps, but if we're going to go in that direction, I'd rather have a
hierarchy which is built upon more granular options rather than assuming
we know exactly what the 'DBA' role should have for every environment.

 On looking around, it seems many of the databases provides such
 roles
 https://dbbulletin.wordpress.com/2013/05/29/backup-privileges-needed-to-backup-databases/
 
 In the link, though they are talking about physical (file level) backup,
 there is mention about such roles in other databases.

Most of this discussion is about non-physical-level-backups.  We have
the REPLICATION role attribute for physical-level-backups and I don't
think anyone wants to get rid of that in favor of pulling it into some
'DBA' role attribute.

 The other way as discussed on this thread is to use something like
 DUMPALL (or DUMPFULL) privilege which also sounds to be a good
 way, apart from the fact that the same privilege can be used for
 non-dump purposes to extract the information from database/cluster.

I think we're probably going to go with DUMPAUTH as the specific role
attribute privilege, to make it clear that it includes authentication
information.  That's really the only distinction between DUMP (or
whatever) and a privilege to support pg_dumpall.

This does make me think that we need to consider if this role attribute
implicitly grants CONNECT to all databases also.

   How about PHYSICAL BACKUP (for basebackup) and LOGICAL BACKUP
   for pg_dump?
 
  Again, this makes it look like the read-all right would be specific to
  users executing pg_dump, which is incorrect and misleading.  PHYSICAL
  might also imply the ability to do pg_basebackup.
 
 That's right, however having unrelated privileges for doing physical
 backup via pg_basebackup and pg_start*/pg_stop* method also
 doesn't sound to be the best way (can be slightly difficult for
 users to correlate after reading docs).  

We should definitely segregate the privilege to run start/stop backup
and run pg_basebackup.  One allows you to read the database files off
the filesystem more-or-less directly and the other doesn't.  That's a
big difference.

 Don't you think in this case
 we should have some form of hierarchy for privileges (like user
 having privilege to do pg_basebackup can also perform the
 backup via pg_start*/pg_stop* method or some other way to relate
 both forms of physical backup)?

If we had a specific privilege for pg_basebackup then I would agree that
it would allow call pg_start/stop_backup.  The same is not true in
reverse though.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Andres Freund
On 2014-12-31 10:02:40 -0300, Alvaro Herrera wrote:
 Alvaro Herrera wrote:
  Michael Paquier wrote:
   HI all.
   
   markhor has run for the first time in 8 days, and there is something
   in range e703261..72dd233 making the regression test of brin crashing.
   See here:
   http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhordt=2014-12-30%2020%3A58%3A49
  
  This shows that the crash was in the object_address test, not brin.
  Will research.
 
 I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
 the object_address test.  The backtrace is pretty strange:

Hard to say without more detail, but my guess is that the argument to
get_collation_oid() isn't actually valid. For one, that'd explain the
error, for another, the pointer's value (name=name@entry=0x11655c0) is
suspiciously low.

 #8  0x00504699 in get_collation_oid (name=name@entry=0x11655c0, 
 missing_ok=missing_ok@entry=0 '\000')
 at /pgsql/source/master/src/backend/catalog/namespace.c:3323
 #9  0x0050d8dc in get_object_address 
 (objtype=objtype@entry=OBJECT_COLLATION, 
 objname=objname@entry=0x11655c0, objargs=objargs@entry=0x0, 
 relp=relp@entry=0x7fff24432c28, lockmode=lockmode@entry=1, 
 missing_ok=missing_ok@entry=0 '\000')
 at /pgsql/source/master/src/backend/catalog/objectaddress.c:704

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread Magnus Hagander
On Wed, Dec 31, 2014 at 3:08 PM, Stephen Frost sfr...@snowman.net wrote:

 * Magnus Hagander (mag...@hagander.net) wrote:
  On Tue, Dec 30, 2014 at 4:16 PM, Stephen Frost sfr...@snowman.net
 wrote:
   The approach I was thinking was to document and implement this as
   impliciting granting exactly USAGE and SELECT rights, no more (not
   BYPASSRLS) and no less (yes, the role could execute functions).  I
 agree
 
  If the role could execute functions, it's *not* exactly USAGE and SELECT
  rights, it's now USAGE and SELECT and EXECUTE rights... Just to be
  nitpicking, but see below.

 We seem to be coming at it from two different directions, so I'll try to
 clarify.  What I'm suggesting is that this role attribute be strictly
 *additive*.  Any other privileges granted to the role with this role
 attribute would still be applicable, including grants made to PUBLIC.

 This role attribute would *not* include EXECUTE rights, by that logic.
 However, if PUBLIC was granted EXECUTE rights for the function, then
 this role could execute that function.


Ah, ok, mistunderstood that part.


What it sounds like is you're imagining this role attribute to mean the
 role has *only* USAGE and SELECT (or COPY or whatever) rights across the
 board and that any grants done explicitly to this role or to public
 wouldn't be respected.  In my view, that moves this away from a role
 *attribute* to being a pre-defined *role*, as such a role would not be
 usable for anything else.


No, i meant additive as well. I misread you.


  that doing so would be strictly more than what pg_dump actually requires
   but it's also what we actually have support for in our privilege
 system.
 
  Yeah, but would it also be what people would actually *want*?

 I can certainly see reasons why you'd want such a role to be able to
 execute functions- in particular, consider xlog_pause anx xlog_resume.

If this role can't execute those functions then they probably can't
 successfully run pg_dump against a replica.


uh, pg_dump doesn't run those commands :P I don't see why that's a
requirement at all.  And you can still always grant those things on top of
whatever the privilege gives you.


 I think having it do exactly what pg_dump needs, and not things like
  execute functions etc, would be the thing people want for a 'DUMP'
  privilege.

 What if we want pg_dump in 9.6 to have an option to execute xlog_pause
 and xlog_resume for you?  You wouldn't be able to run that against a 9.5
 database (or at least, that option wouldn't work).


It would if you added an explicit grant for it, which would have to be
documented.

I don't see how that's different if the definition is allows select on all
tables. That wouldn't automatically give it the ability to do xlog_pause
in an old version either.


 We might *also* want a USAGEANDSELECTANDEXECUTEANDWHATNOTBUTNOBYPASSURL
  (crap, it has to fit within NAMEDATALEN?) privilege, but I think that's a
  different thing.

 Our privilege system doesn't currently allow for negative grants (deny
 user X the ability to run functions, even if EXECUTE is granted to
 PUBLIC).  I'm not against that idea, but I don't see it as the same as
 this.


Agreed. That's what I said - different thing :)


  it would only give you COPY access. (And also
COPY != SELECT in the way that the rule system applies, I think? And
 this
one could be for COPY only)
  
   COPY certainly does equal SELECT rights..  We haven't got an
 independent
   COPY privilege and I don't think it makes sense to invent one.  It
 
  We don't, but I'm saying it might make sense to implement this. Not as a
  regular privilige, but as a role attribute. I'm not sure it's the right
  thing, but it might actually be interesting to entertain.

 We've discussed having a role attribute for COPY-from-filesystem, but
 pg_dump doesn't use that ever, it only uses COPY TO STDOUT.  I'm not
 a fan of making a COPY_TO_STDOUT-vs-SELECT distinction just for this..


Yeah, it's probably going overboard with it, since AFAICT the only thing
that would actually be affected is RULEs on SELECT, which I bet most people
don't use on their tables.


  sounds like you're suggesting that we add a hack directly into COPY for
   this privilege, but my thinking is that the right place to change for
   this is in the privilege system and not hack it into individual
   commands..  I'm also a bit nervous that we'll end up painting ourselves
   into a corner if we hack this to mean exactly what pg_dump needs today.
 
  One point would be that if we define it as exactly what pg_dump needs,
  that definition can change in a future major version.

 Sure, but that then gets a bit ugly because we encourage running the
 latest version of pg_dump against the prior-major-version.


But the latest version of pg_dump *knows* how the prior major version
behaved with these things, and can either adapt, or give the user a message
about what they need to do to adapt.


   I think BYPASSRLS would have 

Re: [HACKERS] documentation update for doc/src/sgml/func.sgml

2014-12-31 Thread Fabien COELHO


Here is a slight update so that type names are treated homogeneously 
between both added paragraphs.


ITSM that this patch should be committed without further ado.

--
Fabien.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2016c5a..f91033b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -939,6 +939,26 @@
 /tgroup
/table
 
+   para
+For functions like functionround()/, functionlog()/ and
+functionsqrt()/ which run against either fixed-precision
+(typenumeric/) or floating-point numbers (e.g. typereal/),
+note that the results of these operations will differ according
+to the input type due to rounding. This is most observable with
+functionround()/, which can end up rounding down as well as up for
+any literal0.5/ value. productnamePostgreSQL/productname's
+handling of floating-point values depends on the operating system, which
+may or may not follow the IEEE floating-point standard.
+  /para
+
+  para
+The bitwise operators work only on integral data types, whereas
+the others are available for all numeric data types. The bitwise
+operators are also available for the bit string types
+typebit/ and typebit varying/, as
+shown in xref linkend=functions-bit-string-op-table.
+   /para
+
   para
 xref linkend=functions-math-random-table shows functions for
 generating random numbers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread Stephen Frost
José,

* José Luis Tallón (jltal...@adv-solutions.net) wrote:
 On 12/30/2014 04:16 PM, Stephen Frost wrote:
 The approach I was thinking was to document and implement this as
 impliciting granting exactly USAGE and SELECT rights, no more (not
 BYPASSRLS) and no less (yes, the role could execute functions).  I agree
 that doing so would be strictly more than what pg_dump actually requires
 but it's also what we actually have support for in our privilege system.
 
 Hmmm coupled with your comments below, I'd say some tweaking of
 the existing privileges is actually needed if we want to add these
 new capabilities.

Not sure I see how..?  Can you clarify what you think needs to be
changed in the existing privilege system?

 BTW, and since this is getting a bit bigger than originally
 considered: would it be interesting to support some
 extension-defined capabilities, too?
 (for things can't be easily controlled by the existing USAGE /
 SELECT / ... rights, I mean)

Not entirely following what you mean here either, but to the extent that
it's independent from the current discussion, perhaps it deserves to be
on its own thread?

 it would only give you COPY access. (And also
 COPY != SELECT in the way that the rule system applies, I think? And this
 one could be for COPY only)
 COPY certainly does equal SELECT rights..  We haven't got an independent
 COPY privilege and I don't think it makes sense to invent one.
 
 FWIW, a COPY(DUMP?) privilege different from SELECT would make sense.
 Considering your below comments it would be better that it not imply
 BYPASSRLS, though.

How would a COPY-to-STDOUT privilege be different from SELECT?

 Lastly, I've been considering other use-cases for this privilege beyond
 the pg_dump one (thinking about auditing, for example).
 
 ISTR there was something upthread on an AUDIT role, right?
 This might be the moment to add it

One of the challenges to adding such a role is defining what it means.
What privileges do you think such a role would have?  I can see that
perhaps it would include read-only access to everything, but I'd want it
to also have read access to the logs..

 That would be EXCLUSIVEBACKUP or something like that, to be consistent
 with existing terminology though.
 Ok.  I agree that working out the specific naming is difficult and would
 like to focus on that, but we probably need to agree on the specific
 capabilities first. :)
 
 Yes :)
 For the record, LOGBACKUP vs PHYSBACKUP was suggested a couple days
 ago. I'd favor DUMP(logical) and BACKUP(physical) --- for lack of a
 better name.

I think I'm coming around to the EXCLUSIVEBACKUP option, per the
discussion with Magnus.  I don't particularly like LOGBACKUP and don't
think it really makes sense, while PHYSBACKUP seems like it'd cover what
REPLICATION already does.

 The above reasoning would have pg_basebackup requiring REPLICATION,
 which is a logical consequence of the implementation but strikes me
 as a bit surprising in the light of these other privileges.

I see what you mean that it's a bit strange for pg_basebackup to require
the REPLICATION role attribute, but, well, that's already the case, no?
Don't think we're going to change that..

 ARCHIVE, though completely appropriate for the exclusivebackup
 case above (so this would become DUMP/BASEBACKUP/ARCHIVE
 +REPLICATION) might end up causing quite some confusion ... (what?
 WAL archiving is NOT granted by the archive privilege, but
 requires a superuser to turn it on(via ALTER SYSTEM)?

Yeah, Magnus didn't like ARCHIVE either and I understand his reasoning.

 pg_xlog_replay_pause
 pg_xlog_replay_resume
 
 Though I'd be find if the xlog_replay ones were their own privilege (eg:
 REPLAY or something).
 +1

Yeah, Magnus was also agreeing that they should be independent.

 I think just calling them xlog related functions is doing us a disservice
 there. Definitely once we have an actual documentation to write for it, but
 also in this discussion.
 [snip]
 If it's for replicas, then why are we not using the REPLICATION privilege
 which is extremely similar to this?
 I don't actually see REPLICATION as being the same.
 
 REPLICATION (ability to replicate) vs REPLICAOPERATOR (ability to
 control *the replica* or the R/O behaviour of the server, for that
 matter...)

Right, think Magnus and I clarified what was meant there.

 Yes.  My thinking on how to approach this was to forcibly set all of
 their transactions to read-only.
 
 So READONLY ?

Right.

 Agreed.  That's actually something that I think would be *really* nice-
 an option to dump the necessary globals for whatever database you're
 running pg_dump against.  We have existing problems in this area
 (database-level GUCs aren't considered database-specific and therefore
 aren't picked up by pg_dump, for example..), but being able to also dump
 the roles which are used in a given database with pg_dump would be
 *really* nice..
 
 Ok, so this would imply modifying pg_dump to include 

Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-12-31 10:02:40 -0300, Alvaro Herrera wrote:
 I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
 the object_address test.  The backtrace is pretty strange:

 Hard to say without more detail, but my guess is that the argument to
 get_collation_oid() isn't actually valid. For one, that'd explain the
 error, for another, the pointer's value (name=name@entry=0x11655c0) is
 suspiciously low.

Given that CLOBBER_CACHE_ALWAYS seems to make it fail reliably, the
obvious explanation is that what's being passed is a pointer into
catcache or relcache storage that isn't guaranteed to be valid for
long enough.  The given backtrace doesn't go down far enough to show
where the bogus input came from, but I'm betting that something is
returning to SQL a string it got from cache without pstrdup'ing it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-31 Thread Fabien COELHO



I'm not sure about what Makefile changes could be necessary for
various environments, it looks ok to me as it is.


Found one. EXTRA_CLEAN is needed for generated C files. Added to this v3.

--
Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..2d4033b 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,9 @@ PGFILEDESC = pgbench - a simple program for running benchmark tests
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
+
+EXTRA_CLEAN	= exprparse.c exprscan.c
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +20,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..243c6b9
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include postgres_fe.h
+
+#include pgbench.h
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix=expr_yy
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type expr expr
+%type ival INTEGER
+%type str VARIABLE
+%token INTEGER VARIABLE
+%token CHAR_ERROR /* never used, will raise a syntax error */
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; }
+	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_INTEGER_CONSTANT;
+	expr-u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_VARIABLE;
+	expr-u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr-etype = ENODE_OPERATOR;
+	expr-u.operator.operator = operator;
+	expr-u.operator.lexpr = lexpr;
+	expr-u.operator.rexpr = rexpr;
+	return expr;
+}
+
+#include exprscan.c
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
new file mode 100644
index 000..4c9229c
--- /dev/null
+++ b/contrib/pgbench/exprscan.l
@@ -0,0 +1,105 @@
+%{
+/*-
+ *
+ * exprscan.l
+ *	  a lexical scanner for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+/* line and column number for error reporting */
+static int	yyline = 0, yycol = 0;
+
+/* Handles to the buffer that the lexer uses internally */
+static YY_BUFFER_STATE scanbufhandle;
+static char *scanbuf;
+static int	scanbuflen;
+
+/* flex 2.5.4 doesn't bother with a decl for this */
+int expr_yylex(void);
+
+%}
+
+%option 8bit

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-31 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 12/30/2014 09:20 AM, Tom Lane wrote:
 In one light this is certainly a bug fix, but in another it's just
 definitional instability.
 
 If we'd gotten a field bug report we might well have chosen to back-patch,
 though, and perhaps your client's complaint counts as that.

 I got caught by this with ON_ERROR_ROLLBACK on 9.3 just this afternoon 
 before remembering this thread. So there's a field report :-)

 +0.75 for backpatching (It's hard to imagine someone relying on the bad 
 behaviour, but you never know).

It seems like there's a consensus in favor of back-patching this change,
so I'll go ahead and do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 On Wed, Dec 31, 2014 at 3:08 PM, Stephen Frost sfr...@snowman.net wrote:
  * Magnus Hagander (mag...@hagander.net) wrote:
   that doing so would be strictly more than what pg_dump actually requires
but it's also what we actually have support for in our privilege
  system.
  
   Yeah, but would it also be what people would actually *want*?
 
  I can certainly see reasons why you'd want such a role to be able to
  execute functions- in particular, consider xlog_pause anx xlog_resume.
 
 If this role can't execute those functions then they probably can't
  successfully run pg_dump against a replica.
 
 uh, pg_dump doesn't run those commands :P I don't see why that's a
 requirement at all.  And you can still always grant those things on top of
 whatever the privilege gives you.

Ok, so, first off, this is all about the discussion around is this
additive or not..  Since we just agreed that it's additive, I'm not
sure I see the need to discuss the EXECUTE privileges..

Having said that though..

If you can't pause/resume then you can end up with your pg_dump
transaction getting killed.  I'm aware of folks who already do this
today, by hand with shell scripts..  I agree that pg_dump doesn't do it,
but I do think it'd be nice to have and I can certainly see the use-case
for them..

  I think having it do exactly what pg_dump needs, and not things like
   execute functions etc, would be the thing people want for a 'DUMP'
   privilege.
 
  What if we want pg_dump in 9.6 to have an option to execute xlog_pause
  and xlog_resume for you?  You wouldn't be able to run that against a 9.5
  database (or at least, that option wouldn't work).
 
 It would if you added an explicit grant for it, which would have to be
 documented.

Huh?  An explicit grant for xlog_pause/xlog_resume won't work as we
check role attributes rights inside the function..

 I don't see how that's different if the definition is allows select on all
 tables. That wouldn't automatically give it the ability to do xlog_pause
 in an old version either.

Ok, that I agree with- you don't automatically get xlog_pause rights if
you have the 'select on all tables' right.

  We've discussed having a role attribute for COPY-from-filesystem, but
  pg_dump doesn't use that ever, it only uses COPY TO STDOUT.  I'm not
  a fan of making a COPY_TO_STDOUT-vs-SELECT distinction just for this..
 
 Yeah, it's probably going overboard with it, since AFAICT the only thing
 that would actually be affected is RULEs on SELECT, which I bet most people
 don't use on their tables.

Well, we could make SELECT not work, but if you've got COPY then you can
still get all the data, so, yeah, not much different.  I seriously doubt
many people are using rules..

   One point would be that if we define it as exactly what pg_dump needs,
   that definition can change in a future major version.
 
  Sure, but that then gets a bit ugly because we encourage running the
  latest version of pg_dump against the prior-major-version.
 
 But the latest version of pg_dump *knows* how the prior major version
 behaved with these things, and can either adapt, or give the user a message
 about what they need to do to adapt.

Yes, that's true.

  No, it would still be *extremely* useful.  We have an option to pg_dump
  to say please dump with RLS enabled.  What that means is that you'd be
  able to dump the entire database for all data you're allowed to see
  through RLS policies.  How is that useless?  I certainly think it'd be
  very handy and a good way to get *segregated* dumps according to policy.
 
 Hmm. Yeah, I guess - my mindset was int he database backup mode, where a
 silently partial backup is a really scary thing rather than a feature :)

Agreed, we don't ever want that.

 I guess as long as you still dump all the parts, RLS itself ensures that
 foreign keys etc will actually be valid upon a reaload?

If you set up your policies correctly, yes.  RLS is flexible enough that
you could create policies which fail, but you have the same problem
today to some extent (consider tables you don't have SELECT rights on).

   We could/should also throw a WARNING if DUMP Is granted to a role without
   BYPASSRLS in case row level security is enabled in the system, I think.
  But
   that's more of an implementation detail.
 
  That's a bit ugly and RLS could be added to a relation after the DUMP
  privilege is granted.
 
 Yes, it's not going to be all-covering, but it can still be a useful
 hint/warning in the cases where it *does* that. We obviously still need
 pg_dump to give the error in both scenarios.

I'm not against doing it, personally, but I suspect others won't like it
(or at least, that's been the case in the past with other things..).

 What I think this part of the discussion is getting at is that there's a
  lot of people who view pg_dump as explicitly for dump the ENTIRE
  database.  While that's one use-case it is certainly not the only 

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2014-12-31 Thread Andres Freund
Hi,

On 2014-12-05 16:18:02 +0900, Fujii Masao wrote:
 On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund and...@2ndquadrant.com wrote:
  So I think we just need to make pg_basebackup create to .ready
  files.
 
 s/.ready/.done? If yes, +1.

That unfortunately requires changes to both backend and pg_basebackup to
support fetch and stream modes respectively.

I've attached a preliminary patch for this. I'd appreciate feedback. I
plan to commit it in a couple of days, after some more
testing/rereading.

  Given that the walreceiver and restore_command already
  unconditionally do XLogArchiveForceDone() I think we'd follow the
  established precedent. Arguably it could make sense to archive files
  again on the standby after a promotion as they aren't guaranteed to have
  been on the then primary. But we don't have any infrastructure anyway
  for that and walsender doesn't do so, so it doesn't seem to make any
  sense to do that for pg_basebackup.
 
  Independent from this bug, there's also some debatable behaviour about
  what happens if a node with a high wal_keep_segments turns on
  archive_mode. Suddenly all those old files are archived... I think it
  might be a good idea to simply always create .done files when
  archive_mode is disabled while a wal segment is finished.
 
 +1

I tend to think that's a master only change. Agreed?

Greetings,

Andres Freund
From 3db116bc5b9465f555957bb11ac6cb8b20c18405 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 31 Dec 2014 14:50:57 +0100
Subject: [PATCH 1/2] Add pg_string_endswith as the start of a string helper
 library in src/common.

Backpatch to 9.3 where src/common was introduce, because a bugfix that
needs to be backpatched, requires the function. Earlier branches will
have to duplicate the code.
---
 src/backend/replication/slot.c | 21 ++---
 src/common/Makefile|  2 +-
 src/common/string.c| 43 ++
 src/include/common/string.h| 15 +++
 src/tools/msvc/Mkvcbuild.pm|  2 +-
 5 files changed, 62 insertions(+), 21 deletions(-)
 create mode 100644 src/common/string.c
 create mode 100644 src/include/common/string.h

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 937b669..698ca6b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -40,6 +40,7 @@
 #include sys/stat.h
 
 #include access/transam.h
+#include common/string.h
 #include miscadmin.h
 #include replication/slot.h
 #include storage/fd.h
@@ -780,24 +781,6 @@ CheckSlotRequirements(void)
 }
 
 /*
- * Returns whether the string `str' has the postfix `end'.
- */
-static bool
-string_endswith(const char *str, const char *end)
-{
-	size_t		slen = strlen(str);
-	size_t		elen = strlen(end);
-
-	/* can't be a postfix if longer */
-	if (elen  slen)
-		return false;
-
-	/* compare the end of the strings */
-	str += slen - elen;
-	return strcmp(str, end) == 0;
-}
-
-/*
  * Flush all replication slots to disk.
  *
  * This needn't actually be part of a checkpoint, but it's a convenient
@@ -864,7 +847,7 @@ StartupReplicationSlots(void)
 			continue;
 
 		/* we crashed while a slot was being setup or deleted, clean up */
-		if (string_endswith(replication_de-d_name, .tmp))
+		if (pg_string_endswith(replication_de-d_name, .tmp))
 		{
 			if (!rmtree(path, true))
 			{
diff --git a/src/common/Makefile b/src/common/Makefile
index 7edbaaa..e5c345d 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -23,7 +23,7 @@ include $(top_builddir)/src/Makefile.global
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o username.o wait_error.o
+OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o string.o username.o wait_error.o
 
 OBJS_FRONTEND = $(OBJS_COMMON) fe_memutils.o
 
diff --git a/src/common/string.c b/src/common/string.c
new file mode 100644
index 000..07a2aaf
--- /dev/null
+++ b/src/common/string.c
@@ -0,0 +1,43 @@
+/*-
+ *
+ * string.c
+ *		string handling helpers
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/common/string.c
+ *
+ *-
+ */
+
+
+#ifndef FRONTEND
+#include postgres.h
+#else
+#include postgres_fe.h
+#endif
+
+#include common/string.h
+
+
+/*
+ * Returns whether the string `str' has the postfix `end'.
+ */
+bool
+pg_string_endswith(const char *str, const char *end)
+{
+	size_t		slen = strlen(str);
+	size_t		elen = strlen(end);
+
+	/* can't be a postfix if longer */
+	if (elen  slen)
+		return false;
+
+	/* compare the end of the strings */
+	str += slen - elen;
+	return strcmp(str, end) == 0;
+}
diff --git a/src/include/common/string.h 

Re: [HACKERS] psql tab completion: fix COMMENT ON ... IS IS IS

2014-12-31 Thread Robert Haas
On Sun, Dec 28, 2014 at 7:44 PM, Ian Barwick i...@2ndquadrant.com wrote:
 Currently tab completion for 'COMMENT ON {object} foo IS' will result in the
 'IS'
 being duplicated up to two times; not a world-shattering issue I know, but
 the
 fix is trivial and I stumble over it often enough to for it to mildly annoy
 me.
 Patch attached.

I've noticed that in the past, too.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The return value of allocate_recordbuf()

2014-12-31 Thread Robert Haas
On Mon, Dec 29, 2014 at 6:14 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. There is no way to check beforehand if a palloc() will fail because of
 OOM. We could check for MaxAllocSize, though.

I think we need a version of palloc that returns NULL instead of
throwing an error.  The error-throwing behavior is for the best in
almost every case, but I think the no-error version would find enough
users to be worthwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-31 Thread Thom Brown
On 31 December 2014 at 14:20, Thom Brown t...@linux.com wrote:

 On 18 December 2014 at 16:03, Amit Kapila amit.kapil...@gmail.com wrote:



 On Thu, Dec 18, 2014 at 9:22 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
 
  On Mon, Dec 8, 2014 at 10:40 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  
   On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost sfr...@snowman.net
 wrote:
   
  
   So to summarize my understanding, below are the set of things
   which I should work on and in the order they are listed.
  
   1. Push down qualification
   2. Performance Data
   3. Improve the way to push down the information related to worker.
   4. Dynamic allocation of work for workers.
  
  
 
  I have worked on the patch to accomplish above mentioned points
  1, 2 and partly 3 and would like to share the progress with community.

 Sorry forgot to attach updated patch in last mail, attaching it now.


 When attempting to recreate the plan in your example, I get an error:

  ➤ psql://thom@[local]:5488/pgbench

 # create table t1(c1 int, c2 char(500)) with (fillfactor=10);
 CREATE TABLE
 Time: 13.653 ms

  ➤ psql://thom@[local]:5488/pgbench

 # insert into t1 values(generate_series(1,100),'amit');
 INSERT 0 100
 Time: 4.796 ms

  ➤ psql://thom@[local]:5488/pgbench

 # explain select c1 from t1;
 ERROR:  could not register background process
 HINT:  You may need to increase max_worker_processes.
 Time: 1.659 ms

  ➤ psql://thom@[local]:5488/pgbench

 # show max_worker_processes ;
  max_worker_processes
 --
  8
 (1 row)

 Time: 0.199 ms

 # show parallel_seqscan_degree ;
  parallel_seqscan_degree
 -
  10
 (1 row)


 Should I really need to increase max_worker_processes to =
 parallel_seqscan_degree?  If so, shouldn't there be a hint here along with
 the error message pointing this out?  And should the error be produced when
 only a *plan* is being requested?

 Also, I noticed that where a table is partitioned, the plan isn't
 parallelised:

 # explain select distinct bid from pgbench_accounts;


QUERY
 PLAN

 
  HashAggregate  (cost=1446639.00..1446643.99 rows=499 width=4)
Group Key: pgbench_accounts.bid
-  Append  (cost=0.00..1321639.00 rows=5001 width=4)
  -  Seq Scan on pgbench_accounts  (cost=0.00..0.00 rows=1 width=4)
  -  Seq Scan on pgbench_accounts_1  (cost=0.00..4279.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_2  (cost=0.00..2640.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_3  (cost=0.00..2640.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_4  (cost=0.00..2640.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_5  (cost=0.00..2640.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_6  (cost=0.00..2640.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_7  (cost=0.00..2640.00
 rows=10 width=4)
 ...
  -  Seq Scan on pgbench_accounts_498  (cost=0.00..2640.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_499  (cost=0.00..2640.00
 rows=10 width=4)
  -  Seq Scan on pgbench_accounts_500  (cost=0.00..2640.00
 rows=10 width=4)
 (504 rows)

 Is this expected?


Another issue (FYI, pgbench2 initialised with: pgbench -i -s 100 -F 10
pgbench2):

 ➤ psql://thom@[local]:5488/pgbench2

# explain select distinct bid from pgbench_accounts;
QUERY
PLAN
---
 HashAggregate  (cost=245833.38..245834.38 rows=100 width=4)
   Group Key: bid
   -  Parallel Seq Scan on pgbench_accounts  (cost=0.00..220833.38
rows=1000 width=4)
 Number of Workers: 8
 Number of Blocks Per Workers: 208333
(5 rows)

Time: 7.476 ms

 ➤ psql://thom@[local]:5488/pgbench2

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 14897.991 ms

The logs say:

2014-12-31 15:21:42 GMT [9164]: [240-1] user=,db=,client= LOG:  registering
background worker backend_worker
2014-12-31 15:21:42 GMT [9164]: [241-1] user=,db=,client= LOG:  registering
background worker backend_worker
2014-12-31 15:21:42 GMT [9164]: [242-1] user=,db=,client= LOG:  registering
background worker backend_worker
2014-12-31 15:21:42 GMT [9164]: [243-1] user=,db=,client= LOG:  registering
background worker backend_worker
2014-12-31 15:21:42 GMT [9164]: [244-1] user=,db=,client= LOG:  registering
background worker backend_worker
2014-12-31 15:21:42 GMT [9164]: [245-1] user=,db=,client= LOG:  registering
background worker backend_worker
2014-12-31 15:21:42 GMT [9164]: [246-1] 

Re: [HACKERS] Publish autovacuum informations

2014-12-31 Thread Robert Haas
On Mon, Dec 29, 2014 at 11:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Either one of those approaches would cripple our freedom to change those
 data structures; which we've done repeatedly in the past and will surely
 want to do again.  So I'm pretty much -1 on exposing them.

We could instead add a view of this information to core --
pg_stat_autovacuum, or whatever.

But to be honest, I'm more in favor of Guillaume's proposal.  I will
repeat my recent assertion that we -- you in particular -- are too
reluctant to expose internal data structures to authors of C
extensions, and that this is developer-hostile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Alvaro Herrera
Tom Lane wrote:

 Given that CLOBBER_CACHE_ALWAYS seems to make it fail reliably, the
 obvious explanation is that what's being passed is a pointer into
 catcache or relcache storage that isn't guaranteed to be valid for
 long enough.  The given backtrace doesn't go down far enough to show
 where the bogus input came from, but I'm betting that something is
 returning to SQL a string it got from cache without pstrdup'ing it.

Yep, that was it -- the bug was in getObjectIdentityParts.  I noticed
other three cases of missing pstrdup(), also fixed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Andrew Dunstan


On 12/31/2014 07:49 AM, Tomas Vondra wrote:

On 28.12.2014 00:46, Noah Misch wrote:

On Tue, Dec 23, 2014 at 03:32:59PM +0100, Tomas Vondra wrote:

On 23.12.2014 15:21, Andrew Dunstan wrote:

No, config_opts is what's passed to configure. Try something like:

 if ($branch eq 'REL9_0_STABLE')
 {
 $skip_steps{'pl-install-check'} = 1;
 }

Applied to all three animals.

As of the time of this change, the animals ceased to report in.

The strange thing is that while the first run worked fine (as
illustrated by the results submitted to pgbuildfarm.org), all the
following runs fail like this:

Global symbol %skip_steps requires explicit package name at
build-farm.conf line 308.
Compilation failed in require at ./run_branches.pl line 55.

Apparently, something is broken, but my Perl-fu is limited so I have no
idea what/why :-(





Sorry, I should have tested it. This seems to work:

   if ($branch eq 'REL9_0_STABLE')
   {
$PGBuild::Options::skip_steps .= ' pl-install-check';
$main::skip_steps{'pl-install-check'} = 1;
   }

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-31 Thread Robert Haas
On Wed, Dec 31, 2014 at 9:00 AM, David Rowley dgrowle...@gmail.com wrote:
 Yeah, that's what I was talking about.
 I'm just looking at the code which uses this size estimate in
 choose_hashed_grouping(). I'd be a bit worried giving the difference between
 48 and 128 that we'd under estimate the hash size too much and end up going
 to disk during hashagg.

That's an issue, but the problem is that...

 I think the patch should be modified so that it uses 128 bytes for the size
 estimate on machines that don't support int128, but I'm not quite sure at
 this stage if that causes issues for replication, if 1 machine had support
 and one didn't, would it matter if the pg_aggregate row on the replica was
 48 bytes instead of 128? Is this worth worrying about?

...if we do this, then yes, there is an incompatibility between
binaries compiled with this option and binaries compiled without it.
They generate different system catalog contents after initdb and we
shouldn't use one set of binaries with an initdb done the other way.
That seems like serious overkill, especially since this could not
inconceivably flip between one recompile and the next if the
administrator has run 'yum update' in the meantime.  So I'd argue for
picking one estimate and using it across the board, and living with
the fact that it's sometimes going to be wrong.  Such is the lot in
life of an estimate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Andrew Gierth
 Noah == Noah Misch n...@leadboat.com writes:

 Noah Suppose one node orchestrated all sorting and aggregation.

Well, that has the downside of making it into an opaque blob, without
actually gaining much.

 Noah Call it a MultiGroupAggregate for now.  It wouldn't harness
 Noah Sort nodes, because it performs aggregation between
 Noah tuplesort_puttupleslot() calls.  Instead, it would directly
 Noah manage two Tuplesortstate, CUR and NEXT.  The node would have
 Noah an initial phase similar to ExecSort(), in which it drains the
 Noah outer node to populate the first CUR.  After that, it looks
 Noah more like agg_retrieve_direct(),

agg_retrieve_direct is already complex enough, and this would be
substantially more so, as compared to agg_retrieve_chained which is
substantially simpler.

A more serious objection is that this forecloses (or at least makes
much more complex) the future possibility of doing some grouping sets
by sorting and others by hashing. The chained approach specifically
allows for the future possibility of using a HashAggregate as the
chain head, so that for example cube(a,b) can be implemented as a
sorted agg for (a,b) and (a) and a hashed agg for (b) and (), allowing
it to be done with one sort even if the result size for (a,b) is too
big to hash.

 Noah Tom, Andrew, does that look satisfactory?

Not to me.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Tomas Vondra
On 31.12.2014 17:29, Andrew Dunstan wrote:
 
 Sorry, I should have tested it. This seems to work:
 
if ($branch eq 'REL9_0_STABLE')
{
 $PGBuild::Options::skip_steps .= ' pl-install-check';
 $main::skip_steps{'pl-install-check'} = 1;
}
 
 cheers

Meh, no problem. We've fixed it in 2014, so it's OK. To quote one of my
colleagues I haven't tested it, I believe it works correctly. ;-)

Any ideas why it worked for the first run?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Andrew Dunstan


On 12/31/2014 12:26 PM, Tomas Vondra wrote:

On 31.12.2014 17:29, Andrew Dunstan wrote:

Sorry, I should have tested it. This seems to work:

if ($branch eq 'REL9_0_STABLE')
{
 $PGBuild::Options::skip_steps .= ' pl-install-check';
 $main::skip_steps{'pl-install-check'} = 1;
}

cheers

Meh, no problem. We've fixed it in 2014, so it's OK. To quote one of my
colleagues I haven't tested it, I believe it works correctly. ;-)

Any ideas why it worked for the first run?



No. It failed for me right off the bat.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2014-12-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Dec 29, 2014 at 11:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Either one of those approaches would cripple our freedom to change those
 data structures; which we've done repeatedly in the past and will surely
 want to do again.  So I'm pretty much -1 on exposing them.

 We could instead add a view of this information to core --
 pg_stat_autovacuum, or whatever.

 But to be honest, I'm more in favor of Guillaume's proposal.  I will
 repeat my recent assertion that we -- you in particular -- are too
 reluctant to expose internal data structures to authors of C
 extensions, and that this is developer-hostile.

Well, the core question there is whether we have a policy of not breaking
extension-visible APIs.  While we will very often do things like adding
parameters to existing functions, I think we've tended to refrain from
making wholesale semantic revisions to exposed data structures.

I'd be all right with putting the data structure declarations in a file
named something like autovacuum_private.h, especially if it carried an
annotation that if you depend on this, don't be surprised if we break
your code in future.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] orangutan seizes up during isolation-check

2014-12-31 Thread Noah Misch
On Sun, Dec 28, 2014 at 07:20:04PM -0500, Andrew Dunstan wrote:
 On 12/28/2014 04:58 PM, Noah Misch wrote:
 The gettext maintainer was open to implementing the setlocale_native_forked()
 technique in gettext, though the last visible progress was in October.  In 
 any
 event, PostgreSQL builds will see older gettext for several years.  If
 setlocale-darwin-fork-v1.patch is not wanted, I suggest making the postmaster
 check during startup whether it has become multithreaded.  If multithreaded:
 
FATAL: postmaster became multithreaded during startup
HINT: Set the LC_ALL environment variable to a valid locale.

 I would like to go ahead and commit setlocale-main-harden-v1.patch, which is 
 a
 good thing to have regardless of what happens with gettext.
 
 
 I'm OK with this, but on its own it won't fix orangutan's problems, will it?

Right; setlocale-main-harden-v1.patch fixes a bug not affecting orangutan at
all.  None of the above will make orangutan turn green.  Checking
multithreading during startup would merely let it fail cleanly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] orangutan seizes up during isolation-check

2014-12-31 Thread Noah Misch
On Wed, Dec 31, 2014 at 12:32:37AM -0500, Robert Haas wrote:
 On Sun, Dec 28, 2014 at 4:58 PM, Noah Misch n...@leadboat.com wrote:
  I wondered whether to downgrade FATAL to LOG in back branches.  Introducing 
  a
  new reason to block startup is disruptive for a minor release, but having 
  the
  postmaster deadlock at an unpredictable later time is even more disruptive. 
   I
  am inclined to halt startup that way in all branches.
 
 Jeepers.  I'd rather not do that.  From your report, this problem has
 been around for years.  Yet, as far as I know, it's bothering very few
 real users, some of whom might be far more bothered by the postmaster
 suddenly failing to start.  I'm fine with a FATAL in master, but I'd
 vote against doing anything that might prevent startup in the
 back-branches without more compelling justification.

Clusters hosted on OS X fall into these categories:

1) Unaffected configuration.  This includes everyone setting a valid messages
   locale via LANG, LC_ALL or LC_MESSAGES.
2) Affected configuration.  Through luck and light use, the cluster would not
   experience the crashes/hangs.
3) Cluster would experience the crashes/hangs.

DBAs in (3) want the FATAL at startup, but those in (2) want a LOG message
instead.  DBAs in (1) don't care.  Since intermittent postmaster hangs are far
worse than startup failure, if (2) and (3) have similar population, FATAL is
the better bet.  If (2) is sufficiently more populous than (3), then the many
small pricks from startup failure do add up to hurt more than the occasional
postmaster hang.  Who knows how that calculation plays out.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Noah Misch
On Wed, Dec 31, 2014 at 02:45:29PM +0530, Atri Sharma wrote:
  Suppose one node orchestrated all sorting and aggregation.  Call it a
  MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
  performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
  would directly manage two Tuplesortstate, CUR and NEXT.  The node would
  have
  an initial phase similar to ExecSort(), in which it drains the outer node
  to
  populate the first CUR.  After that, it looks more like
  agg_retrieve_direct(),
  except that CUR is the input source, and each tuple drawn is also put into
  NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
  This
  design does not add I/O consumption or require a nonstandard communication
  channel between executor nodes.  Tom, Andrew, does that look satisfactory?
 
 
 So you are essentially proposing merging ChainAggregate and its
 corresponding Sort node?
 
 So the structure would be something like:
 
 GroupAggregate
 -- MultiGroupAgg (a,b)
  MultiGroupAgg (c,d) ...

No.

 If you are proposing
 replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
 with a single MultiGroupAggregate node, I am not able to understand how it
 will handle all the multiple sort orders.

Yes, I was proposing that.  My paragraph that you quoted above was the attempt
to explain how the node would manage multiple sort orders.  If you have
specific questions about it, feel free to ask.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Noah Misch
On Wed, Dec 31, 2014 at 05:33:43PM +, Andrew Gierth wrote:
  Noah == Noah Misch n...@leadboat.com writes:
 
  Noah Suppose one node orchestrated all sorting and aggregation.
 
 Well, that has the downside of making it into an opaque blob, without
 actually gaining much.

The opaque-blob criticism is valid.  As for not gaining much, well, the gain I
sought was to break this stalemate.  You and Tom have expressed willingness to
accept the read I/O multiplier of the CTE approach.  You and I are willing to
swallow an architecture disruption, namely a tuplestore acting as a side
channel between executor nodes.  Given your NACK, I agree that it fails to
move us toward consensus and therefore does not gain much.  Alas.

 A more serious objection is that this forecloses (or at least makes
 much more complex) the future possibility of doing some grouping sets
 by sorting and others by hashing. The chained approach specifically
 allows for the future possibility of using a HashAggregate as the
 chain head, so that for example cube(a,b) can be implemented as a
 sorted agg for (a,b) and (a) and a hashed agg for (b) and (), allowing
 it to be done with one sort even if the result size for (a,b) is too
 big to hash.

That's a fair criticism, too.  Ingesting nodeSort.c into nodeAgg.c wouldn't be
too bad, because nodeSort.c is a thin wrapper around tuplesort.c.  Ingesting
nodeHash.c is not so tidy; that could entail extracting a module similar in
level to tuplesort.c, to be consumed by both executor nodes.  This does raise
the good point that the GROUPING SETS _design_ ought to consider group and
hash aggregation together.  Designing one in isolation carries too high of a
risk of painting the other into a corner.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2014-12-31 Thread Bruce Momjian
On Tue, Dec 30, 2014 at 01:27:44PM +0100, Andres Freund wrote:
 On 2014-12-30 21:23:38 +0900, Michael Paquier wrote:
  On Tue, Dec 30, 2014 at 6:21 PM, Jeff Davis pg...@j-davis.com wrote:
   On Fri, 2013-08-30 at 09:57 +0300, Heikki Linnakangas wrote:
   Speeding up the CRC calculation obviously won't help with the WAL volume
   per se, ie. you still generate the same amount of WAL that needs to be
   shipped in replication. But then again, if all you want to do is to
   reduce the volume, you could just compress the whole WAL stream.
  
   Was this point addressed?
  Compressing the whole record is interesting for multi-insert records,
  but as we need to keep the compressed data in a pre-allocated buffer
  until WAL is written, we can only compress things within a given size
  range. The point is, even if we define a  lower bound, compression is
  going to perform badly with an application that generates for example
  many small records that are just higher than the lower bound...
  Unsurprisingly for small records this was bad:
 
 So why are you bringing it up? That's not an argument for anything,
 except not doing it in such a simplistic way.

I still don't understand the value of adding WAL compression, given the
high CPU usage and minimal performance improvement.  The only big
advantage is WAL storage, but again, why not just compress the WAL file
when archiving.

I thought we used to see huge performance benefits from WAL compression,
but not any more?  Has the UPDATE WAL compression removed that benefit? 
Am I missing something?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_event_trigger_dropped_objects: Add name/args output columns

2014-12-31 Thread Alvaro Herrera
Tom Lane wrote:

 But is there any reason to think the failure on dunlin has anything to do
 with default ACLs?  I think you'd better work on understanding why there
 is a platform dependency here, before you consider either removing the
 regression test case or adding support for object types that it shouldn't
 be hitting.

Thanks for the commit.  Now dunlin is green, but treepie displayed the
failure, so we know it's not a platform dependency but probably a timing
dependency.  The failure from treepie is

*** 1323,1328 
--- 1323,1329 
  
  DROP SCHEMA testns CASCADE;
  NOTICE:  drop cascades to table testns.acltest1
+ ERROR:  requested object address for unsupported object class 28: text result 
for role regressuser1 in schema testns on types
  SELECT d.* -- check that entries went away
FROM pg_default_acl d LEFT JOIN pg_namespace n ON defaclnamespace = n.oid
WHERE nspname IS NULL AND defaclnamespace != 0;

where 28 is OCLASS_DEFACL, which is consistent with the text result.  I
have no idea why this is being invoked but it seems clear to me now that
we need to support this case.  I will work on that on Friday, and also
check whether we need to add the AMPROC/AMOP cases and USER_MAPPING.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_event_trigger_dropped_objects: Add name/args output columns

2014-12-31 Thread Tom Lane
I wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Alvaro Herrera wrote:
 pg_event_trigger_dropped_objects: Add name/args output columns

 This is causing buildfarm member dunlin to fail:
 ...
 No other member so far has reported a problem here.  Not sure if that's
 the strangest bit, or the fact that dunlin is reporting anything in the
 first place.  I can reproduce the problem quite simply by creating an
 event trigger on sql_drop and then try to drop an object not supported
 by getObjectIdentityParts -- in this case it's a default ACL for tables
 in the schema being dropped.

 But is there any reason to think the failure on dunlin has anything to do
 with default ACLs?

I substituted a more detailed version of the error message, which soon
confirmed your guess that this had something to do with default ACLs:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=treepiedt=2014-12-31%2021%3A15%3A49

What seems to be happening is that the rowsecurity test installs an event
trigger, and if that happens to be active when the DROP SCHEMA testns
in privileges.sql executes, you get a failure because of the default
privileges attached to the schema.  The event trigger is very short-lived,
so the failure is hard to hit; nonetheless this is at least the third such
failure in the buildfarm.

I've fixed this by the expedient of making rowsecurity not run as part
of a parallel group.  We cannot have regression tests that trigger such
irreproducible failures.

We should not leave it there though.  The event triggers in rowsecurity
think they only fire for policy-related events, so how come we're seeing
getObjectIdentityParts invoked on a default ACL?  And if that is expected
behavior, how is it you figure it'd be OK to not have an implementation
for every possible object type?

In the long run it might be interesting to have a test module that runs
the entire core regression suite with some appropriate event trigger in
place.  We are clearly far away from being able to do that yet, though,
and in the meantime this is not how I want to find out about event-trigger
bugs.  Creating short-lifespan event triggers has to be something that
happens only in regression tests that run by themselves.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion: fix COMMENT ON ... IS IS IS

2014-12-31 Thread Ian Barwick
On 15/01/01 1:07, Robert Haas wrote:
 On Sun, Dec 28, 2014 at 7:44 PM, Ian Barwick i...@2ndquadrant.com wrote:
 Currently tab completion for 'COMMENT ON {object} foo IS' will result in the
 'IS'
 being duplicated up to two times; not a world-shattering issue I know, but
 the
 fix is trivial and I stumble over it often enough to for it to mildly annoy
 me.
 Patch attached.
 
 I've noticed that in the past, too.  Committed.

Thanks.

Regards

Ian Barwick


-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2014-12-31 Thread Josh Berkus
Heikki,

Thanks for getting back to this!  I really look forward to simplifying
WAL tuning for users.

 min_recycle_wal_size
 checkpoint_wal_size

 snip

 These settings are fairly intuitive for a DBA to tune. You begin by
 figuring out how much disk space you can afford to spend on WAL, and set
 checkpoint_wal_size to that (with some safety margin, of course). Then
 you set checkpoint_timeout based on how long you're willing to wait for
 recovery to finish. Finally, if you have infrequent batch jobs that need
 a lot more WAL than the system otherwise needs, you can set
 min_recycle_wal_size to keep enough WAL preallocated for the spikes.

 We'll want to rename them to make it even *more* intuitive.
 
 Sure, I'm all ears.

My suggestion:

max_wal_size
min_wal_size

... these would be very easy to read  understand for users:  Set
max_wal_size based on the amount of space you have available for the
transaction log, or about 10% of the space available for your database
if you don't have a specific allocation for the log.  If your database
involves large batch imports, you may want to increase min_wal_size to
be at least the size of your largest batch.

Suggested defaults:

max_wal_size: 256MB
min_wal_size: 64MB

Please remind me because I'm having trouble finding this in the
archives: how does wal_keep_segments interact with the new settings?

 But ... do I understand things correctly that checkpoint wouldn't kick
 in until you hit checkpoint_wal_size?  If that's the case, isn't real
 disk space usage around 2X checkpoint_wal_size if spread checkpoint is
 set to 0.9?  Or does checkpoint kick in sometime earlier?
 
 It kicks in earlier, so that the checkpoint *completes* just when
 checkpoint_wal_size of WAL is used up. So the real disk usage is
 checkpoint_wal_size.

Awesome.  This makes me very happy.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2014-12-31 Thread Amit Kapila
On Thu, Jan 1, 2015 at 2:39 AM, Bruce Momjian br...@momjian.us wrote:

 On Tue, Dec 30, 2014 at 01:27:44PM +0100, Andres Freund wrote:
  On 2014-12-30 21:23:38 +0900, Michael Paquier wrote:
   On Tue, Dec 30, 2014 at 6:21 PM, Jeff Davis pg...@j-davis.com wrote:
On Fri, 2013-08-30 at 09:57 +0300, Heikki Linnakangas wrote:
Speeding up the CRC calculation obviously won't help with the WAL
volume
per se, ie. you still generate the same amount of WAL that needs
to be
shipped in replication. But then again, if all you want to do is to
reduce the volume, you could just compress the whole WAL stream.
   
Was this point addressed?
   Compressing the whole record is interesting for multi-insert records,
   but as we need to keep the compressed data in a pre-allocated buffer
   until WAL is written, we can only compress things within a given size
   range. The point is, even if we define a  lower bound, compression is
   going to perform badly with an application that generates for example
   many small records that are just higher than the lower bound...
   Unsurprisingly for small records this was bad:
 
  So why are you bringing it up? That's not an argument for anything,
  except not doing it in such a simplistic way.

 I still don't understand the value of adding WAL compression, given the
 high CPU usage and minimal performance improvement.  The only big
 advantage is WAL storage, but again, why not just compress the WAL file
 when archiving.

 I thought we used to see huge performance benefits from WAL compression,
 but not any more?

I think there can be performance benefit for the cases when the data
is compressible, but it would be loss otherwise.  The main thing is
that the current compression algorithm (pg_lz) used is not so
favorable for non-compresible data.

Has the UPDATE WAL compression removed that benefit?

Good question,  I think there might be some impact due to that, but in
general for page level compression still there will be much more to
compress.

In general, I think this idea has merit with respect to compressible data,
and to save for the cases where it will not perform well, there is a on/off
switch for this feature and in future if PostgreSQL has some better
compression method, we can consider the same as well.  One thing
that we need to think is whether user's can decide with ease when to
enable this global switch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Compression of full-page-writes

2014-12-31 Thread Michael Paquier
On Thu, Jan 1, 2015 at 2:10 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 1, 2015 at 2:39 AM, Bruce Momjian br...@momjian.us wrote:
  So why are you bringing it up? That's not an argument for anything,
  except not doing it in such a simplistic way.

 I still don't understand the value of adding WAL compression, given the
 high CPU usage and minimal performance improvement.  The only big
 advantage is WAL storage, but again, why not just compress the WAL file
 when archiving.
When doing some tests with pgbench for a fixed number of transactions,
I also noticed a reduction in replay time as well, see here for
example some results here:
http://www.postgresql.org/message-id/CAB7nPqRv6RaSx7hTnp=g3dyqou++fel0uioyqpllbdbhayb...@mail.gmail.com

 I thought we used to see huge performance benefits from WAL compression,
 but not any more?

 I think there can be performance benefit for the cases when the data
 is compressible, but it would be loss otherwise.  The main thing is
 that the current compression algorithm (pg_lz) used is not so
 favorable for non-compresible data.
Yes definitely. Switching to a different algorithm would be the next
step forward. We have been discussing mainly about lz4 that has a
friendly license, I think that it would be worth studying other things
as well once we have all the infrastructure in place.

Has the UPDATE WAL compression removed that benefit?

 Good question,  I think there might be some impact due to that, but in
 general for page level compression still there will be much more to
 compress.
That may be a good thing to put a number on. We could try to patch a
build with a revert of a3115f0d and measure a bit that the difference
in WAL size that it creates. Thoughts?

 In general, I think this idea has merit with respect to compressible data,
 and to save for the cases where it will not perform well, there is a on/off
 switch for this feature and in future if PostgreSQL has some better
 compression method, we can consider the same as well.  One thing
 that we need to think is whether user's can decide with ease when to
 enable this global switch.
The opposite is true as well, we shouldn't force the user to have data
compressed even if the switch is disabled.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2014-12-31 Thread Amit Kapila
On Wed, Dec 31, 2014 at 7:50 PM, Thom Brown t...@linux.com wrote:


 When attempting to recreate the plan in your example, I get an error:

  ➤ psql://thom@[local]:5488/pgbench

 # create table t1(c1 int, c2 char(500)) with (fillfactor=10);
 CREATE TABLE
 Time: 13.653 ms

  ➤ psql://thom@[local]:5488/pgbench

 # insert into t1 values(generate_series(1,100),'amit');
 INSERT 0 100
 Time: 4.796 ms

  ➤ psql://thom@[local]:5488/pgbench

 # explain select c1 from t1;
 ERROR:  could not register background process
 HINT:  You may need to increase max_worker_processes.
 Time: 1.659 ms

  ➤ psql://thom@[local]:5488/pgbench

 # show max_worker_processes ;
  max_worker_processes
 --
  8
 (1 row)

 Time: 0.199 ms

 # show parallel_seqscan_degree ;
  parallel_seqscan_degree
 -
  10
 (1 row)


 Should I really need to increase max_worker_processes to =
parallel_seqscan_degree?

Yes, as the parallel workers are implemented based on dynamic
bgworkers, so it is dependent on max_worker_processes.


 If so, shouldn't there be a hint here along with the error message
pointing this out?  And should the error be produced when only a *plan* is
being requested?


I think one thing we could do minimize the chance of such an
error is set the value of parallel workers to be used for plan equal
to max_worker_processes if parallel_seqscan_degree is greater
than max_worker_processes.  Even if we do this, still such an
error can come if user has registered bgworker before we could
start parallel plan execution.

 Also, I noticed that where a table is partitioned, the plan isn't
parallelised:


 Is this expected?


Yes, to keep the initial implementation simple, it allows the
parallel plan when there is single table in query.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-31 Thread Abhijit Menon-Sen
Hi.

OK, here are the patches with the various suggestions applied.

I found that the alignment didn't seem to make much difference for the
CRC32* instructions, so I changed to process (len/8)*8bytes followed by
(len%8)*1bytes, the way the Linux kernel does.

-- Abhijit
From 82de6abbc05afabbf575941743b0ee355d2888ed Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen a...@2ndquadrant.com
Date: Tue, 30 Dec 2014 12:41:19 +0530
Subject: Implement slice-by-8 CRC calculation

The COMP_CRC32C macro now calls pg_comp_crc32c(), which processes eight
data bytes at a time. Its output is identical to the byte-at-a-time CRC
code. (This change does not apply to the LEGACY_CRC32 computation.)

Reviewers: Andres Freund, Heikki Linnakangas

Author: Abhijit Menon-Sen
---
 src/include/utils/pg_crc.h|   6 +-
 src/include/utils/pg_crc_tables.h | 594 +-
 src/port/pg_crc.c |  86 ++
 3 files changed, 619 insertions(+), 67 deletions(-)

diff --git a/src/include/utils/pg_crc.h b/src/include/utils/pg_crc.h
index f871cba..55934e5 100644
--- a/src/include/utils/pg_crc.h
+++ b/src/include/utils/pg_crc.h
@@ -41,6 +41,8 @@
 
 typedef uint32 pg_crc32;
 
+extern pg_crc32 pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len);
+
 /*
  * CRC calculation using the CRC-32C (Castagnoli) polynomial.
  *
@@ -51,7 +53,7 @@ typedef uint32 pg_crc32;
 #define INIT_CRC32C(crc) ((crc) = 0x)
 #define FIN_CRC32C(crc)	((crc) ^= 0x)
 #define COMP_CRC32C(crc, data, len)	\
-	COMP_CRC32_NORMAL_TABLE(crc, data, len, pg_crc32c_table)
+	((crc) = pg_comp_crc32c((crc), (char *) (data), (len)))
 #define EQ_CRC32C(c1, c2) ((c1) == (c2))
 
 /*
@@ -115,7 +117,7 @@ do {			  \
 } while (0)
 
 /* Constant tables for CRC-32C and CRC-32 polynomials */
-extern CRCDLLIMPORT const uint32 pg_crc32c_table[];
+extern CRCDLLIMPORT const uint32 pg_crc32c_table[8][256];
 extern CRCDLLIMPORT const uint32 pg_crc32_table[];
 
 #endif   /* PG_CRC_H */
diff --git a/src/include/utils/pg_crc_tables.h b/src/include/utils/pg_crc_tables.h
index cb6b470..707b363 100644
--- a/src/include/utils/pg_crc_tables.h
+++ b/src/include/utils/pg_crc_tables.h
@@ -28,71 +28,535 @@
  * This table is based on the so-called Castagnoli polynomial (the same
  * that is used e.g. in iSCSI).
  */
-const uint32 pg_crc32c_table[256] = {
-	0x, 0xF26B8303, 0xE13B70F7, 0x1350F3F4,
-	0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB,
-	0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B,
-	0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24,
-	0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B,
-	0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384,
-	0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54,
-	0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B,
-	0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A,
-	0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35,
-	0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5,
-	0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA,
-	0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45,
-	0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A,
-	0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A,
-	0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595,
-	0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48,
-	0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957,
-	0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687,
-	0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198,
-	0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927,
-	0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38,
-	0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8,
-	0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7,
-	0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096,
-	0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789,
-	0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859,
-	0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46,
-	0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9,
-	0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6,
-	0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36,
-	0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829,
-	0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C,
-	0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93,
-	0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043,
-	0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C,
-	0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3,
-	0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC,
-	0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C,
-	0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033,
-	0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652,
-	0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D,
-	0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D,
-	0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982,
-	0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D,
-	0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622,
-	0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2,
-	0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED,
-	0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530,
-	0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F,
-	0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF,
-	0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0,
-	0xD3D3E1AB, 0x21B862A8, 

Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-31 Thread David Rowley
On 1 January 2015 at 04:02, Fabien COELHO coe...@cri.ensmp.fr wrote:


  I'm not sure about what Makefile changes could be necessary for
 various environments, it looks ok to me as it is.


 Found one. EXTRA_CLEAN is needed for generated C files. Added to this v3.


I was about to go and look at this, but I had a problem when attempting to
compile with MSVC.

The attached patch seems to fix it.

Regards

David Rowley


pgbench-expr-msvc_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers