date:20141231

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Noah Misch

On Tue, Dec 23, 2014 at 02:29:58AM -0500, Noah Misch wrote:
> On Mon, Dec 22, 2014 at 10:46:16AM -0500, Tom Lane wrote:
> > I still find the ChainAggregate approach too ugly at a system structural
> > level to accept, regardless of Noah's argument about number of I/O cycles
> > consumed.  We'll be paying for that in complexity and bugs into the
> > indefinite future, and I wonder if it isn't going to foreclose some other
> > "performance opportunities" as well.
> 
> Among GROUPING SETS GroupAggregate implementations, I bet there's a nonempty
> intersection between those having maintainable design and those having optimal
> I/O usage, optimal memory usage, and optimal number of sorts.  Let's put more
> effort into finding it.  I'm hearing that the shared tuplestore is
> ChainAggregate's principal threat to system structure; is that right?

The underlying algorithm, if naively expressed in terms of our executor
concepts, would call ExecProcNode() on a SortState, then feed the resulting
slot to both a GroupAggregate and to another Sort.  That implies a non-tree
graph of executor nodes, which isn't going to fly anytime soon.  The CTE
approach bypasses that problem by eliminating cooperation between sorts,
instead reading 2N+1 copies of the source data for N sorts.  ChainAggregate is
a bit like a node having two parents, a Sort and a GroupAggregate.  However,
the graph edge between ChainAggregate and its GroupAggregate is a tuplestore
instead of the usual, synchronous ExecProcNode().

Suppose one node orchestrated all sorting and aggregation.  Call it a
MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
would directly manage two Tuplesortstate, CUR and NEXT.  The node would have
an initial phase similar to ExecSort(), in which it drains the outer node to
populate the first CUR.  After that, it looks more like agg_retrieve_direct(),
except that CUR is the input source, and each tuple drawn is also put into
NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.  This
design does not add I/O consumption or require a nonstandard communication
channel between executor nodes.  Tom, Andrew, does that look satisfactory?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Atri Sharma

   ChainAggregate is

> a bit like a node having two parents, a Sort and a GroupAggregate.
> However,
> the graph edge between ChainAggregate and its GroupAggregate is a
> tuplestore
> instead of the usual, synchronous ExecProcNode().
>

Well, I dont buy the two parents theory. The Sort nodes are intermediately
stacked amongst ChainAggregate nodes, so there is still the single edge.
However, as you rightly said, there is a shared tuplestore, but note that
only the head of chain ChainAggregate has the top GroupAggregate as its
parent.

>
> Suppose one node orchestrated all sorting and aggregation.  Call it a
> MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
> performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
> would directly manage two Tuplesortstate, CUR and NEXT.  The node would
> have
> an initial phase similar to ExecSort(), in which it drains the outer node
> to
> populate the first CUR.  After that, it looks more like
> agg_retrieve_direct(),
> except that CUR is the input source, and each tuple drawn is also put into
> NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
> This
> design does not add I/O consumption or require a nonstandard communication
> channel between executor nodes.  Tom, Andrew, does that look satisfactory?
>
>
So you are essentially proposing merging ChainAggregate and its
corresponding Sort node?

So the structure would be something like:

GroupAggregate
--> MultiGroupAgg (a,b)
> MultiGroupAgg (c,d) ...

I am not sure if I understand you correctly. Only the top level
GroupAggregate node projects the result of the entire operation. The key to
ChainAggregate nodes is that each ChainAggregate node handles grouping sets
that fit a single ROLLUP list i.e. can be done by a single sort order.
There can be multiple lists of this type in a single GS operation, however,
our current design has only a single top GroupAggregate node but a
ChainAggregate node + Sort node per sort order. If you are proposing
replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
with a single MultiGroupAggregate node, I am not able to understand how it
will handle all the multiple sort orders. If you are proposing replacing
only ChainAggregate + Sort node with a single MultiGroupAgg node, that
still shares the tuplestore with top level GroupAggregate node.

I am pretty sure I have messed up my understanding of your proposal. Please
correct me if I am wrong.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2014-12-31 Thread Michael Paquier

On Wed, Dec 31, 2014 at 4:44 PM, Guillaume Lelarge
 wrote:
> 2014-12-12 14:58 GMT+01:00 Heikki Linnakangas :
>> Now, what do we do with the back-branches? I'm not sure. Changing the
>> behaviour in back-branches could cause nasty surprises. Perhaps it's best to
>> just leave it as it is, even though it's buggy.
>>
>
> As long as master is fixed, I don't actually care. But I agree with Dennis
> that it's hard to see what's been commited with all the different issues
> found, and if any commits were done, in which branch. I'd like to be able to
> tell my customers: update to this minor release to see if it's fixed, but I
> can't even do that.
This bug does not endanger at all data consistency as only old WAL
files remain on the standby, so I'm fine as well with just a clean fix
on master, and nothing done on back-branches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2014-12-31 Thread Heikki Linnakangas


(reviving an old thread)

On 08/24/2013 12:53 AM, Josh Berkus wrote:

On 08/23/2013 02:08 PM, Heikki Linnakangas wrote:


Here's a bigger patch, which does more. It is based on the ideas in the
post I started this thread with, with feedback incorporated from the
long discussion. With this patch, WAL disk space usage is controlled by
two GUCs:

min_recycle_wal_size
checkpoint_wal_size





These settings are fairly intuitive for a DBA to tune. You begin by
figuring out how much disk space you can afford to spend on WAL, and set
checkpoint_wal_size to that (with some safety margin, of course). Then
you set checkpoint_timeout based on how long you're willing to wait for
recovery to finish. Finally, if you have infrequent batch jobs that need
a lot more WAL than the system otherwise needs, you can set
min_recycle_wal_size to keep enough WAL preallocated for the spikes.


We'll want to rename them to make it even *more* intuitive.


Sure, I'm all ears.


But ... do I understand things correctly that checkpoint wouldn't "kick
in" until you hit checkpoint_wal_size?  If that's the case, isn't real
disk space usage around 2X checkpoint_wal_size if spread checkpoint is
set to 0.9?  Or does checkpoint kick in sometime earlier?


It kicks in earlier, so that the checkpoint *completes* just when 
checkpoint_wal_size of WAL is used up. So the real disk usage is 
checkpoint_wal_size.


There is still an internal variable called CheckPointSegments that 
triggers the checkpoint, but it is now derived from checkpoint_wal_size 
(see CalculateCheckpointSegments function):


CheckPointSegments = (checkpoint_wal_size / 16 MB) / (2 + 
checkpoint_completion_target)


This is the same formula we've always had in the manual for calculating 
the amount of WAL space used, but in reverse. I.e. we calculate 
CheckPointSegments based on the desired disk space usage, not the other 
way round.



As a note, pgBench would be a terrible test for this patch; we really
need something which creates uneven traffic.  I'll see if I can devise
something.


Attached is a rebased version of this patch. Everyone, please try this 
out on whatever workloads you have, and let me know:


a) How does the auto-tuning of the number of recycled segments work? 
Does pg_xlog reach a steady-state size, or does it fluctuate a lot?


b) Are the two GUCs, checkpoint_wal_size, and min_recycle_wal_size, 
intuitive to set?


- Heikki

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bcb106..34f9466 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1325,7 +1325,7 @@ include_dir 'conf.d'
 40% of RAM to shared_buffers will work better than a
 smaller amount.  Larger settings for shared_buffers
 usually require a corresponding increase in
-checkpoint_segments, in order to spread out the
+checkpoint_wal_size, in order to spread out the
 process of writing large quantities of new or changed data over a
 longer period of time.

@@ -2394,18 +2394,21 @@ include_dir 'conf.d'
  Checkpoints
 
 
- 
-  checkpoint_segments (integer)
+ 
+  checkpoint_wal_size (integer)
   
-   checkpoint_segments configuration parameter
+   checkpoint_wal_size configuration parameter
   
   
   

-Maximum number of log file segments between automatic WAL
-checkpoints (each segment is normally 16 megabytes). The default
-is three segments.  Increasing this parameter can increase the
-amount of time needed for crash recovery.
+Maximum size to let the WAL grow to between automatic WAL
+checkpoints. This is a soft limit; WAL size can exceed
+checkpoint_wal_size under special circumstances, like
+under heavy load, a failing archive_command, or a high
+wal_keep_segments setting. The default is 128 MB.
+Increasing this parameter can increase the amount of time needed for
+crash recovery.
 This parameter can only be set in the postgresql.conf
 file or on the server command line.

@@ -2458,7 +2461,7 @@ include_dir 'conf.d'
 Write a message to the server log if checkpoints caused by
 the filling of checkpoint segment files happen closer together
 than this many seconds (which suggests that
-checkpoint_segments ought to be raised).  The default is
+checkpoint_wal_size ought to be raised).  The default is
 30 seconds (30s).  Zero disables the warning.
 No warnings will be generated if checkpoint_timeout
 is less than checkpoint_warning.
@@ -2468,6 +2471,24 @@ include_dir 'conf.d'
   
  
 
+ 
+  min_recycle_wal_size (integer)
+  
+   min_recycle_wal_size configuration parameter
+  
+  
+   
+As long as WAL disk usage stays below this setting, old WAL files are
+always recycled for future use at a checkpoint, rather than remo

Re: [HACKERS] Redesigning checkpoint_segments

2014-12-31 Thread Heikki Linnakangas


On 09/01/2013 10:37 AM, Amit Kapila wrote:

On Sat, Aug 24, 2013 at 2:38 AM, Heikki Linnakangas
 wrote:

a.
In XLogFileInit(),
/*
!  * XXX: What should we use as max_segno? We used to use XLOGfileslop when
!  * that was a constant, but that was always a bit dubious: normally, at a
!  * checkpoint, XLOGfileslop was the offset from the checkpoint record,
!  * but here, it was the offset from the insert location. We can't do the
!  * normal XLOGfileslop calculation here because we don't have access to
!  * the prior checkpoint's redo location. So somewhat arbitrarily, just
!  * use CheckPointSegments.
!  */
! max_segno = logsegno + CheckPointSegments;
   if (!InstallXLogFileSegment(&installed_segno, tmppath,
! *use_existent, max_segno,
   use_lock))

Earlier max_advance is same when InstallXLogFileSegment is called from
RemoveOldXlogFiles() and XLogFileInit(),
but now they will be different (and it seems there is no direct
relation between these 2 numbers), so will it be okay for scenario
when someone else has created the file while this function was
filling, because it needs to restore as future segment which will be
decided based on max_segno?


I haven't really thought hard about the above. As the comment says, 
passing the same max_advance value here and in RemoveOldXlogFiles() was 
a bit dubious too, because the reference point was different.


I believe it's quite rare that two processes create a new WAL segment 
concurrently, so it isn't terribly important what we do here.



b. Do createrestartpoint need to update the
CheckPointDistanceEstimate, as when it will try to remove old xlog
files, it needs recycleSegNo which is calculated using
CheckPointDistanceEstimate?


Yeah, you're right, it should. I haven't tested this with archive 
recovery or replication at all yet.



As a developer, I would love to have configuration knob such as
min_recycle_wal_size, but not sure how many users will be comfortable
setting this value, actually few users I had talked about this earlier
are interested in setting max WAL size which can allow them to set an
upper limit on space required by WAL.
Can't we think of doing the calculation of files to recycle only based
on CheckPointDistanceEstimate.


You can always just leave min_recycle_wal_size to the default. It sets a 
minimum for the number of preallocated segments, which can help if you 
have spikes that consume a lot of WAL, like nightly batch jobs. But if 
you don't have such spikes, or the overhead of creating new segments 
when such a spike happens isn't too large, you don't need to set it.


One idea is to try to make the creation of new WAL segments faster. Then 
it wouldn't hurt so much if you run out of preallocated/recycled 
segments and need to suddenly create a lot of new ones. Then we might 
not need a minimum setting at all.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-31 Thread Fabien COELHO



Here is a review & updated version of the patch.

Patch applies and compiles without problem on current head,
and worked for the various tests I made with custom scripts.

This patch is a good thing, and I recommand warmly its inclusion. It 
extends greatly pgbench custom capabilities by allowing arbitrary integer 
expressions, and will allow to add other functions (I'll send abs() & a 
hash(), and possibly others).


I'm not sure about what Makefile changes could be necessary for
various environments, it looks ok to me as it is.

I have included the following additional changes in v2:

(1) small update to pgbench documentation. English proof reading is welcome.

(2) improve error reporting to display the file and line from where an error
is raised, and also the column on syntax errors (yyline is always 1...).
The prior status was that there were no way to get this information, which
was quite annoying.  It could probably be improved further.

(3) spacing fixed in a few places.

If Robert is ok with these changes, I think it could be applied.

--
Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..67e2bf6 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,7 @@ PGFILEDESC = "pgbench - a simple program for running benchmark tests"
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +18,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=>gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..243c6b9
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include "postgres_fe.h"
+
+#include "pgbench.h"
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix="expr_yy"
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type  expr
+%type  INTEGER
+%type  VARIABLE
+%token INTEGER VARIABLE
+%token CHAR_ERROR /* never used, will raise a syntax error */
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; }
+	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_INTEGER_CONSTANT;
+	expr->u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_VARIABLE;
+	expr->u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_OPERATOR;
+	expr->u.operator.operator = operator;
+	expr->u.operator.lexpr = lexpr;
+	expr->u.operator.rexpr = rexpr;
+	return expr;
+}
+
+#include "exprscan.c"
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/

Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Tomas Vondra

On 28.12.2014 00:46, Noah Misch wrote:
> On Tue, Dec 23, 2014 at 03:32:59PM +0100, Tomas Vondra wrote:
>> On 23.12.2014 15:21, Andrew Dunstan wrote:
>>>
>>> No, config_opts is what's passed to configure. Try something like:
>>>
>>> if ($branch eq 'REL9_0_STABLE')
>>> {
>>> $skip_steps{'pl-install-check'} = 1;
>>> }
>>
>> Applied to all three animals.
> 
> As of the time of this change, the animals ceased to report in.

The strange thing is that while the first run worked fine (as
illustrated by the results submitted to pgbuildfarm.org), all the
following runs fail like this:

Global symbol "%skip_steps" requires explicit package name at
build-farm.conf line 308.
Compilation failed in require at ./run_branches.pl line 55.

Apparently, something is broken, but my Perl-fu is limited so I have no
idea what/why :-(

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2014-12-31 Thread José Luis Tallón


On 12/30/2014 04:16 PM, Stephen Frost wrote:

[snip]
The approach I was thinking was to document and implement this as
impliciting granting exactly USAGE and SELECT rights, no more (not
BYPASSRLS) and no less (yes, the role could execute functions).  I agree
that doing so would be strictly more than what pg_dump actually requires
but it's also what we actually have support for in our privilege system.


Hmmm coupled with your comments below, I'd say some tweaking of the 
existing privileges is actually needed if we want to add these new 
"capabilities".


BTW, and since this is getting a bit bigger than originally considered: 
would it be interesting to support some extension-defined capabilities, too?
(for things can't be easily controlled by the existing USAGE / 
SELECT / ... rights, I mean)



it would only give you COPY access. (And also
COPY != SELECT in the way that the rule system applies, I think? And this
one could be for COPY only)

COPY certainly does equal SELECT rights..  We haven't got an independent
COPY privilege and I don't think it makes sense to invent one.


FWIW, a COPY(DUMP?) privilege different from SELECT would make sense.
Considering your below comments it would be better that it not imply 
BYPASSRLS, though.



It
sounds like you're suggesting that we add a hack directly into COPY for
this privilege, but my thinking is that the right place to change for
this is in the privilege system and not hack it into individual
commands..  I'm also a bit nervous that we'll end up painting ourselves
into a corner if we hack this to mean exactly what pg_dump needs today.

Agreed.


Lastly, I've been considering other use-cases for this privilege beyond
the pg_dump one (thinking about auditing, for example).


ISTR there was something upthread on an AUDIT role, right?
This might be the moment to add it



[snip]

Similar concerns would exist for the existing REPLICATION role for example
- that one clearly lets you bypass RLS as well, just not with a SQL
statement.

I'm not sure that I see the point you're making here.  Yes, REPLICATION
allows you to do a filesystem copy of the entire database and that
clearly bypasses RLS and *all* of our privilege system.  I'm suggesting
that this role attribute work as an implicit grant of privileges we
already have.  That strikes me as easy to document and very clear for
users.


+1

[snip]
So you're saying a privilege that would allow you to do
pg_start_backup()/pg_stop_backup() but *not* actually use pg_basebackup?
Yes.


That would be "EXCLUSIVEBACKUP" or something like that, to be consistent
with existing terminology though.

Ok.  I agree that working out the specific naming is difficult and would
like to focus on that, but we probably need to agree on the specific
capabilities first. :)


Yes :)
For the record, LOGBACKUP vs PHYSBACKUP was suggested a couple days ago. 
I'd favor DUMP(logical) and BACKUP(physical) --- for lack of a better name.
The above reasoning would have pg_basebackup requiring REPLICATION, 
which is a logical consequence of the implementation but strikes me as a 
bit surprising in the light of these other privileges.




[snip]

Personalyl I think using the DUMP name makes that a lot more clear. Maybe
we need to avoid using BACKUP alone as well, to make sure it doesn't go the
other way - using BASEBACKUP and EXCLUSIVEBACKUP for those two different
ones perhaps?

DUMP - implicitly grants existing rights, to facilitate pg_dump and
perhaps some other use-cases
BASEBACKUP - allows pg_basebackup, which means bypassing all in-DB
  privilege systems
EXCLUSIVEBACKUP - just allows pg_start/stop_backup and friends

I'm still not entirely happy with the naming, but I feel like we're
getting there.  One thought I had was using ARCHIVE somewhere, but I
kind of like BASEBACKUP meaning what's needed to run pg_basebackup, and,
well, we say 'backup' in the pg_start/stop_backup function names, so
it's kind of hard to avoid that.  EXCLUSIVEBACKUP seems really long tho.


ARCHIVE, though completely appropriate for the "exclusivebackup" case 
above (so this would become DUMP/BASEBACKUP/ARCHIVE +REPLICATION) might 
end up causing quite some confusion ... ("what? WAL archiving is NOT 
granted by the "archive" privilege, but requires a superuser to turn it 
on(via ALTER SYSTEM)?


POLA again

I had defined them when I started the thread:

pg_start_backup
pg_stop_backup
pg_switch_xlog
pg_create_restore_point


... for "BACKUP" / EXCLUSIVEBACKUP (or, actually, FSBACKUP/PHYSBACKUP ...)

Later I added:

pg_xlog_replay_pause
pg_xlog_replay_resume

Though I'd be find if the xlog_replay ones were their own privilege (eg:
REPLAY or something).

+1

I think just calling them "xlog related functions" is doing us a disservice
there. Definitely once we have an actual documentation to write for it, but
also in this discussion.

[snip]

If it's for replicas, then why are we not using the REPLICATION privilege
which is extremely similar

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-12-31 Thread Amit Kapila

On Mon, Dec 29, 2014 at 11:10 AM, Dilip kumar 
wrote:
>
> On 29 December 2014 10:22 Amit Kapila Wrote,
>
>
> I think nothing more to be handled from my side, you can go ahead with
review..
>

The patch looks good to me.  I have done couple of
cosmetic changes (spelling mistakes, improve some comments,
etc.), check the same once and if you are okay, we can move
ahead.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

vacuumdb_parallel_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2014-12-31 Thread Magnus Hagander

On Tue, Dec 30, 2014 at 4:16 PM, Stephen Frost  wrote:

> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Mon, Dec 29, 2014 at 11:01 PM, Stephen Frost 
> wrote:
> > > That said, a 'DUMP' privilege which allows the user to dump the
> contents
> > > of the entire database is entirely reasonable.  We need to be clear in
> > > the documentation though- such a 'DUMP' privilege is essentially
> > > granting USAGE on all schemas and table-level SELECT on all tables and
> >
> > sequences (anything else..?).  To be clear, a user with this privilege
> > > can utilize that access without using pg_dump.
> >
> > Well, it would not give you full USAGE - granted USAGE on a schema, you
> can
> > execute functions in there for example (provided permissions). This
> > privilege would not do that,
>
> The approach I was thinking was to document and implement this as
> impliciting granting exactly USAGE and SELECT rights, no more (not
> BYPASSRLS) and no less (yes, the role could execute functions).  I agree
>

If the role could execute functions, it's *not* "exactly USAGE and SELECT
rights", it's now "USAGE and SELECT and EXECUTE" rights... Just to be
nitpicking, but see below.

> that doing so would be strictly more than what pg_dump actually requires
> but it's also what we actually have support for in our privilege system.
>

Yeah, but would it also be what people would actually *want*?

I think having it do exactly what pg_dump needs, and not things like
execute functions etc, would be the thing people want for a 'DUMP'
privilege.

We might *also* want a USAGEANDSELECTANDEXECUTEANDWHATNOTBUTNOBYPASSURL
(crap, it has to fit within NAMEDATALEN?) privilege, but I think that's a
different thing.

> it would only give you COPY access. (And also
> > COPY != SELECT in the way that the rule system applies, I think? And this
> > one could be for COPY only)
>
> COPY certainly does equal SELECT rights..  We haven't got an independent
> COPY privilege and I don't think it makes sense to invent one.  It
>

We don't, but I'm saying it might make sense to implement this. Not as a
regular privilige, but as a role attribute. I'm not sure it's the right
thing, but it might actually be interesting to entertain.

> sounds like you're suggesting that we add a hack directly into COPY for
> this privilege, but my thinking is that the right place to change for
> this is in the privilege system and not hack it into individual
> commands..  I'm also a bit nervous that we'll end up painting ourselves
> into a corner if we hack this to mean exactly what pg_dump needs today.
>

One point would be that if we define it as "exactly what pg_dump needs",
that definition can change in a future major version.

> One other point is that this shouldn't imply any other privileges, imv.
> > > I'm specifically thinking of BYPASSRLS- that's independently grantable
> > > and therefore should be explicitly set, if it's intended.  Things
> >
> > I think BYPASSRLS would have to be implicitly granted by the DUMP
> > privilege. Without that, the DUMP privilege is more or less meaningless
> > (for anybody who uses RLS - but if they don't use RLS, then having it
> > include BYPASSRLS makes no difference). Worse than that, you may end up
> > with a dump that you cannot restore.
>
> The dump would fail, as I mentioned before.  I don't see why BYPASSRLS
> has to be implicitly granted by the DUMP privilege and can absolutely
> see use-cases for it to *not* be.  Those use-cases would require passing
> pg_dump the --enable-row-security option, but that's why it's there.
>

So you're basically saying that if RLS is in use anywhere, this priviliege
alone would be useless, correct? And you would get a hard failure at *dump
time*, not at reload time? That would at least make it acceptable, as you
would know it's wrong. and we could make the error messages shown
specifically hint that you need to grant both privileges to make it useful.

We could/should also throw a WARNING if DUMP Is granted to a role without
BYPASSRLS in case row level security is enabled in the system, I think. But
that's more of an implementation detail.

Implementations which don't use RLS are not impacted either way, so we
> don't need to consider them.  Implementations which *do* use RLS, in my
> experience, would certainly understand and expect BYPASSRLS to be
> required if they want this role to be able to dump all data out
> regardless of the RLS policies.  What does implicitly including
> BYPASSRLS in this privilege gain us?  A slightly less irritated DBA who
> forgot to include BYPASSRLS and ended up getting a pg_dump error because
> of it.  On the other hand, it walls off any possibility of using this
> privilege for roles who shouldn't be allowed to bypass RLS or for
> pg_dumps to be done across the entire system for specific RLS policies.
>

Yeah, I think that's acceptable as long as we get the error at dump, and
not at reload (when it's too late to fix it).

> Similar concerns would

[HACKERS] Performance improvement for joins where outer side is unique

2014-12-31 Thread David Rowley

Hi,

I've been hacking a bit at the join code again... This time I've been
putting some effort into  optimising the case where the inner side of the
join is known to be unique.
For example, given the tables:

create table t1 (id int primary key);
create table t2 (id int primary key);

And query such as:

select * from t1 left outer join t2 on t1.id=t2.id;

It is possible to deduce at planning time that "for each row in the outer
relation, only 0 or 1 rows can exist in the inner relation", (inner being
t2)

In all of our join algorithms in the executor, if the join type is SEMI,
 we skip to the next outer row once we find a matching inner row. This is
because we don't want to allow duplicate rows in the inner side to
duplicate outer rows in the result set. Obviously this is required per SQL
spec. I believe we can also skip to the next outer row in this case when
we've managed to prove that no other row can possibly exist that matches
the current outer row, due to a unique index or group by/distinct clause
(for subqueries).

I've attached a patch which aims to prove this idea works. Although so far
I've only gotten around to implementing this for outer joins.

Since code already existed in analyzejoin.c which aimed to determine if a
join to a relation was unique on the join's condition, the patch is pretty
much just some extra plumbing and a small rewire of analyzejoin.c, which
just aims to get the "unique_inner" bool value down to the join node.

The performance improvement is somewhere around 5-10% for hash joins, and a
bit less for merge join. In theory it could be 50% for nested loop joins.
In my life in the OLTP world, these "unique joins" pretty much cover all
joins that ever happen ever. Perhaps the OLAP world is different, so from
my point of view this is going to be a very nice performance gain.

I'm seeing this patch (or a more complete version) as the first step to a
whole number of other planner improvements:

A couple of examples of those are:

1.
explain select * from sales s inner join product p on p.productid =
s.productid order by s.productid,p.name;

The current plan for this looks like:
   QUERY PLAN

 Sort  (cost=149.80..152.80 rows=1200 width=46)
   Sort Key: s.productid, p.name
   ->  Hash Join  (cost=37.00..88.42 rows=1200 width=46)
 Hash Cond: (s.productid = p.productid)
 ->  Seq Scan on sales s  (cost=0.00..31.40 rows=2140 width=8)
 ->  Hash  (cost=22.00..22.00 rows=1200 width=38)
   ->  Seq Scan on product p  (cost=0.00..22.00 rows=1200
width=38)

But in reality we could have executed this with a simple merge join using
the PK of product (productid) to provide presorted input. The extra sort
column on p.name is redundant due to productid being unique.

The UNION planning is crying out for help for cases like this. Where it
could provide sorted input on a unique index, providing the union's
targetlist contained all of the unique index's columns, and we also managed
to find an index on the other part of the union on the same set of columns,
then we could perform a Merge Append and a Unique. This would cause a
signification improvement in execution time for these types of queries, as
the current planner does an append/sort/unique, which especially sucks for
plans with a LIMIT clause.

I think this should solve some of the problems that Kyotarosan encountered
in his episode of dancing with indices over here ->
http://www.postgresql.org/message-id/20131031.194310.212107585.horiguchi.kyot...@lab.ntt.co.jp
where he was unable to prove that he could trim down sort nodes once all of
the columns of a unique index had been seen in the order by due to not
knowing if joins were going to duplicate the outer rows.

2.
It should also be possible to reuse a join in situations like:
create view product_sales as select p.productid,sum(s.qty) soldqty from
product p inner join sales s group by p.productid;

Where the consumer does:

select ps.*,p.current_price from product_sales ps inner join product p on
ps.productid = p.productid;

Here we'd currently join the product table twice, both times on the same
condition, but we could safely not bother doing that if we knew that the
join could never match more than 1 row on the inner side. Unfortunately I
deal with horrid situations like this daily, where people have used views
from within views, and all the same tables end up getting joined multiple
times :-(

Of course, both 1 and 2 should be considered separately from the attached
patch, this was just to show where it might lead.

Performance of the patch are as follows:


Test case:
create table t1 (id int primary key);
create table t2 (id int primary key);

insert into t1 select x.x from generate_series(1,100) x(x);
insert into t2 select x.x from generate_series(1,100) x(x);

vacuum analyze;

Query:
select count(t2.id) from t1 left outer join t2 on t1

Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-31 Thread David Rowley

On 31 December 2014 at 18:20, Robert Haas  wrote:

> On Fri, Dec 26, 2014 at 7:57 AM, David Rowley 
> wrote:
> > 1. Do we need to keep the 128 byte aggregate state size for machines
> without
> > 128 bit ints? This has been reduced to 48 bytes in the patch, which is in
> > favour code being compiled with a compiler which has 128 bit ints.  I
> kind
> > of think that we need to keep the 128 byte estimates for compilers that
> > don't support int128, but I'd like to hear any counter arguments.
>
> I think you're referring to the estimated state size in pg_aggregate
> here, and I'd say it's probably not a big deal one way or the other.
> Presumably, at some point, 128-bit arithmetic will become common
> enough that we'll want to change that estimate, but I don't know
> whether we've reached that point or not.
>
>
Yeah, that's what I was talking about.
I'm just looking at the code which uses this size estimate
in choose_hashed_grouping(). I'd be a bit worried giving the difference
between 48 and 128 that we'd under estimate the hash size too much and end
up going to disk during hashagg.

I think the patch should be modified so that it uses 128 bytes for the size
estimate on machines that don't support int128, but I'm not quite sure at
this stage if that causes issues for replication, if 1 machine had support
and one didn't, would it matter if the pg_aggregate row on the replica was
48 bytes instead of 128? Is this worth worrying about?



> > 2. References to int16 meaning 16 bytes. I'm really in two minds about
> this,
> > it's quite nice to keep the natural flow, int4, int8, int16, but I can't
> > help think that this will confuse someone one day. I think it'll be a
> long
> > time before it confused anyone if we called it int128 instead, but I'm
> not
> > that excited about seeing it renamed either. I'd like to hear what others
> > have to say... Is there a chance that some sql standard in the distant
> > future will have HUGEINT and we might regret not getting the internal
> names
> > nailed down?
>
> Yeah, I think using int16 to mean 16-bytes will be confusing to
> someone almost immediately.
>
>
hmm, I think it should be changed to int128 then.  Pitty int4 was selected
as a name instead of int32 back in the day...

I'm going to mark the patch as waiting on author, pending those two changes.

My view with the size estimates change is that if a committer deems it
overkill, then they can rip it out, but at least it's been thought of and
considered rather than forgotten about.

Regards

David Rowley

Re: [HACKERS] Additional role attributes && superuser review

2014-12-31 Thread Stephen Frost

* Magnus Hagander (mag...@hagander.net) wrote:
> On Tue, Dec 30, 2014 at 4:16 PM, Stephen Frost  wrote:
> > The approach I was thinking was to document and implement this as
> > impliciting granting exactly USAGE and SELECT rights, no more (not
> > BYPASSRLS) and no less (yes, the role could execute functions).  I agree
> 
> If the role could execute functions, it's *not* "exactly USAGE and SELECT
> rights", it's now "USAGE and SELECT and EXECUTE" rights... Just to be
> nitpicking, but see below.

We seem to be coming at it from two different directions, so I'll try to
clarify.  What I'm suggesting is that this role attribute be strictly
*additive*.  Any other privileges granted to the role with this role
attribute would still be applicable, including grants made to PUBLIC.

This role attribute would *not* include EXECUTE rights, by that logic.
However, if PUBLIC was granted EXECUTE rights for the function, then
this role could execute that function.

What it sounds like is you're imagining this role attribute to mean the
role has *only* USAGE and SELECT (or COPY or whatever) rights across the
board and that any grants done explicitly to this role or to public
wouldn't be respected.  In my view, that moves this away from a role
*attribute* to being a pre-defined *role*, as such a role would not be
usable for anything else.

> > that doing so would be strictly more than what pg_dump actually requires
> > but it's also what we actually have support for in our privilege system.
> 
> Yeah, but would it also be what people would actually *want*?

I can certainly see reasons why you'd want such a role to be able to
execute functions- in particular, consider xlog_pause anx xlog_resume.
If this role can't execute those functions then they probably can't
successfully run pg_dump against a replica.

> I think having it do exactly what pg_dump needs, and not things like
> execute functions etc, would be the thing people want for a 'DUMP'
> privilege.

What if we want pg_dump in 9.6 to have an option to execute xlog_pause
and xlog_resume for you?  You wouldn't be able to run that against a 9.5
database (or at least, that option wouldn't work).

> We might *also* want a USAGEANDSELECTANDEXECUTEANDWHATNOTBUTNOBYPASSURL
> (crap, it has to fit within NAMEDATALEN?) privilege, but I think that's a
> different thing.

Our privilege system doesn't currently allow for negative grants (deny
user X the ability to run functions, even if EXECUTE is granted to
PUBLIC).  I'm not against that idea, but I don't see it as the same as
this.

> > it would only give you COPY access. (And also
> > > COPY != SELECT in the way that the rule system applies, I think? And this
> > > one could be for COPY only)
> >
> > COPY certainly does equal SELECT rights..  We haven't got an independent
> > COPY privilege and I don't think it makes sense to invent one.  It
> 
> We don't, but I'm saying it might make sense to implement this. Not as a
> regular privilige, but as a role attribute. I'm not sure it's the right
> thing, but it might actually be interesting to entertain.

We've discussed having a role attribute for COPY-from-filesystem, but
pg_dump doesn't use that ever, it only uses COPY TO STDOUT.  I'm not
a fan of making a COPY_TO_STDOUT-vs-SELECT distinction just for this..

> > sounds like you're suggesting that we add a hack directly into COPY for
> > this privilege, but my thinking is that the right place to change for
> > this is in the privilege system and not hack it into individual
> > commands..  I'm also a bit nervous that we'll end up painting ourselves
> > into a corner if we hack this to mean exactly what pg_dump needs today.
> 
> One point would be that if we define it as "exactly what pg_dump needs",
> that definition can change in a future major version.

Sure, but that then gets a bit ugly because we encourage running the
latest version of pg_dump against the prior-major-version.

> > > I think BYPASSRLS would have to be implicitly granted by the DUMP
> > > privilege. Without that, the DUMP privilege is more or less meaningless
> > > (for anybody who uses RLS - but if they don't use RLS, then having it
> > > include BYPASSRLS makes no difference). Worse than that, you may end up
> > > with a dump that you cannot restore.
> >
> > The dump would fail, as I mentioned before.  I don't see why BYPASSRLS
> > has to be implicitly granted by the DUMP privilege and can absolutely
> > see use-cases for it to *not* be.  Those use-cases would require passing
> > pg_dump the --enable-row-security option, but that's why it's there.
> 
> So you're basically saying that if RLS is in use anywhere, this priviliege
> alone would be useless, correct?

No, it would still be *extremely* useful.  We have an option to pg_dump
to say "please dump with RLS enabled".  What that means is that you'd be
able to dump the entire database for all data you're allowed to see
through RLS policies.  How is that useless?  I certainly think it'd be
very handy and a g

Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-31 Thread Andres Freund

On 2015-01-01 03:00:50 +1300, David Rowley wrote:
> > > 2. References to int16 meaning 16 bytes. I'm really in two minds about
> > this,
> > > it's quite nice to keep the natural flow, int4, int8, int16, but I can't
> > > help think that this will confuse someone one day. I think it'll be a
> > long
> > > time before it confused anyone if we called it int128 instead, but I'm
> > not
> > > that excited about seeing it renamed either. I'd like to hear what others
> > > have to say... Is there a chance that some sql standard in the distant
> > > future will have HUGEINT and we might regret not getting the internal
> > names
> > > nailed down?
> >
> > Yeah, I think using int16 to mean 16-bytes will be confusing to
> > someone almost immediately.
> >
> >
> hmm, I think it should be changed to int128 then.  Pitty int4 was selected
> as a name instead of int32 back in the day...

Note that the C datatype has been int32/int64 for a while now, it's just
the SQL datatype and the names of its support functions. Given that,
afaiu, we're talking about the C datatype it seems pretty clear that it
should be named int128.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2014-12-31 Thread Thom Brown

On 18 December 2014 at 16:03, Amit Kapila  wrote:

>
>
> On Thu, Dec 18, 2014 at 9:22 PM, Amit Kapila 
> wrote:
> >
> > On Mon, Dec 8, 2014 at 10:40 AM, Amit Kapila 
> wrote:
> > >
> > > On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost 
> wrote:
> > > >
> > >
> > > So to summarize my understanding, below are the set of things
> > > which I should work on and in the order they are listed.
> > >
> > > 1. Push down qualification
> > > 2. Performance Data
> > > 3. Improve the way to push down the information related to worker.
> > > 4. Dynamic allocation of work for workers.
> > >
> > >
> >
> > I have worked on the patch to accomplish above mentioned points
> > 1, 2 and partly 3 and would like to share the progress with community.
>
> Sorry forgot to attach updated patch in last mail, attaching it now.
>

When attempting to recreate the plan in your example, I get an error:

 ➤ psql://thom@[local]:5488/pgbench

# create table t1(c1 int, c2 char(500)) with (fillfactor=10);
CREATE TABLE
Time: 13.653 ms

 ➤ psql://thom@[local]:5488/pgbench

# insert into t1 values(generate_series(1,100),'amit');
INSERT 0 100
Time: 4.796 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain select c1 from t1;
ERROR:  could not register background process
HINT:  You may need to increase max_worker_processes.
Time: 1.659 ms

 ➤ psql://thom@[local]:5488/pgbench

# show max_worker_processes ;
 max_worker_processes
--
 8
(1 row)

Time: 0.199 ms

# show parallel_seqscan_degree ;
 parallel_seqscan_degree
-
 10
(1 row)


Should I really need to increase max_worker_processes to >=
parallel_seqscan_degree?  If so, shouldn't there be a hint here along with
the error message pointing this out?  And should the error be produced when
only a *plan* is being requested?

Also, I noticed that where a table is partitioned, the plan isn't
parallelised:

# explain select distinct bid from pgbench_accounts;


   QUERY
PLAN

 HashAggregate  (cost=1446639.00..1446643.99 rows=499 width=4)
   Group Key: pgbench_accounts.bid
   ->  Append  (cost=0.00..1321639.00 rows=5001 width=4)
 ->  Seq Scan on pgbench_accounts  (cost=0.00..0.00 rows=1 width=4)
 ->  Seq Scan on pgbench_accounts_1  (cost=0.00..4279.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_2  (cost=0.00..2640.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_3  (cost=0.00..2640.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_4  (cost=0.00..2640.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_5  (cost=0.00..2640.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_6  (cost=0.00..2640.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_7  (cost=0.00..2640.00
rows=10 width=4)
...
 ->  Seq Scan on pgbench_accounts_498  (cost=0.00..2640.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_499  (cost=0.00..2640.00
rows=10 width=4)
 ->  Seq Scan on pgbench_accounts_500  (cost=0.00..2640.00
rows=10 width=4)
(504 rows)

Is this expected?

Thom

[HACKERS] Small doc patch about pg_service.conf

2014-12-31 Thread David Fetter

Folks,

There was a slash missing, which I've added.  Where is the default
directory on Windows, or is there one?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d829a4b..de272c5 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -6952,7 +6952,7 @@ myEventProc(PGEventId evtId, void *evtInfo, void 
*passThrough)
at ~/.pg_service.conf or the location
specified by the environment variable PGSERVICEFILE,
or it can be a system-wide file
-   at etc/pg_service.conf or in the directory
+   at /etc/pg_service.conf or in the directory
specified by the environment variable
PGSYSCONFDIR.  If service definitions with the same
name exist in the user and the system file, the user file takes

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Michael Paquier wrote:
> > HI all.
> > 
> > markhor has run for the first time in 8 days, and there is something
> > in range e703261..72dd233 making the regression test of brin crashing.
> > See here:
> > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2014-12-30%2020%3A58%3A49
> 
> This shows that the crash was in the object_address test, not brin.
> Will research.

I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
the object_address test.  The backtrace is pretty strange:

#0  0x7f08ce674107 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f08ce6754e8 in __GI_abort () at abort.c:89
#2  0x007ac071 in ExceptionalCondition (
conditionName=conditionName@entry=0x800f28 "!(keylen < 64)", 
errorType=errorType@entry=0x7e724f "FailedAssertion", 
fileName=fileName@entry=0x800ef0 
"/pgsql/source/master/src/backend/access/hash/hashfunc.c", 
lineNumber=lineNumber@entry=147)
at /pgsql/source/master/src/backend/utils/error/assert.c:54
#3  0x00494a93 in hashname (fcinfo=fcinfo@entry=0x7fff244324a0)
at /pgsql/source/master/src/backend/access/hash/hashfunc.c:147
#4  0x007b450d in DirectFunctionCall1Coll (func=0x494a50 , 
collation=collation@entry=0, arg1=)
at /pgsql/source/master/src/backend/utils/fmgr/fmgr.c:1027
#5  0x00797aca in CatalogCacheComputeHashValue 
(cache=cache@entry=0x10367d8, 
nkeys=, cur_skey=cur_skey@entry=0x7fff244328e0)
at /pgsql/source/master/src/backend/utils/cache/catcache.c:212
#6  0x00798ff7 in SearchCatCache (cache=0x10367d8, v1=18241016, v2=6, 
v3=11, v4=0)
at /pgsql/source/master/src/backend/utils/cache/catcache.c:1149
#7  0x007a67ae in GetSysCacheOid (cacheId=cacheId@entry=15, 
key1=, 
key2=key2@entry=6, key3=key3@entry=11, key4=key4@entry=0)
at /pgsql/source/master/src/backend/utils/cache/syscache.c:988
#8  0x00504699 in get_collation_oid (name=name@entry=0x11655c0, 
missing_ok=missing_ok@entry=0 '\000')
at /pgsql/source/master/src/backend/catalog/namespace.c:3323
#9  0x0050d8dc in get_object_address 
(objtype=objtype@entry=OBJECT_COLLATION, 
objname=objname@entry=0x11655c0, objargs=objargs@entry=0x0, 
relp=relp@entry=0x7fff24432c28, lockmode=lockmode@entry=1, 
missing_ok=missing_ok@entry=0 '\000')
at /pgsql/source/master/src/backend/catalog/objectaddress.c:704


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2014-12-31 Thread Stephen Frost

Amit,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
> On Tue, Dec 30, 2014 at 6:35 PM, Stephen Frost  wrote:
> > I'm pretty sure we've agreed that having a catch-all role attribute like
> > 'DBA' is a bad idea because we'd likely be adding privileges to it later
> > which would expand the rights of users with that attribute, possibly
> > beyond what is intended.
> 
> Yes, that could happen but do you want to say that is the only reason
> to consider server level roles (such as DBA) a bad idea.  

No, the other reason is that having more granular permissions is better
than catch-all's.  'superuser' is that catch-all today.

> Can't we make
> users aware of such things via documentation and then there will be
> some onus on user's to give such a privilege with care.

Perhaps, but if we're going to go in that direction, I'd rather have a
hierarchy which is built upon more granular options rather than assuming
we know exactly what the 'DBA' role should have for every environment.

> On looking around, it seems many of the databases provides such
> roles
> https://dbbulletin.wordpress.com/2013/05/29/backup-privileges-needed-to-backup-databases/
> 
> In the link, though they are talking about physical (file level) backup,
> there is mention about such roles in other databases.

Most of this discussion is about non-physical-level-backups.  We have
the REPLICATION role attribute for physical-level-backups and I don't
think anyone wants to get rid of that in favor of pulling it into some
'DBA' role attribute.

> The other way as discussed on this thread is to use something like
> DUMPALL (or DUMPFULL) privilege which also sounds to be a good
> way, apart from the fact that the same privilege can be used for
> non-dump purposes to extract the information from database/cluster.

I think we're probably going to go with DUMPAUTH as the specific role
attribute privilege, to make it clear that it includes authentication
information.  That's really the only distinction between DUMP (or
whatever) and a privilege to support pg_dumpall.

This does make me think that we need to consider if this role attribute
implicitly grants CONNECT to all databases also.

> > > How about PHYSICAL BACKUP (for basebackup) and LOGICAL BACKUP
> > > for pg_dump?
> >
> > Again, this makes it look like the read-all right would be specific to
> > users executing pg_dump, which is incorrect and misleading.  "PHYSICAL"
> > might also imply the ability to do pg_basebackup.
> 
> That's right, however having unrelated privileges for doing physical
> backup via pg_basebackup and pg_start*/pg_stop* method also
> doesn't sound to be the best way (can be slightly difficult for
> users to correlate after reading docs).  

We should definitely segregate the privilege to run start/stop backup
and run pg_basebackup.  One allows you to read the database files off
the filesystem more-or-less directly and the other doesn't.  That's a
big difference.

> Don't you think in this case
> we should have some form of hierarchy for privileges (like user
> having privilege to do pg_basebackup can also perform the
> backup via pg_start*/pg_stop* method or some other way to relate
> both forms of physical backup)?

If we had a specific privilege for pg_basebackup then I would agree that
it would allow call pg_start/stop_backup.  The same is not true in
reverse though.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Andres Freund

On 2014-12-31 10:02:40 -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > Michael Paquier wrote:
> > > HI all.
> > > 
> > > markhor has run for the first time in 8 days, and there is something
> > > in range e703261..72dd233 making the regression test of brin crashing.
> > > See here:
> > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2014-12-30%2020%3A58%3A49
> > 
> > This shows that the crash was in the object_address test, not brin.
> > Will research.
> 
> I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
> the object_address test.  The backtrace is pretty strange:

Hard to say without more detail, but my guess is that the argument to
get_collation_oid() isn't actually valid. For one, that'd explain the
error, for another, the pointer's value (name=name@entry=0x11655c0) is
suspiciously low.

> #8  0x00504699 in get_collation_oid (name=name@entry=0x11655c0, 
> missing_ok=missing_ok@entry=0 '\000')
> at /pgsql/source/master/src/backend/catalog/namespace.c:3323
> #9  0x0050d8dc in get_object_address 
> (objtype=objtype@entry=OBJECT_COLLATION, 
> objname=objname@entry=0x11655c0, objargs=objargs@entry=0x0, 
> relp=relp@entry=0x7fff24432c28, lockmode=lockmode@entry=1, 
> missing_ok=missing_ok@entry=0 '\000')
> at /pgsql/source/master/src/backend/catalog/objectaddress.c:704

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2014-12-31 Thread Magnus Hagander

On Wed, Dec 31, 2014 at 3:08 PM, Stephen Frost  wrote:

> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Tue, Dec 30, 2014 at 4:16 PM, Stephen Frost 
> wrote:
> > > The approach I was thinking was to document and implement this as
> > > impliciting granting exactly USAGE and SELECT rights, no more (not
> > > BYPASSRLS) and no less (yes, the role could execute functions).  I
> agree
> >
> > If the role could execute functions, it's *not* "exactly USAGE and SELECT
> > rights", it's now "USAGE and SELECT and EXECUTE" rights... Just to be
> > nitpicking, but see below.
>
> We seem to be coming at it from two different directions, so I'll try to
> clarify.  What I'm suggesting is that this role attribute be strictly
> *additive*.  Any other privileges granted to the role with this role
> attribute would still be applicable, including grants made to PUBLIC.
>
> This role attribute would *not* include EXECUTE rights, by that logic.
> However, if PUBLIC was granted EXECUTE rights for the function, then
> this role could execute that function.
>

Ah, ok, mistunderstood that part.


What it sounds like is you're imagining this role attribute to mean the
> role has *only* USAGE and SELECT (or COPY or whatever) rights across the
> board and that any grants done explicitly to this role or to public
> wouldn't be respected.  In my view, that moves this away from a role
> *attribute* to being a pre-defined *role*, as such a role would not be
> usable for anything else.
>

No, i meant additive as well. I misread you.


> > that doing so would be strictly more than what pg_dump actually requires
> > > but it's also what we actually have support for in our privilege
> system.
> >
> > Yeah, but would it also be what people would actually *want*?
>
> I can certainly see reasons why you'd want such a role to be able to
> execute functions- in particular, consider xlog_pause anx xlog_resume.

If this role can't execute those functions then they probably can't
> successfully run pg_dump against a replica.
>

uh, pg_dump doesn't run those commands :P I don't see why that's a
requirement at all.  And you can still always grant those things on top of
whatever the privilege gives you.


> I think having it do exactly what pg_dump needs, and not things like
> > execute functions etc, would be the thing people want for a 'DUMP'
> > privilege.
>
> What if we want pg_dump in 9.6 to have an option to execute xlog_pause
> and xlog_resume for you?  You wouldn't be able to run that against a 9.5
> database (or at least, that option wouldn't work).
>

It would if you added an explicit grant for it, which would have to be
documented.

I don't see how that's different if the definition is "allows select on all
tables". That wouldn't automatically give it the ability to do xlog_pause
in an old version either.


> We might *also* want a USAGEANDSELECTANDEXECUTEANDWHATNOTBUTNOBYPASSURL
> > (crap, it has to fit within NAMEDATALEN?) privilege, but I think that's a
> > different thing.
>
> Our privilege system doesn't currently allow for negative grants (deny
> user X the ability to run functions, even if EXECUTE is granted to
> PUBLIC).  I'm not against that idea, but I don't see it as the same as
> this.
>

Agreed. That's what I said - different thing :)


> > it would only give you COPY access. (And also
> > > > COPY != SELECT in the way that the rule system applies, I think? And
> this
> > > > one could be for COPY only)
> > >
> > > COPY certainly does equal SELECT rights..  We haven't got an
> independent
> > > COPY privilege and I don't think it makes sense to invent one.  It
> >
> > We don't, but I'm saying it might make sense to implement this. Not as a
> > regular privilige, but as a role attribute. I'm not sure it's the right
> > thing, but it might actually be interesting to entertain.
>
> We've discussed having a role attribute for COPY-from-filesystem, but
> pg_dump doesn't use that ever, it only uses COPY TO STDOUT.  I'm not
> a fan of making a COPY_TO_STDOUT-vs-SELECT distinction just for this..
>

Yeah, it's probably going overboard with it, since AFAICT the only thing
that would actually be affected is RULEs on SELECT, which I bet most people
don't use on their tables.


> > sounds like you're suggesting that we add a hack directly into COPY for
> > > this privilege, but my thinking is that the right place to change for
> > > this is in the privilege system and not hack it into individual
> > > commands..  I'm also a bit nervous that we'll end up painting ourselves
> > > into a corner if we hack this to mean exactly what pg_dump needs today.
> >
> > One point would be that if we define it as "exactly what pg_dump needs",
> > that definition can change in a future major version.
>
> Sure, but that then gets a bit ugly because we encourage running the
> latest version of pg_dump against the prior-major-version.
>

But the latest version of pg_dump *knows* how the prior major version
behaved with these things, and can either ada

Re: [HACKERS] documentation update for doc/src/sgml/func.sgml

2014-12-31 Thread Fabien COELHO



Here is a slight update so that type names are treated homogeneously 
between both added paragraphs.


ITSM that this patch should be committed without further ado.

--
Fabien.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2016c5a..f91033b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -939,6 +939,26 @@
 

 
+   
+For functions like round(), log() and
+sqrt() which run against either fixed-precision
+(numeric) or floating-point numbers (e.g. real),
+note that the results of these operations will differ according
+to the input type due to rounding. This is most observable with
+round(), which can end up rounding down as well as up for
+any 0.5 value. PostgreSQL's
+handling of floating-point values depends on the operating system, which
+may or may not follow the IEEE floating-point standard.
+  
+
+  
+The bitwise operators work only on integral data types, whereas
+the others are available for all numeric data types. The bitwise
+operators are also available for the bit string types
+bit and bit varying, as
+shown in .
+   
+
   
  shows functions for
 generating random numbers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2014-12-31 Thread Stephen Frost

José,

* José Luis Tallón (jltal...@adv-solutions.net) wrote:
> On 12/30/2014 04:16 PM, Stephen Frost wrote:
> >The approach I was thinking was to document and implement this as
> >impliciting granting exactly USAGE and SELECT rights, no more (not
> >BYPASSRLS) and no less (yes, the role could execute functions).  I agree
> >that doing so would be strictly more than what pg_dump actually requires
> >but it's also what we actually have support for in our privilege system.
> 
> Hmmm coupled with your comments below, I'd say some tweaking of
> the existing privileges is actually needed if we want to add these
> new "capabilities".

Not sure I see how..?  Can you clarify what you think needs to be
changed in the existing privilege system?

> BTW, and since this is getting a bit bigger than originally
> considered: would it be interesting to support some
> extension-defined capabilities, too?
> (for things can't be easily controlled by the existing USAGE /
> SELECT / ... rights, I mean)

Not entirely following what you mean here either, but to the extent that
it's independent from the current discussion, perhaps it deserves to be
on its own thread?

> >>it would only give you COPY access. (And also
> >>COPY != SELECT in the way that the rule system applies, I think? And this
> >>one could be for COPY only)
> >COPY certainly does equal SELECT rights..  We haven't got an independent
> >COPY privilege and I don't think it makes sense to invent one.
> 
> FWIW, a COPY(DUMP?) privilege different from SELECT would make sense.
> Considering your below comments it would be better that it not imply
> BYPASSRLS, though.

How would a COPY-to-STDOUT privilege be different from SELECT?

> >Lastly, I've been considering other use-cases for this privilege beyond
> >the pg_dump one (thinking about auditing, for example).
> 
> ISTR there was something upthread on an AUDIT role, right?
> This might be the moment to add it

One of the challenges to adding such a role is defining what it means.
What privileges do you think such a role would have?  I can see that
perhaps it would include read-only access to everything, but I'd want it
to also have read access to the logs..

> >>That would be "EXCLUSIVEBACKUP" or something like that, to be consistent
> >>with existing terminology though.
> >Ok.  I agree that working out the specific naming is difficult and would
> >like to focus on that, but we probably need to agree on the specific
> >capabilities first. :)
> 
> Yes :)
> For the record, LOGBACKUP vs PHYSBACKUP was suggested a couple days
> ago. I'd favor DUMP(logical) and BACKUP(physical) --- for lack of a
> better name.

I think I'm coming around to the EXCLUSIVEBACKUP option, per the
discussion with Magnus.  I don't particularly like LOGBACKUP and don't
think it really makes sense, while PHYSBACKUP seems like it'd cover what
REPLICATION already does.

> The above reasoning would have pg_basebackup requiring REPLICATION,
> which is a logical consequence of the implementation but strikes me
> as a bit surprising in the light of these other privileges.

I see what you mean that it's a bit strange for pg_basebackup to require
the REPLICATION role attribute, but, well, that's already the case, no?
Don't think we're going to change that..

> ARCHIVE, though completely appropriate for the "exclusivebackup"
> case above (so this would become DUMP/BASEBACKUP/ARCHIVE
> +REPLICATION) might end up causing quite some confusion ... ("what?
> WAL archiving is NOT granted by the "archive" privilege, but
> requires a superuser to turn it on(via ALTER SYSTEM)?

Yeah, Magnus didn't like ARCHIVE either and I understand his reasoning.

> >pg_xlog_replay_pause
> >pg_xlog_replay_resume
> >
> >Though I'd be find if the xlog_replay ones were their own privilege (eg:
> >REPLAY or something).
> +1

Yeah, Magnus was also agreeing that they should be independent.

> >>I think just calling them "xlog related functions" is doing us a disservice
> >>there. Definitely once we have an actual documentation to write for it, but
> >>also in this discussion.
> >[snip]
> >>If it's for replicas, then why are we not using the REPLICATION privilege
> >>which is extremely similar to this?
> >I don't actually see REPLICATION as being the same.
> 
> REPLICATION (ability to replicate) vs REPLICAOPERATOR (ability to
> control *the replica* or the R/O behaviour of the server, for that
> matter...)

Right, think Magnus and I clarified what was meant there.

> >Yes.  My thinking on how to approach this was to forcibly set all of
> >their transactions to read-only.
> 
> So "READONLY" ?

Right.

> >Agreed.  That's actually something that I think would be *really* nice-
> >an option to dump the necessary globals for whatever database you're
> >running pg_dump against.  We have existing problems in this area
> >(database-level GUCs aren't considered database-specific and therefore
> >aren't picked up by pg_dump, for example..), but being able to also dump
> >the roles whi

Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Tom Lane

Andres Freund  writes:
> On 2014-12-31 10:02:40 -0300, Alvaro Herrera wrote:
>> I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
>> the object_address test.  The backtrace is pretty strange:

> Hard to say without more detail, but my guess is that the argument to
> get_collation_oid() isn't actually valid. For one, that'd explain the
> error, for another, the pointer's value (name=name@entry=0x11655c0) is
> suspiciously low.

Given that CLOBBER_CACHE_ALWAYS seems to make it fail reliably, the
obvious explanation is that what's being passed is a pointer into
catcache or relcache storage that isn't guaranteed to be valid for
long enough.  The given backtrace doesn't go down far enough to show
where the bogus input came from, but I'm betting that something is
returning to SQL a string it got from cache without pstrdup'ing it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-31 Thread Fabien COELHO




I'm not sure about what Makefile changes could be necessary for
various environments, it looks ok to me as it is.


Found one. EXTRA_CLEAN is needed for generated C files. Added to this v3.

--
Fabien.diff --git a/contrib/pgbench/.gitignore b/contrib/pgbench/.gitignore
index 489a2d6..aae819e 100644
--- a/contrib/pgbench/.gitignore
+++ b/contrib/pgbench/.gitignore
@@ -1 +1,3 @@
+/exprparse.c
+/exprscan.c
 /pgbench
diff --git a/contrib/pgbench/Makefile b/contrib/pgbench/Makefile
index b8e2fc8..2d4033b 100644
--- a/contrib/pgbench/Makefile
+++ b/contrib/pgbench/Makefile
@@ -4,7 +4,9 @@ PGFILEDESC = "pgbench - a simple program for running benchmark tests"
 PGAPPICON = win32
 
 PROGRAM = pgbench
-OBJS	= pgbench.o $(WIN32RES)
+OBJS	= pgbench.o exprparse.o $(WIN32RES)
+
+EXTRA_CLEAN	= exprparse.c exprscan.c
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS = $(libpq_pgport) $(PTHREAD_LIBS)
@@ -18,8 +20,22 @@ subdir = contrib/pgbench
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
+
+distprep: exprparse.c exprscan.c
 endif
 
 ifneq ($(PORTNAME), win32)
 override CFLAGS += $(PTHREAD_CFLAGS)
 endif
+
+# There is no correct way to write a rule that generates two files.
+# Rules with two targets don't have that meaning, they are merely
+# shorthand for two otherwise separate rules.  To be safe for parallel
+# make, we must chain the dependencies like this.  The semicolon is
+# important, otherwise make will choose the built-in rule for
+# gram.y=>gram.c.
+
+exprparse.h: exprparse.c ;
+
+# exprscan is compiled as part of exprparse
+exprparse.o: exprscan.c
diff --git a/contrib/pgbench/exprparse.y b/contrib/pgbench/exprparse.y
new file mode 100644
index 000..243c6b9
--- /dev/null
+++ b/contrib/pgbench/exprparse.y
@@ -0,0 +1,96 @@
+%{
+/*-
+ *
+ * exprparse.y
+ *	  bison grammar for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+#include "postgres_fe.h"
+
+#include "pgbench.h"
+
+PgBenchExpr *expr_parse_result;
+
+static PgBenchExpr *make_integer_constant(int64 ival);
+static PgBenchExpr *make_variable(char *varname);
+static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+		PgBenchExpr *rexpr);
+
+%}
+
+%expect 0
+%name-prefix="expr_yy"
+
+%union
+{
+	int64		ival;
+	char	   *str;
+	PgBenchExpr *expr;
+}
+
+%type  expr
+%type  INTEGER
+%type  VARIABLE
+%token INTEGER VARIABLE
+%token CHAR_ERROR /* never used, will raise a syntax error */
+
+%left	'+' '-'
+%left	'*' '/' '%'
+%right	UMINUS
+
+%%
+
+result: expr{ expr_parse_result = $1; }
+
+expr: '(' expr ')'			{ $$ = $2; }
+	| '+' expr %prec UMINUS	{ $$ = $2; }
+	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
+	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
+	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
+	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
+	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| INTEGER{ $$ = make_integer_constant($1); }
+	| VARIABLE { $$ = make_variable($1); }
+	;
+
+%%
+
+static PgBenchExpr *
+make_integer_constant(int64 ival)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_INTEGER_CONSTANT;
+	expr->u.integer_constant.ival = ival;
+	return expr;
+}
+
+static PgBenchExpr *
+make_variable(char *varname)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_VARIABLE;
+	expr->u.variable.varname = varname;
+	return expr;
+}
+
+static PgBenchExpr *
+make_op(char operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
+
+	expr->etype = ENODE_OPERATOR;
+	expr->u.operator.operator = operator;
+	expr->u.operator.lexpr = lexpr;
+	expr->u.operator.rexpr = rexpr;
+	return expr;
+}
+
+#include "exprscan.c"
diff --git a/contrib/pgbench/exprscan.l b/contrib/pgbench/exprscan.l
new file mode 100644
index 000..4c9229c
--- /dev/null
+++ b/contrib/pgbench/exprscan.l
@@ -0,0 +1,105 @@
+%{
+/*-
+ *
+ * exprscan.l
+ *	  a lexical scanner for a simple expression syntax
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *-
+ */
+
+/* line and column number for error reporting */
+static int	yyline = 0, yycol = 0;
+
+/* Handles to the buffer that the lexer uses internally */
+static YY_BUFFER_STATE scanbufhandle;
+static char *scanbuf;
+static int	scanbuflen;
+
+/* flex 2.5.4 doesn't bother with a decl for this */
+int expr_yylex(void);
+
+%}
+
+%opti

Re: [HACKERS] [GENERAL] ON_ERROR_ROLLBACK

2014-12-31 Thread Tom Lane

Andrew Dunstan  writes:
> On 12/30/2014 09:20 AM, Tom Lane wrote:
>> In one light this is certainly a bug fix, but in another it's just
>> definitional instability.
>> 
>> If we'd gotten a field bug report we might well have chosen to back-patch,
>> though, and perhaps your client's complaint counts as that.

> I got caught by this with ON_ERROR_ROLLBACK on 9.3 just this afternoon 
> before remembering this thread. So there's a field report :-)

> +0.75 for backpatching (It's hard to imagine someone relying on the bad 
> behaviour, but you never know).

It seems like there's a consensus in favor of back-patching this change,
so I'll go ahead and do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2014-12-31 Thread Stephen Frost

* Magnus Hagander (mag...@hagander.net) wrote:
> On Wed, Dec 31, 2014 at 3:08 PM, Stephen Frost  wrote:
> > * Magnus Hagander (mag...@hagander.net) wrote:
> > > that doing so would be strictly more than what pg_dump actually requires
> > > > but it's also what we actually have support for in our privilege
> > system.
> > >
> > > Yeah, but would it also be what people would actually *want*?
> >
> > I can certainly see reasons why you'd want such a role to be able to
> > execute functions- in particular, consider xlog_pause anx xlog_resume.
> 
> If this role can't execute those functions then they probably can't
> > successfully run pg_dump against a replica.
> 
> uh, pg_dump doesn't run those commands :P I don't see why that's a
> requirement at all.  And you can still always grant those things on top of
> whatever the privilege gives you.

Ok, so, first off, this is all about the discussion around "is this
additive or not"..  Since we just agreed that it's additive, I'm not
sure I see the need to discuss the EXECUTE privileges..

Having said that though..

If you can't pause/resume then you can end up with your pg_dump
transaction getting killed.  I'm aware of folks who already do this
today, by hand with shell scripts..  I agree that pg_dump doesn't do it,
but I do think it'd be nice to have and I can certainly see the use-case
for them..

> > I think having it do exactly what pg_dump needs, and not things like
> > > execute functions etc, would be the thing people want for a 'DUMP'
> > > privilege.
> >
> > What if we want pg_dump in 9.6 to have an option to execute xlog_pause
> > and xlog_resume for you?  You wouldn't be able to run that against a 9.5
> > database (or at least, that option wouldn't work).
> 
> It would if you added an explicit grant for it, which would have to be
> documented.

Huh?  An explicit grant for xlog_pause/xlog_resume won't work as we
check role attributes rights inside the function..

> I don't see how that's different if the definition is "allows select on all
> tables". That wouldn't automatically give it the ability to do xlog_pause
> in an old version either.

Ok, that I agree with- you don't automatically get xlog_pause rights if
you have the 'select on all tables' right.

> > We've discussed having a role attribute for COPY-from-filesystem, but
> > pg_dump doesn't use that ever, it only uses COPY TO STDOUT.  I'm not
> > a fan of making a COPY_TO_STDOUT-vs-SELECT distinction just for this..
> 
> Yeah, it's probably going overboard with it, since AFAICT the only thing
> that would actually be affected is RULEs on SELECT, which I bet most people
> don't use on their tables.

Well, we could make SELECT not work, but if you've got COPY then you can
still get all the data, so, yeah, not much different.  I seriously doubt
many people are using rules..

> > > One point would be that if we define it as "exactly what pg_dump needs",
> > > that definition can change in a future major version.
> >
> > Sure, but that then gets a bit ugly because we encourage running the
> > latest version of pg_dump against the prior-major-version.
> 
> But the latest version of pg_dump *knows* how the prior major version
> behaved with these things, and can either adapt, or give the user a message
> about what they need to do to adapt.

Yes, that's true.

> > No, it would still be *extremely* useful.  We have an option to pg_dump
> > to say "please dump with RLS enabled".  What that means is that you'd be
> > able to dump the entire database for all data you're allowed to see
> > through RLS policies.  How is that useless?  I certainly think it'd be
> > very handy and a good way to get *segregated* dumps according to policy.
> 
> Hmm. Yeah, I guess - my mindset was int he "database backup" mode, where a
> "silently partial" backup is a really scary thing rather than a feature :)

Agreed, we don't ever want that.

> I guess as long as you still dump all the parts, RLS itself ensures that
> foreign keys etc will actually be valid upon a reaload?

If you set up your policies correctly, yes.  RLS is flexible enough that
you could create policies which fail, but you have the same problem
today to some extent (consider tables you don't have SELECT rights on).

> > > We could/should also throw a WARNING if DUMP Is granted to a role without
> > > BYPASSRLS in case row level security is enabled in the system, I think.
> > But
> > > that's more of an implementation detail.
> >
> > That's a bit ugly and RLS could be added to a relation after the DUMP
> > privilege is granted.
> 
> Yes, it's not going to be all-covering, but it can still be a useful
> hint/warning in the cases where it *does* that. We obviously still need
> pg_dump to give the error in both scenarios.

I'm not against doing it, personally, but I suspect others won't like it
(or at least, that's been the case in the past with other things..).

> What I think this part of the discussion is getting at is that there's a
> > lot of people who view p

Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

2014-12-31 Thread Andres Freund

Hi,

On 2014-12-05 16:18:02 +0900, Fujii Masao wrote:
> On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund  wrote:
> > So I think we just need to make pg_basebackup create to .ready
> > files.
> 
> s/.ready/.done? If yes, +1.

That unfortunately requires changes to both backend and pg_basebackup to
support fetch and stream modes respectively.

I've attached a preliminary patch for this. I'd appreciate feedback. I
plan to commit it in a couple of days, after some more
testing/rereading.

> > Given that the walreceiver and restore_command already
> > unconditionally do XLogArchiveForceDone() I think we'd follow the
> > established precedent. Arguably it could make sense to archive files
> > again on the standby after a promotion as they aren't guaranteed to have
> > been on the then primary. But we don't have any infrastructure anyway
> > for that and walsender doesn't do so, so it doesn't seem to make any
> > sense to do that for pg_basebackup.
> >
> > Independent from this bug, there's also some debatable behaviour about
> > what happens if a node with a high wal_keep_segments turns on
> > archive_mode. Suddenly all those old files are archived... I think it
> > might be a good idea to simply always create .done files when
> > archive_mode is disabled while a wal segment is finished.
> 
> +1

I tend to think that's a master only change. Agreed?

Greetings,

Andres Freund
>From 3db116bc5b9465f555957bb11ac6cb8b20c18405 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 31 Dec 2014 14:50:57 +0100
Subject: [PATCH 1/2] Add pg_string_endswith as the start of a string helper
 library in src/common.

Backpatch to 9.3 where src/common was introduce, because a bugfix that
needs to be backpatched, requires the function. Earlier branches will
have to duplicate the code.
---
 src/backend/replication/slot.c | 21 ++---
 src/common/Makefile|  2 +-
 src/common/string.c| 43 ++
 src/include/common/string.h| 15 +++
 src/tools/msvc/Mkvcbuild.pm|  2 +-
 5 files changed, 62 insertions(+), 21 deletions(-)
 create mode 100644 src/common/string.c
 create mode 100644 src/include/common/string.h

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 937b669..698ca6b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -40,6 +40,7 @@
 #include 
 
 #include "access/transam.h"
+#include "common/string.h"
 #include "miscadmin.h"
 #include "replication/slot.h"
 #include "storage/fd.h"
@@ -780,24 +781,6 @@ CheckSlotRequirements(void)
 }
 
 /*
- * Returns whether the string `str' has the postfix `end'.
- */
-static bool
-string_endswith(const char *str, const char *end)
-{
-	size_t		slen = strlen(str);
-	size_t		elen = strlen(end);
-
-	/* can't be a postfix if longer */
-	if (elen > slen)
-		return false;
-
-	/* compare the end of the strings */
-	str += slen - elen;
-	return strcmp(str, end) == 0;
-}
-
-/*
  * Flush all replication slots to disk.
  *
  * This needn't actually be part of a checkpoint, but it's a convenient
@@ -864,7 +847,7 @@ StartupReplicationSlots(void)
 			continue;
 
 		/* we crashed while a slot was being setup or deleted, clean up */
-		if (string_endswith(replication_de->d_name, ".tmp"))
+		if (pg_string_endswith(replication_de->d_name, ".tmp"))
 		{
 			if (!rmtree(path, true))
 			{
diff --git a/src/common/Makefile b/src/common/Makefile
index 7edbaaa..e5c345d 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -23,7 +23,7 @@ include $(top_builddir)/src/Makefile.global
 override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o username.o wait_error.o
+OBJS_COMMON = exec.o pgfnames.o psprintf.o relpath.o rmtree.o string.o username.o wait_error.o
 
 OBJS_FRONTEND = $(OBJS_COMMON) fe_memutils.o
 
diff --git a/src/common/string.c b/src/common/string.c
new file mode 100644
index 000..07a2aaf
--- /dev/null
+++ b/src/common/string.c
@@ -0,0 +1,43 @@
+/*-
+ *
+ * string.c
+ *		string handling helpers
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/common/string.c
+ *
+ *-
+ */
+
+
+#ifndef FRONTEND
+#include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif
+
+#include "common/string.h"
+
+
+/*
+ * Returns whether the string `str' has the postfix `end'.
+ */
+bool
+pg_string_endswith(const char *str, const char *end)
+{
+	size_t		slen = strlen(str);
+	size_t		elen = strlen(end);
+
+	/* can't be a postfix if longer */
+	if (elen > slen)
+		return false;
+
+	/* compare the end of the strings */
+	str += slen - elen;
+	return strcmp(str, end) == 0;
+}
diff --git a/src/include/common/string.h b/

Re: [HACKERS] psql tab completion: fix COMMENT ON ... IS IS IS

2014-12-31 Thread Robert Haas

On Sun, Dec 28, 2014 at 7:44 PM, Ian Barwick  wrote:
> Currently tab completion for 'COMMENT ON {object} foo IS' will result in the
> 'IS'
> being duplicated up to two times; not a world-shattering issue I know, but
> the
> fix is trivial and I stumble over it often enough to for it to mildly annoy
> me.
> Patch attached.

I've noticed that in the past, too.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The return value of allocate_recordbuf()

2014-12-31 Thread Robert Haas

On Mon, Dec 29, 2014 at 6:14 AM, Heikki Linnakangas
 wrote:
> Hmm. There is no way to check beforehand if a palloc() will fail because of
> OOM. We could check for MaxAllocSize, though.

I think we need a version of palloc that returns NULL instead of
throwing an error.  The error-throwing behavior is for the best in
almost every case, but I think the no-error version would find enough
users to be worthwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2014-12-31 Thread Thom Brown

On 31 December 2014 at 14:20, Thom Brown  wrote:

> On 18 December 2014 at 16:03, Amit Kapila  wrote:
>
>>
>>
>> On Thu, Dec 18, 2014 at 9:22 PM, Amit Kapila 
>> wrote:
>> >
>> > On Mon, Dec 8, 2014 at 10:40 AM, Amit Kapila 
>> wrote:
>> > >
>> > > On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost 
>> wrote:
>> > > >
>> > >
>> > > So to summarize my understanding, below are the set of things
>> > > which I should work on and in the order they are listed.
>> > >
>> > > 1. Push down qualification
>> > > 2. Performance Data
>> > > 3. Improve the way to push down the information related to worker.
>> > > 4. Dynamic allocation of work for workers.
>> > >
>> > >
>> >
>> > I have worked on the patch to accomplish above mentioned points
>> > 1, 2 and partly 3 and would like to share the progress with community.
>>
>> Sorry forgot to attach updated patch in last mail, attaching it now.
>>
>
> When attempting to recreate the plan in your example, I get an error:
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # create table t1(c1 int, c2 char(500)) with (fillfactor=10);
> CREATE TABLE
> Time: 13.653 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # insert into t1 values(generate_series(1,100),'amit');
> INSERT 0 100
> Time: 4.796 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # explain select c1 from t1;
> ERROR:  could not register background process
> HINT:  You may need to increase max_worker_processes.
> Time: 1.659 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # show max_worker_processes ;
>  max_worker_processes
> --
>  8
> (1 row)
>
> Time: 0.199 ms
>
> # show parallel_seqscan_degree ;
>  parallel_seqscan_degree
> -
>  10
> (1 row)
>
>
> Should I really need to increase max_worker_processes to >=
> parallel_seqscan_degree?  If so, shouldn't there be a hint here along with
> the error message pointing this out?  And should the error be produced when
> only a *plan* is being requested?
>
> Also, I noticed that where a table is partitioned, the plan isn't
> parallelised:
>
> # explain select distinct bid from pgbench_accounts;
>
>
>QUERY
> PLAN
>
> 
>  HashAggregate  (cost=1446639.00..1446643.99 rows=499 width=4)
>Group Key: pgbench_accounts.bid
>->  Append  (cost=0.00..1321639.00 rows=5001 width=4)
>  ->  Seq Scan on pgbench_accounts  (cost=0.00..0.00 rows=1 width=4)
>  ->  Seq Scan on pgbench_accounts_1  (cost=0.00..4279.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_2  (cost=0.00..2640.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_3  (cost=0.00..2640.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_4  (cost=0.00..2640.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_5  (cost=0.00..2640.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_6  (cost=0.00..2640.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_7  (cost=0.00..2640.00
> rows=10 width=4)
> ...
>  ->  Seq Scan on pgbench_accounts_498  (cost=0.00..2640.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_499  (cost=0.00..2640.00
> rows=10 width=4)
>  ->  Seq Scan on pgbench_accounts_500  (cost=0.00..2640.00
> rows=10 width=4)
> (504 rows)
>
> Is this expected?
>

Another issue (FYI, pgbench2 initialised with: pgbench -i -s 100 -F 10
pgbench2):

 ➤ psql://thom@[local]:5488/pgbench2

# explain select distinct bid from pgbench_accounts;
QUERY
PLAN
---
 HashAggregate  (cost=245833.38..245834.38 rows=100 width=4)
   Group Key: bid
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..220833.38
rows=1000 width=4)
 Number of Workers: 8
 Number of Blocks Per Workers: 208333
(5 rows)

Time: 7.476 ms

 ➤ psql://thom@[local]:5488/pgbench2

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 14897.991 ms

The logs say:

2014-12-31 15:21:42 GMT [9164]: [240-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [241-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [242-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [243-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [244-1] user=,db=,client= LOG:  registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [245-1] user=,db=,client

Re: [HACKERS] Publish autovacuum informations

2014-12-31 Thread Robert Haas

On Mon, Dec 29, 2014 at 11:03 AM, Tom Lane  wrote:
> Either one of those approaches would cripple our freedom to change those
> data structures; which we've done repeatedly in the past and will surely
> want to do again.  So I'm pretty much -1 on exposing them.

We could instead add a view of this information to core --
pg_stat_autovacuum, or whatever.

But to be honest, I'm more in favor of Guillaume's proposal.  I will
repeat my recent assertion that we -- you in particular -- are too
reluctant to expose internal data structures to authors of C
extensions, and that this is developer-hostile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

2014-12-31 Thread Alvaro Herrera

Tom Lane wrote:

> Given that CLOBBER_CACHE_ALWAYS seems to make it fail reliably, the
> obvious explanation is that what's being passed is a pointer into
> catcache or relcache storage that isn't guaranteed to be valid for
> long enough.  The given backtrace doesn't go down far enough to show
> where the bogus input came from, but I'm betting that something is
> returning to SQL a string it got from cache without pstrdup'ing it.

Yep, that was it -- the bug was in getObjectIdentityParts.  I noticed
other three cases of missing pstrdup(), also fixed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Andrew Dunstan



On 12/31/2014 07:49 AM, Tomas Vondra wrote:

On 28.12.2014 00:46, Noah Misch wrote:

On Tue, Dec 23, 2014 at 03:32:59PM +0100, Tomas Vondra wrote:

On 23.12.2014 15:21, Andrew Dunstan wrote:

No, config_opts is what's passed to configure. Try something like:

 if ($branch eq 'REL9_0_STABLE')
 {
 $skip_steps{'pl-install-check'} = 1;
 }

Applied to all three animals.

As of the time of this change, the animals ceased to report in.

The strange thing is that while the first run worked fine (as
illustrated by the results submitted to pgbuildfarm.org), all the
following runs fail like this:

Global symbol "%skip_steps" requires explicit package name at
build-farm.conf line 308.
Compilation failed in require at ./run_branches.pl line 55.

Apparently, something is broken, but my Perl-fu is limited so I have no
idea what/why :-(





Sorry, I should have tested it. This seems to work:

   if ($branch eq 'REL9_0_STABLE')
   {
$PGBuild::Options::skip_steps .= ' pl-install-check';
$main::skip_steps{'pl-install-check'} = 1;
   }

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-31 Thread Robert Haas

On Wed, Dec 31, 2014 at 9:00 AM, David Rowley  wrote:
> Yeah, that's what I was talking about.
> I'm just looking at the code which uses this size estimate in
> choose_hashed_grouping(). I'd be a bit worried giving the difference between
> 48 and 128 that we'd under estimate the hash size too much and end up going
> to disk during hashagg.

That's an issue, but the problem is that...

> I think the patch should be modified so that it uses 128 bytes for the size
> estimate on machines that don't support int128, but I'm not quite sure at
> this stage if that causes issues for replication, if 1 machine had support
> and one didn't, would it matter if the pg_aggregate row on the replica was
> 48 bytes instead of 128? Is this worth worrying about?

...if we do this, then yes, there is an incompatibility between
binaries compiled with this option and binaries compiled without it.
They generate different system catalog contents after initdb and we
shouldn't use one set of binaries with an initdb done the other way.
That seems like serious overkill, especially since this could not
inconceivably flip between one recompile and the next if the
administrator has run 'yum update' in the meantime.  So I'd argue for
picking one estimate and using it across the board, and living with
the fact that it's sometimes going to be wrong.  Such is the lot in
life of an estimate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Andrew Gierth

> "Noah" == Noah Misch  writes:

 Noah> Suppose one node orchestrated all sorting and aggregation.

Well, that has the downside of making it into an opaque blob, without
actually gaining much.

 Noah> Call it a MultiGroupAggregate for now.  It wouldn't harness
 Noah> Sort nodes, because it performs aggregation between
 Noah> tuplesort_puttupleslot() calls.  Instead, it would directly
 Noah> manage two Tuplesortstate, CUR and NEXT.  The node would have
 Noah> an initial phase similar to ExecSort(), in which it drains the
 Noah> outer node to populate the first CUR.  After that, it looks
 Noah> more like agg_retrieve_direct(),

agg_retrieve_direct is already complex enough, and this would be
substantially more so, as compared to agg_retrieve_chained which is
substantially simpler.

A more serious objection is that this forecloses (or at least makes
much more complex) the future possibility of doing some grouping sets
by sorting and others by hashing. The chained approach specifically
allows for the future possibility of using a HashAggregate as the
chain head, so that for example cube(a,b) can be implemented as a
sorted agg for (a,b) and (a) and a hashed agg for (b) and (), allowing
it to be done with one sort even if the result size for (a,b) is too
big to hash.

 Noah> Tom, Andrew, does that look satisfactory?

Not to me.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Tomas Vondra

On 31.12.2014 17:29, Andrew Dunstan wrote:
> 
> Sorry, I should have tested it. This seems to work:
> 
>if ($branch eq 'REL9_0_STABLE')
>{
> $PGBuild::Options::skip_steps .= ' pl-install-check';
> $main::skip_steps{'pl-install-check'} = 1;
>}
> 
> cheers

Meh, no problem. We've fixed it in 2014, so it's OK. To quote one of my
colleagues "I haven't tested it, I believe it works correctly." ;-)

Any ideas why it worked for the first run?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Initdb-cs_CZ.WIN-1250 buildfarm failures

2014-12-31 Thread Andrew Dunstan



On 12/31/2014 12:26 PM, Tomas Vondra wrote:

On 31.12.2014 17:29, Andrew Dunstan wrote:

Sorry, I should have tested it. This seems to work:

if ($branch eq 'REL9_0_STABLE')
{
 $PGBuild::Options::skip_steps .= ' pl-install-check';
 $main::skip_steps{'pl-install-check'} = 1;
}

cheers

Meh, no problem. We've fixed it in 2014, so it's OK. To quote one of my
colleagues "I haven't tested it, I believe it works correctly." ;-)

Any ideas why it worked for the first run?



No. It failed for me right off the bat.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2014-12-31 Thread Tom Lane

Robert Haas  writes:
> On Mon, Dec 29, 2014 at 11:03 AM, Tom Lane  wrote:
>> Either one of those approaches would cripple our freedom to change those
>> data structures; which we've done repeatedly in the past and will surely
>> want to do again.  So I'm pretty much -1 on exposing them.

> We could instead add a view of this information to core --
> pg_stat_autovacuum, or whatever.

> But to be honest, I'm more in favor of Guillaume's proposal.  I will
> repeat my recent assertion that we -- you in particular -- are too
> reluctant to expose internal data structures to authors of C
> extensions, and that this is developer-hostile.

Well, the core question there is whether we have a policy of not breaking
extension-visible APIs.  While we will very often do things like adding
parameters to existing functions, I think we've tended to refrain from
making wholesale semantic revisions to exposed data structures.

I'd be all right with putting the data structure declarations in a file
named something like autovacuum_private.h, especially if it carried an
annotation that "if you depend on this, don't be surprised if we break
your code in future".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

2014-12-31 Thread Noah Misch

On Sun, Dec 28, 2014 at 07:20:04PM -0500, Andrew Dunstan wrote:
> On 12/28/2014 04:58 PM, Noah Misch wrote:
> >The gettext maintainer was open to implementing the setlocale_native_forked()
> >technique in gettext, though the last visible progress was in October.  In 
> >any
> >event, PostgreSQL builds will see older gettext for several years.  If
> >setlocale-darwin-fork-v1.patch is not wanted, I suggest making the postmaster
> >check during startup whether it has become multithreaded.  If multithreaded:
> >
> >   FATAL: postmaster became multithreaded during startup
> >   HINT: Set the LC_ALL environment variable to a valid locale.

> >I would like to go ahead and commit setlocale-main-harden-v1.patch, which is 
> >a
> >good thing to have regardless of what happens with gettext.
> >
> 
> I'm OK with this, but on its own it won't fix orangutan's problems, will it?

Right; setlocale-main-harden-v1.patch fixes a bug not affecting orangutan at
all.  None of the above will make orangutan turn green.  Checking
multithreading during startup would merely let it fail cleanly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

2014-12-31 Thread Noah Misch

On Wed, Dec 31, 2014 at 12:32:37AM -0500, Robert Haas wrote:
> On Sun, Dec 28, 2014 at 4:58 PM, Noah Misch  wrote:
> > I wondered whether to downgrade FATAL to LOG in back branches.  Introducing 
> > a
> > new reason to block startup is disruptive for a minor release, but having 
> > the
> > postmaster deadlock at an unpredictable later time is even more disruptive. 
> >  I
> > am inclined to halt startup that way in all branches.
> 
> Jeepers.  I'd rather not do that.  From your report, this problem has
> been around for years.  Yet, as far as I know, it's bothering very few
> real users, some of whom might be far more bothered by the postmaster
> suddenly failing to start.  I'm fine with a FATAL in master, but I'd
> vote against doing anything that might prevent startup in the
> back-branches without more compelling justification.

Clusters hosted on OS X fall into these categories:

1) Unaffected configuration.  This includes everyone setting a valid messages
   locale via LANG, LC_ALL or LC_MESSAGES.
2) Affected configuration.  Through luck and light use, the cluster would not
   experience the crashes/hangs.
3) Cluster would experience the crashes/hangs.

DBAs in (3) want the FATAL at startup, but those in (2) want a LOG message
instead.  DBAs in (1) don't care.  Since intermittent postmaster hangs are far
worse than startup failure, if (2) and (3) have similar population, FATAL is
the better bet.  If (2) is sufficiently more populous than (3), then the many
small pricks from startup failure do add up to hurt more than the occasional
postmaster hang.  Who knows how that calculation plays out.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Noah Misch

On Wed, Dec 31, 2014 at 02:45:29PM +0530, Atri Sharma wrote:
> > Suppose one node orchestrated all sorting and aggregation.  Call it a
> > MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
> > performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
> > would directly manage two Tuplesortstate, CUR and NEXT.  The node would
> > have
> > an initial phase similar to ExecSort(), in which it drains the outer node
> > to
> > populate the first CUR.  After that, it looks more like
> > agg_retrieve_direct(),
> > except that CUR is the input source, and each tuple drawn is also put into
> > NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
> > This
> > design does not add I/O consumption or require a nonstandard communication
> > channel between executor nodes.  Tom, Andrew, does that look satisfactory?
> >
> >
> So you are essentially proposing merging ChainAggregate and its
> corresponding Sort node?
> 
> So the structure would be something like:
> 
> GroupAggregate
> --> MultiGroupAgg (a,b)
> > MultiGroupAgg (c,d) ...

No.

> If you are proposing
> replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
> with a single MultiGroupAggregate node, I am not able to understand how it
> will handle all the multiple sort orders.

Yes, I was proposing that.  My paragraph that you quoted above was the attempt
to explain how the node would manage multiple sort orders.  If you have
specific questions about it, feel free to ask.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Noah Misch

On Wed, Dec 31, 2014 at 05:33:43PM +, Andrew Gierth wrote:
> > "Noah" == Noah Misch  writes:
> 
>  Noah> Suppose one node orchestrated all sorting and aggregation.
> 
> Well, that has the downside of making it into an opaque blob, without
> actually gaining much.

The opaque-blob criticism is valid.  As for not gaining much, well, the gain I
sought was to break this stalemate.  You and Tom have expressed willingness to
accept the read I/O multiplier of the CTE approach.  You and I are willing to
swallow an architecture disruption, namely a tuplestore acting as a side
channel between executor nodes.  Given your NACK, I agree that it fails to
move us toward consensus and therefore does not gain much.  Alas.

> A more serious objection is that this forecloses (or at least makes
> much more complex) the future possibility of doing some grouping sets
> by sorting and others by hashing. The chained approach specifically
> allows for the future possibility of using a HashAggregate as the
> chain head, so that for example cube(a,b) can be implemented as a
> sorted agg for (a,b) and (a) and a hashed agg for (b) and (), allowing
> it to be done with one sort even if the result size for (a,b) is too
> big to hash.

That's a fair criticism, too.  Ingesting nodeSort.c into nodeAgg.c wouldn't be
too bad, because nodeSort.c is a thin wrapper around tuplesort.c.  Ingesting
nodeHash.c is not so tidy; that could entail extracting a module similar in
level to tuplesort.c, to be consumed by both executor nodes.  This does raise
the good point that the GROUPING SETS _design_ ought to consider group and
hash aggregation together.  Designing one in isolation carries too high of a
risk of painting the other into a corner.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compression of full-page-writes

2014-12-31 Thread Bruce Momjian

On Tue, Dec 30, 2014 at 01:27:44PM +0100, Andres Freund wrote:
> On 2014-12-30 21:23:38 +0900, Michael Paquier wrote:
> > On Tue, Dec 30, 2014 at 6:21 PM, Jeff Davis  wrote:
> > > On Fri, 2013-08-30 at 09:57 +0300, Heikki Linnakangas wrote:
> > >> Speeding up the CRC calculation obviously won't help with the WAL volume
> > >> per se, ie. you still generate the same amount of WAL that needs to be
> > >> shipped in replication. But then again, if all you want to do is to
> > >> reduce the volume, you could just compress the whole WAL stream.
> > >
> > > Was this point addressed?
> > Compressing the whole record is interesting for multi-insert records,
> > but as we need to keep the compressed data in a pre-allocated buffer
> > until WAL is written, we can only compress things within a given size
> > range. The point is, even if we define a  lower bound, compression is
> > going to perform badly with an application that generates for example
> > many small records that are just higher than the lower bound...
> > Unsurprisingly for small records this was bad:
> 
> So why are you bringing it up? That's not an argument for anything,
> except not doing it in such a simplistic way.

I still don't understand the value of adding WAL compression, given the
high CPU usage and minimal performance improvement.  The only big
advantage is WAL storage, but again, why not just compress the WAL file
when archiving.

I thought we used to see huge performance benefits from WAL compression,
but not any more?  Has the UPDATE WAL compression removed that benefit? 
Am I missing something?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_event_trigger_dropped_objects: Add name/args output columns

2014-12-31 Thread Alvaro Herrera

Tom Lane wrote:

> But is there any reason to think the failure on dunlin has anything to do
> with default ACLs?  I think you'd better work on understanding why there
> is a platform dependency here, before you consider either removing the
> regression test case or adding support for object types that it shouldn't
> be hitting.

Thanks for the commit.  Now dunlin is green, but treepie displayed the
failure, so we know it's not a platform dependency but probably a timing
dependency.  The failure from treepie is

*** 1323,1328 
--- 1323,1329 

  DROP SCHEMA testns CASCADE;
  NOTICE:  drop cascades to table testns.acltest1
+ ERROR:  requested object address for unsupported object class 28: text result 
"for role regressuser1 in schema testns on types"
  SELECT d.* -- check that entries went away
FROM pg_default_acl d LEFT JOIN pg_namespace n ON defaclnamespace = n.oid
WHERE nspname IS NULL AND defaclnamespace != 0;

where 28 is OCLASS_DEFACL, which is consistent with the text result.  I
have no idea why this is being invoked but it seems clear to me now that
we need to support this case.  I will work on that on Friday, and also
check whether we need to add the AMPROC/AMOP cases and USER_MAPPING.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_event_trigger_dropped_objects: Add name/args output columns

2014-12-31 Thread Tom Lane

I wrote:
> Alvaro Herrera  writes:
>> Alvaro Herrera wrote:
>>> pg_event_trigger_dropped_objects: Add name/args output columns

>> This is causing buildfarm member dunlin to fail:
>> ...
>> No other member so far has reported a problem here.  Not sure if that's
>> the strangest bit, or the fact that dunlin is reporting anything in the
>> first place.  I can reproduce the problem quite simply by creating an
>> event trigger on sql_drop and then try to drop an object not supported
>> by getObjectIdentityParts -- in this case it's a default ACL for tables
>> in the schema being dropped.

> But is there any reason to think the failure on dunlin has anything to do
> with default ACLs?

I substituted a more detailed version of the error message, which soon
confirmed your guess that this had something to do with default ACLs:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=treepie&dt=2014-12-31%2021%3A15%3A49

What seems to be happening is that the rowsecurity test installs an event
trigger, and if that happens to be active when the "DROP SCHEMA testns"
in privileges.sql executes, you get a failure because of the default
privileges attached to the schema.  The event trigger is very short-lived,
so the failure is hard to hit; nonetheless this is at least the third such
failure in the buildfarm.

I've "fixed" this by the expedient of making rowsecurity not run as part
of a parallel group.  We cannot have regression tests that trigger such
irreproducible failures.

We should not leave it there though.  The event triggers in rowsecurity
think they only fire for policy-related events, so how come we're seeing
getObjectIdentityParts invoked on a default ACL?  And if that is expected
behavior, how is it you figure it'd be OK to not have an implementation
for every possible object type?

In the long run it might be interesting to have a test module that runs
the entire core regression suite with some appropriate event trigger in
place.  We are clearly far away from being able to do that yet, though,
and in the meantime this is not how I want to find out about event-trigger
bugs.  Creating short-lifespan event triggers has to be something that
happens only in regression tests that run by themselves.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql tab completion: fix COMMENT ON ... IS IS IS

2014-12-31 Thread Ian Barwick

On 15/01/01 1:07, Robert Haas wrote:
> On Sun, Dec 28, 2014 at 7:44 PM, Ian Barwick  wrote:
>> Currently tab completion for 'COMMENT ON {object} foo IS' will result in the
>> 'IS'
>> being duplicated up to two times; not a world-shattering issue I know, but
>> the
>> fix is trivial and I stumble over it often enough to for it to mildly annoy
>> me.
>> Patch attached.
> 
> I've noticed that in the past, too.  Committed.

Thanks.

Regards

Ian Barwick


-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2014-12-31 Thread Josh Berkus

Heikki,

Thanks for getting back to this!  I really look forward to simplifying
WAL tuning for users.

>>> min_recycle_wal_size
>>> checkpoint_wal_size
>>>
>> 
>>
>>> These settings are fairly intuitive for a DBA to tune. You begin by
>>> figuring out how much disk space you can afford to spend on WAL, and set
>>> checkpoint_wal_size to that (with some safety margin, of course). Then
>>> you set checkpoint_timeout based on how long you're willing to wait for
>>> recovery to finish. Finally, if you have infrequent batch jobs that need
>>> a lot more WAL than the system otherwise needs, you can set
>>> min_recycle_wal_size to keep enough WAL preallocated for the spikes.
>>
>> We'll want to rename them to make it even *more* intuitive.
> 
> Sure, I'm all ears.

My suggestion:

max_wal_size
min_wal_size

... these would be very easy to read & understand for users:  "Set
max_wal_size based on the amount of space you have available for the
transaction log, or about 10% of the space available for your database
if you don't have a specific allocation for the log.  If your database
involves large batch imports, you may want to increase min_wal_size to
be at least the size of your largest batch."

Suggested defaults:

max_wal_size: 256MB
min_wal_size: 64MB

Please remind me because I'm having trouble finding this in the
archives: how does wal_keep_segments interact with the new settings?

>> But ... do I understand things correctly that checkpoint wouldn't "kick
>> in" until you hit checkpoint_wal_size?  If that's the case, isn't real
>> disk space usage around 2X checkpoint_wal_size if spread checkpoint is
>> set to 0.9?  Or does checkpoint kick in sometime earlier?
> 
> It kicks in earlier, so that the checkpoint *completes* just when
> checkpoint_wal_size of WAL is used up. So the real disk usage is
> checkpoint_wal_size.

Awesome.  This makes me very happy.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compression of full-page-writes

2014-12-31 Thread Amit Kapila

On Thu, Jan 1, 2015 at 2:39 AM, Bruce Momjian  wrote:
>
> On Tue, Dec 30, 2014 at 01:27:44PM +0100, Andres Freund wrote:
> > On 2014-12-30 21:23:38 +0900, Michael Paquier wrote:
> > > On Tue, Dec 30, 2014 at 6:21 PM, Jeff Davis  wrote:
> > > > On Fri, 2013-08-30 at 09:57 +0300, Heikki Linnakangas wrote:
> > > >> Speeding up the CRC calculation obviously won't help with the WAL
volume
> > > >> per se, ie. you still generate the same amount of WAL that needs
to be
> > > >> shipped in replication. But then again, if all you want to do is to
> > > >> reduce the volume, you could just compress the whole WAL stream.
> > > >
> > > > Was this point addressed?
> > > Compressing the whole record is interesting for multi-insert records,
> > > but as we need to keep the compressed data in a pre-allocated buffer
> > > until WAL is written, we can only compress things within a given size
> > > range. The point is, even if we define a  lower bound, compression is
> > > going to perform badly with an application that generates for example
> > > many small records that are just higher than the lower bound...
> > > Unsurprisingly for small records this was bad:
> >
> > So why are you bringing it up? That's not an argument for anything,
> > except not doing it in such a simplistic way.
>
> I still don't understand the value of adding WAL compression, given the
> high CPU usage and minimal performance improvement.  The only big
> advantage is WAL storage, but again, why not just compress the WAL file
> when archiving.
>
> I thought we used to see huge performance benefits from WAL compression,
> but not any more?

I think there can be performance benefit for the cases when the data
is compressible, but it would be loss otherwise.  The main thing is
that the current compression algorithm (pg_lz) used is not so
favorable for non-compresible data.

>Has the UPDATE WAL compression removed that benefit?

Good question,  I think there might be some impact due to that, but in
general for page level compression still there will be much more to
compress.

In general, I think this idea has merit with respect to compressible data,
and to save for the cases where it will not perform well, there is a on/off
switch for this feature and in future if PostgreSQL has some better
compression method, we can consider the same as well.  One thing
that we need to think is whether user's can decide with ease when to
enable this global switch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Compression of full-page-writes

2014-12-31 Thread Michael Paquier

On Thu, Jan 1, 2015 at 2:10 PM, Amit Kapila  wrote:
> On Thu, Jan 1, 2015 at 2:39 AM, Bruce Momjian  wrote:
>> > So why are you bringing it up? That's not an argument for anything,
>> > except not doing it in such a simplistic way.
>>
>> I still don't understand the value of adding WAL compression, given the
>> high CPU usage and minimal performance improvement.  The only big
>> advantage is WAL storage, but again, why not just compress the WAL file
>> when archiving.
When doing some tests with pgbench for a fixed number of transactions,
I also noticed a reduction in replay time as well, see here for
example some results here:
http://www.postgresql.org/message-id/CAB7nPqRv6RaSx7hTnp=g3dyqou++fel0uioyqpllbdbhayb...@mail.gmail.com

>> I thought we used to see huge performance benefits from WAL compression,
>> but not any more?
>
> I think there can be performance benefit for the cases when the data
> is compressible, but it would be loss otherwise.  The main thing is
> that the current compression algorithm (pg_lz) used is not so
> favorable for non-compresible data.
Yes definitely. Switching to a different algorithm would be the next
step forward. We have been discussing mainly about lz4 that has a
friendly license, I think that it would be worth studying other things
as well once we have all the infrastructure in place.

>>Has the UPDATE WAL compression removed that benefit?
>
> Good question,  I think there might be some impact due to that, but in
> general for page level compression still there will be much more to
> compress.
That may be a good thing to put a number on. We could try to patch a
build with a revert of a3115f0d and measure a bit that the difference
in WAL size that it creates. Thoughts?

> In general, I think this idea has merit with respect to compressible data,
> and to save for the cases where it will not perform well, there is a on/off
> switch for this feature and in future if PostgreSQL has some better
> compression method, we can consider the same as well.  One thing
> that we need to think is whether user's can decide with ease when to
> enable this global switch.
The opposite is true as well, we shouldn't force the user to have data
compressed even if the switch is disabled.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2014-12-31 Thread Amit Kapila

On Wed, Dec 31, 2014 at 7:50 PM, Thom Brown  wrote:
>
>
> When attempting to recreate the plan in your example, I get an error:
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # create table t1(c1 int, c2 char(500)) with (fillfactor=10);
> CREATE TABLE
> Time: 13.653 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # insert into t1 values(generate_series(1,100),'amit');
> INSERT 0 100
> Time: 4.796 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # explain select c1 from t1;
> ERROR:  could not register background process
> HINT:  You may need to increase max_worker_processes.
> Time: 1.659 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # show max_worker_processes ;
>  max_worker_processes
> --
>  8
> (1 row)
>
> Time: 0.199 ms
>
> # show parallel_seqscan_degree ;
>  parallel_seqscan_degree
> -
>  10
> (1 row)
>
>
> Should I really need to increase max_worker_processes to >=
parallel_seqscan_degree?

Yes, as the parallel workers are implemented based on dynamic
bgworkers, so it is dependent on max_worker_processes.

> If so, shouldn't there be a hint here along with the error message
pointing this out?  And should the error be produced when only a *plan* is
being requested?
>

I think one thing we could do minimize the chance of such an
error is set the value of parallel workers to be used for plan equal
to max_worker_processes if parallel_seqscan_degree is greater
than max_worker_processes.  Even if we do this, still such an
error can come if user has registered bgworker before we could
start parallel plan execution.

> Also, I noticed that where a table is partitioned, the plan isn't
parallelised:
>
>
> Is this expected?
>

Yes, to keep the initial implementation simple, it allows the
parallel plan when there is single table in query.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-31 Thread Abhijit Menon-Sen

Hi.

OK, here are the patches with the various suggestions applied.

I found that the alignment didn't seem to make much difference for the
CRC32* instructions, so I changed to process (len/8)*8bytes followed by
(len%8)*1bytes, the way the Linux kernel does.

-- Abhijit
>From 82de6abbc05afabbf575941743b0ee355d2888ed Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Tue, 30 Dec 2014 12:41:19 +0530
Subject: Implement slice-by-8 CRC calculation

The COMP_CRC32C macro now calls pg_comp_crc32c(), which processes eight
data bytes at a time. Its output is identical to the byte-at-a-time CRC
code. (This change does not apply to the LEGACY_CRC32 computation.)

Reviewers: Andres Freund, Heikki Linnakangas

Author: Abhijit Menon-Sen
---
 src/include/utils/pg_crc.h|   6 +-
 src/include/utils/pg_crc_tables.h | 594 +-
 src/port/pg_crc.c |  86 ++
 3 files changed, 619 insertions(+), 67 deletions(-)

diff --git a/src/include/utils/pg_crc.h b/src/include/utils/pg_crc.h
index f871cba..55934e5 100644
--- a/src/include/utils/pg_crc.h
+++ b/src/include/utils/pg_crc.h
@@ -41,6 +41,8 @@
 
 typedef uint32 pg_crc32;
 
+extern pg_crc32 pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len);
+
 /*
  * CRC calculation using the CRC-32C (Castagnoli) polynomial.
  *
@@ -51,7 +53,7 @@ typedef uint32 pg_crc32;
 #define INIT_CRC32C(crc) ((crc) = 0x)
 #define FIN_CRC32C(crc)	((crc) ^= 0x)
 #define COMP_CRC32C(crc, data, len)	\
-	COMP_CRC32_NORMAL_TABLE(crc, data, len, pg_crc32c_table)
+	((crc) = pg_comp_crc32c((crc), (char *) (data), (len)))
 #define EQ_CRC32C(c1, c2) ((c1) == (c2))
 
 /*
@@ -115,7 +117,7 @@ do {			  \
 } while (0)
 
 /* Constant tables for CRC-32C and CRC-32 polynomials */
-extern CRCDLLIMPORT const uint32 pg_crc32c_table[];
+extern CRCDLLIMPORT const uint32 pg_crc32c_table[8][256];
 extern CRCDLLIMPORT const uint32 pg_crc32_table[];
 
 #endif   /* PG_CRC_H */
diff --git a/src/include/utils/pg_crc_tables.h b/src/include/utils/pg_crc_tables.h
index cb6b470..707b363 100644
--- a/src/include/utils/pg_crc_tables.h
+++ b/src/include/utils/pg_crc_tables.h
@@ -28,71 +28,535 @@
  * This table is based on the so-called Castagnoli polynomial (the same
  * that is used e.g. in iSCSI).
  */
-const uint32 pg_crc32c_table[256] = {
-	0x, 0xF26B8303, 0xE13B70F7, 0x1350F3F4,
-	0xC79A971F, 0x35F1141C, 0x26A1E7E8, 0xD4CA64EB,
-	0x8AD958CF, 0x78B2DBCC, 0x6BE22838, 0x9989AB3B,
-	0x4D43CFD0, 0xBF284CD3, 0xAC78BF27, 0x5E133C24,
-	0x105EC76F, 0xE235446C, 0xF165B798, 0x030E349B,
-	0xD7C45070, 0x25AFD373, 0x36FF2087, 0xC494A384,
-	0x9A879FA0, 0x68EC1CA3, 0x7BBCEF57, 0x89D76C54,
-	0x5D1D08BF, 0xAF768BBC, 0xBC267848, 0x4E4DFB4B,
-	0x20BD8EDE, 0xD2D60DDD, 0xC186FE29, 0x33ED7D2A,
-	0xE72719C1, 0x154C9AC2, 0x061C6936, 0xF477EA35,
-	0xAA64D611, 0x580F5512, 0x4B5FA6E6, 0xB93425E5,
-	0x6DFE410E, 0x9F95C20D, 0x8CC531F9, 0x7EAEB2FA,
-	0x30E349B1, 0xC288CAB2, 0xD1D83946, 0x23B3BA45,
-	0xF779DEAE, 0x05125DAD, 0x1642AE59, 0xE4292D5A,
-	0xBA3A117E, 0x4851927D, 0x5B016189, 0xA96AE28A,
-	0x7DA08661, 0x8FCB0562, 0x9C9BF696, 0x6EF07595,
-	0x417B1DBC, 0xB3109EBF, 0xA0406D4B, 0x522BEE48,
-	0x86E18AA3, 0x748A09A0, 0x67DAFA54, 0x95B17957,
-	0xCBA24573, 0x39C9C670, 0x2A993584, 0xD8F2B687,
-	0x0C38D26C, 0xFE53516F, 0xED03A29B, 0x1F682198,
-	0x5125DAD3, 0xA34E59D0, 0xB01EAA24, 0x42752927,
-	0x96BF4DCC, 0x64D4CECF, 0x77843D3B, 0x85EFBE38,
-	0xDBFC821C, 0x2997011F, 0x3AC7F2EB, 0xC8AC71E8,
-	0x1C661503, 0xEE0D9600, 0xFD5D65F4, 0x0F36E6F7,
-	0x61C69362, 0x93AD1061, 0x80FDE395, 0x72966096,
-	0xA65C047D, 0x5437877E, 0x4767748A, 0xB50CF789,
-	0xEB1FCBAD, 0x197448AE, 0x0A24BB5A, 0xF84F3859,
-	0x2C855CB2, 0xDEEEDFB1, 0xCDBE2C45, 0x3FD5AF46,
-	0x7198540D, 0x83F3D70E, 0x90A324FA, 0x62C8A7F9,
-	0xB602C312, 0x44694011, 0x5739B3E5, 0xA55230E6,
-	0xFB410CC2, 0x092A8FC1, 0x1A7A7C35, 0xE811FF36,
-	0x3CDB9BDD, 0xCEB018DE, 0xDDE0EB2A, 0x2F8B6829,
-	0x82F63B78, 0x709DB87B, 0x63CD4B8F, 0x91A6C88C,
-	0x456CAC67, 0xB7072F64, 0xA457DC90, 0x563C5F93,
-	0x082F63B7, 0xFA44E0B4, 0xE9141340, 0x1B7F9043,
-	0xCFB5F4A8, 0x3DDE77AB, 0x2E8E845F, 0xDCE5075C,
-	0x92A8FC17, 0x60C37F14, 0x73938CE0, 0x81F80FE3,
-	0x55326B08, 0xA759E80B, 0xB4091BFF, 0x466298FC,
-	0x1871A4D8, 0xEA1A27DB, 0xF94AD42F, 0x0B21572C,
-	0xDFEB33C7, 0x2D80B0C4, 0x3ED04330, 0xCCBBC033,
-	0xA24BB5A6, 0x502036A5, 0x4370C551, 0xB11B4652,
-	0x65D122B9, 0x97BAA1BA, 0x84EA524E, 0x7681D14D,
-	0x2892ED69, 0xDAF96E6A, 0xC9A99D9E, 0x3BC21E9D,
-	0xEF087A76, 0x1D63F975, 0x0E330A81, 0xFC588982,
-	0xB21572C9, 0x407EF1CA, 0x532E023E, 0xA145813D,
-	0x758FE5D6, 0x87E466D5, 0x94B49521, 0x66DF1622,
-	0x38CC2A06, 0xCAA7A905, 0xD9F75AF1, 0x2B9CD9F2,
-	0xFF56BD19, 0x0D3D3E1A, 0x1E6DCDEE, 0xEC064EED,
-	0xC38D26C4, 0x31E6A5C7, 0x22B65633, 0xD0DDD530,
-	0x0417B1DB, 0xF67C32D8, 0xE52CC12C, 0x1747422F,
-	0x49547E0B, 0xBB3FFD08, 0xA86F0EFC, 0x5A048DFF,
-	0x8ECEE914, 0x7CA56A17, 0x6FF599E3, 0x9D9E1AE0,
-	0xD3D3E1AB, 0x21B862A8, 0x32E8915C, 0xC083125F,
-	0x14

Re: [HACKERS] add modulo (%) operator to pgbench

2014-12-31 Thread David Rowley

On 1 January 2015 at 04:02, Fabien COELHO  wrote:

>
>  I'm not sure about what Makefile changes could be necessary for
>> various environments, it looks ok to me as it is.
>>
>
> Found one. EXTRA_CLEAN is needed for generated C files. Added to this v3.
>
>
I was about to go and look at this, but I had a problem when attempting to
compile with MSVC.

The attached patch seems to fix it.

Regards

David Rowley


pgbench-expr-msvc_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

52 matches

Mail list logo