Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-14 Thread Etsuro Fujita

On 2016/02/10 4:16, Robert Haas wrote:

On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat
 wrote:

Thanks Jeevan for your review and comments. PFA the patch which fixes those.



Committed with a couple more small adjustments.


Thanks for working on this, Robert, Ashutosh, and everyone involved!

I happened to notice that this code in foreign_join_ok():

switch (jointype)
{
case JOIN_INNER:
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
   fpinfo_i->remote_conds);
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
   fpinfo_o->remote_conds);
break;

case JOIN_LEFT:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
  fpinfo_i->remote_conds);
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
   fpinfo_o->remote_conds);
break;

case JOIN_RIGHT:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
  fpinfo_o->remote_conds);
fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
   fpinfo_i->remote_conds);
break;

case JOIN_FULL:
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
  fpinfo_i->remote_conds);
fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
  fpinfo_o->remote_conds);
break;

default:
/* Should not happen, we have just check this above */
elog(ERROR, "unsupported join type %d", jointype);
}

would break the list fpinfo_i->remote_conds in the case of INNER JOIN or 
FULL JOIN.  You can see the list breakage from e.g., the following 
queries on an Assert-enabled build:


postgres=# create extension postgres_fdw;
CREATE EXTENSION
postgres=# create server myserver foreign data wrapper postgres_fdw 
options (dbname 'mydatabase');

CREATE SERVER
postgres=# create user mapping for current_user server myserver;
CREATE USER MAPPING
postgres=# create foreign table foo (a int) server myserver options 
(table_name 'foo');

CREATE FOREIGN TABLE
postgres=# create foreign table bar (a int) server myserver options 
(table_name 'bar');

CREATE FOREIGN TABLE
postgres=# create foreign table baz (a int) server myserver options 
(table_name 'baz');

CREATE FOREIGN TABLE
postgres=# select * from foo, bar, baz where foo.a = bar.a and bar.a = 
baz.a and foo.a < 10 and bar.a < 10 and baz.a < 10;


Attached is a patch to avoid the breakage.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 3488,3495  foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
  		case JOIN_INNER:
  			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
  			   fpinfo_i->remote_conds);
! 			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
! 			   fpinfo_o->remote_conds);
  			break;
  
  		case JOIN_LEFT:
--- 3488,3496 
  		case JOIN_INNER:
  			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
  			   fpinfo_i->remote_conds);
! 			if (fpinfo_o->remote_conds)
! fpinfo->remote_conds = list_concat(list_copy(fpinfo->remote_conds),
!    fpinfo_o->remote_conds);
  			break;
  
  		case JOIN_LEFT:
***
*** 3509,3516  foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
  		case JOIN_FULL:
  			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
  			  fpinfo_i->remote_conds);
! 			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
! 			  fpinfo_o->remote_conds);
  			break;
  
  		default:
--- 3510,3518 
  		case JOIN_FULL:
  			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
  			  fpinfo_i->remote_conds);
! 			if (fpinfo_o->remote_conds)
! fpinfo->joinclauses = list_concat(list_copy(fpinfo->joinclauses),
!   fpinfo_o->remote_conds);
  			break;
  
  		default:

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 5:21 AM, Robert Haas  wrote:
> On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO  wrote:
>> Here is a 3 part v29:
>>
>> a) add support for integer functions in pgbench, including turning
>>operators as functions, as well as some minimal infrastructure for
>>additional types (this allows to minimize the diffs with the next
>>patch which adds double).
>>
>> b) add support for doubles, including setting double variables.
>>Note that variable are not explicitely typed. Add some
>>double-related functions, most interesting of them for pgbench
>>are the randoms.
>>
>> c) remove \setrandom support (as thanks to part b \set x random(...) does
>>the same). Prints an error pointing to the replacement if \setrandom is
>>used in a pgbench script.
>
> Thanks, I'll review these as soon as I can get to it.

I got around to look at (a) in this set.

+   if ((PGBENCH_FUNCTIONS[fnumber].nargs >= 0 &&
+PGBENCH_FUNCTIONS[fnumber].nargs != elist_length(args)) ||
+   /* check at least one arg for min & max */
+   (PGBENCH_FUNCTIONS[fnumber].nargs == -1 &&
+elist_length(args) == 0))
+   expr_yyerror_more("unexpected number of arguments",
+ PGBENCH_FUNCTIONS[fnumber].fname);
We could split that into two parts, each one with a more precise error message:
- For functions that expect at least one argument: "at least one
argument was expected, none found".
- For functions that expect N arguments: "N arguments expected, but M found"

+   "\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n"
"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
}
 };

-
 /* Function prototypes */
Noise.

+ * Recursive evaluation of int or double expressions
+ *
+ * Note that currently only integer variables are available, with values
+ * stored as text.
This comment is incorrect, we only care about integers in this patch.

Taking patch 1 as a completely independent thing, there is no need to
introduce PgBenchValueType yet. Similar remark for setIntValue and
coerceToInt. They are useful afterwards when introducing double types
to be able to handle double input parameters for the gaussian and
other functions.

-   /*
-* INT64_MIN / -1 is problematic, since the result
-* can't be represented on a two's-complement machine.
-* Some machines produce INT64_MIN, some produce zero,
-* some throw an exception. We can dodge the problem
-* by recognizing that division by -1 is the same as
-* negation.
-*/
-   if (rval == -1)
+   if (coerceToInt(&rval) == -1)
{
-   *retval = -lval;
-
-   /* overflow check (needed for INT64_MIN) */
-   if (lval == PG_INT64_MIN)
-   {
-   fprintf(stderr, "bigint out of range\n");
-   return false;
-   }
+   setIntValue(retval, 0);
+   return true;
}
(INT64_MIN / -1) should error. (INT64_MIN % -1) should result in 0.
This is missing the division handling.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 3:20 PM, Michael Paquier
 wrote:
> On Sat, Feb 13, 2016 at 6:25 AM, Alvaro Herrera  
> wrote:
>> I noticed that pgbench calls FD_ISSET on a socket returned by
>> PQsocket() without first checking that it's not invalid.  I don't think
>> there's a real problem here because the socket was already checked a few
>> lines above, but I think applying the same coding pattern to both places
>> is cleaner.
>>
>> Any objections to changing it like this?  I'd probably backpatch to 9.5,
>> but no further (even though this pattern actually appears all the way
>> back.)
>
> Not really, +1 for consistency here, and this makes the code clearer.
>
> Different issues in the same area:
> 1) In vacuumdb.c, init_slot() does not check for the return value of 
> PQsocket():
> slot->sock = PQsocket(conn);
> 2) In isolationtester.c, try_complete_step() should do the same.
> 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem.
> I guess those ones should be fixed as well, no?

With a patch, that's even better.
-- 
Michael
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 832f9f9..b4325b0 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -360,6 +360,14 @@ StreamLogicalLog(void)
 			struct timeval timeout;
 			struct timeval *timeoutptr = NULL;
 
+			if (PQsocket(conn) < 0)
+			{
+fprintf(stderr,
+		_("%s: bad socket: \"%s\"\n"),
+		progname, strerror(errno));
+goto error;
+			}
+
 			FD_ZERO(&input_mask);
 			FD_SET(PQsocket(conn), &input_mask);
 
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index c6afcd5..e81123f 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -70,7 +70,7 @@ static void DisconnectDatabase(ParallelSlot *slot);
 
 static int	select_loop(int maxFd, fd_set *workerset, bool *aborting);
 
-static void init_slot(ParallelSlot *slot, PGconn *conn);
+static void init_slot(ParallelSlot *slot, PGconn *conn, const char *progname);
 
 static void help(const char *progname);
 
@@ -421,14 +421,14 @@ vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
 	 * array contains the connection.
 	 */
 	slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * concurrentCons);
-	init_slot(slots, conn);
+	init_slot(slots, conn, progname);
 	if (parallel)
 	{
 		for (i = 1; i < concurrentCons; i++)
 		{
 			conn = connectDatabase(dbname, host, port, username, prompt_password,
    progname, false, true);
-			init_slot(slots + i, conn);
+			init_slot(slots + i, conn, progname);
 		}
 	}
 
@@ -917,11 +917,18 @@ select_loop(int maxFd, fd_set *workerset, bool *aborting)
 }
 
 static void
-init_slot(ParallelSlot *slot, PGconn *conn)
+init_slot(ParallelSlot *slot, PGconn *conn, const char *progname)
 {
 	slot->connection = conn;
 	slot->isFree = true;
 	slot->sock = PQsocket(conn);
+
+	if (slot->sock < 0)
+	{
+		fprintf(stderr, _("%s: bad socket: %s\n"), progname,
+strerror(errno));
+		exit(1);
+	}
 }
 
 static void
diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c
index 0a9d25c..3e13a39 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -720,6 +720,12 @@ try_complete_step(Step *step, int flags)
 	PGresult   *res;
 	bool		canceled = false;
 
+	if (sock < 0)
+	{
+		fprintf(stderr, "bad socket: %s\n", strerror(errno));
+		exit_nicely();
+	}
+
 	gettimeofday(&start_time, NULL);
 	FD_ZERO(&read_set);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket

2016-02-14 Thread Michael Paquier
On Sat, Feb 13, 2016 at 6:25 AM, Alvaro Herrera  wrote:
> I noticed that pgbench calls FD_ISSET on a socket returned by
> PQsocket() without first checking that it's not invalid.  I don't think
> there's a real problem here because the socket was already checked a few
> lines above, but I think applying the same coding pattern to both places
> is cleaner.
>
> Any objections to changing it like this?  I'd probably backpatch to 9.5,
> but no further (even though this pattern actually appears all the way
> back.)

Not really, +1 for consistency here, and this makes the code clearer.

Different issues in the same area:
1) In vacuumdb.c, init_slot() does not check for the return value of PQsocket():
slot->sock = PQsocket(conn);
2) In isolationtester.c, try_complete_step() should do the same.
3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem.
I guess those ones should be fixed as well, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-02-14 Thread Rushabh Lathia
On Fri, Feb 12, 2016 at 5:40 PM, Etsuro Fujita 
wrote:

> Hi Rushabh and Thom,
>
> Thanks for the review!
>
> On 2016/02/10 22:37, Thom Brown wrote:
>
>> On 10 February 2016 at 08:00, Rushabh Lathia 
>> wrote:
>>
>>> Fujita-san, I am attaching update version of the patch, which added
>>> the documentation update.
>>>
>>
> Thanks for updating the patch!
>
> Once we finalize this, I feel good with the patch and think that we
>>> could mark this as ready for committer.
>>>
>>
> I find this wording a bit confusing:
>>
>> "If the IsForeignRelUpdatable pointer is set to NULL, foreign tables
>> are assumed to be insertable, updatable, or deletable either the FDW
>> provides ExecForeignInsert,ExecForeignUpdate, or ExecForeignDelete
>> respectively or if the FDW optimizes a foreign table update on a
>> foreign table using PlanDMLPushdown (PlanDMLPushdown still needs to
>> provide BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown to
>> execute the optimized update.)."
>>
>> This is a very long sentence, and the word "either" doesn't work here.
>>
>
> Agreed.
>
> As a result of our discussions, we reached a conclusion that the DML
> pushdown APIs should be provided together with existing APIs such as
> ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete, IIUC.  So, how
> about (1) leaving the description for the existing APIs as-is and (2)
> adding a new description for the DML pushdown APIs in parenthesis, like
> this?:
>
>  If the IsForeignRelUpdatable pointer is set to
>  NULL, foreign tables are assumed to be insertable,
> updatable,
>  or deletable if the FDW provides ExecForeignInsert,
>  ExecForeignUpdate, or ExecForeignDelete
>  respectively.
>  (If the FDW attempts to optimize a foreign table update, it still
>  needs to provide PlanDMLPushdown, BeginDMLPushdown,
>  IterateDMLPushdown and EndDMLPushdown.)
>
> Actually, if the FDW provides the DML pushdown APIs, (pushdown-able)
> foreign table updates can be done without ExecForeignInsert,
> ExecForeignUpdate or ExecForeignDelete.  So, the above docs are not
> necessarily correct.  But we don't recommend to do that without the
> existing APIs, so I'm not sure it's worth complicating the docs.


Adding a new description for DML pushdown API seems good idea. I would
suggest to add that as separate paragraph rather then into brackets.


>


>
> Also:
>>
>> "When the query doesn't has the clause, the FDW must also increment
>> the row count for the ForeignScanState node in the EXPLAIN ANALYZE
>> case."
>>
>> Should read "doesn't have"
>>
>
> Will fix.
>
> Best regards,
> Etsuro Fujita
>
>
>


-- 
Rushabh Lathia


Re: [HACKERS] Refectoring of receivelog.c

2016-02-14 Thread Craig Ringer
On 15 February 2016 at 04:48, Magnus Hagander  wrote:

> I was working on adding the tar streaming functionality we talked about at
> the developer meeting to pg_basebackup, and rapidly ran across the issue
> that Andres has been complaining about for a while. The code in
> receivelog.c just passes an insane number of parameters around. Adding or
> changing even a small thing ends up touching a huge number of places.
>


Other than the lack of comments on the fields in StreamCtl to indicate
their functions I think this looks good.

I didn't find any mistakes, but I do admit my eyes started glazing over
after a bit.

I'd rather not have StreamCtl as a typedef of an anonymous struct if it's
exposed in a header though. It should have a name that can be used in
forward declarations etc.

After recently working with the XLogReader I can't emphasise enough how
important *useful* comments on the fields in these sorts of structs are -
the XLogReaderState has ReadRecPtr, readSegNo + readOff + readLen,
currRecPtr AND latestPagePtr. Which actually do have comments, just not
super helpful ones.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2016-02-14 Thread Kyotaro HORIGUCHI
Hello, thank you for reviewing this.

> On Thu, Jan 7, 2016 at 3:36 AM, Kyotaro HORIGUCHI
>  wrote:
> > - 0001-Prepare-for-sharing-psqlscan-with-pgbench.patch
> >
> >   This diff looks a bit large but most of them is cut'n-paste
> >   work and the substantial change is rather small.
> >
> >   This refactors psqlscan.l into two .l files. The additional
> >   psqlscan_slash.l is a bit tricky in the point that recreating
> >   scan_state on transition between psqlscan.l.
>
> I've looked at this patch a few times now but find it rather hard to
> verify.  I am wondering if you would be willing to separate 0001 into
> subpatches.  For example, maybe there could be one or two patches that
> ONLY move code around and then one or more patches that make the
> changes to that code.  Right now, for example, psql_scan_setup() is
> getting three additional arguments, but it's also being moved to a
> different file.  Perhaps those two things could be done one at a time.

I tried to split it into patches with some meaningful (I thought)
steps, but I'll arrange them if it is not easy to read.

> I also think this patch could really benefit from a detailed set of
> submission notes that specifically lay out why each change was made
> and why.  For instance, I see that psqlscan.l used yyless() while
> psqlscanbody.l uses a new my_yyless() you've defined.  There is
> probably a great reason for that and I'm sure if I stare at this for
> long enough I can figure out what that reason is, but it would be
> better if you had a list of bullet points explaining what was changed
> and why.

I'm sorry, but I didn't understood the 'submission notes' exactly
means. Is it precise descriptions in source comments? or commit
message of git-commit?

> I would really like to see this patch committed; my problem is that I
> don't have enough braincells to be sure that it's correct in the
> present form.

Thank you. I'll send the rearranged patch sooner.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
> Surprizingly yes. The list is handled as an identifier list and
> parsed by SplitIdentifierString thus it can accept double-quoted
> names.

Good point. I was not aware of this trick.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-02-14 Thread Etsuro Fujita

On 2016/02/12 21:46, Robert Haas wrote:

On Fri, Feb 12, 2016 at 7:19 AM, Etsuro Fujita
 wrote:

I think that displaying target lists would be confusing for users.  Here is
an example:

EXPLAIN (verbose, costs off)
DELETE FROM rem1; -- can be pushed down
  QUERY PLAN
-
  Delete on public.rem1
->  Foreign Delete on public.rem1
  Output: ctid
  Remote SQL: DELETE FROM public.loc1
(4 rows)

Should we output the "Output" line?



I see your point, but what if there's a RETURNING clause?


IMO I think that would be confusing in that case.  Here is an example:

EXPLAIN (verbose, costs off)
DELETE FROM rem1 RETURNING *; -- can be pushed down
  QUERY PLAN
--
 Delete on public.rem1
   Output: f1, f2
   ->  Foreign Delete on public.rem1
 Output: ctid
 Remote SQL: DELETE FROM public.loc1 RETURNING f1, f2
(5 rows)

The Output line beneath the ForeignScan node doesn't match the RETURNING 
expressions in the remote query as the Output line beneath the 
ModifyTable node does, so I think displaying that would be confusing 
even in that case.


Another example:

postgres=# explain verbose update foo set a = a + 1 returning *;
  QUERY PLAN
--
 Update on public.foo  (cost=100.00..137.50 rows=1000 width=10)
   Output: a
   ->  Foreign Update on public.foo  (cost=100.00..137.50 rows=1000 
width=10)

 Output: (a + 1), ctid
 Remote SQL: UPDATE public.foo SET a = (a + 1) RETURNING a
(5 rows)

Same above.

As for case of INSERT .. RETURNING .., I guess there is not such a 
mismatch, but I'm not sure that displaying that is that helpful, 
honestly, so I'd vote for suppressing that in all cases, for consistency.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect formula for SysV IPC parameters

2016-02-14 Thread Kyotaro HORIGUCHI
Thanks for looking at this.

At Fri, 12 Feb 2016 23:19:45 +0900, Fujii Masao  wrote 
in 
> >> ISTM that you also need to change the descriptions about SEMMNI and SEMMNS
> >> under the Table 17-1.
> >
> > Oops! Thank you for pointing it out.
> >
> > The original description doesn't mention the magic-number ('5' in
> > the above formulas, which previously was '4') so I haven't added
> > anything about it.
> >
> > Process of which the number is limited by max_worker_processes is
> > called 'background process' (not 'backend worker') in the
> > documentation so I used the name to mention it in the additional
> > description.
> >
> > The difference in the body text for 9.2, 9.3 is only a literal
> > '4' to '5' in the formula.
> 
> Thanks for updating the patches!
> 
> They look good to me except that the formulas don't include the number of
> background processes requesting shared memory access, i.e.,
> GetNumShmemAttachedBgworkers(), in 9.3. Isn't it better to use the following
> formula in 9.3?
> 
>   ceil((max_connections + autovacuum_max_workers + number of
> background proceses + 5) / 16)
> 
> Attached patch uses the above formula for 9.3. I'm thinking to push your
> patches to 9.2, 9.4, 9.5, master, also push the attached one to 9.3.
> Comments?

One concern is that users don't have any simple way to know how
many bg-workers a server runs in their current configuration.
The formula donsn't make sense without it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-14 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 10 Feb 2016 18:36:43 +0900, Michael Paquier  
wrote in 
> On Wed, Feb 10, 2016 at 5:34 PM, Kyotaro HORIGUCHI
>  wrote:
> > > > +sync_node_group:
> > > > +   sync_list   { $$ = create_group_node(1, 
> > > > $1);
> > > > }
> > > > +   |   sync_element_ast{ $$ = create_group_node(1,
> > > > $1);}
> > > > +   |   INT '[' sync_list ']'   { $$ = create_group_node($1,
> > > > $3);}
> > > > +   |   INT '[' sync_element_ast ']'{ $$ = create_group_node($1,
> > > > $3); }
> > > > We may want to be careful with the use of '[' in application_name. I am 
> > > > not
> > > > much thrilled with forbidding the use of []() in application_name, so 
> > > > we may
> > > > want to recommend user to use a backslash when using s_s_names when a 
> > > > group
> > > > is defined.
> >
> > Mmmm. I found that application_name can contain
> > commas. Furthermore, there seems to be no limitation for
> > character in the name.
> >
> > postgres=# set application_name='ho,ge';
> > postgres=# select application_name from pg_stat_activity;
> >  application_name
> > --
> >  ho,ge
> >
> > check_application_name() allows all characters in the range
> > between 32 to 126 in ascii. All other characters are replaced
> > with '?'.
> 
> Actually I was thinking about that a couple of hours ago. If the
> application_name of a node has a comma, it cannot become a sync
> replica, no? Wouldn't we need a special handling in s_s_names like
> '\,' make a comma part of an application name? Or just ban commas from
> the list of supported characters in the application name?

Surprizingly yes. The list is handled as an identifier list and
parsed by SplitIdentifierString thus it can accept deouble-quoted
names.

s_s_names='abc, def, " abc,""def"'

Result list is ["abc", "def", " abc,\"def"]

Simplly supporting the same notation addresses the problem and
accepts strings like the following.

s_s_names='2["comma,name", "foo[bar,baz]"]'


It is currently an undocumented behavior but I doubt the
necessity to have an explict mention.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-02-14 Thread Peter Geoghegan
On Sun, Feb 7, 2016 at 4:50 PM, Peter Geoghegan  wrote:
>> I'm not even sure this is necessary. The idea of missing out on
>> producing a single sorted run sounds bad but in practice since we
>> normally do the final merge on the fly there doesn't seem like there's
>> really any difference between reading one tape or reading two or three
>> tapes when outputing the final results. There will be the same amount
>> of I/O happening and a 2-way or 3-way merge for most data types should
>> be basically free.
>
> I basically agree with you, but it seems possible to fix the
> regression (generally misguided though those regressed cases are).
> It's probably easiest to just fix it.

Here is a benchmark on my laptop:

$ pgbench -i -s 500 --unlogged

This results in a ~1GB accounts PK:

postgres=# \di+ pgbench_accounts_pkey
List of relations
─[ RECORD 1 ]──
Schema  │ public
Name│ pgbench_accounts_pkey
Type│ index
Owner   │ pg
Table   │ pgbench_accounts
Size│ 1071 MB
Description │

The query I'm testing is: "reindex index pgbench_accounts_pkey;"

Now, with a maintenance_work_mem of 5MB, the most recent revision of
my patch takes about 54.2 seconds to complete this, as compared to
master's 44.4 seconds. So, clearly a noticeable regression there of
just under 20%. I did not see a regression with a 5MB
maintenance_work_mem when pgbench scale was 100, though. And, with the
default maintenance_work_mem of 64MB, it's a totally different story
-- my patch takes about 28.3 seconds, whereas master takes 48.5
seconds (i.e. longer than with 5MB). My patch needs a 56-way final
merge with the 64MB maintenance_work_mem case, and 47 distinct merge
steps, plus a final on-the-fly merge for the 5MB maintenance_work_mem
case. So, a huge amount of merging, but RS still hardly pays for
itself. With the regressed case for my patch, we finish sorting *runs*
about 15 seconds in to a 54.2 second operation -- very early. So it
isn't "quicksort vs replacement selection", so much as "polyphase
merge vs replacement selection". There is a good reason to think that
we can make progress on fixing that regression by doubling down on the
general strategy of improving cache characteristics, and being
cleverer about memory use during non-final merging, too.

I looked at what it would take to make the heap a smaller part of
memtuples, along the lines Robert and I talked about, and I think it's
non-trivial because it needs to make the top of the heap something
other than memtuples[0]. I'd need to change the heap code, which
already has 3 reasons for existing (RS, merging, and top-N heap). I'll
find it really hard to justify the effort, and especially the risk of
adding bugs, for a benefit that there is *scant* evidence for. My
guess is that the easiest, and most sensible way to fix the ~20%
regression seen here is to introduce batch memory allocation to
non-final merge steps, which is where most time was spent. (For
simplicity, that currently only happens during the final merge phase,
but I could revisit that -- seems not that hard).

Now, I accept that the cost model has to go. So, what I think would be
best is if we still added a GUC, like the replacement_sort_mem
suggestion that Robert made. This would be a threshold for using what
is currently called "quicksort with spillover". There'd be no cost
model. Jeff Janes also suggested something like this.

The only regression that I find concerning is the one reported by Jeff
Janes [1]. That didn't even involve a correlation, though, so no
reason to think that it would be at all helped by what Robert and I
talked about. It seemed like the patch happened to have the effect of
tickling a pre-existing problem with polyphase merge -- what Jeff
called an "anti-sweetspot". Jeff had a plausible theory for why that
is.

So, what if we try to fix polyphase merge? That would be easier. We
could look at the tape buffer size, and the number of tapes, as well
as memory access patterns. We might even make more fundamental changes
to polyphase merge, since we don't use the more advanced variant that
I think correlation is a red herring. Knuth suggests that his
algorithm 5.4.3, cascade merge, has more efficient distribution of
runs.

The bottom line is that there will always be some regression
somewhere. I'm not sure what the guiding principle for when that
becomes unacceptable is, but you did seem sympathetic to the idea that
really low work_mem settings (e.g. 4MB) with really big inputs were
not too concerning [2]. I'm emphasizing Jeff's case now because I,
like you [2], am much more worried about maintenance_work_mem default
cases with regressions than anything else, and that *was* such a case.

Like Jeff Janes, I don't care about his other regression of about 5%
[3], which involved a 4MB work_mem + 100 million tuples. The other
case (the one I do care about) was 64MB +  400 million tuples, and was
a much bigger regression, which is suggestive of the unpredictable

Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 10:51 AM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>> > Why do we need pg_shadow or pg_user or related views at all..?
>>
>> A lot of code looks at those just to get usernames.  I am not in favor of
>> breaking such stuff without need.
>
> Alright.
>
>> How about we just say that the password in these old views always reads
>> out as '' even when there is a password, and we invent new views
>> that carry real auth information?  (Hopefully in an extensible way.)
>
> I'd be alright with that approach, I'd just rather that any clients
> which actually want to read the password field be updated to look at the
> extensible and sensible base catalogs, and not some hacked up array that
> we shoved into that field.

Well, then let's mask it, and just have pg_auth_verifiers. Another
possible problem that I can see with this patch is what do we do with
valid_until? The last set of patches sent did not switch this field to
be per-verifier settable. I would consider a saner approach to keep
things simple and still do that. Allowing multiple verifiers per
protocol is a problem, and having a solution for it would be nice.
Should this be prioritized before having more protocols like SCRAM?

FWIW, browsing through pgbouncer, it has a look at pg_shadow for
user's password to build a basic configuration file.

(My mistake, while pg_user is world-readable, that's not the case of pg_shadow).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> How about we just say that the password in these old views always reads
>> out as '' even when there is a password, and we invent new views
>> that carry real auth information?  (Hopefully in an extensible way.)

> I'd be alright with that approach, I'd just rather that any clients
> which actually want to read the password field be updated to look at the
> extensible and sensible base catalogs, and not some hacked up array that
> we shoved into that field.

Yeah, I'm good with that.  I just don't think the breakage needs to extend
to clients that aren't trying to read auth-related information.

BTW, if we haven't learned this lesson by now: I'm pretty sure that every
single one of these views is an attempt to emulate what *used* to be the
real base catalog, in some previous release.  Maybe we should stop
expecting clients to read the real catalog, ever, in favor of a sanitized
view?  Although I don't know exactly what that would lead to in terms of
what we'd expose that's different from what the base catalog is.  But it's
worth thinking about whether there is a way to avoid having this same
discussion again in five or ten years.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Stephen Frost
* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Mon, Feb 15, 2016 at 10:23 AM, Stephen Frost  wrote:
> > I would start by pointing out that pg_user currently uses pg_shadow..
> > Why do we need pg_shadow or pg_user or related views at all..?
> 
> pg_user/pg_shadow have the advantage to be world-readable and mask
> password values.

New views would have that same advantage, should we implement them that
way.  Tom's approach is also workable though, where we make the existing
views have a reducaed charter, which is mainly around providing user
lists and simply not include any info about password verifiers or the
like.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Why do we need pg_shadow or pg_user or related views at all..?
> 
> A lot of code looks at those just to get usernames.  I am not in favor of
> breaking such stuff without need.

Alright.

> How about we just say that the password in these old views always reads
> out as '' even when there is a password, and we invent new views
> that carry real auth information?  (Hopefully in an extensible way.)

I'd be alright with that approach, I'd just rather that any clients
which actually want to read the password field be updated to look at the
extensible and sensible base catalogs, and not some hacked up array that
we shoved into that field.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 10:23 AM, Stephen Frost  wrote:
> I would start by pointing out that pg_user currently uses pg_shadow..
> Why do we need pg_shadow or pg_user or related views at all..?

pg_user/pg_shadow have the advantage to be world-readable and mask
password values.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Tom Lane
Stephen Frost  writes:
> Why do we need pg_shadow or pg_user or related views at all..?

A lot of code looks at those just to get usernames.  I am not in favor of
breaking such stuff without need.

How about we just say that the password in these old views always reads
out as '' even when there is a password, and we invent new views
that carry real auth information?  (Hopefully in an extensible way.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> It seems to me that applications are going to need a refresh anyway...

Indeed.

> Among the other possibilities I can foresee:
> - Add a column "protocol" to pg_shadow and produce one row per
> protocol, so one user will be listed for all the protocol it has. Any
> application could then filter out things with an additional WHERE
> clause.
> - Nuke passwd from pg_shadow and have a new view pg_shadow_verifiers
> made of the user OID, the protocol and the verifier. This maps quite
> well with pg_auth_verifiers.
> - Give up and nuke pg_shadow, which is here for compatibility down to
> 8.1, and add a protocol column to pg_user, or even better create a new
> view pg_user_verifiers that has all the data of all the protocols. If
> we care a lot about backward-compatibility, pg_user could as well map
> with pg_auth_verifiers with the md5 protocol.
> I would go with the last one.

I would start by pointing out that pg_user currently uses pg_shadow..

Why do we need pg_shadow or pg_user or related views at all..?
Applications will need to be updated, we might as well simply nuke them
and expect applications to use the new catalogs.  Perhaps there is a
useful view or two which we can provide over the new catalogs, but I'd
rather consider how to create brand new, useful, views over the new
catalogs than consider any kind of way to provides backwards
compatible-ish views.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 9:56 AM, Stephen Frost  wrote:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> We'd need as well to switch pg_shadow to have an array of elements
>> made of protocol:identifier instead of a single password field. There
>> can be only one valid identifier per protocol, user and valid_until
>> for a single point in time, and I can't believe that we should
>> recommend only one authentication protocol per single major version of
>> Postgres.
>
> Ugh, that sounds pretty grotty to me.
>
> Applications which consider these fields will need to be updated, one
> way or the other, and I'd much rather they be updated to work with
> reasonable structures rather than something we've hacked together in
> some faint hope that it'd be useful.  An array in pg_shadow for a field
> which used to be a text field does *not* sound like a simpler solution
> to me, and I'd rather simply do away with those views entirely, or at
> least nuke the fields which are at issue, than try to come up with
> something between wholesale change and no change that ends up being
> worse than both.

It seems to me that applications are going to need a refresh anyway...
Among the other possibilities I can foresee:
- Add a column "protocol" to pg_shadow and produce one row per
protocol, so one user will be listed for all the protocol it has. Any
application could then filter out things with an additional WHERE
clause.
- Nuke passwd from pg_shadow and have a new view pg_shadow_verifiers
made of the user OID, the protocol and the verifier. This maps quite
well with pg_auth_verifiers.
- Give up and nuke pg_shadow, which is here for compatibility down to
8.1, and add a protocol column to pg_user, or even better create a new
view pg_user_verifiers that has all the data of all the protocols. If
we care a lot about backward-compatibility, pg_user could as well map
with pg_auth_verifiers with the md5 protocol.
I would go with the last one.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Mon, Feb 15, 2016 at 9:17 AM, Stephen Frost  wrote:
> > That said, per various discussions, we'd really want a more-or-less
> > fully formed patch to land prior to the last CF, for this to have any
> > chance.  Perhaps that means it's not going to happen, which would be
> > unfortunate, but it's not beyond the possible, in my view, at least.
> 
> Well, I could send a rebased patch with the new things proposed
> upthread, and with things split in as many patches as I can get out,
> roughly:
> 1) One patch for the catalog split
> 2) One for the GUC param controlling recommended protocols
> 3) One for the pg_upgrade function cleaning up automatically the
> entries of unsupported protocols
> 4) SCRAM on top of the rest, which is at more or less 75% something
> that Heikki produced.

That sounds like a pretty reasonable split, to me at least.

> >> > In addition, I would prefer to maintain the current structure of the
> >> > pg_authid table and use the rolpassword and rolvaliduntil columns to
> >> > store only the md5 verifier which will also be stored in
> >> > pg_auth_verifiers.  This would provide a smoother migration path with
> >> > the idea that rolpassword and rolvaliduntil will be removed from
> >> > pg_authid in a future version of Postgres.
> >>
> >> Actually, I am of the opinion that both rolpassword and rolvaliduntil
> >> should be directly part of pg_auth_verifiers. Being able to handle
> >> multiple verifiers for the same protocol is a feature that is being
> >> asked for with a given password handling policy (was pinged again
> >> about that in Moscow last week). Rolling in new verifiers needs now
> >> extra roles to be created.
> >
> > I'm on Michael's side here.  I don't believe it makes sense to try and
> > maintain the exact structure of pg_authid.  We are certainly happy to
> > whack around the other catalogs, and I'm unimpressed with my prior
> > efforts to provide backwards-compatible catalog (see pg_user, et al) for
> > just a few releases- we seem unable to get rid of them now, even though
> > we should have years ago, really.  Indeed, I'd be just as happy if we
> > got rid of them during this work..
> 
> We'd need as well to switch pg_shadow to have an array of elements
> made of protocol:identifier instead of a single password field. There
> can be only one valid identifier per protocol, user and valid_until
> for a single point in time, and I can't believe that we should
> recommend only one authentication protocol per single major version of
> Postgres.

Ugh, that sounds pretty grotty to me.

Applications which consider these fields will need to be updated, one
way or the other, and I'd much rather they be updated to work with
reasonable structures rather than something we've hacked together in
some faint hope that it'd be useful.  An array in pg_shadow for a field
which used to be a text field does *not* sound like a simpler solution
to me, and I'd rather simply do away with those views entirely, or at
least nuke the fields which are at issue, than try to come up with
something between wholesale change and no change that ends up being
worse than both.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Michael Paquier
On Mon, Feb 15, 2016 at 9:17 AM, Stephen Frost  wrote:
> Michael,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Sat, Feb 13, 2016 at 3:05 AM, David Steele  wrote:
>> > On 11/16/15 8:53 AM, Michael Paquier wrote:
>> >> On Sat, Sep 5, 2015 at 9:31 AM, Bruce Momjian wrote:
>> >>> On Fri, Sep  4, 2015 at 04:51:33PM -0400, Stephen Frost wrote:
>> > Coming in late, but can you explain how multiple passwords allow for
>> > easier automated credential rotation?  If you have five applications
>> > with stored passwords, I imagine you can't change them all at once, so
>> > with multiples you could change it on one, then go to the others and
>> > change it there, and finally, remove the old password.  Is that the
>> > process?  I am not realizing that without multiple plasswords, this is 
>> > a
>> > hard problem.
>>  That's exactly the process if multiple passwords can be used.  If
>>  there's only one account and one password supported then you have to
>>  change all the systems all at once and that certainly can be a hard
>>  problem.
>> 
>>  One way to deal with this is to have a bunch of different accounts, but
>>  that's certainly not simple either and can get quite painful.
>> >>> OK, for me, if we can explain the benefit for users, it seems worth
>> >>> doing just to allow that.
>> >> Reviving an old thread for a patch still registered in this commit
>> >> fest to make the arguing move on.
>> >
>> > I was wondering if this patch was going to be submitted for the 2016-03 CF?
>>
>> For 9.6 I am afraid this is too late, per the rule that there cannot
>> be huge patches for the last CF of a development cycle. But I have
>> plans for this set of features afterwards with the first CF of 9.7 and
>> I was planning to talk about it at PgCon's unconference if I can get
>> there to gather some feedback. There is still cruel need for it on my
>> side..
>
> There's a lot of good reason to get SCRAM added as a protocol,
> considering our current password-based implementation is rather..
> lacking.  Regarding the specific comment about the timing, that rule is
> specifically to prevent large patches from landing in the last CF
> without any prior discussion or review, as I recall, so I'm not sure it
> really applies here as there's been quite a bit of discussion and review
> already.

Honestly I don't know what to answer to that.

> That said, per various discussions, we'd really want a more-or-less
> fully formed patch to land prior to the last CF, for this to have any
> chance.  Perhaps that means it's not going to happen, which would be
> unfortunate, but it's not beyond the possible, in my view, at least.

Well, I could send a rebased patch with the new things proposed
upthread, and with things split in as many patches as I can get out,
roughly:
1) One patch for the catalog split
2) One for the GUC param controlling recommended protocols
3) One for the pg_upgrade function cleaning up automatically the
entries of unsupported protocols
4) SCRAM on top of the rest, which is at more or less 75% something
that Heikki produced.

>> > In addition, I would prefer to maintain the current structure of the
>> > pg_authid table and use the rolpassword and rolvaliduntil columns to
>> > store only the md5 verifier which will also be stored in
>> > pg_auth_verifiers.  This would provide a smoother migration path with
>> > the idea that rolpassword and rolvaliduntil will be removed from
>> > pg_authid in a future version of Postgres.
>>
>> Actually, I am of the opinion that both rolpassword and rolvaliduntil
>> should be directly part of pg_auth_verifiers. Being able to handle
>> multiple verifiers for the same protocol is a feature that is being
>> asked for with a given password handling policy (was pinged again
>> about that in Moscow last week). Rolling in new verifiers needs now
>> extra roles to be created.
>
> I'm on Michael's side here.  I don't believe it makes sense to try and
> maintain the exact structure of pg_authid.  We are certainly happy to
> whack around the other catalogs, and I'm unimpressed with my prior
> efforts to provide backwards-compatible catalog (see pg_user, et al) for
> just a few releases- we seem unable to get rid of them now, even though
> we should have years ago, really.  Indeed, I'd be just as happy if we
> got rid of them during this work..

We'd need as well to switch pg_shadow to have an array of elements
made of protocol:identifier instead of a single password field. There
can be only one valid identifier per protocol, user and valid_until
for a single point in time, and I can't believe that we should
recommend only one authentication protocol per single major version of
Postgres.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2016-02-14 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Sat, Feb 13, 2016 at 3:05 AM, David Steele  wrote:
> > On 11/16/15 8:53 AM, Michael Paquier wrote:
> >> On Sat, Sep 5, 2015 at 9:31 AM, Bruce Momjian wrote:
> >>> On Fri, Sep  4, 2015 at 04:51:33PM -0400, Stephen Frost wrote:
> > Coming in late, but can you explain how multiple passwords allow for
> > easier automated credential rotation?  If you have five applications
> > with stored passwords, I imagine you can't change them all at once, so
> > with multiples you could change it on one, then go to the others and
> > change it there, and finally, remove the old password.  Is that the
> > process?  I am not realizing that without multiple plasswords, this is a
> > hard problem.
>  That's exactly the process if multiple passwords can be used.  If
>  there's only one account and one password supported then you have to
>  change all the systems all at once and that certainly can be a hard
>  problem.
> 
>  One way to deal with this is to have a bunch of different accounts, but
>  that's certainly not simple either and can get quite painful.
> >>> OK, for me, if we can explain the benefit for users, it seems worth
> >>> doing just to allow that.
> >> Reviving an old thread for a patch still registered in this commit
> >> fest to make the arguing move on.
> >
> > I was wondering if this patch was going to be submitted for the 2016-03 CF?
> 
> For 9.6 I am afraid this is too late, per the rule that there cannot
> be huge patches for the last CF of a development cycle. But I have
> plans for this set of features afterwards with the first CF of 9.7 and
> I was planning to talk about it at PgCon's unconference if I can get
> there to gather some feedback. There is still cruel need for it on my
> side..

There's a lot of good reason to get SCRAM added as a protocol,
considering our current password-based implementation is rather..
lacking.  Regarding the specific comment about the timing, that rule is
specifically to prevent large patches from landing in the last CF
without any prior discussion or review, as I recall, so I'm not sure it
really applies here as there's been quite a bit of discussion and review
already.

That said, per various discussions, we'd really want a more-or-less
fully formed patch to land prior to the last CF, for this to have any
chance.  Perhaps that means it's not going to happen, which would be
unfortunate, but it's not beyond the possible, in my view, at least.

> > In addition, I would prefer to maintain the current structure of the
> > pg_authid table and use the rolpassword and rolvaliduntil columns to
> > store only the md5 verifier which will also be stored in
> > pg_auth_verifiers.  This would provide a smoother migration path with
> > the idea that rolpassword and rolvaliduntil will be removed from
> > pg_authid in a future version of Postgres.
> 
> Actually, I am of the opinion that both rolpassword and rolvaliduntil
> should be directly part of pg_auth_verifiers. Being able to handle
> multiple verifiers for the same protocol is a feature that is being
> asked for with a given password handling policy (was pinged again
> about that in Moscow last week). Rolling in new verifiers needs now
> extra roles to be created.

I'm on Michael's side here.  I don't believe it makes sense to try and
maintain the exact structure of pg_authid.  We are certainly happy to
whack around the other catalogs, and I'm unimpressed with my prior
efforts to provide backwards-compatible catalog (see pg_user, et al) for
just a few releases- we seem unable to get rid of them now, even though
we should have years ago, really.  Indeed, I'd be just as happy if we
got rid of them during this work..

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?

2016-02-14 Thread Tom Lane
Patric Bechtel  writes:
> Tom Lane schrieb am 14.02.2016 um 17:51:
>> I think your problem is that the planner won't apply 
>> match_boolean_index_clause() or
>> expand_boolean_index_clause(), because it has a hard-wired idea of which 
>> index opclasses could 
>> possibly benefit from that, cf IsBooleanOpfamily().

> If someone might give me a hint where to look, I'd be grateful.

In principle, instead of using the hard-wired IsBooleanOpfamily test,
we could check "op_in_opfamily(BooleanEqualOperator, opfamily)" to see
whether those transforms would produce something that works with the
index.  The reason I didn't do it that way is that those tests are made
for every WHERE clause the planner ever considers, so it seemed like
adding a syscache lookup that is usually going to accomplish nothing
would be pretty annoying overhead.  The trick to getting an acceptable
patch here would be to figure out how to arrange things to not cost
much when the optimization doesn't apply.

One idea worth exploring is to redefine IsBooleanOpfamily to inspect
the index's AM OID as well as the opfamily, and code it along the
lines of

((relam) == BTREE_AM_OID ? (opfamily) == BOOL_BTREE_FAM_OID :
 (relam) == HASH_AM_OID ? (opfamily) == BOOL_HASH_FAM_OID :
 op_in_opfamily(BooleanEqualOperator, opfamily))

so that the extra syscache lookup doesn't have to happen at all for
btree and hash indexes.  But I doubt that moves the needle far enough.

Another idea is to try to rearrange match_clause_to_indexcol and
expand_indexqual_conditions so that IsBooleanOpfamily() doesn't get hit
quite so easily.  I don't remember at the moment exactly why that's the
first test rather than something further down.  Probably at least part of
the reason is an assumption that IsBooleanOpfamily() is cheap.  It's
possible that match_boolean_index_clause() is actually cheaper than the
full-fledged version of IsBooleanOpfamily() and so the order of those
tests should be reversed; moving them down to after other tests might also
be appropriate.

Another line of thought is to try to cache the result of the
IsBooleanOpfamily() test in the IndexOptInfo structs so it doesn't
have to get done so many times.  But that would be more invasive
and I'm not sure that it helps in simple cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

2016-02-14 Thread Tom Lane
Robert Haas  writes:
> On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane  wrote:
>> pgprocno of the specific PGPROC, or of the group leader?  If it's
>> the former, I'm pretty suspicious that there are deadlock and/or
>> linked-list-corruption hazards here.  If it's the latter, at least
>> the comments around this are misleading.

> Leader.  The other way would be nuts.

... and btw, either BecomeLockGroupMember() has got this backwards, or
I'm still fundamentally confused.

Also, after fixing that it would be good to add a comment explaining why
it's not fundamentally unsafe for BecomeLockGroupMember() to examine
leader->pgprocno without having any relevant lock.  AFAICS, that's only
okay because the pgprocno fields are never changed after InitProcGlobal,
even when a PGPROC is recycled.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?

2016-02-14 Thread Patric Bechtel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Tom,

Tom Lane schrieb am 14.02.2016 um 17:51:
> Patric Bechtel  writes:
>> I tried to add bool support to the btree_gin contrib module, and as far as I 
>> can tell, it
>> seems to work (wasn't that complicated, actually). But now I'm stuck, as 
>> PostgreSQL doesn't
>> seem to like to use my new index, if I use equality or unequality, just with 
>> greater and
>> lower than.
> 
> I think your problem is that the planner won't apply 
> match_boolean_index_clause() or
> expand_boolean_index_clause(), because it has a hard-wired idea of which 
> index opclasses could 
> possibly benefit from that, cf IsBooleanOpfamily().

oh, sh*t...

My motivation was the size of the bool indexes; they are tiny and really fast. 
It feels almost
like bitmap indexes.

I hope that's not too far over my head already... but I'll take a look.

If someone might give me a hint where to look, I'd be grateful.

Thanks a lot for the hint,

Patric
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAlbA+64ACgkQfGgGu8y7ypCfVwCg81dCY9Mv70+2dk8e3+5xChyO
C7cAn1fRV3NAosi0W3IisKNEmS9K9hZE
=Xd+r
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

2016-02-14 Thread Tom Lane
Robert Haas  writes:
> On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> ... the lock manager lock that protects the fields in a
>>> given PGPROC is chosen by taking pgprocno modulo the number of lock
>>> manager partitions.

>> pgprocno of the specific PGPROC, or of the group leader?  If it's
>> the former, I'm pretty suspicious that there are deadlock and/or
>> linked-list-corruption hazards here.  If it's the latter, at least
>> the comments around this are misleading.

> Leader.  The other way would be nuts.

On closer inspection, that's actually still somewhat problematic,
because that essentially means that no one can inspect another backend's
lockGroupLeader etc fields unless they take *all* the lock partition
LWLocks first.  You can't take just the relevant one because you don't
know which one that is, at least not without dereferencing lockGroupLeader
which is entirely unsafe if you haven't got the appropriate lock.

This isn't a problem for members of the lock group, since they already
know who the leader is and hence which partition lock to use.  And it's
not a problem for the deadlock detector either since it'll take all those
partition locks anyway.  But it makes life difficult for status inspection
code.  I'm finding that the only way to write a lock-group-aware function
that tells whether A is blocked by B is to hold both the ProcArrayLock and
*all* of the lock partition locks throughout inspection of the data
structures.  I'd hoped to be able to restrict it to locking just the
partition holding the lock A is blocked on, but that ain't working.

Maybe this is all okay, but it makes me nervous that the data structures
may have been over-optimized.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns

2016-02-14 Thread David Rowley
On 14/02/2016 5:11 pm, "Tom Lane"  wrote:
>
> David Rowley  writes:
> > On 12/02/2016 12:01 am, "Tom Lane"  wrote:
> > I can't quite understand what you're seeing here.
>
> The second loop is iterating through the original GROUP BY list, so it
> will see again any outer Vars that were excluded by the first loop.
> It needs to exclude them exactly the same, because they are outside
> the scope of its data structures.  Consider something like (admittedly
> pretty silly, but legal SQL)
>
> create table up (u1 int, u2 int, u3 int);
> create table down (f1 int primary key, f2 int);
>
> select * from othertable, up
> where u1 in (select f2 from down group by f1, f2, up.u3);
>
> up.u3 would have varlevelsup = 1, varno = 2, varattno = 3.
> If you don't skip it then the surplusvars[var->varno] access
> will be trying to fetch off the end of the surplusvars[] array,
> because there is only one RTE in the subquery's rangetable
> though there are two in the outer query's rangetable.

Thanks for explaining this. Clearly I missed the case of the varno pointing
off the end of the array. Thanks for noticing and fixing.


[HACKERS] Refectoring of receivelog.c

2016-02-14 Thread Magnus Hagander
I was working on adding the tar streaming functionality we talked about at
the developer meeting to pg_basebackup, and rapidly ran across the issue
that Andres has been complaining about for a while. The code in
receivelog.c just passes an insane number of parameters around. Adding or
changing even a small thing ends up touching a huge number of places.

Here's an attempt to refactor the code to instead pass around a control
structure. I think it's a definite win already now, and we can't just keep
adding new functionality on top of the current one.

I'll proceed to work on the actual functionality I was working on to go on
top of this separately, but would appreciate a review of this part
independently. It's mostly mechanical, but there may definitely be mistakes
- or thinkos in the whole idea...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/src/bin/pg_basebackup/pg_basebackup.c
--- b/src/bin/pg_basebackup/pg_basebackup.c
***
*** 372,381  typedef struct
  static int
  LogStreamerMain(logstreamer_param *param)
  {
! 	if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline,
! 		   param->sysidentifier, param->xlogdir,
! 		   reached_end_position, standby_message_timeout,
! 		   NULL, false, true))
  
  		/*
  		 * Any errors will already have been reported in the function process,
--- 372,391 
  static int
  LogStreamerMain(logstreamer_param *param)
  {
! 	StreamCtl	stream;
! 
! 	MemSet(&stream, sizeof(stream), 0);
! 	stream.startpos = param->startptr;
! 	stream.timeline = param->timeline;
! 	stream.sysidentifier = param->sysidentifier;
! 	stream.stream_stop = reached_end_position;
! 	stream.standby_message_timeout = standby_message_timeout;
! 	stream.synchronous = false;
! 	stream.mark_done = true;
! 	stream.basedir = param->xlogdir;
! 	stream.partial_suffix = NULL;
! 
! 	if (!ReceiveXlogStream(param->bgconn, &stream))
  
  		/*
  		 * Any errors will already have been reported in the function process,
*** a/src/bin/pg_basebackup/pg_receivexlog.c
--- b/src/bin/pg_basebackup/pg_receivexlog.c
***
*** 276,285  FindStreamingStart(uint32 *tli)
  static void
  StreamLog(void)
  {
! 	XLogRecPtr	startpos,
! serverpos;
! 	TimeLineID	starttli,
! servertli;
  
  	/*
  	 * Connect in replication mode to the server
--- 276,286 
  static void
  StreamLog(void)
  {
! 	XLogRecPtr	serverpos;
! 	TimeLineID	servertli;
! 	StreamCtl	stream;
! 
! 	MemSet(&stream, 0, sizeof(stream));
  
  	/*
  	 * Connect in replication mode to the server
***
*** 311,327  StreamLog(void)
  	/*
  	 * Figure out where to start streaming.
  	 */
! 	startpos = FindStreamingStart(&starttli);
! 	if (startpos == InvalidXLogRecPtr)
  	{
! 		startpos = serverpos;
! 		starttli = servertli;
  	}
  
  	/*
  	 * Always start streaming at the beginning of a segment
  	 */
! 	startpos -= startpos % XLOG_SEG_SIZE;
  
  	/*
  	 * Start the replication
--- 312,328 
  	/*
  	 * Figure out where to start streaming.
  	 */
! 	stream.startpos = FindStreamingStart(&stream.timeline);
! 	if (stream.startpos == InvalidXLogRecPtr)
  	{
! 		stream.startpos = serverpos;
! 		stream.timeline = servertli;
  	}
  
  	/*
  	 * Always start streaming at the beginning of a segment
  	 */
! 	stream.startpos -= stream.startpos % XLOG_SEG_SIZE;
  
  	/*
  	 * Start the replication
***
*** 329,340  StreamLog(void)
  	if (verbose)
  		fprintf(stderr,
  _("%s: starting log streaming at %X/%X (timeline %u)\n"),
! progname, (uint32) (startpos >> 32), (uint32) startpos,
! starttli);
  
! 	ReceiveXlogStream(conn, startpos, starttli, NULL, basedir,
! 	  stop_streaming, standby_message_timeout, ".partial",
! 	  synchronous, false);
  
  	PQfinish(conn);
  	conn = NULL;
--- 330,346 
  	if (verbose)
  		fprintf(stderr,
  _("%s: starting log streaming at %X/%X (timeline %u)\n"),
! 		progname, (uint32) (stream.startpos >> 32), (uint32) stream.startpos,
! stream.timeline);
! 
! 	stream.stream_stop = stop_streaming;
! 	stream.standby_message_timeout = standby_message_timeout;
! 	stream.synchronous = synchronous;
! 	stream.mark_done = false;
! 	stream.basedir = basedir;
! 	stream.partial_suffix = ".partial";
  
! 	ReceiveXlogStream(conn, &stream);
  
  	PQfinish(conn);
  	conn = NULL;
*** a/src/bin/pg_basebackup/receivelog.c
--- b/src/bin/pg_basebackup/receivelog.c
***
*** 33,59  static XLogRecPtr lastFlushPosition = InvalidXLogRecPtr;
  
  static bool still_sending = true;		/* feedback still needs to be sent? */
  
! static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos,
!  uint32 timeline, char *basedir,
! 			   stream_stop_callback stream_stop, int standby_message_timeout,
!  char *partial_suffix, XLogRecPtr *stoppos,
!  bool synchronous, bool mark_done);
  static int	CopyStreamPoll(PGconn *conn, long timeout_ms);
  static int	CopyStreamReceive(PGconn *conn, long t

Re: [HACKERS] pl/pgSQL, get diagnostics and big data

2016-02-14 Thread Christian Ullrich

* Andreas 'ads' Scherbaum wrote:


Attached is a new version of the patch, with %lu replaced by %zu.
I re-ran all the tests, especially the long test with 2^32+x rows, and
it produces the same result as before.


To paraphrase Twain: "Sire, the Board finds this patch perfect in all 
the requirements and qualifications for inclusion into core, and doth 
hold his case open for decision after due examination by his committer."


The Windows issue is corrected, and all regression tests pass on Windows 
and FreeBSD. I can find no further fault with this patch.


Sorry it took so long, my other PostgreSQL issue happened just when I 
was going to test the updated patch.


--
Christian



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-02-14 Thread Robert Haas
On Sun, Feb 14, 2016 at 12:19 AM, Masahiko Sawada  wrote:
> Thank you for reviewing this patch.
> I've divided 000 patch into two patches, and attached latest 4 patches in 
> total.

Thank you!  I'll go through this again as soon as I have a free moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-14 Thread Robert Haas
On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO  wrote:
> Here is a 3 part v29:
>
> a) add support for integer functions in pgbench, including turning
>operators as functions, as well as some minimal infrastructure for
>additional types (this allows to minimize the diffs with the next
>patch which adds double).
>
> b) add support for doubles, including setting double variables.
>Note that variable are not explicitely typed. Add some
>double-related functions, most interesting of them for pgbench
>are the randoms.
>
> c) remove \setrandom support (as thanks to part b \set x random(...) does
>the same). Prints an error pointing to the replacement if \setrandom is
>used in a pgbench script.

Thanks, I'll review these as soon as I can get to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

2016-02-14 Thread Robert Haas
On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> First, the overall concept here is that processes can either be a
>> member of a lock group or a member of no lock group.
>
> Check.
>
>> Second, I can review the data structure changes and the associated
>> invariants.
>
> The data structure additions seemed relatively straightforward, though
> I did wonder why you added groupLeader to PROCLOCK.  Why is that not
> redundant with tag.myProc->lockGroupLeader?  If it isn't redundant,
> it's inadequately documented.  If it is, seems better to lose it.

That's a good question.  There are locking considerations: the lock
manager lock for a given PROCLOCK isn't necessarily the same one that
protects tag.myProc->lockGroupLeader.  I can go and look back through
the contexts where we use the PROCLOCK field and see if we can get by
without this, but I suspect it's too much of a pain.

> Also, isn't the comment on this PGPROC field incorrect:
>
> PGPROC *lockGroupLeader;/* lock group leader, if I'm a follower */
>
> ie shouldn't you s/follower/member/ ?

Yes, that would be better.

>> ... the lock manager lock that protects the fields in a
>> given PGPROC is chosen by taking pgprocno modulo the number of lock
>> manager partitions.
>
> pgprocno of the specific PGPROC, or of the group leader?  If it's
> the former, I'm pretty suspicious that there are deadlock and/or
> linked-list-corruption hazards here.  If it's the latter, at least
> the comments around this are misleading.

Leader.  The other way would be nuts.

>> Each PROCLOCK now has a new groupLeader flag.  While a PGPROC's
>> lockGroupLeader may be NULL if that process is not involved in group
>> locking, a PROCLOCK's groupLeader always has a non-NULL value; it
>> points back to the PGPROC that acquired the lock.
>
> This is not what the comment on it says; and your prose explanation
> here implies that it should actually be equal to tag.myProc, or else
> you are using some definition of "acquired the lock" that I don't
> follow at all.  There could be lots of PGPROCs that have acquired
> a lock in one sense or another; which one do you mean?

The PROCLOCK's groupLeader will be equal to
tag.myProc->lockGroupLeader unless that is NULL.  In that case it will
be equal to tag.myProc.  Or at least that's the intention, modulo
bugs.

>> With respect to pg_locks - and for that matter also pg_stat_activity -
>> I think you are right that improvement is needed.
>
> This is really the core of my concern at the moment.  I think that
> isolationtester is probably outright broken for any situation where the
> queries-under-test are being parallel executed, and so will be any other
> client that's trying to identify who blocks whom from pg_locks.

OK.

>> The simplest thing we could do to make that easier is, in
>> pg_stat_activity, have parallel workers advertise the PID that
>> launched them in a new field; and in pg_locks, have members of a lock
>> group advertise the leader's PID in a new field.
>
> That would be simple for us, but it would break every existing client-side
> query that tries to identify blockers/blockees; and not only would those
> queries need work but they would become very substantially more complex
> and slower (probably at least 4-way joins not 2-way).  We already know
> that isolationtester's query has performance problems in the buildfarm.
> I think more thought is needed here, and that this area should be
> considered a release blocker.  It's certainly a blocker for any thought
> of turning parallel query on by default.

I don't quite see why this would cause the joins to get more complex.
It's also not really clear what the alternative is.  You could show
all the locks under the leader's PID, but that's horribly confusing
because there might be duplicates, and because if somebody's waiting
you really want to be able to tell which process to target with gdb.
You can show everybody under their own PIDs as now, but then you can't
tell which processes are in groups together.  So I don't see what
there is except to show both values.  Maybe there's some whole-new way
of displaying this information that would be more clear but I don't
really see what it is.

>> I hope this helps and please let me know what more you think I should do.
>
> At the least you need to make another pass over the comments and README
> addition.  I highlighted above a few critical inaccuracies, and it's not
> hard to find a lot of less-critical typos in the comments in that commit.
> Transposing some of what you've written here into the README would likely
> be a good thing too.

I'm getting on a long plane flight shortly but go back over this when
I can find some time to do that in an appropriately careful way.
Please feel free to make such corrections to comments and so forth as
may seem urgent to you in the meanwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hacker

Re: [HACKERS] [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

2016-02-14 Thread Tom Lane
Robert Haas  writes:
> First, the overall concept here is that processes can either be a
> member of a lock group or a member of no lock group.

Check.

> Second, I can review the data structure changes and the associated
> invariants.

The data structure additions seemed relatively straightforward, though
I did wonder why you added groupLeader to PROCLOCK.  Why is that not
redundant with tag.myProc->lockGroupLeader?  If it isn't redundant,
it's inadequately documented.  If it is, seems better to lose it.

Also, isn't the comment on this PGPROC field incorrect:

PGPROC *lockGroupLeader;/* lock group leader, if I'm a follower */

ie shouldn't you s/follower/member/ ?

> ... the lock manager lock that protects the fields in a
> given PGPROC is chosen by taking pgprocno modulo the number of lock
> manager partitions.

pgprocno of the specific PGPROC, or of the group leader?  If it's
the former, I'm pretty suspicious that there are deadlock and/or
linked-list-corruption hazards here.  If it's the latter, at least
the comments around this are misleading.

> Each PROCLOCK now has a new groupLeader flag.  While a PGPROC's
> lockGroupLeader may be NULL if that process is not involved in group
> locking, a PROCLOCK's groupLeader always has a non-NULL value; it
> points back to the PGPROC that acquired the lock.

This is not what the comment on it says; and your prose explanation
here implies that it should actually be equal to tag.myProc, or else
you are using some definition of "acquired the lock" that I don't
follow at all.  There could be lots of PGPROCs that have acquired
a lock in one sense or another; which one do you mean?

> With respect to pg_locks - and for that matter also pg_stat_activity -
> I think you are right that improvement is needed.

This is really the core of my concern at the moment.  I think that
isolationtester is probably outright broken for any situation where the
queries-under-test are being parallel executed, and so will be any other
client that's trying to identify who blocks whom from pg_locks.

> The simplest thing we could do to make that easier is, in
> pg_stat_activity, have parallel workers advertise the PID that
> launched them in a new field; and in pg_locks, have members of a lock
> group advertise the leader's PID in a new field.

That would be simple for us, but it would break every existing client-side
query that tries to identify blockers/blockees; and not only would those
queries need work but they would become very substantially more complex
and slower (probably at least 4-way joins not 2-way).  We already know
that isolationtester's query has performance problems in the buildfarm.
I think more thought is needed here, and that this area should be
considered a release blocker.  It's certainly a blocker for any thought
of turning parallel query on by default.

> I hope this helps and please let me know what more you think I should do.

At the least you need to make another pass over the comments and README
addition.  I highlighted above a few critical inaccuracies, and it's not
hard to find a lot of less-critical typos in the comments in that commit.
Transposing some of what you've written here into the README would likely
be a good thing too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: enhancing slow query log, and autoexplain log about waiting on lock before query exec time

2016-02-14 Thread Pavel Stehule
2016-02-14 17:46 GMT+01:00 Tom Lane :

> Pavel Stehule  writes:
> > We have a patch, that inject logs about the time waiting on locks before
> > query execution. This feature helps us lot of, and I hope, it can be
> > generally useful.
>
> Doesn't log_lock_waits cover that territory already?
>

It does. But It creates different log entry - and can be hard to join slow
query with log entry sometimes lot of lines before. This proposal is about
taking important information comfortably - and log parsing and processing
is simpler.

Regards

Pavel


> regards, tom lane
>


Re: [HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?

2016-02-14 Thread Tom Lane
Patric Bechtel  writes:
> I tried to add bool support to the btree_gin contrib module, and as far as I 
> can tell, it seems to
> work (wasn't that complicated, actually).
> But now I'm stuck, as PostgreSQL doesn't seem to like to use my new index, if 
> I use equality or
> unequality, just with greater and lower than.

I think your problem is that the planner won't apply
match_boolean_index_clause() or expand_boolean_index_clause(),
because it has a hard-wired idea of which index opclasses could
possibly benefit from that, cf IsBooleanOpfamily().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: enhancing slow query log, and autoexplain log about waiting on lock before query exec time

2016-02-14 Thread Tom Lane
Pavel Stehule  writes:
> We have a patch, that inject logs about the time waiting on locks before
> query execution. This feature helps us lot of, and I hope, it can be
> generally useful.

Doesn't log_lock_waits cover that territory already?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-14 Thread Fabien COELHO


Hello Michaël,


I'll be happy if you do the review of the resulting split.


OK, I am fine with this scenario as well. I have luckily done nothing yet.


Here is a 3 part v29:

a) add support for integer functions in pgbench, including turning
   operators as functions, as well as some minimal infrastructure for
   additional types (this allows to minimize the diffs with the next
   patch which adds double).

b) add support for doubles, including setting double variables.
   Note that variable are not explicitely typed. Add some
   double-related functions, most interesting of them for pgbench
   are the randoms.

c) remove \setrandom support (as thanks to part b \set x random(...) does
   the same). Prints an error pointing to the replacement if \setrandom is
   used in a pgbench script.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..9d5eb32 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -798,8 +798,11 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   references to variables :variablename,
   and expressions composed of unary (-) or binary operators
-  (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  (+, -, *, /,
+  %) with their usual associativity, function calls and
+  parentheses.
+   shows the available
+  functions.
  
 
  
@@ -965,6 +968,52 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))

   
 
+   
+   
+PgBench Functions
+
+ 
+  
+   Function
+   Return Type
+   Description
+   Example
+   Result
+  
+ 
+ 
+  
+   abs(a)
+   same as a
+   integer value
+   abs(-17)
+   17
+  
+  
+   debug(a)
+   same asa 
+   print to stderr the given argument
+   debug(5432)
+   5432
+  
+  
+   max(i [, ... ] )
+   integer
+   maximum value
+   max(5, 4, 3, 2)
+   5
+  
+  
+   min(i [, ... ] )
+   integer
+   minimum value
+   min(5, 4, 3, 2)
+   2
+  
+ 
+ 
+   
+
   
As an example, the full definition of the built-in TPC-B-like
transaction is:
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 06ee04b..93c6173 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -16,10 +16,13 @@
 
 PgBenchExpr *expr_parse_result;
 
+static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list);
 static PgBenchExpr *make_integer_constant(int64 ival);
 static PgBenchExpr *make_variable(char *varname);
-static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
+static PgBenchExpr *make_op(const char *operator, PgBenchExpr *lexpr,
 		PgBenchExpr *rexpr);
+static int find_func(const char *fname);
+static PgBenchExpr *make_func(const int fnumber, PgBenchExprList *args);
 
 %}
 
@@ -31,13 +34,15 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 	int64		ival;
 	char	   *str;
 	PgBenchExpr *expr;
+	PgBenchExprList *elist;
 }
 
+%type  elist
 %type  expr
-%type  INTEGER
-%type  VARIABLE
+%type  INTEGER function
+%type  VARIABLE FUNCTION
 
-%token INTEGER VARIABLE
+%token INTEGER VARIABLE FUNCTION
 %token CHAR_ERROR /* never used, will raise a syntax error */
 
 /* Precedence: lowest to highest */
@@ -49,16 +54,25 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr,
 
 result: expr{ expr_parse_result = $1; }
 
+elist:  	{ $$ = NULL; }
+	| expr 	{ $$ = make_elist($1, NULL); }
+	| elist ',' expr		{ $$ = make_elist($3, $1); }
+	;
+
 expr: '(' expr ')'			{ $$ = $2; }
 	| '+' expr %prec UMINUS	{ $$ = $2; }
-	| '-' expr %prec UMINUS	{ $$ = make_op('-', make_integer_constant(0), $2); }
-	| expr '+' expr			{ $$ = make_op('+', $1, $3); }
-	| expr '-' expr			{ $$ = make_op('-', $1, $3); }
-	| expr '*' expr			{ $$ = make_op('*', $1, $3); }
-	| expr '/' expr			{ $$ = make_op('/', $1, $3); }
-	| expr '%' expr			{ $$ = make_op('%', $1, $3); }
+	| '-' expr %prec UMINUS	{ $$ = make_op("-", make_integer_constant(0), $2); }
+	| expr '+' expr			{ $$ = make_op("+", $1, $3); }
+	| expr '-' expr			{ $$ = make_op("-", $1, $3); }
+	| expr '*' expr			{ $$ = make_op("*", $1, $3); }
+	| expr '/' expr			{ $$ = make_op("/", $1, $3); }
+	| expr '%' expr			{ $$ = make_op("%", $1, $3); }
 	| INTEGER{ $$ = make_integer_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
+	| function '(' elist ')'{ $$ = make_func($1, $3); }
+	;
+
+function: FUNCTION			{ $$ = find_func($1); pg_free($1); }
 	;
 
 %%
@@ -68,8 +82,9 @@ make_integer_constant(int64 ival)
 {
 	PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr));
 
-	expr->etype = ENODE_INTEGER_CONSTANT;
-	expr->u.integer_constant.ival = ival;
+	expr->etype = ENODE_CONSTANT;
+	expr->u.constant.type = PGBT_INT;
+	expr->u.constant.u.ival = ival;
 	return expr;
 }
 
@@ -84,14 +99,128 

Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns

2016-02-14 Thread Tom Lane
David Rowley  writes:
> On 12/02/2016 12:01 am, "Tom Lane"  wrote:
>> Um, AFAICS, you *do* need to check again in the second loop, else you'll
>> be accessing a surplusvars[] entry that might not exist at all, and in
>> any case might falsely tell you that you can exclude the outer var from
>> the new GROUP BY list.

> I can't quite understand what you're seeing here.

The second loop is iterating through the original GROUP BY list, so it
will see again any outer Vars that were excluded by the first loop.
It needs to exclude them exactly the same, because they are outside
the scope of its data structures.  Consider something like (admittedly
pretty silly, but legal SQL)

create table up (u1 int, u2 int, u3 int);
create table down (f1 int primary key, f2 int);

select * from othertable, up
where u1 in (select f2 from down group by f1, f2, up.u3);

up.u3 would have varlevelsup = 1, varno = 2, varattno = 3.
If you don't skip it then the surplusvars[var->varno] access
will be trying to fetch off the end of the surplusvars[] array,
because there is only one RTE in the subquery's rangetable
though there are two in the outer query's rangetable.

When I trace through this example, it manages not to crash because
in point of fact the outer Var has already been replaced by a Param.
But I don't think this code should be assuming that; it's an artifact
of the detailed order of processing in subquery_planner(), and could
easily change in the future, for example if we tried to move the
whole affair to the jointree preprocessing stage as we discussed earlier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Defaults for replication/backup

2016-02-14 Thread Magnus Hagander
On Sun, Feb 14, 2016 at 2:00 AM, Robert Haas  wrote:

> On Sat, Feb 13, 2016 at 11:31 AM, Andres Freund 
> wrote:
> > On 2016-02-13 11:10:58 -0500, Tom Lane wrote:
> >> Magnus Hagander  writes:
> >> > The big thing is, IIRC, that we turn off the optimizations in
> >> > min_wal_level. *most* people will see no impact of their regular
> >> > application runtime from that, but it might definitely have an effect
> on
> >> > data loads and such. For normal runtime, there should be very close
> to zero
> >> > difference, no?
> >>
> >> I was asking for a demonstration of that, not just handwaving.  Even if
> >> it was measured years ago, I wouldn't assume the comparison would be
> >> the same on current Postgres.
> >
> > Well, let's look at what the difference between wal_level's are:
> > 1) the (currently broken) optimization of not WAL logging COPY et al,
> >for newly created relations.
> > 2) relation AccessExclusiveLocks are WAL logged on >= hot_standby
> > 3) Subtransaction assignment records are generated for >= hot_standby
> >after 64 records.
> > 4) checkpoints and bgwriter occasionally generate XLOG_RUNNING_XACTS
> >records
> > 5) btreevacuum() and _bt_getbuf() sometimes do additional WAL logging on
> >>= hot_standby
> > 6) Once per vacuum we issue a XLOG_HEAP2_CLEANUP_INFO
> >
> >
> > 1) obviously can have performance impact; but only in a relatively
> > narrow set of cases. I doubt any of the others really play that major a
> > role.  But I really think minor efficiency differences are besides the
> > point. Making backups and replication easier has a far bigger positive
> > impact, for far more users.
>
> I think that there are definitely people for whom #1 is an issue, but
> maybe that's not a sufficient reason not to change the default.
>

There definitely are people. I'd say most of those would already be tuning
their config in different ways as well, so changing it is a lot lower cost
for them. And there's fewer of them.



> As a thought experiment, how about teaching initdb how to tailor the
> configuration to a couple of common scenarios via some new flag?  I'll
> call it --setup although that's probably not best.
>
> --setup=replication   --- Preconfigure for replication.
> --setup=standalone  --- Preconfigure for standalone mode.
> --setup=ephemeral  --- Preconfigure for an ephemeral instance that
> doesn't need durability because we'll blow it up soon.
>
> Whichever mode we make the default, I think this kind of thing would
> make life easier for users.
>

I'd like to reiterate that this is not just about replication, it's even
more about decent backups. As soon as your database goes to the point that
pg_dump is not a great solution for backup (and that happens pretty
quickly, given the performance of pg_restore), the response is "oh, go
change these 3 parameters in your config and then restart the db
disconnecting all your users" which gives interesting reactions from
people...

I could go with somethin glike
--setup=small
--setup=normal
--setup=ephemeral

which would be a more proper mapping I think. Of course, this would also
give us the ability to bikeshed about three different colors, one in each
predefined set! :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?

2016-02-14 Thread Patric Bechtel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I tried to add bool support to the btree_gin contrib module, and as far as I 
can tell, it seems to
work (wasn't that complicated, actually).

But now I'm stuck, as PostgreSQL doesn't seem to like to use my new index, if I 
use equality or
unequality, just with greater and lower than.

My test subject is a table with 13690993 rows, one of them (bar) is a boolean, 
376442 are true,
the others are false, no nulls. The index on bar is a btree_gin index. Table is 
vacuum analyzed
and all, so statistics are fresh and usable, as the estimates within the plans 
show.

Here's the plan if I ask for 300 rows with d, as in "select id from foo where 
bar":

 Seq Scan on foo  (cost=0.00..684709.82 rows=385495 width=8) (actual 
time=0.014..2657.326
rows=376442 loops=1)
   Filter: bar
   Rows Removed by Filter: 13314551
 Planning time: 0.309 ms
 Execution time: 2672.559 ms

But, if I query "select if from foo where bar>'f'":

 Bitmap Heap Scan on foo  (cost=7955.59..313817.94 rows=385495 width=8) (actual
time=220.631..365.299 rows=376442 loops=1)
   Recheck Cond: (bar > false)
   Heap Blocks: exact=104100
   ->  Bitmap Index Scan on ix_foo_gin  (cost=0.00..7859.21 rows=385495 
width=0) (actual
time=193.192..193.192 rows=376442 loops=1)
 Index Cond: (bar > false)
 Planning time: 0.400 ms
 Execution time: 377.518 ms

It starts using the index. The rule seems to be: as long as I'm using <, <=, >= 
or >, it chooses
the index. If I use = or !=, it doesn't.

Here's my definition of the bool_ops for the gin index (it's very similar to 
the other indexes in
the btree_gin extension):

CREATE OPERATOR CLASS bool_ops
DEFAULT FOR TYPE bool USING gin
AS
OPERATOR1   <,
OPERATOR2   <=,
OPERATOR3   =,
OPERATOR4   >=,
OPERATOR5   >,
FUNCTION1   btboolcmp(bool,bool),
FUNCTION2   gin_extract_value_bool(bool, internal),
FUNCTION3   gin_extract_query_bool(bool, internal, int2, 
internal, internal),
FUNCTION4   gin_btree_consistent(internal, int2, anyelement, 
int4, internal,
internal),
FUNCTION5   gin_compare_prefix_bool(bool,bool,int2, internal),
STORAGE bool;

What am I overseeing?

- -- 
Patric
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAlbAdg0ACgkQfGgGu8y7ypBHZwCg0g1JSgZTc0OBYsMzrj6w4Zy6
DTQAn38gk8hfqFf86N8hWEzwqc9afjar
=SLMC
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-02-14 Thread Michael Paquier
On Sun, Feb 14, 2016 at 4:42 PM, Fabien COELHO  wrote:
>
>> So I would be fine to do a portion of the legwork and extract from this
>> patch something smaller that adds only functions as a first step, with the
>> minimum set of functions I mentioned upthread. Robert, Alvaro, Fabien, does
>> that sound fine to you?
>
>
> Thanks, but this is my c*, I have a few hours of travelling this evening,
> I'll do it then.
>
> I'll be happy if you do the review of the resulting split.

OK, I am fine with this scenario as well. I have luckily done nothing yet.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns

2016-02-14 Thread David Rowley
On 12/02/2016 12:01 am, "Tom Lane"  wrote:
>
> David Rowley  writes:
> > [ prune_group_by_clause_ab4f491_2016-01-23.patch ]
> > [ check_functional_grouping_refactor.patch ]
>
> I've committed this with mostly-cosmetic revisions (principally, rewriting
> a lot of the comments, which seemed on the sloppy side).

Many thanks for committing this and improving the comments.

> >> * Both of the loops iterating over the groupClause neglect to check
> >> varlevelsup, thus leading to assert failures or worse if an outer-level
> >> Var is present in the GROUP BY list.  (I'm pretty sure outer Vars can
> >> still be present at this point, though I might be wrong.)
>
> > Fixed in the first loop, and the way I've rewritten the code to use
> > bms_difference, there's no need to check again in the 2nd loop.
>
> Um, AFAICS, you *do* need to check again in the second loop, else you'll
> be accessing a surplusvars[] entry that might not exist at all, and in
> any case might falsely tell you that you can exclude the outer var from
> the new GROUP BY list.
>

I can't quite understand what you're seeing here. As far as I can tell from
reading the code again (on my phone ) the varlevelsup check in the 2nd loop
is not required. Any var with varlevelsup above zero won't make it into the
surplus bitmap for the release as the bms_difference call just removes the
pk vars from the varlevelsup =0 vars. So the bms_ismember should be fine. I
can't see what I'm missing. In either case it test does no harm.

> That was the only actual bug I found, though I changed some other stuff.


[HACKERS] proposal: enhancing slow query log, and autoexplain log about waiting on lock before query exec time

2016-02-14 Thread Pavel Stehule
Hi,

the interpretation of slow queries or entries from auto-explain log can be
difficult some times, because the the main time of query evaluation is
waiting on lock, and this number isn't in related entry. Our situation is
little bit difficult, because we have not direct access to PostgreSQL logs,
we working with log aggregators.

We have a patch, that inject logs about the time waiting on locks before
query execution. This feature helps us lot of, and I hope, it can be
generally useful.

Is this feature interesting for community? What do you think about it?

Regards

Pavel