Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-18 Thread Michael Paquier
On Tue, Apr 19, 2016 at 2:42 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Mon, Apr 11, 2016 at 3:25 PM, Michael Paquier
>>  wrote:
>> > Now, I have produced two patches:
>> > - 0001-Support-for-VS-2015-locale-hack.patch, which makes use of
>> > __crt_locale_data_public in ucrt/corecrt.h. This is definitely an ugly
>> > hack, though I am coming to think that this may be the approach that
>> > would us the less harm, and that's closer to what is done for VS 2012
>> > and 2013.
>> > - 0001-Support-for-VS-2015-getlocaleinfoex.patch, which make use of
>> > GetLocaleInfoEx, this requires us to lower a bit the support grid for
>> > Windows, basically that's throwing support for XP if compilation is
>> > done with VS 2015.
>> > Based on my tests, both are working with short and long local names,
>> > testing via initdb --locale.
>>
>> The first patch is actually not what I wanted to send. Here are the
>> correct ones...
>
> This thread seems to have stalled.  I thought we were going to consider
> these patches for 9.6.  Should we simply push them to see what the
> buildfarm thinks?

We need to choose first which one we'd like to have, it would be great
to get some input from Andrew or even Magnus here who I thought would
get the last word.

> Has anyone other than Michael actually tested it in VS2015?

It doesn't seem to be the case...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-04-18 Thread Alvaro Herrera
Michael Paquier wrote:
> On Mon, Apr 11, 2016 at 3:25 PM, Michael Paquier
>  wrote:
> > Now, I have produced two patches:
> > - 0001-Support-for-VS-2015-locale-hack.patch, which makes use of
> > __crt_locale_data_public in ucrt/corecrt.h. This is definitely an ugly
> > hack, though I am coming to think that this may be the approach that
> > would us the less harm, and that's closer to what is done for VS 2012
> > and 2013.
> > - 0001-Support-for-VS-2015-getlocaleinfoex.patch, which make use of
> > GetLocaleInfoEx, this requires us to lower a bit the support grid for
> > Windows, basically that's throwing support for XP if compilation is
> > done with VS 2015.
> > Based on my tests, both are working with short and long local names,
> > testing via initdb --locale.
> 
> The first patch is actually not what I wanted to send. Here are the
> correct ones...

This thread seems to have stalled.  I thought we were going to consider
these patches for 9.6.  Should we simply push them to see what the
buildfarm thinks?  Has anyone other than Michael actually tested it in
VS2015?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-18 Thread Etsuro Fujita

On 2016/04/19 13:59, Michael Paquier wrote:

On Tue, Apr 19, 2016 at 1:14 PM, Etsuro Fujita
 wrote:

What do you think about that?



+   /* Wait for the result */
+   res = pgfdw_get_result(conn, query);
+   if (res == NULL)
+   pgfdw_report_error(ERROR, NULL, conn, false, query);
+   last_res = res;
+
+   /*
+* Verify that there are no more results
+*
+* We don't use a PG_TRY block here, so be careful not to throw error
+* without releasing the PGresult.
+*/
+   res = pgfdw_get_result(conn, query);
+   if (res != NULL)
+   {
+   PQclear(last_res);
+   pgfdw_report_error(ERROR, res, conn, true, query);
+   }

But huge objection to that because this fragilizes the current logic
postgres_fdw is based on: PQexec returns the last result to caller,
I'd rather not break that logic for 9.6 stability's sake.


IIUC, I think each query submitted by PQexec in postgres_fdw.c contains 
just a single command.  Maybe I'm missing something, though.



A even better proof of that is the following, which just emulates what
your version of pgfdw_get_result is doing when consuming the results.
+   /* Verify that there are no more results */
+   res = pgfdw_get_result(fmstate->conn, fmstate->query);
+   if (res != NULL)
+   pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
This could even lead to incorrect errors in the future if multiple
queries are combined with those DMLs for a reason or another.


I'd like to leave such enhancements for future work...

Thanks for the comment!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Michael Paquier
On Tue, Apr 19, 2016 at 3:14 AM, Tom Lane  wrote:
> Or in short: this is a whole lot further than I'm prepared to go to
> satisfy one customer with a badly-designed application.  And from what
> I can tell from the Feb 2015 discussion, that's what this has been
> written for.

This holds true. I imagine that a lot of people at least on this list
have already spent some time in tracking down long-running
transactions in someone's application and actually tuned the
application so as the bloat gets reduced and things perform better for
other transactions taking a shorter time. Without the need of this
feature.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-18 Thread Michael Paquier
On Tue, Apr 19, 2016 at 1:14 PM, Etsuro Fujita
 wrote:
> The comment "We don't use a PG_TRY block here ..." seems to be wrongly
> placed, so I moved that comment.  Also, I think it'd be better to call
> pgfdw_report_error() with the clear argument set to false, not true, since
> we don't need to clear the PGresult.  Same for postgresExecForeignUpdate,
> postgresExecForeignUpdate, and prepare_foreign_modify.

No objection to moving those comment blocks where pgfdw_get_result is called.

> What do you think about that?

+   /* Wait for the result */
+   res = pgfdw_get_result(conn, query);
+   if (res == NULL)
+   pgfdw_report_error(ERROR, NULL, conn, false, query);
+   last_res = res;
+
+   /*
+* Verify that there are no more results
+*
+* We don't use a PG_TRY block here, so be careful not to throw error
+* without releasing the PGresult.
+*/
+   res = pgfdw_get_result(conn, query);
+   if (res != NULL)
+   {
+   PQclear(last_res);
+   pgfdw_report_error(ERROR, res, conn, true, query);
+   }

But huge objection to that because this fragilizes the current logic
postgres_fdw is based on: PQexec returns the last result to caller,
I'd rather not break that logic for 9.6 stability's sake.

A even better proof of that is the following, which just emulates what
your version of pgfdw_get_result is doing when consuming the results.
+   /* Verify that there are no more results */
+   res = pgfdw_get_result(fmstate->conn, fmstate->query);
+   if (res != NULL)
+   pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
This could even lead to incorrect errors in the future if multiple
queries are combined with those DMLs for a reason or another.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-18 Thread Etsuro Fujita

On 2016/04/19 12:45, Etsuro Fujita wrote:

On 2016/04/19 12:26, Michael Paquier wrote:

Note for Robert: pgfdw_get_result copycats PQexec by discarding all
PQresult received except the last one. I think that's fine for the
purposes of postgres_fdw, but perhaps you have a different opinion on
the matter.



That seemed reasonable to me, but sorry, on second thought, I'm not sure
that's still a good idea.  One reason is (1) I think it's better for the
in-postgres_fdw.c functions using pgfdw_get_result to verify that there
are no more results, in itself.  I think that would improve the
robustness of those functions.  Another reason is I don't think
pgfdw_report_error, which is used in pgfdw_get_result, works well for
cases where the query contains multiple SQL commands.  So, +1 for the
idea of simply encapsulating the while (PQisBusy(...)) loop into a new
function pgfdw_get_result().


Here is a proposed patch for that.

Other changes:

 * We don't use a PG_TRY block here, so be careful not to throw error
 * without releasing the PGresult.
 */
-   res = PQexecPrepared(fmstate->conn,
-fmstate->p_name,
-fmstate->p_nums,
-p_values,
-NULL,
-NULL,
-0);
+   if (!PQsendQueryPrepared(fmstate->conn,
+fmstate->p_name,
+fmstate->p_nums,
+p_values,
+NULL,
+NULL,
+0))
+   pgfdw_report_error(ERROR, NULL, fmstate->conn, true, 
fmstate->query);

The comment "We don't use a PG_TRY block here ..." seems to be wrongly 
placed, so I moved that comment.  Also, I think it'd be better to call 
pgfdw_report_error() with the clear argument set to false, not true, 
since we don't need to clear the PGresult.  Same for 
postgresExecForeignUpdate, postgresExecForeignUpdate, and 
prepare_foreign_modify.


What do you think about that?

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/connection.c
--- b/contrib/postgres_fdw/connection.c
***
*** 17,22 
--- 17,23 
  #include "access/xact.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
+ #include "storage/latch.h"
  #include "utils/hsearch.h"
  #include "utils/memutils.h"
  
***
*** 448,453  GetPrepStmtNumber(PGconn *conn)
--- 449,537 
  }
  
  /*
+  * Send a query using PQsendQuery, and wait for the results.
+  *
+  * This function is interruptible by signals.
+  *
+  * Note: we assume that the query string contains a single SQL command.
+  */
+ PGresult *
+ pgfdw_exec_query(PGconn *conn, const char *query)
+ {
+ 	PGresult   *res;
+ 	PGresult   *last_res = NULL;
+ 
+ 	/*
+ 	 * Submit a query.  Since we don't use non-blocking mode, this also can
+ 	 * block.  But its risk is relatively small, so we ignore that for now.
+ 	 */
+ 	if (!PQsendQuery(conn, query))
+ 		pgfdw_report_error(ERROR, NULL, conn, false, query);
+ 
+ 	/* Wait for the result */
+ 	res = pgfdw_get_result(conn, query);
+ 	if (res == NULL)
+ 		pgfdw_report_error(ERROR, NULL, conn, false, query);
+ 	last_res = res;
+ 
+ 	/*
+ 	 * Verify that there are no more results
+ 	 *
+ 	 * We don't use a PG_TRY block here, so be careful not to throw error
+ 	 * without releasing the PGresult.
+ 	 */
+ 	res = pgfdw_get_result(conn, query);
+ 	if (res != NULL)
+ 	{
+ 		PQclear(last_res);
+ 		pgfdw_report_error(ERROR, res, conn, true, query);
+ 	}
+ 
+ 	return last_res;
+ }
+ 
+ /*
+  * Wait for the next result from a prior asynchronous execution function call,
+  * and return it.
+  *
+  * This function offers quick responsiveness by checking for any interruptions.
+  *
+  * Caller is responsible for the error handling on the fetched result.
+  *
+  * Note: we assume that the query string contains a single SQL command.
+  */
+ PGresult *
+ pgfdw_get_result(PGconn *conn, const char *query)
+ {
+ 	/*
+ 	 * Receive data until PQgetResult is ready to get the result without
+ 	 * blocking.
+ 	 */
+ 	while (PQisBusy(conn))
+ 	{
+ 		int		wc;
+ 
+ 		/* Sleep until there's something to do */
+ 		wc = WaitLatchOrSocket(MyLatch,
+ 			   WL_LATCH_SET | WL_SOCKET_READABLE,
+ 			   PQsocket(conn),
+ 			   -1L);
+ 		ResetLatch(MyLatch);
+ 
+ 		CHECK_FOR_INTERRUPTS();
+ 
+ 		/* Data available in socket */
+ 		if (wc & WL_SOCKET_READABLE)
+ 		{
+ 			if (!PQconsumeInput(conn))
+ pgfdw_report_error(ERROR, NULL, conn, false, query);
+ 		}
+ 	}
+ 
+ 	return PQgetResult(conn);
+ }
+ 
+ /*
   * Report an error we got from the remote 

[HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-18 Thread Noah Misch
On Mon, Apr 18, 2016 at 11:56:55PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Apr 18, 2016 at 10:22:59PM -0400, Tom Lane wrote:
> >> We could alternatively set extra_float_digits to its max value and hope
> >> that off-by-one-in-the-last-place values would get printed as something
> >> visibly different from the exact result.  I'm not sure I want to trust
> >> that that works reliably; but maybe it would be worth printing the
> >> result both ways, just to provide additional info when there's a failure.
> 
> > We'd have an independent problem if extra_float_digits=3 prints the same
> > digits for distinguishable float values, so I wouldn't mind relying on it 
> > not
> > to do that.  But can we expect the extra_float_digits=3 representation of
> > those particular values to be the same for every implementation?
> 
> Hm?  The expected answer is exact (30, 45, or whatever) in each case.
> If we get some residual low-order digits then it's a failure, so we don't
> need to worry about whether it's the same failure everywhere.

Does something forbid snprintf implementations from printing '45'::float8 as
45.0001 under extra_float_digits=3?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-18 Thread Tom Lane
Noah Misch  writes:
> On Mon, Apr 18, 2016 at 10:22:59PM -0400, Tom Lane wrote:
>> We could alternatively set extra_float_digits to its max value and hope
>> that off-by-one-in-the-last-place values would get printed as something
>> visibly different from the exact result.  I'm not sure I want to trust
>> that that works reliably; but maybe it would be worth printing the
>> result both ways, just to provide additional info when there's a failure.

> We'd have an independent problem if extra_float_digits=3 prints the same
> digits for distinguishable float values, so I wouldn't mind relying on it not
> to do that.  But can we expect the extra_float_digits=3 representation of
> those particular values to be the same for every implementation?

Hm?  The expected answer is exact (30, 45, or whatever) in each case.
If we get some residual low-order digits then it's a failure, so we don't
need to worry about whether it's the same failure everywhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-18 Thread Etsuro Fujita

On 2016/04/19 12:26, Michael Paquier wrote:

On Tue, Apr 19, 2016 at 12:16 PM, Noah Misch  wrote:

On Sat, Apr 16, 2016 at 08:59:40AM +0900, Michael Paquier wrote:



Here is a new version. I just recalled that I forgot a PQclear() call
to clean up results.


Thanks for updating the patch!


Robert, the deadline to fix this open item expired eleven days ago.  The
thread had been seeing regular activity, but it has now been quiet for three
days.  Do you have an updated plan for fixing this open item?



Note for Robert: pgfdw_get_result copycats PQexec by discarding all
PQresult received except the last one. I think that's fine for the
purposes of postgres_fdw, but perhaps you have a different opinion on
the matter.


That seemed reasonable to me, but sorry, on second thought, I'm not sure 
that's still a good idea.  One reason is (1) I think it's better for the 
in-postgres_fdw.c functions using pgfdw_get_result to verify that there 
are no more results, in itself.  I think that would improve the 
robustness of those functions.  Another reason is I don't think 
pgfdw_report_error, which is used in pgfdw_get_result, works well for 
cases where the query contains multiple SQL commands.  So, +1 for the 
idea of simply encapsulating the while (PQisBusy(...)) loop into a new 
function pgfdw_get_result().


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-18 Thread Michael Paquier
On Tue, Apr 19, 2016 at 12:16 PM, Noah Misch  wrote:
> On Sat, Apr 16, 2016 at 08:59:40AM +0900, Michael Paquier wrote:
>> On Fri, Apr 15, 2016 at 9:46 PM, Michael Paquier
>>  wrote:
>> > On Fri, Apr 15, 2016 at 8:25 PM, Etsuro Fujita
>> >  wrote:
>> >> How about doing something similar for PQprepare/PQexecPrepared in
>> >> postgresExecForeignInsert, postgresExecForeignUpdate, and
>> >> postgresExecForeignDelete?
>> >
>> > Yes, I hesitated to touch those, but they are good candidates for this
>> > new interface, and actually it has proved not to be complicated to
>> > plug in the new routines in those code paths.
>> >
>> >> Also, how about doing that for PQexec in connection.c?
>> >
>> > Here I disagree, this is not adapted. All the PQexec calls are part of
>> > callbacks that are triggered on failures, and we rely on such a
>> > callback when issuing PQcancel. do_sql_command runs commands that take
>> > a short amount of time, so I think as well that it is fine as-is with
>> > PQexec.
>>
>> Here is a new version. I just recalled that I forgot a PQclear() call
>> to clean up results.
>
> Robert, the deadline to fix this open item expired eleven days ago.  The
> thread had been seeing regular activity, but it has now been quiet for three
> days.  Do you have an updated plan for fixing this open item?

Note for Robert: pgfdw_get_result copycats PQexec by discarding all
PQresult received except the last one. I think that's fine for the
purposes of postgres_fdw, but perhaps you have a different opinion on
the matter.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-04-18 Thread Noah Misch
On Sat, Apr 16, 2016 at 08:59:40AM +0900, Michael Paquier wrote:
> On Fri, Apr 15, 2016 at 9:46 PM, Michael Paquier
>  wrote:
> > On Fri, Apr 15, 2016 at 8:25 PM, Etsuro Fujita
> >  wrote:
> >> How about doing something similar for PQprepare/PQexecPrepared in
> >> postgresExecForeignInsert, postgresExecForeignUpdate, and
> >> postgresExecForeignDelete?
> >
> > Yes, I hesitated to touch those, but they are good candidates for this
> > new interface, and actually it has proved not to be complicated to
> > plug in the new routines in those code paths.
> >
> >> Also, how about doing that for PQexec in connection.c?
> >
> > Here I disagree, this is not adapted. All the PQexec calls are part of
> > callbacks that are triggered on failures, and we rely on such a
> > callback when issuing PQcancel. do_sql_command runs commands that take
> > a short amount of time, so I think as well that it is fine as-is with
> > PQexec.
> 
> Here is a new version. I just recalled that I forgot a PQclear() call
> to clean up results.

Robert, the deadline to fix this open item expired eleven days ago.  The
thread had been seeing regular activity, but it has now been quiet for three
days.  Do you have an updated plan for fixing this open item?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel query vs extensions

2016-04-18 Thread Noah Misch
On Mon, Apr 18, 2016 at 09:56:28AM -0400, Robert Haas wrote:
> On Fri, Apr 15, 2016 at 12:45 AM, Jeff Janes  wrote:
> > Should every relevant contrib extension get a version bump with a
> > transition file which is nothing but a list of "alter function blah
> > blah blah parallel safe" ?
> 
> Yes, I think that's what we would need to do.  It's a lot of work,
> albeit mostly mechanical.

This is in the open items list, but I think it is too late to include such a
change in 9.6.  This is an opportunity for further optimization, not a defect.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-18 Thread Noah Misch
On Mon, Apr 18, 2016 at 10:22:59PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Apr 18, 2016 at 09:17:46AM -0400, Tom Lane wrote:
> >> Given that it's apparently showing the results of asind as NULL ...
> 
> > I doubt asind is returning NULL.  Here's the query, which uses a CASE to
> > report NULL if asind returns any value not on a whitelist:
> 
> > SELECT x,
> >CASE WHEN asind(x) IN (-90,-30,0,30,90) THEN asind(x) END AS asind,
> >CASE WHEN acosd(x) IN (0,60,90,120,180) THEN acosd(x) END AS acosd,
> >CASE WHEN atand(x) IN (-45,0,45) THEN atand(x) END AS atand
> > FROM (VALUES (-1), (-0.5), (0), (0.5), (1)) AS t(x);
> 
> Oh, duh --- should have checked the query.  Yes, the most probable theory
> must be that it's returning something that's slightly off from the exact
> value.
> 
> > I can see the benefit for atand(-0.5) and for atand(0.5), since those are
> > inexact.  Does the CASE gain us anything for asind or acosd?
> 
> None of these are expected to be inexact.  The point of the CASE is to
> make it obvious if what's returned isn't *exactly* what we expect.

Ah, got it.

> We could alternatively set extra_float_digits to its max value and hope
> that off-by-one-in-the-last-place values would get printed as something
> visibly different from the exact result.  I'm not sure I want to trust
> that that works reliably; but maybe it would be worth printing the
> result both ways, just to provide additional info when there's a failure.

We'd have an independent problem if extra_float_digits=3 prints the same
digits for distinguishable float values, so I wouldn't mind relying on it not
to do that.  But can we expect the extra_float_digits=3 representation of
those particular values to be the same for every implementation?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-18 Thread Tom Lane
Noah Misch  writes:
> On Mon, Apr 18, 2016 at 09:17:46AM -0400, Tom Lane wrote:
>> Given that it's apparently showing the results of asind as NULL ...

> I doubt asind is returning NULL.  Here's the query, which uses a CASE to
> report NULL if asind returns any value not on a whitelist:

> SELECT x,
>CASE WHEN asind(x) IN (-90,-30,0,30,90) THEN asind(x) END AS asind,
>CASE WHEN acosd(x) IN (0,60,90,120,180) THEN acosd(x) END AS acosd,
>CASE WHEN atand(x) IN (-45,0,45) THEN atand(x) END AS atand
> FROM (VALUES (-1), (-0.5), (0), (0.5), (1)) AS t(x);

Oh, duh --- should have checked the query.  Yes, the most probable theory
must be that it's returning something that's slightly off from the exact
value.

> I can see the benefit for atand(-0.5) and for atand(0.5), since those are
> inexact.  Does the CASE gain us anything for asind or acosd?

None of these are expected to be inexact.  The point of the CASE is to
make it obvious if what's returned isn't *exactly* what we expect.

We could alternatively set extra_float_digits to its max value and hope
that off-by-one-in-the-last-place values would get printed as something
visibly different from the exact result.  I'm not sure I want to trust
that that works reliably; but maybe it would be worth printing the
result both ways, just to provide additional info when there's a failure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-18 Thread Noah Misch
On Mon, Apr 18, 2016 at 09:17:46AM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Mon, Apr 18, 2016 at 12:31 PM, Peter Eisentraut  wrote:
> >> I don't know if it's worth tracking this as an open item and thus
> >> kind-of release blocker if no one else has the problem.  But I
> >> definitely still have it.  My initial suspicion was that this had
> >> something to do with a partial upgrade to gcc 6 (not yet released), in
> >> other words a messed up system.  But I was able to reproduce it in a
> >> freshly installed chroot.  It only happens with various versions of gcc,
> >> but not with clang.  So I'm going to have to keep digging.
> 
> > gcc is moving slowly but surely to have 6.0 in a released state btw:
> > https://gcc.gnu.org/ml/gcc/2016-04/msg00103.html
> 
> Given that it's apparently showing the results of asind as NULL, the
> theory that comes to mind is some sort of optimization issue affecting
> the output tuple's null-flags.  I have no idea why only this test would
> be affected, though.  Anyway, a good way to test that theory would be
> to see if the -O level affects it.

I doubt asind is returning NULL.  Here's the query, which uses a CASE to
report NULL if asind returns any value not on a whitelist:

SELECT x,
   CASE WHEN asind(x) IN (-90,-30,0,30,90) THEN asind(x) END AS asind,
   CASE WHEN acosd(x) IN (0,60,90,120,180) THEN acosd(x) END AS acosd,
   CASE WHEN atand(x) IN (-45,0,45) THEN atand(x) END AS atand
FROM (VALUES (-1), (-0.5), (0), (0.5), (1)) AS t(x);

I can see the benefit for atand(-0.5) and for atand(0.5), since those are
inexact.  Does the CASE gain us anything for asind or acosd?

Results under -O0 would be a helpful data point, nonetheless.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Coverage report

2016-04-18 Thread Alvaro Herrera
Hi,

Pursuant to promises made in Brussels a couple of months ago, I set up a
machine to run and expose the "make coverage" report under "make
check-world".  Some people have already heard about this.

I would like to collect ideas on how to improve this.  For example

* Should we run something other than "make check-world"  As far as I
know, that covers all or almost all the tests we have; are there things
that we should have and are not running?  If so, how do we go about
enabling them?

* buildfarm doesn't run make check-world for reasons of its own.  Maybe
we should run exactly what a typical buildfarm animal runs?  That way,
the coverage report would be closer to what we're actually verifying.

* Should we keep historical reports to study how numbers change in time?
Maybe save one report a week or something like that?

* Any other comments?

-- 
Álvaro Herrerahttp://www.linkedin.com/in/alvherre


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.6 scariest patch tournament

2016-04-18 Thread Josh berkus
On 04/18/2016 11:37 AM, Alvaro Herrera wrote:
> Hackers, lurkers,
> 
> The PostgreSQL Project needs you!
> 
> The Release Management Team would like your input regarding the patch or
> patches which, in your opinion, are the most likely sources of major
> bugs or instabilities in PostgreSQL 9.6.
> 
> Please submit your answers before May 1st using this form:
> https://docs.google.com/forms/d/1xNNqhXC116wCMnomqGz9RQ7OuVwZqAcEre7iiU6pT20/viewform
> 
> If, for some reason, you prefer not to fill that form or have further
> input on the topic, you can correspond via private email to one or more
> members of the RMT,
> 
>   Robert Haas 
>   Alvaro Herrera 
>   Noah Misch 
> 
> The RMT will publish aggregate, unattributed results after the poll
> closes.

We should send the owner of the scariest patch something as a prize.
Maybe a plastic skeleton or something ...

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.6 scariest patch tournament

2016-04-18 Thread Bill Moran
On Mon, 18 Apr 2016 15:37:21 -0300
Alvaro Herrera  wrote:

> Hackers, lurkers,
> 
> The PostgreSQL Project needs you!
> 
> The Release Management Team would like your input regarding the patch or
> patches which, in your opinion, are the most likely sources of major
> bugs or instabilities in PostgreSQL 9.6.

Wow ... this may be the most unusual survey I've seen in a while.

-- 
Bill Moran


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SET ROLE and reserved roles

2016-04-18 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Apr 15, 2016 at 11:56 AM, Stephen Frost  wrote:
> > Perhaps it would be helpful to return to the initial goal of these
> > default roles.
> >
> > We've identified certain privileges which are currently superuser-only
> > and we would like to be able to be GRANT those to non-superuser roles.
> > Where our GRANT system is able to provide the granularity desired, we
> > have simply removed the superuser checks, set up appropriate default
> > values and, through the "pg_dump dump catalog ACLs" patch, allowed
> > administrators to make the changes to those objects via the
> > 'GRANT privilege ON object TO role' system.
> >
> > For those cases where our GRANT system is unable to provide the
> > granularity desired and it isn't sensible to extend the GRANT system to
> > cover the exact granularity we wish to provide, we needed to come up
> > with an alternative approach.  That approach is the use of special
> > default roles, where we can allow exactly the level of granularity
> > desired and give administrators a way to grant those privileges to
> > users.
> >
> > What this means in a nutshell is that we've collectivly decided that
> > we'd rather support:
> >
> > /* uses the 'GRANT role TO role' system */
> > GRANT pg_signal_backend TO test_user;
> >
> > than:
> >
> > /*
> >  * uses the 'GRANT privilege ON object TO role' system
> >  * seems like cluster-level is actually the right answer here, but
> >  * we don't support such an object type currently, so using database
> >  * for the example
> >  */
> > GRANT SIGNAL BACKEND ON DATABASE my_database TO test_user;
> >
> > The goal being that the result of the two commands is identical (where
> > the second command is only hypothetical at this point, but hopefully the
> > meaning is conveyed).
> 
> I quite understand all of that.  The problem isn't that I don't
> understand what you did.  The problem is that I don't agree with it.

Ok.  Your prior comment was "I don't get it."  That's why I was
trying to explain the intent and the reasoning behind it.  I guess I
misunderstodd what you meant with that comment.

> I don't think that the way you implemented is a good idea, nor do I
> think it reflects the consensus that was reached on list.  There was
> agreement to create some special magic roles.  There was not agreement
> around the special magic you've sprinkled on those roles that makes
> them work unlike all other roles in the system, and I think that
> special magic is a bad idea.

The concern was raised by Noah about these users being able to own
objects here: 20160221022639.ga486...@tornado.leadboat.com.  Admittedly,
he merely brought it up as a potential concern without specifying a
preference, but after considering it, I realized that there are issues
with default roles being able to own objects and attempted to address
that issue.  I don't think it would have been appropriate to ignore
these issues.

> > However, GRANT'ing a role to a user traditionally also allows the user
> > to 'SET ROLE' to that user, to create objects as that user, and other
> > privileges.  This results in two issues, as I see it:
> >
> > 1) The administrator who is looking to grant the specific 'signal
> >backend' privilege to a user is unlikely to intend or desire for that
> >user to also be able to SET ROLE to the role, or for that user to be
> >able to create objects as the default role.
> 
> I don't think being able to SET ROLE to that role is something that we
> should be trying to prevent.  You're already discovering why.  You
> tried to create this new special kind of role that you can't become,
> and there are already various reports of cases that you've missed.
> You will keep finding more for a while, I predict.  If that role is
> minimally privileged, who cares if users can become it?

Primairly because objects could be created as that user.  I agree that
in well run systems, it's unlikely that will happen, but we can't simply
trust that if we wish to be able to remove or change these roles down
the road.

> > 2) If the default role ends up owning objects then we have much less
> >flexibility in making changes to that role because we must ensure
> >that those objects are preserved across upgrades, etc.
> 
> Tom already said his piece on this point, and I think he's right.  You
> re-stating the argument doesn't change that.  In any case, if we want
> to prevent this, I bet there's some less buggy and invasive way in
> which it could be done than what you've actually chosen.

I re-stated the concern because I don't think it should be missed or
overlooked lightly because it implies requirements on us down the road,
particularly as I had understood from your prior response that it
wasn't clear what the concern was.

I don't mind removing the checks which were added to attempt to prevent
these roles from owning objects, to be clear.  It'd certainly simplify
things, today.  

Re: [HACKERS] pg_upgrade documentation improvement patch

2016-04-18 Thread Alvaro Herrera
Tom Lane wrote:
> Peter Eisentraut  writes:
> > Interesting to me would be a way, perhaps with an option in initdb, to
> > just say, initialize this cluster compatibly with that other cluster, so
> > you don't have to worry about these details.
> 
> Good idea, though I'd think of it as a pg_upgrade option more than being
> initdb's problem.

Agreed.  It's always seemed to me that having pg_upgrade require you to
run initdb is unfriendly; it seems so much more convenient to have it do
the initdb etc for you, where you just specify the target directory
(probably created ahead of time because of ownership -- but initdb
already knows how to deal with that case).

Also, pg_upgrade receiving a pre-existing cluster is inconvenient
because it needs to verify that it's empty etc, for no good reason.

> Either way, though, it would be on the code's head to do something
> about converting the postgresql.conf, pg_hba.conf, etc configuration
> files from the old cluster to the new, which means this isn't just a
> trivial matter of running initdb on the target PGDATA location.  That
> turns it into a bit of a research project I'm afraid --- but if we
> could get there, it'd be a nice improvement.

I don't think we've ever had a backwards-incompatible change in
pg_hba.conf (so we could just copy it over verbatim), and the
postgresql.conf changes should be mostly easy to deal with (so we could
just copy it over and modify a small number of lines), but I wonder if
things like nonstandard locations of config files might be problematic.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade documentation improvement patch

2016-04-18 Thread Yuri Niyazov
On Wed, Apr 13, 2016 at 6:50 AM, Tom Lane  wrote:

> Peter Eisentraut  writes:
> > Interesting to me would be a way, perhaps with an option in initdb, to
> > just say, initialize this cluster compatibly with that other cluster, so
> > you don't have to worry about these details.
>
> Good idea, though I'd think of it as a pg_upgrade option more than being
> initdb's problem.  Either way, though, it would be on the code's head
> to do something about converting the postgresql.conf, pg_hba.conf, etc
> configuration files from the old cluster to the new, which means this
> isn't just a trivial matter of running initdb on the target PGDATA
> location.  That turns it into a bit of a research project I'm afraid
> --- but if we could get there, it'd be a nice improvement.
>
> regards, tom lane
>


There were other things that happened while doing this cluster upgrade that
required more documentation searching - I believe some wal-related
configuration options that go into postgresql.conf were deprecated in 9.5,
so the server complained upon starting up - however, the documentation in
that case was pretty clear about how to replace the old with the new.

The phrase "Many prebuilt installers do this step automatically" - it was
there originally, and I left it in, but I don't know any details about it.
If this is true, then it seems to me that the work that goes into that can
profitably go into pg_upgrade, no?

>From the point of view of a PG-admin noob like me, it's unclear *from the
documentation* to what extent locale and encoding at the cluster level must
match. Certainly the relatively stern phrase "Again, use compatible initdb
flags that match the old cluster" in the documentation stopped me in my
tracks until I got some further clarification, because the consequences of
not doing so were not mentioned at all, and I lean towards conservatism
when it comes to scary things like upgrading a production machine across a
major version release.

Should I update the documentation patch to instruct the use of
pg_controldata exclusively?


[HACKERS] Postgres 9.6 scariest patch tournament

2016-04-18 Thread Alvaro Herrera
Hackers, lurkers,

The PostgreSQL Project needs you!

The Release Management Team would like your input regarding the patch or
patches which, in your opinion, are the most likely sources of major
bugs or instabilities in PostgreSQL 9.6.

Please submit your answers before May 1st using this form:
https://docs.google.com/forms/d/1xNNqhXC116wCMnomqGz9RQ7OuVwZqAcEre7iiU6pT20/viewform

If, for some reason, you prefer not to fill that form or have further
input on the topic, you can correspond via private email to one or more
members of the RMT,

Robert Haas 
Alvaro Herrera 
Noah Misch 

The RMT will publish aggregate, unattributed results after the poll
closes.

Thanks,

-- 
Álvaro Herrera


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Tom Lane
Kevin Grittner  writes:
> On Mon, Apr 18, 2016 at 8:50 AM, Tom Lane  wrote:
>> Surely there was another way to get a similar end result without
>> mucking with things at the level of BufferGetPage.

> To get the feature that some customers have been demanding, a check
> has to be made somewhere near where any page is read in a scan.

I'm not really convinced that we need to define it exactly like that,
though.  In particular, why not just kill transactions as soon as their
oldest snapshot is too old?  That might not work exactly like this does,
but it would have some pretty substantial benefits --- for one, that the
timeout could be configured locally per session rather than having to be
PGC_POSTMASTER.  And it would likely be far easier to limit the
performance side-effects.

I complained about this back when the feature was first discussed,
and you insisted that that answer was no good, and I figured I'd hold
my nose and look the other way as long as the final patch wasn't too
invasive.  Well, now we've seen the end result, and it's very invasive
and has got performance issues as well.  It's time to reconsider.

Or in short: this is a whole lot further than I'm prepared to go to
satisfy one customer with a badly-designed application.  And from what
I can tell from the Feb 2015 discussion, that's what this has been
written for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relax requirement for INTO with SELECT in pl/pgsql

2016-04-18 Thread Pavel Stehule
2016-04-11 15:37 GMT+02:00 Merlin Moncure :

> On Sun, Apr 10, 2016 at 3:13 AM, Pavel Stehule 
> wrote:
> >
> > Hi
> >
> > 2016-03-21 22:13 GMT+01:00 Pavel Stehule :
> >>
> >> Hi
> >>
> >> 2016-03-21 21:24 GMT+01:00 Merlin Moncure :
> >>>
> >>> Patch is trivial (see below), discussion is not :-).
> >>>
> >>> I see no useful reason to require INTO when returning data with
> >>> SELECT.  However, requiring queries to indicate not needing data via
> >>> PERFORM causes some annoyances:
> >>>
> >>> *) converting routines back and forth between pl/pgsql and pl/sql
> >>> requires needless busywork and tends to cause errors to be thrown at
> >>> runtime
> >>>
> >>> *) as much as possible, (keywords begin/end remain a problem),
> >>> pl/pgsql should be a superset of sql
> >>>
> >>> *) it's much more likely to be burned by accidentally forgetting to
> >>> swap in PERFORM than to accidentally leave in a statement with no
> >>> actionable target.  Even if you did so in the latter case, it stands
> >>> to reason you'd accidentally leave in the target variable, too.
> >>>
> >>> *) the PERFORM requirement hails from the days when only statements
> >>> starting with SELECT return data.  There is no PERFORM equivalent for
> >>> WITH/INSERT/DELETE/UPDATE and there are real world scenarios where you
> >>> might have a RETURNING clause that does something but not necessarily
> >>> want to place the result in a variable (for example passing to
> >>> volatile function).  Take a look at the errhint() clause below -- we
> >>> don't even have a suggestion in that case.
> >>>
> >>> This has come up before, and there was a fair amount of sympathy for
> >>> this argument albeit with some dissent -- notably Pavel.  I'd like to
> >>> get a hearing on the issue -- thanks.  If we decide to move forward,
> >>> this would effectively deprecate PERFORM and the documentation will be
> >>> suitably modified as well.
> >>
> >>
> >
> > here is another argument why this idea is not good.
> >
> >
> http://stackoverflow.com/questions/36509511/error-query-has-no-destination-for-result-data-when-writing-pl-pgsql-function
> >
> > Now, when people coming from T-SQL world use some T-SQL constructs, then
> usually the code should not work with the error "query has not destination
> for data ... "
> >
> > When PLpgSQL will be more tolerant, then their code will be executed
> without any error, but will not work.
>
> I don't think it's a problem requiring people to use RETURN in order
> to return data from the function.
>
> SQL functions BTW happily discard results and it's never been an issue
> there FWICT.  To address your other argument given below, there are
> valid cases where you'd use RETURNING without having any interest in
> capturing the set.  For example, you might have a volatile function,
> v_func() that does something and returns a value that may not be
> essential to the caller (say, a count of rows adjusted).
>
> INSERT INTO foo ...
> RETURNING v_func(foo.x);
>
> Scenarios (even if not very common) where dummy variables are required
> and/or queries have to be written into more complex forms (say, into a
> CTE) where you would not have to do so outside pl/pgsql greatly
> outweigh your points that, 'users might do the wrong thing'.  The
> wrong thing is actually the right thing in some cases.
>
> Small aside here: One thing that t-sql did right and pl/sql did wrong
> was to make the language a proper superset of sql.  pl/pgsql's
> hijacking INTO, BEGIN, END, and EXECUTE are really unfortunate as are
> any behaviors that are incompatible with the regular language (like
> requiring PERFORM); they fork the language and make building stored
> procedures in pl/pgsql much more difficult if not impossible.  I'm not
> sure this is a really solvable problem, but at least it can be nibbled
> at.
>
> What are the rules for pl/psm?
>

SQL/PSM knows only "select statement: single row" - subclause 12.5 - and it
is reference to ANSI SQL foundation -  subclause 14.7 - where is defined
SELECT INTO. INTO is mandatory. No other SELECT form is possible.

This is defined in ANSI SQL 2011 - I have not access to more current drafts.

I read a Oracle doc - there INTO or BULK COLLECT clauses are required.

Regards

Pavel


>
> merlin
>


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Kevin Grittner
On Mon, Apr 18, 2016 at 8:50 AM, Tom Lane  wrote:

> Surely there was another way to get a similar end result without
> mucking with things at the level of BufferGetPage.

To get the feature that some customers have been demanding, a check
has to be made somewhere near where any page is read in a scan.  It
didn't take me long in working on this to notice that grepping for
BufferGetPage() calls was a good way to find candidate spots to
insert the check (even if only 7% of BufferGetPage() calls need to
be followed by such a check) -- but the BufferGetPage() itself
clearly does *not* need to be modified to implement the feature.

We could:

(1) Add calls to a check function where needed, and just document
that addition of a BufferGetPage() call should be considered a clue
that a new check might be needed.  (original plan)

(2) Replace the 7% of the BufferGetPage() calls that need to check
the age of the snapshot with something that wraps the two function
calls, and does nothing but call one and then the other.  (favored
by Michael)

(3) Add parameters to BufferGetPage() to specify whether the check
is needed and provide sufficient information to perform the check
if it is.  (current master)

(4) Replace BufferGetPage() with some other function name having
the characteristics of (3) to minimize back-patch pain.
(grudgingly favored by Álvaro)

(5) Revert from community code, leaving it as an EDB value-add
Advanced Server feature.

Does someone see another (better) alternative?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-18 Thread Pavel Stehule
2016-04-18 17:44 GMT+02:00 Aleksander Alekseev :

> > It depends - can be allowed only one - like plpgsql extensions, or
> > can be serialized like pg log extensions
>
> OK, I understand "can be allowed only one". I doubt it would be a very
> useful feature though.
>
> I'm not sure what do you mean by "serialized like pg log extensions".
> Lets say extension A allows "SELECT ASAP ..." queries and extension B
> --- "... ORDER BY RANK". What happens when user executes "SELECT
> ASAP ... ORDER BY RANK" query?
>

no - it is not possible. Not with Bison parser - it cannot work with
unknown syntax - so isn't possible implement one part by parser A, and
second part by parser B.

But we can parsers P1 and P2. P1 knows string XX, P2 knows YY. Buildin
parser (BP) knows SQL

We can have registered parsers P1, P2, BP.

for string SELECT

P1 fails,
P2 fails,
BP processes it

for string YY

P1 fails,
P2 process it,
BP is not called

But transformations can be allowed too (but it is slower)

for string 

P1 does transformation to YYY
P2 does transformation to SELECT
BP processes it




--
> Best regards,
> Aleksander Alekseev
> http://eax.me/
>


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Robert Haas
On Mon, Apr 18, 2016 at 11:53 AM, Robert Haas  wrote:
> On Mon, Apr 18, 2016 at 11:34 AM, Tom Lane  wrote:
>> Andres Freund  writes:
>>> On 2016-04-18 11:24:07 -0400, Tom Lane wrote:
 (The thing that gave me pause about this was noticing that I could not
 start two such postmasters concurrently on my RHEL6 box, without changing
 the default system limits on number of SysV semaphores.)
>>
>>> Hm, is that different before/after the patch? Because afaics at around
>>> NBuffers = 1000, the number of semaphores should be lower after the
>>> patch?
>>
>> Yeah, that was probably the argument we used before.  But it's true that a
>> postmaster built with --disable-spinlocks is harder to test than one built
>> without (because you're going from ~100 semas to ~1100 semas, at default
>> configuration).  If we believe that the main real use-case for this switch
>> is testing, that's not a very nice property.
>>
>> The bottom line IMO is that --disable-spinlocks is actually not that awful
>> for low-stress, low-concurrency scenarios; for example, it doesn't change
>> the runtime of make installcheck-parallel very much for me.  On the other
>> hand, for high-concurrency scenarios you'd darn well better get a real
>> spinlock implementation.  Given that sort of scope of use, it seems like
>> a hundred or so underlying semas should be plenty, and it'd be less likely
>> to cause operational problems than 1024.
>
> I suppose that's probably true.  I thought surely any system worth its
> salt wouldn't have a problem with 1024 semaphores, but a quick Google
> search for "hp-ux semmns" turned up this page:

Oops, hit send too soon.

http://docstore.mik.ua/manuals/hp-ux/en/B2355-60130/semmni.5.html

That's from 2007 and indicates a default maximum of 2048.  So it would
be fairly easy to hit the limit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory leak in GIN index build

2016-04-18 Thread Julien Rouhaud
On 18/04/2016 16:33, Tom Lane wrote:
> 
> I poked at this over the weekend, and got more unhappy the more I poked.
> Aside from the memory leakage issue, there are multiple coding-rule
> violations besides the one you noted about scope of the critical sections.
> One example is that in the page-split path for an inner page, we modify
> the contents of childbuf long before getting into the critical section.
> The number of out-of-date comments was depressingly large as well.
> 
> I ended up deciding that the most reasonable fix was along the lines of
> my first alternative above.  In the attached draft patches, the
> "placeToPage" method is split into two methods, "beginPlaceToPage" and
> "execPlaceToPage", where the second method is what's supposed to happen
> inside the critical section for a non-page-splitting update.  Nothing
> that's done inside beginPlaceToPage is critical.
> 
> I've tested this to the extent of verifying that it passes make
> check-world and stops the memory leak in your test case, but it could use
> more eyeballs on it.
> 

Thanks! I also started working on it but it was very far from being
complete (and already much more ugly...).

I couldn't find any problem in the patch.

I wonder if asserting being in a critical section would be valuable in
the *execPlaceToPage functions, or that (leaf->walinfolen > 0) in
dataExecPlaceToPageLeaf(), since it's filled far from this function.

> Attached are draft patches against HEAD and 9.5 (they're the same except
> for BufferGetPage calls).  I think we'd also want to back-patch to 9.4,
> but that will take considerable additional work because of the XLOG API
> rewrite that happened in 9.5.  I also noted that some of the coding-rule
> violations seem to be new in 9.5, so the problems may be less severe in
> 9.4 --- the memory leak definitely exists there, though.
> 

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Robert Haas
On Mon, Apr 18, 2016 at 11:34 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-04-18 11:24:07 -0400, Tom Lane wrote:
>>> (The thing that gave me pause about this was noticing that I could not
>>> start two such postmasters concurrently on my RHEL6 box, without changing
>>> the default system limits on number of SysV semaphores.)
>
>> Hm, is that different before/after the patch? Because afaics at around
>> NBuffers = 1000, the number of semaphores should be lower after the
>> patch?
>
> Yeah, that was probably the argument we used before.  But it's true that a
> postmaster built with --disable-spinlocks is harder to test than one built
> without (because you're going from ~100 semas to ~1100 semas, at default
> configuration).  If we believe that the main real use-case for this switch
> is testing, that's not a very nice property.
>
> The bottom line IMO is that --disable-spinlocks is actually not that awful
> for low-stress, low-concurrency scenarios; for example, it doesn't change
> the runtime of make installcheck-parallel very much for me.  On the other
> hand, for high-concurrency scenarios you'd darn well better get a real
> spinlock implementation.  Given that sort of scope of use, it seems like
> a hundred or so underlying semas should be plenty, and it'd be less likely
> to cause operational problems than 1024.

I suppose that's probably true.  I thought surely any system worth its
salt wouldn't have a problem with 1024 semaphores, but a quick Google
search for "hp-ux semmns" turned up this page:


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-18 Thread Aleksander Alekseev
> It depends - can be allowed only one - like plpgsql extensions, or
> can be serialized like pg log extensions

OK, I understand "can be allowed only one". I doubt it would be a very
useful feature though.

I'm not sure what do you mean by "serialized like pg log extensions".
Lets say extension A allows "SELECT ASAP ..." queries and extension B
--- "... ORDER BY RANK". What happens when user executes "SELECT
ASAP ... ORDER BY RANK" query?

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we remove unused code?

2016-04-18 Thread Aleksander Alekseev
Hello, Tom.

Thanks for your reply.

> The fact that make check-world doesn't run any code that uses some
> feature isn't a great argument for removing it.  Consider third-party
> extensions, for starters.

What is current policy regarding such sort of things? Is everything
that is in .h files considered a public API and should be backward
compatible? In this case - for how long, forever or until next major
release? Or lets say there is some API that is not used neither
internally nor by any package available on PGXN. Can we break API then?

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Tom Lane
Andres Freund  writes:
> On 2016-04-18 11:24:07 -0400, Tom Lane wrote:
>> (The thing that gave me pause about this was noticing that I could not
>> start two such postmasters concurrently on my RHEL6 box, without changing
>> the default system limits on number of SysV semaphores.)

> Hm, is that different before/after the patch? Because afaics at around
> NBuffers = 1000, the number of semaphores should be lower after the
> patch?

Yeah, that was probably the argument we used before.  But it's true that a
postmaster built with --disable-spinlocks is harder to test than one built
without (because you're going from ~100 semas to ~1100 semas, at default
configuration).  If we believe that the main real use-case for this switch
is testing, that's not a very nice property.

The bottom line IMO is that --disable-spinlocks is actually not that awful
for low-stress, low-concurrency scenarios; for example, it doesn't change
the runtime of make installcheck-parallel very much for me.  On the other
hand, for high-concurrency scenarios you'd darn well better get a real
spinlock implementation.  Given that sort of scope of use, it seems like
a hundred or so underlying semas should be plenty, and it'd be less likely
to cause operational problems than 1024.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Robert Haas
On Mon, Apr 18, 2016 at 11:24 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-04-18 11:07:08 -0400, Tom Lane wrote:
>>> Did you want to actually review this patch, or should I just push it?
>
>> No, I'm good, you should push it. I did a quick scan of the patch, and
>> it looks sane. For a second I was concerned that there might be a
>> situation in which this patch increases the total number of semaphore
>> needed, which might make backpatching a bit problematic - but it appears
>> that that'd be a very absurd configuration.
>
> I was actually wondering whether it'd make sense to cut the number of
> underlying semaphores to 128 or 512 or thereabouts.  But I think we had
> that discussion when the daa7527afc227443 patch went in to begin with,
> and convinced ourselves that 1024 was okay.  Robert, do you recall the
> reasoning?

I don't recall a specific discussion about it, but the number we would
have needed before was gigantic - one per lwlock, which is typically
going to be far more than 1024.  I mean, at shared_buffers=32MB, there
are 4096 buffers using two lwlocks each... never mind everything else
that uses lwlocks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-18 Thread Pavel Stehule
2016-04-18 17:25 GMT+02:00 Aleksander Alekseev :

> > I cannot to imagine extensible parser based on bison. But the parser
> > can be replaced by custom parser.
> >
> > Some like pgpool or pgbouncer does. The extension can assign own
> > parser. Custom parser will be called first, and the integrated parser
> > will be used from extension or as fallback. This can helps with new
> > statements for background workers, theoretically it can helps with
> > extending PostgreSQL SQL. Custom parser can do translation from SQL1
> > to SQL2 dialect, or can do translation from SQL1 to internal calls.
> > The custom parser usually should not implement full SQL - only few
> > statements.
> >
> > Is it this idea more workable?
>
> What if there are two or more contribs that extend the parser? Can we
> be sure that these contribs will not conflict?
>

It depends - can be allowed only one - like plpgsql extensions, or can be
serialized like pg log extensions

Regards

Pavel


>
> --
> Best regards,
> Aleksander Alekseev
> http://eax.me/
>


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Andres Freund
On 2016-04-18 11:24:07 -0400, Tom Lane wrote:
> (The thing that gave me pause about this was noticing that I could not
> start two such postmasters concurrently on my RHEL6 box, without changing
> the default system limits on number of SysV semaphores.)

Hm, is that different before/after the patch? Because afaics at around
NBuffers = 1000, the number of semaphores should be lower after the
patch?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-18 Thread Aleksander Alekseev
> I cannot to imagine extensible parser based on bison. But the parser
> can be replaced by custom parser.
> 
> Some like pgpool or pgbouncer does. The extension can assign own
> parser. Custom parser will be called first, and the integrated parser
> will be used from extension or as fallback. This can helps with new
> statements for background workers, theoretically it can helps with
> extending PostgreSQL SQL. Custom parser can do translation from SQL1
> to SQL2 dialect, or can do translation from SQL1 to internal calls.
> The custom parser usually should not implement full SQL - only few
> statements.
> 
> Is it this idea more workable?

What if there are two or more contribs that extend the parser? Can we
be sure that these contribs will not conflict?

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Tom Lane
Andres Freund  writes:
> On 2016-04-18 11:07:08 -0400, Tom Lane wrote:
>> Did you want to actually review this patch, or should I just push it?

> No, I'm good, you should push it. I did a quick scan of the patch, and
> it looks sane. For a second I was concerned that there might be a
> situation in which this patch increases the total number of semaphore
> needed, which might make backpatching a bit problematic - but it appears
> that that'd be a very absurd configuration.

I was actually wondering whether it'd make sense to cut the number of
underlying semaphores to 128 or 512 or thereabouts.  But I think we had
that discussion when the daa7527afc227443 patch went in to begin with,
and convinced ourselves that 1024 was okay.  Robert, do you recall the
reasoning?

(The thing that gave me pause about this was noticing that I could not
start two such postmasters concurrently on my RHEL6 box, without changing
the default system limits on number of SysV semaphores.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-18 Thread Pavel Stehule
Hi

2016-04-12 7:10 GMT+02:00 Tom Lane :

> Arcadiy Ivanov  writes:
> > Is there any interest and/or tips to allow a pluggable parser or at
> > least allow some syntactical pluggability by extensions?
>
> There is a fair amount of discussion of this in the archives.  The short
> answer is that bison can't do it, and "let's replace bison" is a proposal
> with a steep hill in front of it --- the pain-to-gain ratio is just not
> very favorable.
>
> Forty years ago, I worked on systems with extensible parsers at HP,
> wherein plug-in extensions could add clauses very similarly to what
> you suggest.  Those designs worked, more or less, but they had a lot
> of deficiencies; the most fundamental problem being that any parsing
> inconsistencies would only appear through misbehavior at runtime,
> which you would only discover if you happened to test a case that behaved
> oddly *and* notice that it was not giving the result you expected.
> Bison is far from perfect on this angle, because %prec declarations can
> produce results you weren't expecting ... but it's at least one order of
> magnitude better than any extensible-parser solution I've ever seen.
> Usually bison will give you a shift/reduce error if you write something
> that has more than one possible interpretation.
>
> I'm interested in possible solutions to this problem, but it's far
> harder than it looks.
>

I cannot to imagine extensible parser based on bison. But the parser can be
replaced by custom parser.

Some like pgpool or pgbouncer does. The extension can assign own parser.
Custom parser will be called first, and the integrated parser will be used
from extension or as fallback. This can helps with new statements for
background workers, theoretically it can helps with extending PostgreSQL
SQL. Custom parser can do translation from SQL1 to SQL2 dialect, or can do
translation from SQL1 to internal calls. The custom parser usually should
not implement full SQL - only few statements.

Is it this idea more workable?

Regards

Pavel





>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Should we remove unused code?

2016-04-18 Thread Tom Lane
Aleksander Alekseev  writes:
> For instance there are two flags - HASH_SEGMENT and HASH_FFACTOR that
> are in fact never used (see attachment). I wonder whether we should
> keep code like this or not. 

> What do you think?

The fact that make check-world doesn't run any code that uses some
feature isn't a great argument for removing it.  Consider third-party
extensions, for starters.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Andres Freund
On 2016-04-18 11:07:08 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On April 16, 2016 6:02:39 PM PDT, Tom Lane  wrote:
> >> I went ahead and prepared and tested such a patch; the version for 9.3
> >> is attached.  (9.2 is identical modulo some pgindent-induced whitespace
> >> difference.)  This doesn't look too hazardous to me, so I'm thinking
> >> we should apply it.
> 
> > I can't look at the patch just now, but the plan sounds good. Of you rather 
> > have somebody look art the patch before, I can do tomorrow morning.
> 
> Did you want to actually review this patch, or should I just push it?

No, I'm good, you should push it. I did a quick scan of the patch, and
it looks sane. For a second I was concerned that there might be a
situation in which this patch increases the total number of semaphore
needed, which might make backpatching a bit problematic - but it appears
that that'd be a very absurd configuration.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Should we remove unused code?

2016-04-18 Thread Aleksander Alekseev
Hello

Today I've read this post:

http://blog.2ndquadrant.com/code-coverage/

I think its great and that everyone in this mailing list should
familiarize themselves with it.

Reading the coverage report, I discovered that some code is never
executed. Sometimes its OK (error reporting in out-of-memory cases),
sometimes its not (not enough tests). But there are also cases I'm not
sure about.

For instance there are two flags - HASH_SEGMENT and HASH_FFACTOR that
are in fact never used (see attachment). I wonder whether we should
keep code like this or not. 

What do you think?

-- 
Best regards,
Aleksander Alekseev
http://eax.me/
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 1a9f70c..9b32ceb 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -452,16 +452,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
 		hctl->num_partitions = info->num_partitions;
 	}
 
-	if (flags & HASH_SEGMENT)
-	{
-		hctl->ssize = info->ssize;
-		hctl->sshift = my_log2(info->ssize);
-		/* ssize had better be a power of 2 */
-		Assert(hctl->ssize == (1L << hctl->sshift));
-	}
-	if (flags & HASH_FFACTOR)
-		hctl->ffactor = info->ffactor;
-
 	/*
 	 * SHM hash tables have fixed directory size passed by the caller.
 	 */
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index 007ba2c..0646a72 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -81,9 +81,7 @@ typedef struct HASHCTL
 
 /* Flags to indicate which parameters are supplied */
 #define HASH_PARTITION	0x0001	/* Hashtable is used w/partitioned locking */
-#define HASH_SEGMENT	0x0002	/* Set segment size */
 #define HASH_DIRSIZE	0x0004	/* Set directory size (initial and max) */
-#define HASH_FFACTOR	0x0008	/* Set fill factor */
 #define HASH_ELEM		0x0010	/* Set keysize and entrysize */
 #define HASH_BLOBS		0x0020	/* Select support functions for binary keys */
 #define HASH_FUNCTION	0x0040	/* Set user defined hash function */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spinlocks and semaphores in 9.2 and 9.3

2016-04-18 Thread Tom Lane
Andres Freund  writes:
> On April 16, 2016 6:02:39 PM PDT, Tom Lane  wrote:
>> I went ahead and prepared and tested such a patch; the version for 9.3
>> is attached.  (9.2 is identical modulo some pgindent-induced whitespace
>> difference.)  This doesn't look too hazardous to me, so I'm thinking
>> we should apply it.

> I can't look at the patch just now, but the plan sounds good. Of you rather 
> have somebody look art the patch before, I can do tomorrow morning.

Did you want to actually review this patch, or should I just push it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor pg_dump as a library?

2016-04-18 Thread Tom Lane
Robert Haas  writes:
> On Fri, Apr 15, 2016 at 2:51 PM, Tom Lane  wrote:
>> The problem here is the connection to syscache; changing the behavior
>> of that, in a general context, is very scary.  What we might be able to
>> do that would satisfy pg_dump's needs is to invent a mode in which you
>> can run a read-only transaction that uses the transaction snapshot to
>> populate syscache (and then flushes the syscache at the end).

> I think that we could have an alternate set of functions which have
> the same interface as the syscache functions but using the transaction
> snapshot and don't actually cache anything, and it would be fine for
> what the pg_dump support functions need.

The problem with that approach is that then you are talking about building
duplicate copies of entire layers of the system.  For example, namespace.c
would have to be duplicated into one copy that uses syscache and one that
uses this not-quite-cache.  If it were *only* syscache.c that had to be
duplicated, probably this would work, but ruleutils.c depends on an awful
lot of code above that level.  Indeed, if it did not, the idea of
reimplementing it on the client side wouldn't be so unattractive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor pg_dump as a library?

2016-04-18 Thread Robert Haas
On Fri, Apr 15, 2016 at 2:51 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Apr 14, 2016 at 1:01 PM, Andres Freund  wrote:
>>> I'm not sure I find that convincing: The state portrayed by the syscache
>>> is th state COPY/SELECT et al will be using.  I think the angle to
>>> attack this is rather to allow blocking 'globally visible' DDL
>>> efficiently and correctly, rather than the hack pg_dump is using right now.
>
>> Maybe.  I think that idea of changing the pg_get_Xdef() stuff to use
>> the transaction snapshot rather than the latest snapshot might be
>> worth considering, too.
>
> The problem here is the connection to syscache; changing the behavior
> of that, in a general context, is very scary.  What we might be able to
> do that would satisfy pg_dump's needs is to invent a mode in which you
> can run a read-only transaction that uses the transaction snapshot to
> populate syscache (and then flushes the syscache at the end).  It would
> have to be a pretty "hard" notion of read-only, not the squishy one we
> have now, but I think it would work safely.  Anything that might otherwise
> break because of stale syscache entries should be prevented from having
> bad side-effects by the read-only restriction.

I think that we could have an alternate set of functions which have
the same interface as the syscache functions but using the transaction
snapshot and don't actually cache anything, and it would be fine for
what the pg_dump support functions need.  It wouldn't be nearly as
fast as the actual syscache, but server-side CPU when deparsing DDL
has never really been a problem to my knowledge.  I mean, if we had
something like this, it could also be sped up by adding a cache that
is flushed when the snapshot changes, but I'm not very sure we really
need to go that far.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing comment in pg_upgrade in regards to VACUUM FREEZE

2016-04-18 Thread Alvaro Herrera
Craig Ringer wrote:
> On 18 April 2016 at 12:11, David Rowley 
> wrote:
> 
> > this is not helping me much as I don't really understand why
> > pg_statistic need to be frozen?
> 
> Yeah, in particular it's not clear to me why pg_upgrade goes to such
> efforts to freeze everything when it copies pg_clog over in
> copy_clog_xlog_xid() .
> 
> Is it mainly a defense against the multixact format change?

Nothing to do with that.  The VACUUM FREEZE is executed on the new
database before migrating the old data over; it's there so that the
existing data has no trace of any permanent "normal" Xids from the
original counter.  Immediately afterwards we use pg_resetxlog to set the
counters to different values, and the normal values might be in a range
not covered by those, or they could even be marked aborted.  So a
visibility test would error out or return the wrong thing.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > I disagree.  A developer that sees an unadorned BufferGetPage() call
> > doesn't stop to think twice about whether they need to add a snapshot
> > test.  Many reviewers will miss the necessary addition also.  A
> > developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least
> > consider the idea that the flag might be right; if that developer
> > doesn't think about it, some reviewer may notice a new call with the
> > flag and consider the idea that the flag may be wrong.
> 
> I'm unconvinced ...

Well, nobody opposed this when I proposed it originally.  Robert just
stated that it caused a problem for him while backpatching but didn't
state opinion on reverting that change or not.  Maybe we should call for
a vote here.

> > I understand the backpatching pain argument, but my opinion was the
> > contrary of yours even so.
> 
> I merely point out that the problem came up less than ten days after
> that patch hit the tree.  If that does not give you pause about the
> size of the back-patching problem we've just introduced, it should.

Undersootd.  Kevin's idea of applying a no-op syntax change is on the
table.  I don't like it either, but ISTM better than the other options
so far.

> TBH, there is nothing that I like about this feature: not the underlying
> concept, not the invasiveness of the implementation, nothing.  I would
> dearly like to see it reverted altogether.  I do not think it is worth
> the pain that the current implementation will impose, both on developers
> and on potential users.  Surely there was another way to get a similar
> end result without mucking with things at the level of BufferGetPage.

Ah well, that's a completely different angle, and perhaps we should
explore this before doing anything in the back branches.

So it seems to me we have these options

1) revert the whole feature
2) revert the BufferGetPage syntax change
3) apply a no-op syntax change so that BufferGetPage looks the same on
   backbranches as it does on master today, keeping API and ABI
   compatibility with existing code
4) do nothing

Any others?  (If we decide to call for a vote, I suggest we open a new
thread)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-04-18 Thread Teodor Sigaev

Added, see attached patch (based on v3.1)


With this applied, I am getting a couple errors I have not seen before
after extensive crash recovery testing:
ERROR:  attempted to delete invisible tuple
ERROR:  unexpected chunk number 1 (expected 2) for toast value
100338365 in pg_toast_16425
Huh, seems, it's not related to GIN at all... Indexes don't play with toast 
machinery. The single place where this error can occur is a heap_delete() - 
deleting already deleted tuple.




I've restarted the test harness with intentional crashes turned off,
to see if the problems are related to crash recovery or are more
generic than that.

I've never seen these particular problems before, so don't have much
insight into what might be going on or how to debug it.
Check my reasoning: In version 4 I added a remebering of tail of pending list 
into blknoFinish variable. And when we read page which was a tail on cleanup 
start then we sets cleanupFinish variable and after cleaning that page we will 
stop further cleanup. Any insert caused during cleanup will be placed after 
blknoFinish (corner case: in that page), so, vacuum should not miss tuples 
marked as deleted.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "parallel= " information is not coming in pg_dumpall for create aggregate

2016-04-18 Thread Fabrízio de Royes Mello
On Mon, Apr 18, 2016 at 5:30 AM, tushar 
wrote:
>
>
> Hi,
>
> I checked in PG 9.6 , if we create an aggregate function with saying -
parallel=safe/restricted/unsafe and then take
> a pg_dumpall of the entire cluster , "parallel= " is missing from create
aggregate syntax
>
> Steps to reproduce -
>
> .)connect to psql terminal and create an aggregate function
>
> postgres=# CREATE AGGREGATE unsafe_sum100 (float8)
> (
> stype = float8,
> sfunc = float8pl,
> mstype = float8,
> msfunc = float8pl,
> minvfunc = float8mi,
> parallel=safe);
> CREATE AGGREGATE
>
> .)perform pg_dumpall against that cluster
>
> .)check the content of create aggregate unsafe_sum100 in the file
>
> "
> -
> -- Name: unsafe_sum100(double precision); Type: AGGREGATE; Schema:
public; Owner: centos
> --
>
> CREATE AGGREGATE unsafe_sum100(double precision) (
> SFUNC = float8pl,
> STYPE = double precision,
> MSFUNC = float8pl,
> MINVFUNC = float8mi,
> MSTYPE = double precision
> );
>
> "

You're correct... try the attached patch to fix it.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e1e5bee..396c03d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -13274,6 +13274,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 	int			i_agginitval;
 	int			i_aggminitval;
 	int			i_convertok;
+	int			i_proparallel;
 	const char *aggtransfn;
 	const char *aggfinalfn;
 	const char *aggcombinefn;
@@ -13295,6 +13296,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 	const char *agginitval;
 	const char *aggminitval;
 	bool		convertok;
+	const char *proparallel;
 
 	/* Skip if not to be dumped */
 	if (!agginfo->aggfn.dobj.dump || dopt->dataOnly)
@@ -13324,7 +13326,8 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 			"aggmtransspace, aggminitval, "
 			"true AS convertok, "
 			"pg_catalog.pg_get_function_arguments(p.oid) AS funcargs, "
-			"pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs "
+			"pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs, "
+			"p.proparallel "
 			"FROM pg_catalog.pg_aggregate a, pg_catalog.pg_proc p "
 			"WHERE a.aggfnoid = p.oid "
 			"AND p.oid = '%u'::pg_catalog.oid",
@@ -13472,6 +13475,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 	i_agginitval = PQfnumber(res, "agginitval");
 	i_aggminitval = PQfnumber(res, "aggminitval");
 	i_convertok = PQfnumber(res, "convertok");
+	i_proparallel = PQfnumber(res, "proparallel");
 
 	aggtransfn = PQgetvalue(res, 0, i_aggtransfn);
 	aggfinalfn = PQgetvalue(res, 0, i_aggfinalfn);
@@ -13511,6 +13515,11 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 
 	aggsig_tag = format_aggregate_signature(agginfo, fout, false);
 
+	if (i_proparallel != -1)
+		proparallel = PQgetvalue(res, 0, PQfnumber(res, "proparallel"));
+	else
+		proparallel = NULL;
+
 	if (!convertok)
 	{
 		write_msg(NULL, "WARNING: aggregate function %s could not be dumped correctly for this database version; ignored\n",
@@ -13622,6 +13631,17 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
 	if (hypothetical)
 		appendPQExpBufferStr(details, ",\nHYPOTHETICAL");
 
+	if (proparallel != NULL && proparallel[0] != PROPARALLEL_UNSAFE)
+	{
+		if (proparallel[0] == PROPARALLEL_SAFE)
+			appendPQExpBufferStr(details, ",\nPARALLEL = safe");
+		else if (proparallel[0] == PROPARALLEL_RESTRICTED)
+			appendPQExpBufferStr(details, ",\nPARALLEL = restricted");
+		else if (proparallel[0] != PROPARALLEL_UNSAFE)
+			exit_horribly(NULL, "unrecognized proparallel value for function \"%s\"\n",
+		  agginfo->aggfn.dobj.name);
+	}
+
 	/*
 	 * DROP must be fully qualified in case same name appears in pg_catalog
 	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory leak in GIN index build

2016-04-18 Thread Tom Lane
Julien Rouhaud  writes:
> On 16/04/2016 20:45, Tom Lane wrote:
>> I think this needs to be redesigned so that the critical section and WAL
>> insertion calls all happen within a single straight-line piece of code.
>> 
>> We could try making that place be ginPlaceToPage().  So far as
>> registerLeafRecompressWALData is concerned, that does not look that hard;
>> it could palloc and fill the WAL-data buffer the same as it's doing now,
>> then pass it back up to be registered (and eventually pfreed) in
>> ginPlaceToPage.  However, that would also mean postponing the call of
>> dataPlaceToPageLeafRecompress(), which seems much messier.  I assume
>> the data it's working from is mostly in the tmpCtx that
>> dataPlaceToPageLeaf wants to free before returning.  Maybe we could
>> move creation/destruction of the tmpCtx out to ginPlaceToPage, as well?
>> 
>> The other line of thought is to move the WAL work that ginPlaceToPage
>> does down into the subroutines.  That would probably result in some
>> duplication of code, but it might end up being a cleaner solution.

> I'm not really confident, but I'll give a try.  Probably with your
> second solution which looks easier.

I poked at this over the weekend, and got more unhappy the more I poked.
Aside from the memory leakage issue, there are multiple coding-rule
violations besides the one you noted about scope of the critical sections.
One example is that in the page-split path for an inner page, we modify
the contents of childbuf long before getting into the critical section.
The number of out-of-date comments was depressingly large as well.

I ended up deciding that the most reasonable fix was along the lines of
my first alternative above.  In the attached draft patches, the
"placeToPage" method is split into two methods, "beginPlaceToPage" and
"execPlaceToPage", where the second method is what's supposed to happen
inside the critical section for a non-page-splitting update.  Nothing
that's done inside beginPlaceToPage is critical.

I've tested this to the extent of verifying that it passes make
check-world and stops the memory leak in your test case, but it could use
more eyeballs on it.

Attached are draft patches against HEAD and 9.5 (they're the same except
for BufferGetPage calls).  I think we'd also want to back-patch to 9.4,
but that will take considerable additional work because of the XLOG API
rewrite that happened in 9.5.  I also noted that some of the coding-rule
violations seem to be new in 9.5, so the problems may be less severe in
9.4 --- the memory leak definitely exists there, though.

regards, tom lane

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index e593b2b..fd36ecf 100644
*** a/src/backend/access/gin/ginbtree.c
--- b/src/backend/access/gin/ginbtree.c
***
*** 17,22 
--- 17,23 
  #include "access/gin_private.h"
  #include "access/xloginsert.h"
  #include "miscadmin.h"
+ #include "utils/memutils.h"
  #include "utils/rel.h"
  
  static void ginFindParents(GinBtree btree, GinBtreeStack *stack);
*** ginFindParents(GinBtree btree, GinBtreeS
*** 312,326 
   * Insert a new item to a page.
   *
   * Returns true if the insertion was finished. On false, the page was split and
!  * the parent needs to be updated. (a root split returns true as it doesn't
!  * need any further action by the caller to complete)
   *
   * When inserting a downlink to an internal page, 'childbuf' contains the
   * child page that was split. Its GIN_INCOMPLETE_SPLIT flag will be cleared
!  * atomically with the insert. Also, the existing item at the given location
!  * is updated to point to 'updateblkno'.
   *
   * stack->buffer is locked on entry, and is kept locked.
   */
  static bool
  ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
--- 313,328 
   * Insert a new item to a page.
   *
   * Returns true if the insertion was finished. On false, the page was split and
!  * the parent needs to be updated. (A root split returns true as it doesn't
!  * need any further action by the caller to complete.)
   *
   * When inserting a downlink to an internal page, 'childbuf' contains the
   * child page that was split. Its GIN_INCOMPLETE_SPLIT flag will be cleared
!  * atomically with the insert. Also, the existing item at offset stack->off
!  * in the target page is updated to point to updateblkno.
   *
   * stack->buffer is locked on entry, and is kept locked.
+  * Likewise for childbuf, if given.
   */
  static bool
  ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
*** ginPlaceToPage(GinBtree btree, GinBtreeS
*** 329,339 
--- 331,358 
  {
  	Page		page = BufferGetPage(stack->buffer, NULL, NULL,
  	 BGP_NO_SNAPSHOT_TEST);
+ 	bool		result;
  	GinPlaceToPageRC rc;
  	uint16		xlflags = 0;
  	Page		childpage = NULL;
  	Page		newlpage = NULL,
  newrpage = NULL;
+ 	void	   *ptp_workspace = NULL;
+ 	MemoryContext tmpCxt;

Re: [HACKERS] SET ROLE and reserved roles

2016-04-18 Thread Robert Haas
On Fri, Apr 15, 2016 at 11:56 AM, Stephen Frost  wrote:
> Perhaps it would be helpful to return to the initial goal of these
> default roles.
>
> We've identified certain privileges which are currently superuser-only
> and we would like to be able to be GRANT those to non-superuser roles.
> Where our GRANT system is able to provide the granularity desired, we
> have simply removed the superuser checks, set up appropriate default
> values and, through the "pg_dump dump catalog ACLs" patch, allowed
> administrators to make the changes to those objects via the
> 'GRANT privilege ON object TO role' system.
>
> For those cases where our GRANT system is unable to provide the
> granularity desired and it isn't sensible to extend the GRANT system to
> cover the exact granularity we wish to provide, we needed to come up
> with an alternative approach.  That approach is the use of special
> default roles, where we can allow exactly the level of granularity
> desired and give administrators a way to grant those privileges to
> users.
>
> What this means in a nutshell is that we've collectivly decided that
> we'd rather support:
>
> /* uses the 'GRANT role TO role' system */
> GRANT pg_signal_backend TO test_user;
>
> than:
>
> /*
>  * uses the 'GRANT privilege ON object TO role' system
>  * seems like cluster-level is actually the right answer here, but
>  * we don't support such an object type currently, so using database
>  * for the example
>  */
> GRANT SIGNAL BACKEND ON DATABASE my_database TO test_user;
>
> The goal being that the result of the two commands is identical (where
> the second command is only hypothetical at this point, but hopefully the
> meaning is conveyed).

I quite understand all of that.  The problem isn't that I don't
understand what you did.  The problem is that I don't agree with it.
I don't think that the way you implemented is a good idea, nor do I
think it reflects the consensus that was reached on list.  There was
agreement to create some special magic roles.  There was not agreement
around the special magic you've sprinkled on those roles that makes
them work unlike all other roles in the system, and I think that
special magic is a bad idea.

> However, GRANT'ing a role to a user traditionally also allows the user
> to 'SET ROLE' to that user, to create objects as that user, and other
> privileges.  This results in two issues, as I see it:
>
> 1) The administrator who is looking to grant the specific 'signal
>backend' privilege to a user is unlikely to intend or desire for that
>user to also be able to SET ROLE to the role, or for that user to be
>able to create objects as the default role.

I don't think being able to SET ROLE to that role is something that we
should be trying to prevent.  You're already discovering why.  You
tried to create this new special kind of role that you can't become,
and there are already various reports of cases that you've missed.
You will keep finding more for a while, I predict.  If that role is
minimally privileged, who cares if users can become it?

> 2) If the default role ends up owning objects then we have much less
>flexibility in making changes to that role because we must ensure
>that those objects are preserved across upgrades, etc.

Tom already said his piece on this point, and I think he's right.  You
re-stating the argument doesn't change that.  In any case, if we want
to prevent this, I bet there's some less buggy and invasive way in
which it could be done than what you've actually chosen.

> Further, there seems to be no particular use-case which is satisfied by
> allowing SET ROLE'ing to the default roles or for those roles to own
> objects; indeed, it seems more likely to simply spark confusion and
> ultimately may prevent administrators from making use of this system for
> granting privileges as they are unable to GRANT just the specific
> privilege they wish to.

Great.  But there's no particular use case served by a lot of things
which are natural outgrowths of the rest of the system which we permit
anyway because it's too awkward otherwise - like zero-column tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles

2016-04-18 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Thu, Apr 07, 2016 at 03:50:47PM -0400, Stephen Frost wrote:
> > I'm planning to continue going over the patch tomorrow morning with
> > plans to push this before the feature freeze deadline.
> 
> > --- a/src/test/regress/expected/rolenames.out
> > +++ b/src/test/regress/expected/rolenames.out
> 
> > +GRANT testrol0 TO pg_abc; -- error
> > +ERROR:  role "pg_abc" is reserved
> > +DETAIL:  Cannot GRANT roles to a reserved role.
> 
> The server still accepts "ALTER ROLE testrol0 USER pg_signal_backend".  It
> should block this ALTER ROLE if it blocks the corresponding GRANT.

Agreed, I specifically recall looking at that bit of code, but I think I
got myself convinced that it was the other way around (that the ALTER
would end up granting pg_signal_backend to testrol0, which would be
fine), but you're right, this needs to be prevented also.

Will fix.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Kevin Grittner
On Mon, Apr 18, 2016 at 8:41 AM, Tom Lane  wrote:
> Kevin Grittner  writes:
>> On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera
>>  wrote:
>>> I understand the backpatching pain argument, but my opinion was the
>>> contrary of yours even so.
>
>> The other possibility would be to backpatch the no-op patch which
>> just uses the new syntax without any change in semantics.
>
> That would break 3rd-party extensions in a minor release, wouldn't it?
> Or do I misunderstand your suggestion?

With a little bit of a change to the headers I think we could avoid
that breakage.

The original no-op patch didn't change the executable code, but it
would have interfered with 3rd-party compiles; but with a minor
adjustment (using a modified name for the BufferGetPage with the
extra parameters), we could avoid that problem.  That would seem to
address Álvaro's concern while avoiding five years of backpatch
nightmares.

I don't claim it's an *elegant* solution, but it might be a workable compromise.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Robert Haas
On Sun, Apr 17, 2016 at 6:15 PM, Tom Lane  wrote:
> After struggling with back-patching a GIN bug fix, I wish to offer up the
> considered opinion that this was an impressively bad idea.  It's inserted
> 450 or so pain points for back-patching, which we will have to deal with
> for the next five years.  Moreover, I do not believe that it will do a
> damn thing for ensuring that future calls of BufferGetPage think about
> what to do; they'll most likely be copied-and-pasted from nearby calls,
> just as people have always done.  With luck, the nearby calls will have
> the right semantics, but this change won't help very much at all if they
> don't.

I hit this problem over the weekend, too, when I tried to rebase a
patch a colleague of mine is working on.  So I tend to agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel query vs extensions

2016-04-18 Thread Robert Haas
On Fri, Apr 15, 2016 at 12:45 AM, Jeff Janes  wrote:
> Should every relevant contrib extension get a version bump with a
> transition file which is nothing but a list of "alter function blah
> blah blah parallel safe" ?

Yes, I think that's what we would need to do.  It's a lot of work,
albeit mostly mechanical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Tom Lane
Alvaro Herrera  writes:
> I disagree.  A developer that sees an unadorned BufferGetPage() call
> doesn't stop to think twice about whether they need to add a snapshot
> test.  Many reviewers will miss the necessary addition also.  A
> developer that sees BufferGetPage(NO_SNAPSHOT_TEST) will at least
> consider the idea that the flag might be right; if that developer
> doesn't think about it, some reviewer may notice a new call with the
> flag and consider the idea that the flag may be wrong.

I'm unconvinced ...

> I understand the backpatching pain argument, but my opinion was the
> contrary of yours even so.

I merely point out that the problem came up less than ten days after
that patch hit the tree.  If that does not give you pause about the
size of the back-patching problem we've just introduced, it should.

TBH, there is nothing that I like about this feature: not the underlying
concept, not the invasiveness of the implementation, nothing.  I would
dearly like to see it reverted altogether.  I do not think it is worth
the pain that the current implementation will impose, both on developers
and on potential users.  Surely there was another way to get a similar
end result without mucking with things at the level of BufferGetPage.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-04-18 Thread Andres Freund
On 2016-04-18 13:15:19 +0100, Simon Riggs wrote:
> OK, I'll write up a patch today to fix, with a view to backpatching.

Cool. I've spent a couple minutes on this yesterday, and have the start
of a patch - do you want that?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Tom Lane
Kevin Grittner  writes:
> On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera
>  wrote:
>> I understand the backpatching pain argument, but my opinion was the
>> contrary of yours even so.

> The other possibility would be to backpatch the no-op patch which
> just uses the new syntax without any change in semantics.

That would break 3rd-party extensions in a minor release, wouldn't it?
Or do I misunderstand your suggestion?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-04-18 Thread Andres Freund
On 2016-04-18 20:43:30 +0900, Michael Paquier wrote:
> Yeah, introducing a new WAL record to address this issue in
> back-branches would not be an issue, and that's what we should do. For
> HEAD, let's add that in the commit record.

I'm not sure why/how you'd do it that way in HEAD? I mean the only
reason not to use a separate record is the standby incompatibility.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.

2016-04-18 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Apr 18, 2016 at 12:31 PM, Peter Eisentraut  wrote:
>> I don't know if it's worth tracking this as an open item and thus
>> kind-of release blocker if no one else has the problem.  But I
>> definitely still have it.  My initial suspicion was that this had
>> something to do with a partial upgrade to gcc 6 (not yet released), in
>> other words a messed up system.  But I was able to reproduce it in a
>> freshly installed chroot.  It only happens with various versions of gcc,
>> but not with clang.  So I'm going to have to keep digging.

> gcc is moving slowly but surely to have 6.0 in a released state btw:
> https://gcc.gnu.org/ml/gcc/2016-04/msg00103.html

Given that it's apparently showing the results of asind as NULL, the
theory that comes to mind is some sort of optimization issue affecting
the output tuple's null-flags.  I have no idea why only this test would
be affected, though.  Anyway, a good way to test that theory would be
to see if the -O level affects it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-04-18 Thread Andres Freund
On 2016-04-15 17:37:03 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
> > > I think the bottom line is that we misdesigned the WAL representation
> > > by assuming that this sort of info could always be piggybacked on a
> > > transaction commit record.  It's time to fix that.
> > 
> > I think we got to piggyback it onto a commit record, as long as there's
> > one. Otherwise it's going to be more complex (queuing messages when
> > reading an inval record) and slower (more wal records).  I can easily
> > develop a patch for that, the question is what we do on the back
> > branches...
> 
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).

Yea, I remember ;). We made that decision because we couldn't find
another way, and because the consequences were pretty grave.


> The user just needs to make sure to upgrade the standbys first.  If
> they don't, they would die upon replay of the first such record, which
> they can take as an indication that they need to be upgraded; the
> standby is down for some time, but there is no data loss or
> corruption.

There could, if they're using wal_keep_segments, and the standby cannot
be caught up anymore.


I think it's still worth to go for the new record type, but it's a
pretty close call. We could also just decide to document the issues :/ -
but I'm not sure we're eing all of them yet.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-04-18 Thread Kevin Grittner
On Sun, Apr 17, 2016 at 10:38 PM, Alvaro Herrera
 wrote:
> Tom Lane wrote:
>
>> After struggling with back-patching a GIN bug fix, I wish to offer up the
>> considered opinion that this was an impressively bad idea.  It's inserted
>> 450 or so pain points for back-patching, which we will have to deal with
>> for the next five years.

> I understand the backpatching pain argument, but my opinion was the
> contrary of yours even so.

The other possibility would be to backpatch the no-op patch which
just uses the new syntax without any change in semantics.

I'm not arguing for that; just putting it on the table

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-04-18 Thread Simon Riggs
On 18 April 2016 at 12:43, Michael Paquier 
wrote:

> On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
>  wrote:
> > Andres Freund wrote:
> >> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
> >> > I think the bottom line is that we misdesigned the WAL representation
> >> > by assuming that this sort of info could always be piggybacked on a
> >> > transaction commit record.  It's time to fix that.
> >>
> >> I think we got to piggyback it onto a commit record, as long as there's
> >> one. Otherwise it's going to be more complex (queuing messages when
> >> reading an inval record) and slower (more wal records).  I can easily
> >> develop a patch for that, the question is what we do on the back
> >> branches...
> >
> > We have introduced new wal records in back branches previously --
> > nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).  The user
> > just needs to make sure to upgrade the standbys first.  If they don't,
> > they would die upon replay of the first such record, which they can take
> > as an indication that they need to be upgraded; the standby is down for
> > some time, but there is no data loss or corruption.
>
> Yeah, introducing a new WAL record to address this issue in
> back-branches would not be an issue, and that's what we should do. For
> HEAD, let's add that in the commit record.
>

(non-reply just because of travel)

OK, I'll write up a patch today to fix, with a view to backpatching.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-04-18 Thread Michael Paquier
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
 wrote:
> Andres Freund wrote:
>> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
>> > I think the bottom line is that we misdesigned the WAL representation
>> > by assuming that this sort of info could always be piggybacked on a
>> > transaction commit record.  It's time to fix that.
>>
>> I think we got to piggyback it onto a commit record, as long as there's
>> one. Otherwise it's going to be more complex (queuing messages when
>> reading an inval record) and slower (more wal records).  I can easily
>> develop a patch for that, the question is what we do on the back
>> branches...
>
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).  The user
> just needs to make sure to upgrade the standbys first.  If they don't,
> they would die upon replay of the first such record, which they can take
> as an indication that they need to be upgraded; the standby is down for
> some time, but there is no data loss or corruption.

Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-04-18 Thread Amit Langote
On 2016/04/18 15:33, Ashutosh Bapat wrote:
>> For time being, I will leave this as yet unaddressed (I am thinking about
>> what is reasonable to do for this also considering Robert's comment).  Is
>> that OK?
>>
> 
> Right now EXPLAIN of select * from t1, where t1 is partitioned table shows
> Append
> -> Seq Scan on t1
> -> Seq scan on partition 1
> -> seq scan on partition 2
> ...
> which shows t1 as well as all the partitions on the same level. Users might
> have accepted that for inheritance hierarchy but for partitioning that can
> be confusing, esp. when all the error messages and documentation indicate
> that t1 is an empty (shell?) table. Instead showing it like
> Append for t1 -- (essentially show that this is Append for partitioned
> table t1, exact text might vary)
> -> Seq scan on partition 1
> -> 
> would be more readable. Similarly if we are going to collapse all the
> Append hierarchy, it might get even more confusing. Listing all the
> intermediate partitioned tables as Seq Scan on them would be confusing for
> the reasons mentioned above, and not listing them might make user wonder
> about the reasons for their disappearance. I am not sure what's the
> solution their.

Yes, things remain confusing with Append plans for partitioned tables.
Note that currently if an internal partition doesn't show up, none of its
partitions do (given the way CHECK constraints are generated for each
table in the partition tree).

>> OK, I will address this in the next version.  One question though: should
>> foreign table be only allowed to be leaf partitions (ie, no PARTITION BY
>> clause in CREATE FOREIGN TABLE ... PARTITION OF)?
> 
> That seems a better way. Otherwise users might wonder whether we keep the
> partitions of a foreign table on the foreign server which won't be true.

Quite true.  Can't assume anything other than what the FDW for a given
foreign table lets us assume about what the remote table looks like.
Existing FDW API allows to treat a foreign table as nothing other than
regular tables at best.

> But then we allow foreign tables to have local tables as children in
> inheritance, so somebody from that background might find it incompatible. I
> think we shouldn't let the connection between partitioning and inheritance
> linger longer than necessary.

I, too, think quite some about this last sentence.  But I have only
considered DDL yet.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pgbench with -f and -S

2016-04-18 Thread Robins Tharakan
Got it. Thanks for explaining that. Was looking at -S earlier and didn't
notice the new -b and @ weight options in 9.6devel.

On Mon, 18 Apr 2016 at 10:54 Fabien COELHO  wrote:

>
> >> For the future 9.6, scripts options are cumulatives, so -f & -S can be
> >> combined.
> >>
> >> Indeed, for the <= 9.5 it seems that some options are silently ignores,
> >> when combining -S/-f, only the last one is kept, it should be warned
> about.
> > ​
> > ​Thanks Fabien, for confirming about the missing warning.
> >
> > Also, by 'combined' I think you mean that both (built-in SELECTs & Custom
> > Script) run, although the dev docs don't (yet) say anything about that.
>
> Hmmm... I think it does implicitely, with "add" and "and" in the
> following:
>
> From http://www.postgresql.org/docs/devel/static/pgbench.html:
>
>-b scriptname[@weight]
>Add the specified builtin script to the list of executed scripts. [...]
>
> Idem -f.
>
> And later at the beginning of the Notes:
>
>pgbench executes test scripts chosen randomly from a specified list.
>They include built-in scripts with -b and user-provided custom scripts
> with -f.
>[...]
>
> --
> Fabien.

-- 

-
robins


Re: [HACKERS] Declarative partitioning

2016-04-18 Thread Ashutosh Bapat
On Mon, Apr 18, 2016 at 1:23 PM, Amit Langote  wrote:

> On 2016/04/18 15:38, Ashutosh Bapat wrote:
> >> There was no KeyTypeCollInfo in early days of the patch and then I found
> >> myself doing a lot of:
> >>
> >> partexprs_item = list_head(key->partexprs);
> >> for (attr in key->partattrs)
> >> {
> >> if (attr->attnum != 0)
> >> {
> >> // simple column reference, get type from attr
> >> }
> >> else
> >> {
> >> // expression, get type using exprType, etc.
> >> partexprs_item = lnext(partexprs_item);
> >> }
> >> }
> >>
> >
> > At least the two loops can be flattened to a single loop if we keep only
> > expressions list with attributes being just Var nodes. exprType() etc.
> > would then work seemlessly.
>
> I didn't say anything about your suggestion to use a Node * list as a
> representation for the cached partition key information.  IIUC, you mean
> instead of the AttrNumber[partnatts] array with non-zero attnum for a
> named column slot and 0 for a expressional column slot, create a Node *
> list with Var nodes for simple column references and Expr nodes for
> expressions.
>
> I would mention that the same information is also being used in contexts
> where having simple attnums may be better (for example, when extracting
> key of a tuple slot during tuple routing).  Moreover, this is cached
> information and I thought it may be better to follow the format that other
> similar information uses (index key and such).  Furthermore, looking at
> qual matching code for indexes and recently introduced foreign key
> optimization, it seems we will want to use a similar representation within
> optimizer for partition keys.  IndexOptInfo has int ncolumns and int *
> indexkeys and then match_index_to_operand() compares index key attnums
> with varattno of vars in qual.  It's perhaps speculative at the moment
> because there is not much code wanting to use it yet other than partition
> DDL and tuple-routing and  cached info seems to work as-is for the latter.
>

Ok.


>
> >> That ended up being quite a few places (though I managed to reduce the
> >> number of places over time).  So, I created this struct which is
> >> initialized when partition key is built (on first open of the
> partitioned
> >> table).
> >>
> >
> > Hmm, I am just afraid that we might end up with some code using cached
> > information and some using exprType, exprTypmod etc.
>
> Well, you never use exprType(), etc. for partition keys in other than a
> few places.  All places that do always use the cached values.  Mostly
> partitioning DDL stuff so far.  Tuple routing considers collation of
> individual key columns when comparing input value with partition bounds.
>
>
I am not worried about the current code. But there will be a lot of code
added after version 1. I am worried about that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


[HACKERS] Description of ForeignPath

2016-04-18 Thread Amit Langote
Is the following description now outdated:

"ForeignPath represents a potential scan of a foreign table"

Considering that there now exists FdwRoutine.GetForeignJoinPaths() whose
product is nothing else but a ForeignPath, should it now say (patch attached):

"ForeignPath represents a potential scan of foreign table(s)"

Or something better.

Thanks,
Amit
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index e9dfb66..b11f29f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -1030,7 +1030,7 @@ typedef struct SubqueryScanPath
 } SubqueryScanPath;
 
 /*
- * ForeignPath represents a potential scan of a foreign table
+ * ForeignPath represents a potential scan of foreign table(s)
  *
  * fdw_private stores FDW private data about the scan.  While fdw_private is
  * not actually touched by the core code during normal operations, it's

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "parallel= " information is not coming in pg_dumpall for create aggregate

2016-04-18 Thread tushar

Hi,

I checked in PG 9.6 , if we create an aggregate function with saying - 
parallel=safe/restricted/unsafe and then take
a pg_dumpall of the entire cluster , "parallel= " is missing from create 
aggregate syntax


Steps to reproduce -

.)connect to psql terminal and create an aggregate function

postgres=# CREATE AGGREGATE unsafe_sum100 (float8)
(
stype = float8,
sfunc = float8pl,
mstype = float8,
msfunc = float8pl,
minvfunc = float8mi,
*parallel=safe*);
CREATE AGGREGATE

.)perform pg_dumpall against that cluster

.)check the content of create aggregate unsafe_sum100 in the file

"
-
-- Name: unsafe_sum100(double precision); Type: AGGREGATE; Schema: 
public; Owner: centos

--

CREATE AGGREGATE unsafe_sum100(double precision) (
SFUNC = float8pl,
STYPE = double precision,
MSFUNC = float8pl,
MINVFUNC = float8mi,
MSTYPE = double precision
);

"

--
regards,tushar



Re: [HACKERS] Declarative partitioning

2016-04-18 Thread Amit Langote
On 2016/04/18 15:38, Ashutosh Bapat wrote:
>> There was no KeyTypeCollInfo in early days of the patch and then I found
>> myself doing a lot of:
>>
>> partexprs_item = list_head(key->partexprs);
>> for (attr in key->partattrs)
>> {
>> if (attr->attnum != 0)
>> {
>> // simple column reference, get type from attr
>> }
>> else
>> {
>> // expression, get type using exprType, etc.
>> partexprs_item = lnext(partexprs_item);
>> }
>> }
>>
> 
> At least the two loops can be flattened to a single loop if we keep only
> expressions list with attributes being just Var nodes. exprType() etc.
> would then work seemlessly.

I didn't say anything about your suggestion to use a Node * list as a
representation for the cached partition key information.  IIUC, you mean
instead of the AttrNumber[partnatts] array with non-zero attnum for a
named column slot and 0 for a expressional column slot, create a Node *
list with Var nodes for simple column references and Expr nodes for
expressions.

I would mention that the same information is also being used in contexts
where having simple attnums may be better (for example, when extracting
key of a tuple slot during tuple routing).  Moreover, this is cached
information and I thought it may be better to follow the format that other
similar information uses (index key and such).  Furthermore, looking at
qual matching code for indexes and recently introduced foreign key
optimization, it seems we will want to use a similar representation within
optimizer for partition keys.  IndexOptInfo has int ncolumns and int *
indexkeys and then match_index_to_operand() compares index key attnums
with varattno of vars in qual.  It's perhaps speculative at the moment
because there is not much code wanting to use it yet other than partition
DDL and tuple-routing and  cached info seems to work as-is for the latter.

>> That ended up being quite a few places (though I managed to reduce the
>> number of places over time).  So, I created this struct which is
>> initialized when partition key is built (on first open of the partitioned
>> table).
>>
> 
> Hmm, I am just afraid that we might end up with some code using cached
> information and some using exprType, exprTypmod etc.

Well, you never use exprType(), etc. for partition keys in other than a
few places.  All places that do always use the cached values.  Mostly
partitioning DDL stuff so far.  Tuple routing considers collation of
individual key columns when comparing input value with partition bounds.

Am I missing something?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-04-18 Thread Michael Paquier
On Wed, Apr 6, 2016 at 3:09 PM, Michael Paquier
 wrote:
> On Sat, Apr 2, 2016 at 12:30 AM, Tom Lane  wrote:
>> I wrote:
>>> So the core of my complaint is that we need to fix things so that, whether
>>> or not we are able to create the PGRES_FATAL_ERROR PGresult (and we'd
>>> better consider the behavior when we cannot), ...
>>
>> BTW, the real Achilles' heel of any attempt to ensure sane behavior at
>> the OOM limit is this possibility of being unable to create a PGresult
>> with which to inform the client that we failed.
>>
>> I wonder if we could make things better by keeping around an emergency
>> backup PGresult struct.  Something along these lines:
>>
>> 1. Add a field "PGresult *emergency_result" to PGconn.
>>
>> 2. At the very beginning of any PGresult-returning libpq function, check
>> to see if we have an emergency_result, and if not make one, ensuring
>> there's room in it for a reasonable-size error message; or maybe even
>> preload it with "out of memory" if we assume that's the only condition
>> it'll ever be used for.  If malloc fails at this point, just return NULL
>> without doing anything or changing any libpq state.  (Since a NULL result
>> is documented as possibly caused by OOM, this isn't violating any API.)
>>
>> 3. Subsequent operations never touch the emergency_result unless we're
>> up against an OOM, but it can be used to return a failure indication
>> to the client so long as we leave libpq in a state where additional
>> calls to PQgetResult would return NULL.
>>
>> Basically this shifts the point where an unreportable OOM could happen
>> from somewhere in the depths of libpq to the very start of an operation,
>> where we're presumably in a clean state and OOM failure doesn't leave
>> us with a mess we can't clean up.

Sorry for the late reply here. I am looking back at this thing more seriously.

So, if I understand that correctly. As the emergency result is
pre-allocated, this leverages any future OOM errors because we could
always fallback to this error in case of problems, so this would
indeed help. And as this is independent of the rest of the status of
pg_conn, any subsequent calls of any PQ* routines returning a PGresult
would remain in the same state.

On top of this emergency code path, don't you think that we need as
well a flag that is switched to 'on' once an OOM error is faced? In
consequence, on a code path where an OOM happens, this flag is
activated. Then any subsequent calls of routines returning a PGresult
checks for this flag, resets it, and returns the emergency result. At
the end, it seems to me that the code paths where we check if the
emergency flag is activated or not are the beginning of PQgetResult
and PQexecFinish. In the case of PQexecFinish(), pending results would
be cleared the next time PQexecStart is called. For PQgetResult, this
gives the possibility to the application to report the OOM properly.
However, successive calls to PQgetResult() would still make the
application wait undefinitely for more data if it doesn't catch the
error.

Another thing that I think needs to be patched is libpqrcv_PQexec by
the way, so as we check if any errors are caught on the way when
calling multiple times PQgetResult or this code path could remain
stuck with an infinite wait. As a result, it seems to me that this
offers no real way to fix completely applications doing something like
that:
PQsendQuery(conn);
for (;;)
{
while (PQisBusy(conn))
{
//wait here for some event
}
result = PQgetResult(conn);
if (!result)
break;
}
The issue is that we'd wait for a NULL result to be received from
PQgetResult, however even with this patch asyncStatus remaining set to
PGASYNC_BUSY would make libpq waiting forever with pqWait for data
that will never show up. We could have a new status for asyncStatus to
be able to skip properly the loops where asyncStatus == PGASYNC_BUSY
and make PQisBusy smarter but this would actually break the semantics
of calling successively PQgetResult() if I am following correctly the
last posts of this thread.

Another, perhaps, better idea is to add some more extra logic to
pg_conn as follows:
boolis_emergency;
PGresult *emergency_result;
boolis_emergency_consumed;
So as when an OOM shows up, is_emergency is set to true. Then a first
call of PQgetResult returns the PGresult with the OOM in
emergency_result, setting is_emergency_consumed to true and switching
is_emergency back to false. Then a second call to PQgetResult enforces
the consumption of any results remaining without waiting for them, at
the end returning NULL to the caller, resetting is_emergency_consumed
to false. That's different compared to the extra failure
PGASYNC_COPY_FAILED in the fact that it does not break PQisBusy().

Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Declarative partitioning

2016-04-18 Thread Ashutosh Bapat
> On 2016/04/15 18:46, Ashutosh Bapat wrote:
> >
> > 3. PartitionKeyData contains KeyTypeCollInfo, whose contents can be
> > obtained by calling functions exprType, exprTypemod on partexprs. Why do
> we
> > need to store that information as a separate member?
>
> There was no KeyTypeCollInfo in early days of the patch and then I found
> myself doing a lot of:
>
> partexprs_item = list_head(key->partexprs);
> for (attr in key->partattrs)
> {
> if (attr->attnum != 0)
> {
> // simple column reference, get type from attr
> }
> else
> {
> // expression, get type using exprType, etc.
> partexprs_item = lnext(partexprs_item);
> }
> }
>

At least the two loops can be flattened to a single loop if we keep only
expressions list with attributes being just Var nodes. exprType() etc.
would then work seemlessly.


>
> That ended up being quite a few places (though I managed to reduce the
> number of places over time).  So, I created this struct which is
> initialized when partition key is built (on first open of the partitioned
> table).
>

Hmm, I am just afraid that we might end up with some code using cached
information and some using exprType, exprTypmod etc.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning

2016-04-18 Thread Ashutosh Bapat
> >> Regarding 6, it seems to me that because Append does not have a
> associated
> >> relid (like scan nodes have with scanrelid).  Maybe we need to either
> fix
> >> Append or create some enhanced version of Append which would also
> support
> >> dynamic pruning.
> >>
> >
> > Right, I think, Append might store the relid of the parent table as well
> as
> > partitioning information at that level along-with the subplans.
>
> For time being, I will leave this as yet unaddressed (I am thinking about
> what is reasonable to do for this also considering Robert's comment).  Is
> that OK?
>

Right now EXPLAIN of select * from t1, where t1 is partitioned table shows
Append
-> Seq Scan on t1
-> Seq scan on partition 1
-> seq scan on partition 2
...
which shows t1 as well as all the partitions on the same level. Users might
have accepted that for inheritance hierarchy but for partitioning that can
be confusing, esp. when all the error messages and documentation indicate
that t1 is an empty (shell?) table. Instead showing it like
Append for t1 -- (essentially show that this is Append for partitioned
table t1, exact text might vary)
-> Seq scan on partition 1
-> 
would be more readable. Similarly if we are going to collapse all the
Append hierarchy, it might get even more confusing. Listing all the
intermediate partitioned tables as Seq Scan on them would be confusing for
the reasons mentioned above, and not listing them might make user wonder
about the reasons for their disappearance. I am not sure what's the
solution their.


> > Some more comments
> > Would it be better to declare PartitionDescData as
> > {
> > int nparts;
> > PartitionInfo *partinfo; /* array of partition information structures. */
> > }
>
> I think that might be better.  Will do it the way you suggest.
>
> > The new syntax allows CREATE TABLE to be specified as partition of an
> > already partitioned table. Is it possible to do the same for CREATE
> FOREIGN
> > TABLE? Or that's material for v2? Similarly for ATTACH PARTITION.
>
> OK, I will address this in the next version.  One question though: should
> foreign table be only allowed to be leaf partitions (ie, no PARTITION BY
> clause in CREATE FOREIGN TABLE ... PARTITION OF)?


That seems a better way. Otherwise users might wonder whether we keep the
partitions of a foreign table on the foreign server which won't be true.
But then we allow foreign tables to have local tables as children in
inheritance, so somebody from that background might find it incompatible. I
think we shouldn't let the connection between partitioning and inheritance
linger longer than necessary.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning

2016-04-18 Thread Ashutosh Bapat
On Fri, Apr 15, 2016 at 10:30 PM, Robert Haas  wrote:

> On Fri, Apr 15, 2016 at 5:46 AM, Ashutosh Bapat
>  wrote:
> > Retaining the partition hierarchy would help to push-down join across
> > partition hierarchy effectively.
>
> -1.  You don't get to insert cruft into the final plan for the
> convenience of the optimizer.  I think the AppendPath needs to be
> annotated with sufficient information to do whatever query planning
> optimizations we want, and some or all of that may need to carry over
> to the Append plan to allow run-time partition pruning.  But I think
> that flattening nests of Appends is a good optimization and we should
> preserve it.  If that makes the additional information that any given
> Append needs to carry a bit more complex, so be it.
>
> I also think it's very good that Amit has kept the query planner
> unchanged in this initial patch.  Let's leave that work to phase two.
> What I suggest we do when the time comes is invent new nodes
> RangePartitionMap, ListPartitionMap, HashPartitionMap.  Each contains
> minimal metadata needed for tuple routing or planner transformation.
> For example, RangePartitionMap can contain an array of partition
> boundaries - represented as Datums - and an array of mappings, each a
> Node *.  The associated value can be another PartitionMap object if
> there is subpartitioning in use, or an OID.


range table index instead of OID to make it easy to lookup the


>   This can be used both for
> matching up partitions for join pushdown, and also for fast tuple
> routing and runtime partition pruning.
>

I was thinking about join pushdown (in partitioning hierarchy) of multiple
tables which have similar partitioning structure for few upper levels but
the entire partitioning hierarchy does not match. Pushing down the join as
much possible into the partition hierarchy will help. We might end up with
joins between a plain table and Append relation at the leaf level. For such
join push-down it looks like we will have to construct corresponding Append
hierarchy, push the joins down and then (may be) collapse it to just
collect the results of joins at the leaf level. But preparing for that kind
of optimization need not be part of this work.


>
> > 2. The new syntax allows CREATE TABLE to be specified as partition of an
> > already partitioned table. Is it possible to do the same for CREATE
> FOREIGN
> > TABLE? Or that's material for v2? Similarly for ATTACH PARTITION.
>
> +1 for making CREATE FOREIGN TABLE support that also, and in version
> 1.  And same for ATTACH PARTITION.
>
>
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company