[HACKERS] Turn off transitive joins

2017-10-14 Thread Gourav Kumar
Hi all,

Is there some way by which I can tell postgresql to not to consider
transitive joins while considering join pairs.

I.e. Let's say a query has two join predicates one between A & B relations
and the other between B and C relations.

While constructing the DP to find the best join order, the Optimizer also
consider join between A & C, which can be thought of as a transitive join.

Is their some way to turn this off or tell postgresql to not to consider
these type of joins ?


Re: [HACKERS] Turn off transitive joins

2017-10-14 Thread Gourav Kumar
I don't think there is any need to add any such capability in postgresql,
but I need it for my work.

I need the join qualifiers.
On 15-Oct-2017 1:37 AM, "Tom Lane"  wrote:

> Gourav Kumar  writes:
> > Is there some way by which I can tell postgresql to not to consider
> > transitive joins while considering join pairs.
>
> No ... and you have presented no reason whatever why we should consider
> adding such a capability.
>
> Maybe you should be trying to construct your join graph someplace
> earlier, before the optimizer processes the join quals.
>
> regards, tom lane
>


Re: [HACKERS] pg_regress help output

2017-10-14 Thread Joe Conway
On 10/14/2017 02:04 PM, Peter Eisentraut wrote:
> On 10/10/17 22:31, Joe Conway wrote:
>>> Also, why is the patch apparently changing whitespace in all the help
>>> lines?  Seems like that will create a lot of make-work for translators.
>> I debated with myself about that.
> 
> Well, there are no translations of pg_regress, so please change the
> whitespace to make it look best.

Committed that way.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] pg_control_recovery() return value when not in recovery

2017-10-14 Thread Robert Haas
On Fri, Oct 13, 2017 at 7:31 PM, Joe Conway  wrote:
> Sorry for the slow response, but thinking back on this now, the idea of
> these functions, in my mind at least, was to provide as close to the
> same output as possible to what pg_controldata outputs.

I think that's a good goal.

> So if we make a change here, do we also change pg_controldata?

I think it would make more sense to leave both as they are and
consider writing more documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-10-14 Thread Robert Haas
On Fri, Oct 13, 2017 at 4:36 PM, Peter Geoghegan  wrote:
> No, I'm arguing that they're just bits. Show the bits, rather than
> interpreting what is displayed. Document that there are other logical
> states that are represented as composites of contradictory/mutually
> exclusive states.

/me shrugs.

I think it's perfectly sensible to view those 2 bits as making up a
2-bit field with 4 states rather than displaying each bit
individually, but you obviously disagree.  Fair enough.

>> I guess it ends wherever we decide to stop.
>
> You can take what you're saying much further. What about
> HEAP_XMAX_SHR_LOCK, and HEAP_MOVED? Code like HEAP_LOCKED_UPGRADED()
> pretty strongly undermines the idea that these composite values are
> abstractions.

HEAP_MOVED is obviously a different kind of thing.  The combination of
both bits has no meaning distinct from the meaning of the individual
bits; in fact, I think it's a shouldn't-happen state.  Not sure about
HEAP_XMAX_SHR_LOCK.

> pg_filedump doesn't display HEAP_XMIN_FROZEN, either. (Nor does it
> ever display any of the other composite t_infomask/t_infomask2
> values.)

I can think of two possible explanations for that.  Number one, the
tool was written before HEAP_XMIN_FROZEN was invented and hasn't been
updated for those changes.  Number two, the author of the tool agrees
with your position rather than mine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_regress help output

2017-10-14 Thread Peter Eisentraut
On 10/10/17 22:31, Joe Conway wrote:
>> Also, why is the patch apparently changing whitespace in all the help
>> lines?  Seems like that will create a lot of make-work for translators.
> I debated with myself about that.

Well, there are no translations of pg_regress, so please change the
whitespace to make it look best.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType

2017-10-14 Thread Tomas Vondra


On 10/14/2017 07:49 PM, Robert Haas wrote:
> On Fri, Oct 13, 2017 at 4:49 PM, David Rowley
>  wrote:
>> tps = 8282.481310 (including connections establishing)
>> tps = 8282.750821 (excluding connections establishing)
> 
> vs.
> 
>> tps = 8520.822410 (including connections establishing)
>> tps = 8521.132784 (excluding connections establishing)
>>
>> With the patch we are making use of the extended statistics, which
>> we do expect to be more work for the planner. Although, we didn't
>> add extended statistics to speed up the planner.
> 
> Sure, I understand. That's actually a pretty substantial regression
> - I guess that means that it's pretty important to avoid creating 
> extended statistics that are not needed, at least for short-running 
> queries.
> 

Well, it's only about 3% difference in a single run, which may be easily
due to slightly different binary layout, random noise etc. So I wouldn't
call that "substantial regression", at least not based on this one test.

I've done more thorough testing, and what I see is 1.0-1.2% drop, but on
a test that's rather extreme (statistics on empty table). So again,
likely well within noise, and on larger tables it'll get even less
significant.

But of course - it's not free. It's a bit more work we need to do. But
if you don't need multi-column statistics, don't create them. If your
queries are already fast, you probably don't need them at all.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-10-14 Thread Tomas Vondra
Hi,

One of the existing limitations of parallel query is that cursors
generally do not benefit from it [1]. Commit 61c2e1a95f [2] improved the
situation for cursors from procedural languages, but unfortunately for
user-defined cursors parallelism is still disabled.

For many use cases that is perfectly fine, but for applications that
need to process large amounts of data this is rather annoying. When the
result sets are large, cursors are extremely efficient - in terms of
memory consumption, for example. So the applications have to choose
between "cursor" approach (and no parallelism), or parallelism and
uncomfortably large result sets.

I believe there are two main reasons why parallelism is disabled for
user-defined cursors (or queries that might get suspended):

(1) We can't predict what will happen while the query is suspended (and
the transaction is still in "parallel mode"), e.g. the user might run
arbitrary DML which is not allowed.

(2) If the cursor gets suspended, the parallel workers would be still
assigned to it and could not be used for anything else.

Clearly, we can't solve those issues in general, so the default will
probably remain "parallelism disabled".

I propose is to add a new cursor option (PARALLEL), which would allow
parallel plans for that particular user-defined cursor. Attached is an
experimental patch doing this (I'm sure there are some loose ends).

This does not make either any of the issues go away, of course. We still
enforce "no DML while parallel operation in progress" as before, so this
will not work:

BEGIN;
DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
FETCH 1000 FROM x;
INSERT INTO t2 VALUES (1);
FETCH 1000 FROM x;
COMMIT;

but this will

BEGIN;
DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
FETCH 1000 FROM x;
...
FETCH 1000 FROM x;
CLOSE x;
INSERT INTO t2 VALUES (1);
COMMIT;

Regarding (2), if the user suspends the cursor for a long time, bummer.
The parallel workers will remain assigned, doing nothing. I don't have
any idea how to get around that, but I don't see how we could do better.
I don't see either of these limitations as fatal.

Any opinions / obvious flaws that I missed?

regards

[1]
https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html

[2]
https://www.postgresql.org/message-id/CAOGQiiMfJ%2B4SQwgG%3D6CVHWoisiU0%2B7jtXSuiyXBM3y%3DA%3DeJzmg%40mail.gmail.com

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 76d6cf1..ffaa096 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -66,6 +66,12 @@ PerformCursorOpen(DeclareCursorStmt *cstmt, ParamListInfo params,
 		RequireTransactionChain(isTopLevel, "DECLARE CURSOR");
 
 	/*
+	 * Enable parallel plans for cursors that explicitly requested it.
+	 */
+	if (cstmt->options & CURSOR_OPT_PARALLEL)
+		cstmt->options |= CURSOR_OPT_PARALLEL_OK;
+
+	/*
 	 * Parse analysis was done already, but we still have to run the rule
 	 * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
 	 * came straight from the parser, or suitable locks were acquired by
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 9689429..64f8a32 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -423,6 +423,13 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
 	/* This should be run once and only once per Executor instance */
 	Assert(!estate->es_finished);
 
+	/* If this was PARALLEL cursor, do cleanup and exit parallel mode. */
+	if (queryDesc->parallel_cursor)
+	{
+		ExecShutdownNode(queryDesc->planstate);
+		ExitParallelMode();
+	}
+
 	/* Switch into per-query memory context */
 	oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
 
@@ -1085,6 +1092,18 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 
 	queryDesc->tupDesc = tupType;
 	queryDesc->planstate = planstate;
+
+	/* If this was PARALLEL cursor, enter parallel mode, except in EXPLAIN-only. */
+
+	queryDesc->parallel_cursor
+		= (eflags & EXEC_FLAG_PARALLEL) && !(eflags & EXEC_FLAG_EXPLAIN_ONLY);
+
+	/*
+	 * In PARALLEL cursors we have to enter the parallel mode once, at the very
+	 * beginning (and not in ExecutePlan, as we do for execute_once plans).
+	 */
+	if (queryDesc->parallel_cursor)
+		EnterParallelMode();
 }
 
 /*
@@ -1725,7 +1744,8 @@ ExecutePlan(EState *estate,
 		if (TupIsNull(slot))
 		{
 			/* Allow nodes to release or shut down resources. */
-			(void) ExecShutdownNode(planstate);
+			if (execute_once)
+(void) ExecShutdownNode(planstate);
 			break;
 		}
 
@@ -1772,7 +1792,8 @@ ExecutePlan(EState *estate,
 		if (numberTuples && numberTuples == current_tuple_count)
 		{
 			/* Allow nodes to release or shut down resources. */
-			(void) ExecShutdownNode(planstate);
+			if (execute_once)
+(void) ExecShutdownNode(planstate);
 		

Re: [HACKERS] fresh regression - regproc result contains unwanted schema

2017-10-14 Thread Pavel Stehule
2017-10-14 17:26 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > When function is overwritten, then regproc result contains schema,
> although
> > it is on search_path
>
> There's no "fresh regression" here, it's done that more or less since
> we invented schemas.  See regprocout:
>
>  * Would this proc be found (uniquely!) by regprocin? If not,
>  * qualify it.
>
> git blame dates that comment to commit 52200bef of 2002-04-25.
>
> Admittedly, qualifying the name might not be sufficient to disambiguate,
> but regprocout doesn't have any other tool in its toolbox, so it uses
> the hammer it's got.  If you're overloading functions, you really need
> to use regprocedure not regproc.
>

It is false alarm. I am sorry. I shot by self. Thank you for explanation

Nice evening.

Pavel


> regards, tom lane
>


Re: [HACKERS] Turn off transitive joins

2017-10-14 Thread Tom Lane
Gourav Kumar  writes:
> Is there some way by which I can tell postgresql to not to consider
> transitive joins while considering join pairs.

No ... and you have presented no reason whatever why we should consider
adding such a capability.

Maybe you should be trying to construct your join graph someplace
earlier, before the optimizer processes the join quals.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby
On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> I don't have any reason to believe there's memory issue on the server, So I
> suppose this is just a "heads up" to early adopters until/in case it happens
> again and I can at least provide a stack trace.

I'm back; find stacktrace below.

> Today I see:
> < 2017-10-13 17:22:47.839 -04  >LOG:  server process (PID 32127) was 
> terminated by signal 11: Segmentation fault
> < 2017-10-13 17:22:47.839 -04  >DETAIL:  Failed process was running: 
> autovacuum: BRIN summarize public.gtt 747263

Is it a coincidence the server failed within 45m of yesterday's failure ?

postmaster[26500] general protection ip:84a177 sp:7ffd9b349b88 error:0 in 
postgres[40+692000]
< 2017-10-14 18:05:36.432 -04  >DETAIL:  Failed process was running: 
autovacuum: BRIN summarize public.gtt 41087

> It looks like this table was being inserted into simultaneously by a python
> program using multiprocessing.  It looks like each subprocess was INSERTing
> into several tables, each of which has one BRIN index on timestamp column.

I should add:
These are insert-only child tables in a heirarchy (not PG10 partitions), being
inserted into directly (not via trigger/rule).

Also notice the vacuum process was interrupted, same as yesterday (think
goodness for full logs).  Our INSERT script is using python
multiprocessing.pool() with "maxtasksperchild=1", which I think means we load
one file and then exit the subprocess, and pool() creates a new subproc, which
starts a new PG session and transaction.  Which explains why autovacuum starts
processing the table only to be immediately interrupted.

postgres=# SELECT * FROM postgres_log_2017_10_14_1800 WHERE pid=26500 ORDER BY 
log_time DESC LIMIT 9;
log_time   | 2017-10-14 18:05:34.132-04
pid| 26500
session_id | 59e289b4.6784
session_line   | 2
session_start_time | 2017-10-14 18:03:32-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| processing work entry for relation 
"gtt.public.cdrs_eric_ggsnpdprecord_2017_10_14_recordopeningtime_idx"
---+-
log_time   | 2017-10-14 18:05:32.925-04
pid| 26500
session_id | 59e289b4.6784
session_line   | 1
session_start_time | 2017-10-14 18:03:32-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| automatic analyze of table 
"gtt.public.cdrs_eric_egsnpdprecord_2017_10_14"

gtt=# \dt+ *record_2017_10_14
 public | cdrs_eric_egsnpdprecord_2017_10_14 | table | gtt   | 1642 MB | 
 public | cdrs_eric_ggsnpdprecord_2017_10_14 | table | gtt   | 492 MB  | 

gtt=# \di+ *_2017_10_14*_recordopeningtime_idx
 public | cdrs_eric_egsnpdprecord_2017_10_14_recordopeningtime_idx | index | 
gtt   | cdrs_eric_egsnpdprecord_2017_10_14 | 72 kB | 
 public | cdrs_eric_ggsnpdprecord_2017_10_14_recordopeningtime_idx | index | 
gtt   | cdrs_eric_ggsnpdprecord_2017_10_14 | 48 kB | 

Due to a .."behavioral deficiency" in the loader for those tables, the crashed
backend causes the loader to get stuck, so the tables should be untouched since
the crash, should it be desirable to inspect them.

#0  pfree (pointer=0x298c740) at mcxt.c:954
context = 0x7474617261763a20
#1  0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at 
autovacuum.c:2676
cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 :varcollid 0 
:varlevelsup 0 :varnoold 1 :varoattno 1 :location 146} {CONST :consttype 1184 
:consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull fal"...
cur_nspname = 0x298c728 "s ({VAR :varno 1 :varattno 1 :vartype 1184 
:vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 
146} {CONST :consttype 1184 :consttypmod -1 :constcollid 0 :constlen 8 
:constbyv"...
cur_relname = 0x298cd68 "cdrs_eric_msc_sms_2017_10_14_startofcharge_idx"
__func__ = "perform_work_item"
#2  0x006a6fd9 in do_autovacuum () at autovacuum.c:2533
workitem = 0x7f8ad1f94824
classRel = 0x7f89c26d0e58
tuple = 
relScan = 
dbForm = 
table_oids = 
orphan_oids = 0x0
ctl = {num_partitions = 0, ssize = 0, dsize = 0, max_dsize = 0, ffactor 
= 0, keysize = 4, entrysize = 80, hash = 0, match = 0, keycopy = 0, alloc = 0, 
hcxt = 0x0, hctl = 0x0}
table_toast_map = 0x29c8188
cell = 0x0
shared = 0x298ce18
dbentry = 0x298d0a0
bstrategy = 0x2a61c18
key = {sk_flags = 0, sk_attno = 16, sk_strategy = 3, sk_subtype = 0, 
sk_collation = 100, sk_func = {fn_addr = 0x750430 , fn_oid = 61, 
fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', 
fn_stats = 2 

Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType

2017-10-14 Thread Robert Haas
On Fri, Oct 13, 2017 at 4:44 PM, Tomas Vondra
 wrote:
> On 10/13/2017 10:04 PM, Robert Haas wrote:
>> On Mon, Oct 9, 2017 at 11:03 PM, David Rowley
>>  wrote:
>>> -- Unpatched
>>>  Planning time: 0.184 ms
>>>  Execution time: 105.878 ms
>>>
>>> -- Patched
>>>  Planning time: 2.175 ms
>>>  Execution time: 106.326 ms
>>
>> This might not be the best example to show the advantages of the
>> patch, honestly.
>
> Not sure what exactly is your point? If you're suggesting this example
> is bad because the planning time increased from 0.184 to 2.175 ms, then
> perhaps consider the plans were likely generated on a assert-enabled
> build and on a laptop (both of which adds quite a bit of noise to
> occasional timings). The patch has no impact on planning time (at least
> I've been unable to measure any).

I don't really think there's a problem with the patch; I just noticed
that with the patch applied both the planning and execution time went
up.  I understand that's because this is a toy example, not a real
one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] show precise repos version for dev builds?

2017-10-14 Thread Fabien COELHO



The make dependencies ensure that the header file is regenerated on each
build with a phony target, and the C file is thus recompiled and linked into
the executables on each build. It means that all executables are linked on
each rebuild, even if not necessary, though.


That'd be quite painful, consider e.g. people using LTO.  If done right
however, that should be avoidable to some degree. When creating the
version file, only replace its contents if the contents differ - that's
just a few lines of scripting.


Indeed.

A potential issue is with dynamic linking, potentially someone could 
recompile/reinstall just one shared object or dll, and the executable 
using the lib would change its behavior, and run with libs from 
heterogeneous version. What is the actual version? Hard to say.


In dev mode we often use static linking so that we can copy the executable 
for a previous version and it would not change depending on updated libs, 
and so that we always know (or should know) what actual version is 
running.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType

2017-10-14 Thread Robert Haas
On Fri, Oct 13, 2017 at 4:49 PM, David Rowley
 wrote:
> tps = 8282.481310 (including connections establishing)
> tps = 8282.750821 (excluding connections establishing)

vs.

> tps = 8520.822410 (including connections establishing)
> tps = 8521.132784 (excluding connections establishing)
>
> With the patch we are making use of the extended statistics, which we
> do expect to be more work for the planner. Although, we didn't add
> extended statistics to speed up the planner.

Sure, I understand.  That's actually a pretty substantial regression -
I guess that means that it's pretty important to avoid creating
extended statistics that are not needed, at least for short-running
queries.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-10-14 Thread Peter Geoghegan
On Sat, Oct 14, 2017 at 10:58 AM, Robert Haas  wrote:
> I think it's perfectly sensible to view those 2 bits as making up a
> 2-bit field with 4 states rather than displaying each bit
> individually, but you obviously disagree.  Fair enough.

I guess it is that simple.

> I can think of two possible explanations for that.  Number one, the
> tool was written before HEAP_XMIN_FROZEN was invented and hasn't been
> updated for those changes.

Have we invented our last t_infomask/t_infomask2 (logical) status already?

> Number two, the author of the tool agrees
> with your position rather than mine.

I am working on an experimental version of pg_filedump, customized to
output XML that can be interpreted by an open source hex editor. The
XML makes the hex editor produce color coded, commented
tags/annotations for any given heap or B-Tree relation. This includes
tooltips with literal values for all status bits (including
t_infomask/t_infomask2 bits, IndexTuple bits, B-Tree meta page status
bits, PD_* page-level bits, ItemId bits, and others). I tweeted about
this several months ago, when it was just a tool I wrote for myself,
and received a surprisingly positive response. It seems like I'm on to
something, and should release the tool to the community.

I mention this project because it very much informs my perspective
here. Having spent quite a while deliberately corrupting test data in
novel ways, just to see what happens, the "work backwards from the
storage format" perspective feels very natural to me. I do think that
I understand where you're coming from too, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Tomas Vondra
Hi,

On 10/15/2017 12:42 AM, Justin Pryzby wrote:
> On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
>> I don't have any reason to believe there's memory issue on the server, So I
>> suppose this is just a "heads up" to early adopters until/in case it happens
>> again and I can at least provide a stack trace.
> 
> I'm back; find stacktrace below.
> 
>> Today I see:
>> < 2017-10-13 17:22:47.839 -04  >LOG:  server process (PID 32127) was 
>> terminated by signal 11: Segmentation fault
>> < 2017-10-13 17:22:47.839 -04  >DETAIL:  Failed process was running: 
>> autovacuum: BRIN summarize public.gtt 747263
> 
> Is it a coincidence the server failed within 45m of yesterday's failure ?
> 

Most likely just a coincidence.

> postmaster[26500] general protection ip:84a177 sp:7ffd9b349b88 error:0 in 
> postgres[40+692000]
> < 2017-10-14 18:05:36.432 -04  >DETAIL:  Failed process was running: 
> autovacuum: BRIN summarize public.gtt 41087
> 
>> It looks like this table was being inserted into simultaneously by a python
>> program using multiprocessing.  It looks like each subprocess was INSERTing
>> into several tables, each of which has one BRIN index on timestamp column.
> 
> I should add:
> These are insert-only child tables in a heirarchy (not PG10 partitions), being
> inserted into directly (not via trigger/rule).
> 
> Also notice the vacuum process was interrupted, same as yesterday (think
> goodness for full logs).  Our INSERT script is using python
> multiprocessing.pool() with "maxtasksperchild=1", which I think means we load
> one file and then exit the subprocess, and pool() creates a new subproc, which
> starts a new PG session and transaction.  Which explains why autovacuum starts
> processing the table only to be immediately interrupted.
> 

I don't follow. Why does it explain that autovacuum gets canceled? I
mean, merely opening a new connection/session should not cancel
autovacuum. That requires a command that requires table-level lock
conflicting with autovacuum (so e.g. explicit LOCK command, DDL, ...).

> ... 
> Due to a .."behavioral deficiency" in the loader for those tables, the crashed
> backend causes the loader to get stuck, so the tables should be untouched 
> since
> the crash, should it be desirable to inspect them.
> 

It's a bit difficult to guess what went wrong from this backtrace. For
me gdb typically prints a bunch of lines immediately before the frames,
explaining what went wrong - not sure why it's missing here.

Perhaps some of those pointers are bogus, the memory was already pfree-d
or something like that. You'll have to poke around and try dereferencing
the pointers to find what works and what does not.

For example what do these gdb commands do in the #0 frame?

(gdb) p *(MemoryContext)context
(gdb) p *GetMemoryChunkContext(pointer)

> #0  pfree (pointer=0x298c740) at mcxt.c:954
> context = 0x7474617261763a20
> #1  0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at 
> autovacuum.c:2676
> cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 :varcollid 
> 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 146} {CONST :consttype 
> 1184 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull 
> fal"...
> cur_nspname = 0x298c728 "s ({VAR :varno 1 :varattno 1 :vartype 1184 
> :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 
> 146} {CONST :consttype 1184 :consttypmod -1 :constcollid 0 :constlen 8 
> :constbyv"...
> cur_relname = 0x298cd68 
> "cdrs_eric_msc_sms_2017_10_14_startofcharge_idx"
> __func__ = "perform_work_item"
> #2  0x006a6fd9 in do_autovacuum () at autovacuum.c:2533
...

cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby
On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> > Also notice the vacuum process was interrupted, same as yesterday (think
> > goodness for full logs).  Our INSERT script is using python
> > multiprocessing.pool() with "maxtasksperchild=1", which I think means we 
> > load
> > one file and then exit the subprocess, and pool() creates a new subproc, 
> > which
> > starts a new PG session and transaction.  Which explains why autovacuum 
> > starts
> > processing the table only to be immediately interrupted.

On Sun, Oct 15, 2017 at 01:57:14AM +0200, Tomas Vondra wrote:
> I don't follow. Why does it explain that autovacuum gets canceled? I
> mean, merely opening a new connection/session should not cancel
> autovacuum. That requires a command that requires table-level lock
> conflicting with autovacuum (so e.g. explicit LOCK command, DDL, ...).

I was thinking that INSERT would do it, but I gather you're right about
autovacuum.  Let me get back to you about this..

> > Due to a .."behavioral deficiency" in the loader for those tables, the 
> > crashed
> > backend causes the loader to get stuck, so the tables should be untouched 
> > since
> > the crash, should it be desirable to inspect them.
> > 
> 
> It's a bit difficult to guess what went wrong from this backtrace. For
> me gdb typically prints a bunch of lines immediately before the frames,
> explaining what went wrong - not sure why it's missing here.

Do you mean this ?

...
Loaded symbols for /lib64/libnss_files-2.12.so
Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  pfree (pointer=0x298c740) at mcxt.c:954
954 (*context->methods->free_p) (context, pointer);

> Perhaps some of those pointers are bogus, the memory was already pfree-d
> or something like that. You'll have to poke around and try dereferencing
> the pointers to find what works and what does not.
> 
> For example what do these gdb commands do in the #0 frame?
> 
> (gdb) p *(MemoryContext)context

(gdb) p *(MemoryContext)context
Cannot access memory at address 0x7474617261763a20

> (gdb) p *GetMemoryChunkContext(pointer)

(gdb) p *GetMemoryChunkContext(pointer)
No symbol "GetMemoryChunkContext" in current context.

I had to do this since it's apparently inlined/macro:
(gdb) p *(MemoryContext *) (((char *) pointer) - sizeof(void *))
$8 = (MemoryContext) 0x7474617261763a20

I uploaded the corefile:
http://telsasoft.com/tmp/coredump-postgres-autovacuum-brin-summarize.gz

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control_recovery() return value when not in recovery

2017-10-14 Thread Tom Lane
Robert Haas  writes:
> On Fri, Oct 13, 2017 at 7:31 PM, Joe Conway  wrote:
>> Sorry for the slow response, but thinking back on this now, the idea of
>> these functions, in my mind at least, was to provide as close to the
>> same output as possible to what pg_controldata outputs.

> I think that's a good goal.

>> So if we make a change here, do we also change pg_controldata?

> I think it would make more sense to leave both as they are and
> consider writing more documentation.

+1.  Changing already-shipped behavior seems more disruptive than
this is worth.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fresh regression - regproc result contains unwanted schema

2017-10-14 Thread Tom Lane
Pavel Stehule  writes:
> When function is overwritten, then regproc result contains schema, although
> it is on search_path

There's no "fresh regression" here, it's done that more or less since
we invented schemas.  See regprocout:

 * Would this proc be found (uniquely!) by regprocin? If not,
 * qualify it.

git blame dates that comment to commit 52200bef of 2002-04-25.

Admittedly, qualifying the name might not be sufficient to disambiguate,
but regprocout doesn't have any other tool in its toolbox, so it uses
the hammer it's got.  If you're overloading functions, you really need
to use regprocedure not regproc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-14 Thread legrand legrand
Oups what a silly boy,
I didn't saw that pg10 was released ;o(

That's all I need soon (to test Declarative partitioning).

Thanks for taking time to answer me in a so detailled and interesting
manner.

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_control_recovery() return value when not in recovery

2017-10-14 Thread Michael Paquier
On Sat, Oct 14, 2017 at 8:31 AM, Joe Conway  wrote:
> Sorry for the slow response, but thinking back on this now, the idea of
> these functions, in my mind at least, was to provide as close to the
> same output as possible to what pg_controldata outputs. So:
>
> # pg_controldata
> ...
> Minimum recovery ending location: 0/0
> Min recovery ending loc's timeline:   0
> Backup start location:0/0
> Backup end location:  0/0
> End-of-backup record required:no
> ...
>
> So if we make a change here, do we also change pg_controldata?

For a lot of folks on this list, it is clear that things like
InvalidXLogRecPtr map to 0/0, but what of end-users? Couldn't we
consider marking those fields as "undefined" for example. "invalid"
would mean that the state of the cluster is incorrect, so I am not
sure if that is most adapted.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fresh regression - regproc result contains unwanted schema

2017-10-14 Thread Pavel Stehule
Hi

when I fixed old bug of plpgsql_check I found new regression of regproc
output.

set check_function_bodies TO off;

postgres=# create or replace function f1() returns int as $$ begin end $$
language plpgsql;
CREATE FUNCTION
postgres=# select 'f1()'::regprocedure::oid::regproc;
 regproc
-
 f1
(1 row)

postgres=# create or replace function f1(int) returns int as $$ begin end
$$ language plpgsql;
CREATE FUNCTION
postgres=# select 'f1()'::regprocedure::oid::regproc;
  regproc
---
 public.f1
(1 row)

When function is overwritten, then regproc result contains schema, although
it is on search_path

This behave breaks regress tests (and it is not consistent)

Tested on master

Regards

Pavel