Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-18 Thread Craig Ringer
On 18 November 2016 at 14:04, Tsunakawa, Takayuki
 wrote:
> Hello, Craig,
>
> I'm sorry to be late to review your patch.  I've just been able to read the 
> HTML doc first.  Can I get the latest .patch file for reading and running the 
> code?

The latest is what's attached upthread and what's in the git repo also
referenced upthread.

I haven't been able to update in response to more recent review due to
other development commitments. At this point I doubt I'll be able to
get update it again in time for v10, so anyone who wants to adopt it
is welcome.

> Here are some comments and questions.  I tried to avoid the same point as 
> other reviewers, but there may be an overlap.
>
>
> (1)
> The example
>  UPDATE mytable SET x = x + 1;
> should be
>  UPDATE mytable SET x = x + 1 WHERE id = 42;

Good catch.

> (2)
> "The server usually begins executing the batch before all commands in the 
> batch are queued and the end of batch command is sent."
>
> Does this mean that the app developer cannot control or predict how many TCP 
> transmissions a batch is sent with?

That's not what that sentence means since the TCP layer is much lower
level, but what you say is also true.

All the docs are saying there is that there's no explicit control over
when we start sending the batch to the server. How that translates to
individual TCP packets, etc, is not its problem.

>  For example, if I want to insert 10 rows into a table in bulk, can I send 
> those 10 rows (and the end of batch command) efficiently in one TCP 
> transmission, or are they split by libpq into multiple TCP transmissions?

libpq neither knows nor cares about individual TCP packets. It sends
things to the kernel and lets the kernel deal with that.

That said, you can use socket options TCP_CORK and TCP_NODELAY to get
some control over how and when data is sent. If you know you're about
to send more, you might cork the socket to give the kernel a hint that
it should expect more data to send.

> (3)
> "To avoid deadlocks on large batches the client should be structured around a 
> nonblocking I/O loop using a function like select, poll, epoll, 
> WaitForMultipleObjectEx, etc."
>
> Can't we use some (new) platform-independent API instead of using poll() or 
> WaitForMultipleObject()?  e.g. some thin wrapper around pqWait().  It seems a 
> bit burdonsome to have to use an OS-specific API to just wait for libpq.  
> Apart from that, it does not seem possible to wait for the socket in 64-bit 
> apps on Windows, because SOCKET is 64-bit while PQsocket() returns int.

IMO this problem is out of scope for this patch. A wait abstraction
might be nice, but next thing we know we'll be reinventing APR or
NSPR, I think that's a totally different problem.

Not being able to get a win32 SOCKET from libpq seems like a bit of an
oversight, and it'd definitely be good to address that to make async
mode more usable on Windows. There's some other ugliness in PQsocket
already per the comments there. I think it should be a separate patch,
though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] possible optimizations - pushing filter before aggregation

2016-11-18 Thread Pavel Stehule
Hi

In one application I see slow queries. There is often used views like

CREATE VIEW v AS SELECT min(a) M1, min(b) M2,  max(c) M3, x, y, z
  FROM t1 GROUP BY x, y, z;

and queries like

SELECT * FROM v
  WHERE M2 = const1
AND M3 > const2

Isn't possible in this case push equivalence before aggregation?

Regards

Pavel


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Dave Page
On Fri, Nov 18, 2016 at 1:38 AM, Alvaro Herrera
 wrote:
> Tom Lane wrote:
>> Andrew Dunstan  writes:
>> > I love seeing references to email threads in commit messages. It would
>> > make them a lot friendlier though if a full http URL were included
>> > instead of just a Message-ID,
>>
>> I've intentionally not done that, because it does not seem very
>> future-proof.  The message ids will hopefully be unique indefinitely far
>> into the future, but the location of our archives could move.
>
> It won't.  We have far too many in the archives to risk breaking them.
> In fact, we (Magnus) went great lengths to implement a system so that
> old-style PHP links (of which we have a bunch in very old archives,
> including in commit messages) continue to work to this day.  We're far
> too invested in the system now, because of how successful it has proved
> to be.

Right - we go out of our way to ensure we don't break URLs. That's why
anoncvs.postgresql.org is still actively maintained for example.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL recycle retading based on active sync rep.

2016-11-18 Thread Craig Ringer
On 18 Nov. 2016 13:14, "Kyotaro HORIGUCHI" 
wrote:
>
> Hello.
>
> We had too-early WAL recycling during a test we had on a sync
> replication set. This is not a bug and a bit extreme case but is
> contrary to expectation on synchronous replication.

Isn't this prevented by using a physical replication slot?

You hint that you looked at slots but they didn't meet your needs in some
way. I'm not sure I understood the last part.


Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-18 Thread Emre Hasegeli
> The new names might be better if we were starting in a green field,
> but in themselves they are not any more mnemonic than what we had, and
> what we had has been there for a lot of years.  Also, if we accept both
> old names and new (which it seems like we'd have to), that creates new
> opportunities for confusion, which is a cost that should not be
> disregarded.

This is true for existing users of those operators.  New names are
much easier to get by the new users.  We are using them on all other
datatypes.  Datatypes like JSON is more popular than inet.  Many
people already know what <@ and @> mean.

> The original post proposed that we'd eventually get some benefit by
> being able to repurpose << and >> to mean something else, but the
> time scale over which that could happen is so long as to make it
> unlikely to ever happen.

I think we will immediately get benefit from the new operators because
of other reasons.  Repurposing them in far future was a minor point.
I though having a long term plan on this minor point is better than
having no plan.

> I'm inclined to think we should just reject this patch.  I'm certainly not
> going to commit it without seeing positive votes from multiple people.

It is a fair position.  Anybody care to vote?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-11-18 Thread Emre Hasegeli
> To keep such kind of integrity, we should deeply consider about
> errors.

My point is that I don't think we can keep integrity together with the
fuzzy behaviour, or at least I don't have the skills to do that.  I
can just leave the more complicated operators like "is a
point on a line" as it is, and only change the basic ones.  Do you
think a smaller patch like this would be acceptable?

> That's a fate of FP numbers. Btree is uasble for such data since
> it is constructed using inequality operators but hash is almost
> unsuitable without rounding and normalization because of the
> nature of floating point numbers. Range scan on bree will work
> find but on the other hand equality doesn't work even on btree
> index from the same reason to the below.

Btree is very useful with floats.  There is no problem with it.

> Again, FP numbers generally cannot match exactly each other. You
> have to avoid that.

Again, they do match very well.  Floating point numbers are no magic.
They are perfectly deterministic.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-18 Thread Ashutosh Bapat
>
> /*
>  * For a join relation or an upper relation, use
> deparseExplicitTargetList.
>  * Likewise, for a base relation that is being deparsed as a subquery,
> in
>  * which case the caller would have passed tlist that is non-NIL, use
> that
>  * function.  Otherwise, use deparseTargetList.
>  */

This looks correct. I have modified it to make it simple in the given
patch. But, I think we shouldn't refer to a function e.g.
deparseExplicitTargetlist() in the comment. Instead we should describe
the intent e.g. "deparse SELECT clause from the given targetlist" or
"deparse SELECT clause from attr_needed".

>
>>> (3) I don't think we need this in isSubqueryExpr, so I removed it from
>>> the
>>> patch:
>>>
>>> +   /* Keep compiler happy. */
>>> +   return false;
>
>
>> Doesn't that cause compiler warning, saying "non-void function
>> returning nothing" or something like that. Actually, the "if
>> (bms_is_member(node->varno, outerrel->relids))" ends with a "return"
>> always. Hence we don't need to encapsulate the code in "else" block in
>> else { }. It could be taken outside.
>
>
> Yeah, I think so too, but I like the "if () { } else { }" coding.  That
> coding can be found in other places in core, eg,
> operator_same_subexprs_lookup.

OK.


>
>>> Done.  I modified the patch as proposed; create the tlist by
>>> build_tlist_to_deparse in foreign_join_ok, if needed, and search the
>>> tlist
>>> by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo
>>> to
>>> save the tlist created in foreign_join_ok.
>
>
>> Instead of adding a new member, you might want to reuse grouped_tlist
>> by renaming it.
>
>
> Done.

Right now, we are calculating tlist whether or not the ForeignPath
emerges as the cheapest path. Instead we should calculate tlist, the
first time we need it and then add it to the fpinfo.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 66b059a..c230009 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -102,20 +102,22 @@ typedef struct deparse_expr_cxt
  * foreignrel, when that represents a join or
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
 /* Handy macro to add relation name qualification */
 #define ADD_REL_QUALIFIER(buf, varno)	\
 		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
+#define SS_TAB_ALIAS_PREFIX	"s"
+#define SS_COL_ALIAS_PREFIX	"c"
 
 /*
  * Functions to determine whether an expression can be evaluated safely on
  * remote server.
  */
 static bool foreign_expr_walker(Node *node,
 	foreign_glob_cxt *glob_cxt,
 	foreign_loc_cxt *outer_cxt);
 static char *deparse_type_name(Oid type_oid, int32 typemod);
 
@@ -160,20 +162,28 @@ static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
 static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
 	   deparse_expr_cxt *context);
 static void deparseSelectSql(List *tlist, List **retrieved_attrs,
  deparse_expr_cxt *context);
 static void deparseLockingClause(deparse_expr_cxt *context);
 static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
 static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
 	RelOptInfo *joinrel, bool use_alias, List **params_list);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
+static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
+   RelOptInfo *foreignrel, bool make_subquery,
+   List **params_list);
+static void appendSubselectAlias(StringInfo buf, int tabno, int ncols);
+static void get_subselect_alias_id(Var *node, RelOptInfo *foreignrel,
+	   int *tabno, int *colno);
+static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *tabno,
+		int *colno);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendAggOrderBy(List *orderList, List *targetList,
  deparse_expr_cxt *context);
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist,
 	   deparse_expr_cxt *context);
 
 
 /*
@@ -854,21 +864,21 @@ List *
 build_tlist_to_deparse(RelOptInfo *foreignrel)
 {
 	List	   *tlist = NIL;
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
 
 	/*
 	 * For an upper relation, we have already built the target list while
 	 * checking shippability, so just return that.
 	 */
 	if (foreignrel->reloptkind == RELOPT_UPPER_REL)
-		return fpinfo->grouped_tlist;
+		return fpinfo->tlist;
 
 	/*
 	 * We require columns specified in foreign

Re: [HACKERS] Hash Indexes

2016-11-18 Thread Amit Kapila
On Fri, Nov 18, 2016 at 12:11 PM, Amit Kapila  wrote:
> On Thu, Nov 17, 2016 at 10:54 PM, Robert Haas  wrote:
>> On Thu, Nov 17, 2016 at 12:05 PM, Amit Kapila  
>> wrote:
>>
 I think this comment is saying that we'll release the pin on the
 primary bucket page for now, and then reacquire it later if the user
 reverses the scan direction.  But that doesn't sound very safe,
 because the bucket could be split in the meantime and the order in
 which tuples are returned could change.  I think we want that to
 remain stable within a single query execution.
>>>
>>> Isn't that possible even without the patch?  Basically, after reaching
>>> end of forward scan and for doing backward *all* scan, we need to
>>> perform portal rewind which will in turn call hashrescan where we will
>>> drop the lock on bucket and then again when we try to move cursor
>>> forward we acquire lock in _hash_first(), so in between when we don't
>>> have the lock, the split could happen and next scan results could
>>> differ.
>>
>> Well, the existing code doesn't drop the heavyweight lock at that
>> location, but your patch does drop the pin that serves the same
>> function, so I feel like there must be some difference.
>>
>
> Yes, but I am not sure if existing code is right.  Consider below scenario,
>
> Session-1
>
> Begin;
> Declare c cursor for select * from t4 where c1=1;
> Fetch forward all from c;  --here shared heavy-weight lock count becomes 1
> Fetch prior from c; --here shared heavy-weight lock count becomes 2
> close c; -- here, lock release will reduce the lock count and shared
> heavy-weight lock count becomes 1
>
> Now, if we try to insert from another session, such that it leads to
> bucket-split of the bucket for which session-1 had used a cursor, it
> will wait for session-1.
>

It will not wait, but just skip the split because we are using try
lock, however, the point remains that select should not hold bucket
level locks even after the cursor is closed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tuple count used while costing MergeAppend and that for an append rel

2016-11-18 Thread Ashutosh Bapat
Hi,
In create_merge_append_path() we have following code
1331
1332 /* Now we can compute total costs of the MergeAppend */
1333 cost_merge_append(&pathnode->path, root,
1334   pathkeys, list_length(subpaths),
1335   input_startup_cost, input_total_cost,
1336   rel->tuples);
1337

The last arguemnt to cost_merge_append() is described as
'tuples' is the number of tuples in all the streams

For an append relation, set_append_rel_size() sets rel->tuples to the
sum of rows output by each child i.e. sum of rel->rows from each
child.
1091 rel->rows = parent_rows;
1092 rel->reltarget->width = rint(parent_size / parent_rows);
1093 for (i = 0; i < nattrs; i++)
1094 rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
1095
1096 /*
1097  * Set "raw tuples" count equal to "rows" for the appendrel; needed
1098  * because some places assume rel->tuples is valid for any baserel.
1099  */
1100 rel->tuples = parent_rows;

AFAIU, There's difference in the tuples and rows members of
RelOptInfo. While tuples gives the estimate of number of tuples per
pg_class, rows gives the number of rows output by that relation. From
that perspective rel->tuples should be set of the sum of
child_rel->tuples from each of the children. If we do that the last
argument to cost_merge_append() should change from rel->tuples to
rel->rows.

Does that make sense? Attached patch makes those changes.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index e42ef98..f9df094 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -857,20 +857,21 @@ set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * the parent RTE ... but it has a different RTE and RelOptInfo.  This is
  * a good thing because their outputs are not the same size.
  */
 static void
 set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 	Index rti, RangeTblEntry *rte)
 {
 	int			parentRTindex = rti;
 	bool		has_live_children;
 	double		parent_rows;
+	double		parent_tuples;
 	double		parent_size;
 	double	   *parent_attrsizes;
 	int			nattrs;
 	ListCell   *l;
 
 	/*
 	 * Initialize to compute size estimates for whole append relation.
 	 *
 	 * We handle width estimates by weighting the widths of different child
 	 * rels proportionally to their number of rows.  This is sensible because
@@ -878,20 +879,21 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 	 * "footprint" if we have to sort or hash it.  To do this, we sum the
 	 * total equivalent size (in "double" arithmetic) and then divide by the
 	 * total rowcount estimate.  This is done separately for the total rel
 	 * width and each attribute.
 	 *
 	 * Note: if you consider changing this logic, beware that child rels could
 	 * have zero rows and/or width, if they were excluded by constraints.
 	 */
 	has_live_children = false;
 	parent_rows = 0;
+	parent_tuples = 0;
 	parent_size = 0;
 	nattrs = rel->max_attr - rel->min_attr + 1;
 	parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
 
 	foreach(l, root->append_rel_list)
 	{
 		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
 		int			childRTindex;
 		RangeTblEntry *childRTE;
 		RelOptInfo *childrel;
@@ -1007,20 +1009,23 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		 * consistency, do this before calling set_rel_size() for the child.
 		 */
 		if (root->glob->parallelModeOK && rel->consider_parallel)
 			set_rel_consider_parallel(root, childrel, childRTE);
 
 		/*
 		 * Compute the child's size.
 		 */
 		set_rel_size(root, childrel, childRTindex, childRTE);
 
+		/* Accumulate number of tuples in every child. */
+		parent_tuples += childrel->tuples;
+
 		/*
 		 * It is possible that constraint exclusion detected a contradiction
 		 * within a child subquery, even though we didn't prove one above. If
 		 * so, we can skip this child.
 		 */
 		if (IS_DUMMY_REL(childrel))
 			continue;
 
 		/* We have at least one live child. */
 		has_live_children = true;
@@ -1088,34 +1093,38 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		int			i;
 
 		Assert(parent_rows > 0);
 		rel->rows = parent_rows;
 		rel->reltarget->width = rint(parent_size / parent_rows);
 		for (i = 0; i < nattrs; i++)
 			rel->attr_widths[i] = rint(parent_attrsizes[i] / parent_rows);
 
 		/*
 		 * Set "raw tuples" count equal to "rows" for the appendrel; needed
-		 * because some places assume rel->tuples is valid for any baserel.
 		 */
-		rel->tuples = parent_rows;
 	}
 	else
 	{
 		/*
 		 * All children were excluded by constraints, so mark the whole
 		 * appendrel dummy.  We must do this in this phase so that the rel's
 		 * dummy-ness is visible when we generate paths for other rels.
 		 */
 		set_dummy_rel_p

Re: [HACKERS] Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

2016-11-18 Thread Robert Haas
On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund  wrote:
> I've a working fix for this, and for a similar issue Robert found. I'm
> still playing around with it, but basically the fix is to make the
> growth policy a bit more adaptive.

Any chance you can post a patch soon?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-18 Thread Albe Laurenz
Robert Haas wrote:
> On Tue, Nov 15, 2016 at 10:41 AM, Albe Laurenz  
> wrote:
>> Tobias Bussmann has discovered an oddity with prepared statements.
>>
>> Parallel scan is used with prepared statements, but only if they have
>> been created with protocol V3 "Parse".
>> If a prepared statement has been prepared with the SQL statement PREPARE,
>> it will never use a parallel scan.
>>
>> I guess that is an oversight in commit 57a6a72b, right?
>> PrepareQuery in commands/prepare.c should call CompleteCachedPlan
>> with cursor options CURSOR_OPT_PARALLEL_OK, just like
>> exec_prepare_message in tcop/postgres.c does.
>>
>> The attached patch fixes the problem for me.
> 
> Actually, commit 57a6a72b made this change, and then 7bea19d0 backed
> it out again because it turned out to break things.

I didn't notice the CREATE TABLE ... AS EXECUTE problem.

But then the change to use CURSOR_OPT_PARALLEL_OK in tcop/postgres.c should
be reverted as well, because it breaks things just as bad:

   /* creates a parallel-enabled plan */
   PQprepare(conn, "pstmt", "SELECT id FROM l WHERE val = $1", 1, NULL);
   /* blows up with "cannot insert tuples during a parallel operation" */
   PQexec(conn, "CREATE TABLE bad(id) AS EXECUTE pstmt('abcd')");

With Tobias' patch, this does not fail.

I think we should either apply something like that patch or disable parallel
execution for prepared statements altogether and document that.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-18 Thread Dilip Kumar
On Thu, Nov 17, 2016 at 6:54 AM, Haribabu Kommi
 wrote:
> you assigned as reviewer to the current patch in the 11-2016 commitfest.
> But you haven't shared your review yet. Can you please try to share your
> views
> about the patch. This will help us in smoother operation of commitfest.
>
> Michael had sent an updated patch based on some discussion.
>
> Please Ignore if you already shared your review.


Hi Hari,

So far I haven't spent time on this, I am planning to spend in next week.
But it seems that lot of review has happened and patch is already in
good shapes. So I think other reviewers can take decision.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-18 Thread Ashutosh Bapat
Also another point

I guess, this note doesn't add much value in the given context.
Probably we should remove it.
+* Note: the tlist would have one-to-one correspondence with the
+* joining relation's reltarget->exprs because (1) the above test
+* on PHVs guarantees that the reltarget->exprs doesn't contain
+* any PHVs and (2) the joining relation's local_conds is NIL.
+* This allows us to search the targetlist entry matching a given
+* Var node from the tlist in get_subselect_alias_id.

On Fri, Nov 18, 2016 at 5:10 PM, Ashutosh Bapat
 wrote:
>>
>> /*
>>  * For a join relation or an upper relation, use
>> deparseExplicitTargetList.
>>  * Likewise, for a base relation that is being deparsed as a subquery,
>> in
>>  * which case the caller would have passed tlist that is non-NIL, use
>> that
>>  * function.  Otherwise, use deparseTargetList.
>>  */
>
> This looks correct. I have modified it to make it simple in the given
> patch. But, I think we shouldn't refer to a function e.g.
> deparseExplicitTargetlist() in the comment. Instead we should describe
> the intent e.g. "deparse SELECT clause from the given targetlist" or
> "deparse SELECT clause from attr_needed".
>
>>
 (3) I don't think we need this in isSubqueryExpr, so I removed it from
 the
 patch:

 +   /* Keep compiler happy. */
 +   return false;
>>
>>
>>> Doesn't that cause compiler warning, saying "non-void function
>>> returning nothing" or something like that. Actually, the "if
>>> (bms_is_member(node->varno, outerrel->relids))" ends with a "return"
>>> always. Hence we don't need to encapsulate the code in "else" block in
>>> else { }. It could be taken outside.
>>
>>
>> Yeah, I think so too, but I like the "if () { } else { }" coding.  That
>> coding can be found in other places in core, eg,
>> operator_same_subexprs_lookup.
>
> OK.
>
>
>>
 Done.  I modified the patch as proposed; create the tlist by
 build_tlist_to_deparse in foreign_join_ok, if needed, and search the
 tlist
 by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo
 to
 save the tlist created in foreign_join_ok.
>>
>>
>>> Instead of adding a new member, you might want to reuse grouped_tlist
>>> by renaming it.
>>
>>
>> Done.
>
> Right now, we are calculating tlist whether or not the ForeignPath
> emerges as the cheapest path. Instead we should calculate tlist, the
> first time we need it and then add it to the fpinfo.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-18 Thread Albe Laurenz
Amit Kapila wrote:
> Can you once test by just passing CURSOR_OPT_PARALLEL_OK in
> PrepareQuery() and run the tests by having forc_parallel_mode=regress?
> It seems to me that the code in the planner is changed for setting a
> parallelModeNeeded flag, so it might work as it is.

No, that doesn't work.

But with Tobias' complete patch "make installcheck-world" succeeds.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Peter Eisentraut
On 9/27/16 11:07 PM, Tsunakawa, Takayuki wrote:
>> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
>> Allowing SIGQUIT to prompt fast shutdown of the stats collector seems sane,
>> though.  Try to make sure it doesn't leave partly-written stats files
>> behind.
> 
> The attached patch based on HEAD does this.  I'd like this to be back-patched 
> because one of our important customers uses 9.2.
> 
> I didn't remove partially written stat files on SIGQUIT for the following 
> reasons.  Is this OK?
> 
> 1. The recovery at the next restart will remove the stat files.
> 2. SIGQUIT processing should be as fast as possible.
> 3. If writing stats files took long due to the OS page cache flushing, 
> removing files might be forced to wait likewise.

ISTM that this would change the "immediate shutdown" to not save stats
files anymore.  So far, all the shutdown modes are equivalent in terms
of how they preserve data and system state.  They differ only in when
the hard work happens.  This would be a deviation from that principle.

Child processes don't distinguish between a SIGQUIT coming from a
user-initiated immediate shutdown request and a crash-induced
kill-everyone directive.  So there might not be a non-ugly way to
address that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Tom Lane
Peter Eisentraut  writes:
> ISTM that this would change the "immediate shutdown" to not save stats
> files anymore.  So far, all the shutdown modes are equivalent in terms
> of how they preserve data and system state.  They differ only in when
> the hard work happens.  This would be a deviation from that principle.

There is that.  Up to now, an immediate shutdown request didn't cause
any actual data loss, but now it would.  Maybe that's a reason to reject
this whole concept.  (Upthread I was thinking that's a behavior we have
anyway, but really we don't: a look through pgstats.c shows that it will
never exit without attempting to write the stats, short of a crash of
the stats collector process itself.)

> Child processes don't distinguish between a SIGQUIT coming from a
> user-initiated immediate shutdown request and a crash-induced
> kill-everyone directive.  So there might not be a non-ugly way to
> address that.

It doesn't seem to me that it's a matter of whether the signaling is
adequate; we could fix that.  It's a matter of whether you're willing to
have "pg_ctl stop -m immediate" lose stats data.

Although the stats collector was originally conceived as optional,
we depend on it enough now for autovacuum that I'm not sure it'd be a
good thing to have a popularly-used shutdown mode that loses the stats.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Peter Eisentraut
On 11/14/16 4:38 AM, Ashutosh Bapat wrote:
> The patch 02_close_listen... closes the listen sockets
> explicitly when it's known that postmaster is going to stop all the
> children and then die. I have tried to see, if there's a possibility
> that it closes the listen sockets but do not actually die, thus
> causing a server which doesn't accept any connections and doesn't die.
> But I have not found that possibility.

I can see the point of this, but I'm not sure whether this is always a
good idea.  Some people "monitor" postgres shutdown with PQping() or by
parsing the error messages that the client gets.  If we just close the
sockets as soon as possible, we lose all communication and can't tell
what's going on anymore.

If your HA system insists that the crashing server be completely shut
down before moving on, then maybe that can be refined somehow instead?

I haven't seen any analysis in this thread of how much of a problem this
really is.  For example, could we keep the socket open longer but reject
connection attempts faster in this state?

This patch only changes the behavior in the case of a crash or an
immediate shutdown, but not for the other shutdown modes.  I don't think
it is good to create different behaviors for different modes.

A long time ago, ClosePostmasterPorts() essentially had the job to close
all postmaster sockets.  Every single call thereof is in fact commented
with /* Close the postmaster's sockets */.  (Ancient postgres code uses
"port" as a more general term for socket or communication port.)  Now it
has accumulated other tasks so it is more of a
ClosePostmasterOnlyResourcesInChild().  So if we were to introduce a
ClosePostmasterSockets() somehow, we should adjust the naming and the
comments so that we don't have two similar-sounding functions with
confusing comments.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-18 Thread Robert Haas
On Thu, Nov 17, 2016 at 8:18 PM, Amit Langote
 wrote:
> The reason NULLs in an input row are caught and rejected (with the current
> message) before control reaches range_partition_for_tuple() is because
> it's not clear to me whether the range bound comparison logic in
> partition_rbound_datum_cmp() should be prepared to handle NULLs and what
> the results of comparisons should look like.  Currently, all it ever
> expects to see in the input tuple's partition key is non-NULL datums.
> Comparison proceeds as follows: if a range bound datum is a finite value,
> we invoke the comparison proc or if it is infinite, we conclude that the
> input tuple is > or < the bound in question based on whether the bound is
> a lower or upper bound, respectively.
>
> Or are you saying that get_tuple_for_partition() should simply return -1
> (partition not found) in case of encountering a NULL in range partition
> key to the caller instead of throwing error as is now?  If the user sees
> the message and decides to create a new range partition that *will* accept
> such a row, how do they decide what its boundaries should be?

Well, I think the first thing you have to decide is whether range
partitioning is going to support NULL values in the partition key
columns at all.  If you want it to support that, then you've got to
decide how it's going to be specified in the SQL syntax.  I had the
impression that you were planning not to support that, in which case
you need to reject all such rows.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot too old logging

2016-11-18 Thread Brad DeJong
On Wed, Nov 16, 2016 at 6:43 AM, Robert Haas wrote:
> I think I was suggesting: One or more rows required by this query may
> already have been removed from "%s".

I keep reading that as "you have data corruption because something removed rows 
that your query needs" rather than "this query took too long to complete but 
may succeed if you resubmit it".



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> ISTM that this would change the "immediate shutdown" to not save stats
>> files anymore.  So far, all the shutdown modes are equivalent in terms
>> of how they preserve data and system state.  They differ only in when
>> the hard work happens.  This would be a deviation from that principle.

> There is that.  Up to now, an immediate shutdown request didn't cause
> any actual data loss, but now it would.

Oh, no, after rereading the thread I remember the point here: during
crash recovery, we intentionally zap the stats files on the grounds
that they might be broken, or might be out-of-sync with the recovery
stop point.  See pgstat_reset_all().

Now, you could certainly argue that that's an idea we should rethink;
we're throwing away probably-useful data because it *might* be wrong,
which seems rather pointless when it's known to be approximate anyway.
But if that stays, there's no very good reason not to short-circuit
writing of the files during immediate shutdown.

On the other side of the coin: I note that in 9.4 and up, the postmaster
has a 5-second patience limit before SIGKILL'ing child processes during a
crash or immediate shutdown (cf commit 82233ce7e).  That limit applies to
the stats collector too, and it seems to me that it addresses the real
problem here (ie, unbounded shutdown time) much better than the proposed
patch.  Note that the complaint that started this thread was against 9.2.

My feeling is that 82233ce7e has obsoleted all of the proposals made so
far in this thread, and that we should reject them all.  Instead we should
think about whether pgstat_reset_all should go away.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Tom Lane
Peter Eisentraut  writes:
> On 11/14/16 4:38 AM, Ashutosh Bapat wrote:
>> The patch 02_close_listen... closes the listen sockets
>> explicitly when it's known that postmaster is going to stop all the
>> children and then die. I have tried to see, if there's a possibility
>> that it closes the listen sockets but do not actually die, thus
>> causing a server which doesn't accept any connections and doesn't die.
>> But I have not found that possibility.

> I can see the point of this, but I'm not sure whether this is always a
> good idea.

IMO it's not, and closer analysis says that this patch series is an
attempt to solve something we already fixed, better, in 9.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Peter Eisentraut
On 11/18/16 12:00 PM, Tom Lane wrote:
> My feeling is that 82233ce7e has obsoleted all of the proposals made so
> far in this thread, and that we should reject them all.

Yes, it seems that very similar concerns were already addressed there.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-18 Thread Robert Haas
On Fri, Nov 18, 2016 at 5:59 AM, Amit Langote
 wrote:
> Oh but wait, that means I can insert rows with NULLs in the range
> partition key if I choose to insert it directly into the partition,
> whereas I have been thinking all this while that there could never be
> NULLs in the partition key of a range partition.  What's more,
> get_qual_for_partbound() (patch 0003) emits a IS NOT NULL constraint for
> every partition key column in case of a range partition.  Is that
> wrongheaded altogether?  (also see my reply to your earlier message about
> NULLs in the range partition key)

The easiest thing to do might be to just enforce that all of the
partition key columns have to be not-null when the range-partitioned
table is defined, and reject any attempt to DROP NOT NULL on them
later.  That's probably better that shoehorning it into the table
constraint.

> Thanks for the idea below!
>
>> 1. Forget the idea of a tree.  Instead, let the total number of tables
>> in the partitioning hierarchy be N and let the number of those that
>> are partitioned be K.  Assign each partitioned table in the hierarchy
>> an index between 0 and K-1.  Make your top level data structure (in
>> lieu of PartitionTreeNodeData) be an array of K PartitionDispatch
>> objects, with the partitioning root in entry 0 and the rest in the
>> remaining entries.
>>
>> 2. Within each PartitionDispatch object, store (a) a pointer to a
>> PartitionDesc and (b) an array of integers of length equal to the
>> PartitionDesc's nparts value.  Each integer i, if non-negative, is the
>> final return value for get_partition_for_tuple.  If i == -1, tuple
>> routing fails.  If i < -1, we must next route using the subpartition
>> whose PartitionDesc is at index -(i+1).  Arrange for the array to be
>> in the same order the PartitionDesc's OID list.
>>
>> 3. Now get_partition_for_tuple looks something like this:
>>
>> K = 0
>> loop:
>> pd = PartitionDispatch[K]
>> idx = list/range_partition_for_tuple(pd->partdesc, ...)
>> if (idx >= -1)
>> return idx
>> K = -(idx + 1)
>>
>> No recursion, minimal pointer chasing, no linked lists.  The whole
>> thing is basically trivial aside from the cost of
>> list/range_partition_for_tuple itself; optimizing that is a different
>> project.  I might have some details slightly off here, but hopefully
>> you can see what I'm going for: you want to keep the computation that
>> happens in get_partition_for_tuple() to an absolute minimum, and
>> instead set things up in advance so that getting the partition for a
>> tuple is FAST.  And you want the data structures that you are using in
>> that process to be very compact, hence arrays instead of linked lists.
>
> This sounds *much* better.  Here is a quick attempt at coding the design
> you have outlined above in the attached latest set of patches.

That shrank both 0006 and 0007 substantially, and it should be faster,
too.   I bet you can shrink them further:

- Why is PartitionKeyExecInfo a separate structure and why does it
have a NodeTag?  I bet you can dump the node tag, merge it into
PartitionDispatch, and save some more code and some more
pointer-chasing.

- I still think it's a seriously bad idea for list partitioning and
range partitioning to need different code-paths all over the place
here. List partitions support nulls but not multi-column partitioning
keys and range partitions support multi-column partitioning keys but
not nulls, but you could use an internal structure that supports both.
Then you wouldn't need partition_list_values_bsearch and also
partition_rbound_bsearch; you could have one kind of bound structure
that can be bsearch'd for either list or range.  You might even be
able to unify list_partition_for_tuple and range_partition_for_tuple
although that looks a little harder.  In either case, you bsearch for
the greatest value <= the value you have.  The only difference is that
for list partitioning, you have to enforce at the end that it is an
equal value, whereas for range partitioning less-than-or-equal-to is
enough.  But you should still be able to arrange for more code
sharing.

- I don't see why you need the bound->lower stuff any more.  If
rangeinfo.bounds[offset] is a lower bound for a partition, then
rangeinfo.bounds[offset+1] is either (a) the upper bound for that
partition and the partition is followed by a "gap" or (b) both the
upper bound for that partition and the lower bound for the next
partition.  With the inclusive/exclusive bound stuff gone, every range
bound has the same sense: if the probed value is <= the bound then
we're supposed to be a lower-numbered partition, but if > then we're
supposed to be in this partition or a higher-numbered one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot too old logging

2016-11-18 Thread Robert Haas
On Fri, Nov 18, 2016 at 11:49 AM, Brad DeJong  wrote:
> On Wed, Nov 16, 2016 at 6:43 AM, Robert Haas wrote:
>> I think I was suggesting: One or more rows required by this query may
>> already have been removed from "%s".
>
> I keep reading that as "you have data corruption because something removed 
> rows that your query needs" rather than "this query took too long to complete 
> but may succeed if you resubmit it".

I see your point, but remember that this would only the detail line.
The error would still be "snapshot too old".

ERROR:  snapshot too old
DETAIL: One or more rows required by this query may already have been
removed from "%s".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-18 Thread Robert Haas
On Thu, Nov 17, 2016 at 10:08 PM, Craig Ringer  wrote:
> On 17 November 2016 at 10:57, Robert Haas  wrote:
>> On Wed, Nov 16, 2016 at 9:00 PM, Tsunakawa, Takayuki
>>  wrote:
>>> Do we really want to enable libpq failover against pre-V10 servers?  I 
>>> don't think so, as libpq is a part of PostgreSQL and libpq failover is a 
>>> new feature in PostgreSQL 10.  At least, as one user, I don't want 
>>> PostgreSQL to sacrifice another round trip to establish a connection.  As a 
>>> developer, I don't want libpq code more complex than necessary (the 
>>> proposed patch adds a new state to the connection state machine.)  And I 
>>> think it's natural for the server to return the server attribute 
>>> (primary/standby, writable, etc.) as a response to the Startup message like 
>>> server_version, standard_conforming_strings and server_encoding.
>>
>> Well, generally speaking, a new feature that works against older
>> server is better than one that doesn't.  Of course, if that entails
>> making other compromises then you have to decide whether it's worth
>> it, but SELECT pg_is_in_recovery() and SHOW transaction_read_only
>> exist in older versions so if we pick either of those methods then it
>> will just work.  If we decide to invent some completely new method of
>> distinguishing masters from standbys, then it might not, but that
>> would be a drawback of such a choice, not a benefit.
>
> We can and probably should have both.
>
> If the server tells us on connect whether it's a standby or not, use that.
>
> Otherwise, ask it.
>
> That way we don't pay the round-trip cost and get the log spam when
> talking to newer servers that send us something useful in the startup
> packet, but we can still query it on older servers. Graceful fallback.
>
> Every round trip is potentially very expensive. Having libpq do them
> unnecessarily is bad.

True, but raising the bar for this feature so that it doesn't get done
is also bad.  It can be improved in a later patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unlogged tables cleanup

2016-11-18 Thread Michael Paquier
On Thu, Nov 17, 2016 at 7:06 AM, Robert Haas  wrote:
> Yeah, but surely it's obvious that if you don't WAL log it, it's not
> going to work for archiving or streaming.  It's a lot less obvious why
> you have to WAL log it when you're not doing either of those things if
> you've already written it and fsync'd it locally.  How is it going to
> disappear if it's been fsync'd, one wonders?

That's not obvious that replaying a WAL record for a database creation
or tablespace creation would cause that for sure! I didn't know that
redo was wiping them out with rmtree() at replay before looking at
this bug. One more thing to recall when fixing an issue in this area
in the future.

>> More seriously, if there could be more details regarding that, I would
>> think that we could say something like "logging the init fork is
>> mandatory in any case to ensure its on-disk presence when at recovery
>> replay, even on non-default tablespaces whose base location are
>> deleted and re-created from scratch if the WAL record in charge of
>> creating this tablespace gets replayed". The problem shows up because
>> of tablespaces being deleted at replay at the end... So perhaps this
>> makes sense. What do you think?
>
> Yes, that's about what I think we need to explain.

OK, what do you think about the attached then?

I came up with something like that for those code paths:
-   /* Write the page.  If archiving/streaming, XLOG it. */
+   /*
+* Write the page and log it unconditionally.  This is important
+* particularly for indexes created on tablespaces and databases
+* whose creation happened after the last redo pointer as recovery
+* removes any of their existing content when the corresponding
+* create records are replayed.
+*/
I have not been able to use the word "create" less than that. The
patch is really repetitive, but I think that we had better mention the
need of logging the content in each code path and not touch
log_newpage() to keep it a maximum flexible.

> Actually, I'm
> wondering if we ought to just switch this over to using the delayChkpt
> mechanism instead of going through this complicated song-and-dance.
> Incurring an immediate sync to avoid having to WAL-log this is a bit
> tenuous, but having to WAL-log it AND do the immediate sync just seems
> silly, and I'm actually a bit worried that even with your fix there
> might still be a latent bug somewhere.  We couldn't switch mechanisms
> cleanly in the 9.2 branch (cf.
> f21bb9cfb5646e1793dcc9c0ea697bab99afa523) so perhaps we should
> back-patch it in the form you have it in already, but it's probably
> worth switching over at least in master.

Thinking through it... Could we not just rip off the sync phase
because there is delayChkpt? It seems to me that what counts is that
the commit of the transaction that creates the relation does not get
past the redo point. It would matter for read uncommitted transactions
but that concept does not exist in PG, and never will. On
back-branches it is definitely safer to keep the sync, I am just
thinking about a HEAD-only optimization here as you do.
-- 
Michael


unlogged-tbspace-fix-v4.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-18 Thread David Steele
On 11/14/16 4:29 AM, Michael Paquier wrote:
> On Mon, Nov 14, 2016 at 6:10 PM, Kyotaro HORIGUCHI
>  wrote:
>> It applies the master and compiled cleanly and no error by
>> regtest.  (I didn't confirmed that the problem is still fixed but
>> seemingly no problem)
> 
> Thanks for double-checking.

Also looks good to me.  I like curinsert_flags and XLOG_SKIP_PROGRESS
better than the old names.

>> If I'm not missing something, at the worst we have a checkpoint
>> after a checkpointer restart that should have been supressed. Is
>> it worth picking it up for the complexity?

That's the way I read it as well.  It's not clear to me how the
checkpointer would get restarted under normal circumstances.

I did a kill on the checkpointer and it was ignored.  After a kill -9
the checkpointer process came back but also switched the xlog.  Is this
the expected behavior?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL recycle retading based on active sync rep.

2016-11-18 Thread Andres Freund
Hi,

On 2016-11-18 14:12:42 +0900, Kyotaro HORIGUCHI wrote:
> We had too-early WAL recycling during a test we had on a sync
> replication set. This is not a bug and a bit extreme case but is
> contrary to expectation on synchronous replication.

I don't think you can expect anything else.


> This is because sync replication doesn't wait non-commit WALs to
> be replicated. This situation is artificially caused with the
> first patch attached and the following steps.

You could get that situation even if we waited for syncrep. The
SyncRepWaitForLSN happens after delayChkpt is unset.

Additionally a syncrep connection could break for a a short while, and
you'd loose all guarantees anyway.


> - Is this situation required to be saved? This is caused by a
>   large transaction, spans over two max_wal_size segments, or
>   replication stall lasts for a chackepoint period.

I very strongly think not.


> - Is the measure acceptable?  For the worst case, a master
>   crashes from WAL space exhaustion. (But such large transaction
>   won't/shouldn't exist?)

No, imo not.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JIT compiler for expressions

2016-11-18 Thread Jim Nasby

On 10/28/16 6:47 AM, Dmitry Melnik wrote:

We'd like to present our work on adding LLVM JIT compilation of
expressions in SQL queries for PostgreSQL. The source code (based on
9.6.1) along with brief manual is available at our
github: https://github.com/ispras/postgres
 . Сurrent speedup for TPC-H Q1 is
20% (on 40GB workload). Please feel free to test it and tell us what you
think.


For anyone looking to experiment with some of this stuff, it's possible 
to get LLVM-based JIT via plpython and numba as well.


https://github.com/AustinPUG/PGDay2016/blob/master/Numba%20inside%20PostgreSQL.ipynb
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_sequence catalog

2016-11-18 Thread Peter Eisentraut
On 11/11/16 12:53 PM, Andreas Karlsson wrote:
> Other than my comment above about is_called and last_value I think the 
> patch looks great.

I have made that change and committed the patch.  Thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple count used while costing MergeAppend and that for an append rel

2016-11-18 Thread Tom Lane
Ashutosh Bapat  writes:
> AFAIU, There's difference in the tuples and rows members of
> RelOptInfo. While tuples gives the estimate of number of tuples per
> pg_class, rows gives the number of rows output by that relation. From
> that perspective rel->tuples should be set of the sum of
> child_rel->tuples from each of the children. If we do that the last
> argument to cost_merge_append() should change from rel->tuples to
> rel->rows.

> Does that make sense? Attached patch makes those changes.

AFAICS, what you propose to add in set_append_rel_size is pure overhead.
There's no conceivable use to computing sum-of-raw-tuple-counts for an
appendrel ... or at least, if there is, you didn't say what you expect
it would be good for.  Normally the difference between rel->tuples and
rel->rows corresponds to the selectivity of the rel's restriction clauses.
Since an appendrel has no restrictions of its own (they've all been pushed
down to the child rels) it doesn't seem unreasonable for it to have tuples
equal to rows.  While we could also define it as you suggest, I don't see
the point of expending extra cycles to do so.

I concur that using path->rows not rel->tuples in create_merge_append_path
would be an improvement.  I think it makes no difference now, but if we
were to support parameterized mergeappend paths, rel->rows and rel->tuples
would both be wrong for such a path.  (I believe the possibility of a
parameterized path is the reason why create_append_path computes
path->rows the hard way instead of just copying it from rel->rows.
The logic in create_merge_append_path was probably just copied from
create_append_path; it's overkill today but might not be so forever.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-18 Thread Peter Eisentraut
On 11/17/16 12:30 AM, Tsunakawa, Takayuki wrote:
> No, I'm not recommending a higher value, but just removing the doubtful 
> sentences of 512MB upper limit.  The advantage is that eliminating this 
> sentence will make a chance for users to try best setting.

I think this is a good point.  The information is clearly
wrong/outdated.  We have no new better information, but we should remove
misleading outdated advice and let users find new advice.  Basically,
this just puts Windows on par with other platforms with regard to
shared_buffers tuning, doesn't it?

I'm inclined to commit the original patch if there are no objections.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Robert Haas
On Thu, Nov 17, 2016 at 10:43 PM, Joshua Drake  wrote:
> Why not hash the URL? Something like:
>
> Http://postgresopen.org/archive/743257890976432

I suggested that upthread...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Joshua Drake
Sorry didn't see it.

On Nov 18, 2016 12:44, "Robert Haas"  wrote:

> On Thu, Nov 17, 2016 at 10:43 PM, Joshua Drake 
> wrote:
> > Why not hash the URL? Something like:
> >
> > Http://postgresopen.org/archive/743257890976432
>
> I suggested that upthread...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] patch: function xmltable

2016-11-18 Thread Pavel Stehule
Hi

2016-11-17 19:22 GMT+01:00 Alvaro Herrera :

> I've been going over this patch.  I think it'd be better to restructure
> the  before adding the docs for this new function; I already
> split it out, so don't do anything about this.
>
> Next, looking at struct TableExprBuilder I noticed that the comments are
> already obsolete, as they talk about function params that do not exist
> (missing_columns) and they fail to mention the ones that do exist.
> Also, function member SetContent is not documented at all.  Overall,
> these comments do not convey a lot -- apparently, whoever reads them is
> already supposed to know how it works: "xyz sets a row generating
> filter" doesn't tell me anything.  Since this is API documentation, it
> needs to be much clearer.
>
> ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
> whatsoever.  Needs fixed.
>

I am sending the patch with more comments - but it needs a care someone
with good English skills.


>
> I wonder if it'd be a good idea to install TableExpr first without the
> implementing XMLTABLE, so that it's clearer what is API and what is
> implementation.
>

I am not sure about this step - the API is clean from name. In this moment,
for this API is not any other tests than XMLTABLE implementation.


>
> The number of new keywords in this patch is depressing.  I suppose
> there's no way around that -- as I understand, this is caused by the SQL
> standard's definition of the syntax for this feature.
>
> Have to go now for a bit -- will continue looking afterwards.  Please
> submit delta patches on top of your latest v12 to fix the comments I
> mentioned.
>
>
Regards

Pavel


> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index c386be1..76c2ec8 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -4505,8 +4505,18 @@ ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
 }
 
 /* 
- *		ExecEvalTableExpr
+ *		ExecEvalTableExprProtected and ExecEvalTableExpr
  *
+ * Evaluate a TableExpr node. ExecEvalTableExprProtected is called
+ * from ExecEvalTableExpr from PG_TRY(), PG_CATCH() block be ensured
+ * all allocated memory be released.
+ *
+ * It creates TableExprBuilder object, does all necessary settings and
+ * reads all tuples from this object. Is possible to go over this
+ * cycle more times per query - when LATERAL JOIN is used. On the end
+ * of cycle, the TableExprBuilder object is destroyed, state pointer
+ * to this object is cleaned, and related memory context is resetted.
+ * New call starts new cycle.
  * 
  */
 static Datum
@@ -4682,6 +4692,12 @@ ExecEvalTableExprProtected(TableExprState * tstate,
 	return result;
 }
 
+/*
+ * ExecEvalTableExpr - this is envelop of ExecEvalTableExprProtected() function.
+ *
+ * This function ensures releasing all TableBuilder context and related
+ * memory context, when ExecEvalTableExprProtected fails on exception.
+ */
 static Datum
 ExecEvalTableExpr(TableExprState * tstate,
   ExprContext *econtext,
diff --git a/src/include/executor/tableexpr.h b/src/include/executor/tableexpr.h
index ad5d8e2..3056317 100644
--- a/src/include/executor/tableexpr.h
+++ b/src/include/executor/tableexpr.h
@@ -20,27 +20,37 @@
  * for generating content of table-expression functions like
  * XMLTABLE.
  *
- * CreateContext is called before execution and it does query level
- * initialization. Returns pointer to private TableExprBuilder data
- * (context). A query context should be active in call time. A param
- * missing_columns is true, when a user doesn't specify returned
- * columns.
+ * The TableBuilder is initialized by calling CreateContext function
+ * at evaluation time. First parameter - tuple descriptor describes
+ * produced (expected) table. in_functions is a array of FmgrInfo input
+ * functions for types of columns of produced table. The typioparams
+ * is a array of typio Oids for types of columns of produced table.
+ * The created context is living in special memory context passed
+ * as last parameter.
  *
- * AddNs add namespace info when namespaces are is supported.
- * then passed namespace is default namespace. Namespaces should be
- * passed before Row/Column Paths setting. When name is NULL, then
- * related uri is default namespace.
+ * The SetContent function is used for passing input document to
+ * table builder. The table builder handler knows expected format
+ * and it can do some additional transformations that are not propagated
+ * out from table builder.
  *
- * SetRowPath sets a row generating filter.
+ * The AddNs add namespace info when namespaces are supported.
+ * Namespaces should be passed before Row/Column Paths setting.
+ * When name is NULL,

Re: [HACKERS] patch: function xmltable

2016-11-18 Thread Alvaro Herrera
Pavel Stehule wrote:

> 2016-11-17 19:22 GMT+01:00 Alvaro Herrera :
> 
> > Next, looking at struct TableExprBuilder I noticed that the comments are
> > already obsolete, as they talk about function params that do not exist
> > (missing_columns) and they fail to mention the ones that do exist.
> > Also, function member SetContent is not documented at all.  Overall,
> > these comments do not convey a lot -- apparently, whoever reads them is
> > already supposed to know how it works: "xyz sets a row generating
> > filter" doesn't tell me anything.  Since this is API documentation, it
> > needs to be much clearer.
> >
> > ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
> > whatsoever.  Needs fixed.
> 
> I am sending the patch with more comments - but it needs a care someone
> with good English skills.

Thanks, I can help with that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Alvaro Herrera
Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 11/14/16 4:38 AM, Ashutosh Bapat wrote:
> >> The patch 02_close_listen... closes the listen sockets
> >> explicitly when it's known that postmaster is going to stop all the
> >> children and then die. I have tried to see, if there's a possibility
> >> that it closes the listen sockets but do not actually die, thus
> >> causing a server which doesn't accept any connections and doesn't die.
> >> But I have not found that possibility.
> 
> > I can see the point of this, but I'm not sure whether this is always a
> > good idea.
> 
> IMO it's not, and closer analysis says that this patch series is an
> attempt to solve something we already fixed, better, in 9.4.

... by the same patch submitter.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Tom Lane
Robert Haas  writes:
> On Thu, Nov 17, 2016 at 10:43 PM, Joshua Drake  wrote:
>> Why not hash the URL? Something like:
>> Http://postgresopen.org/archive/743257890976432

> I suggested that upthread...

Don't like the hashing idea because that would be useless for searching,
eg, one's own local mail archives.  Also, if you are looking at a message
in your own mail store when composing a commit message (which I generally
am, don't know about other committers) how are you going to get the hash?
I think it's important to have the message-ID in cleartext.

AFAICS the proposal here is to go from, eg,

Discussion: 

to

Discussion: 
https://www.postgresql.org/message-id/caf4au4w6rrh_j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com

which is annoying, but the line was overlength already.  Maybe we could
drop the Discussion: prefix and just write a URL:

https://www.postgresql.org/message-id/caf4au4w6rrh_j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com

That's still overlength for gmail message-IDs, though, so I'm not sure
it's worth losing the label for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-18 Thread Robert Haas
On Thu, Nov 17, 2016 at 4:25 PM, Michael Paquier
 wrote:
> On Thu, Nov 17, 2016 at 11:16 AM, Robert Haas  wrote:
>> On Thu, Nov 17, 2016 at 1:41 PM, Michael Paquier
>>  wrote:
>>> Okay, let's remove the documentation then. What do you think about the
>>> updated version attached?
>>> (Even this patch enters into "Needs Review" state).
>>
>> LGTM.  I'll commit it if there are not objections.
>
> Thank you.

On further reflection, I'm not sure this fixes the original complaint.
In fact, it might even make it worse.  The performDeletion() call here
is going to run inside of a loop that's scanning through pg_class, so
all in a single transaction.  So we'll actually try to drop ALL
orphaned tables for ALL backends that failed to clean up in a single
transaction, as opposed to the current state of affairs where we will
drop all orphaned tables for ONE backend in a single transaction.  If
anything, that could require MORE locks.  And if it starts failing,
then all autovacuum activity in that database will cease.  Oops.

So now I think that we probably need to make this logic a bit smarter.
Add all of the OIDs that need to be dropped to a list.  Then have a
loop prior to the main loop (where it says "Perform operations on
collected tables.") which iterates over that list and drops those
tables one by one, starting a transaction every (say) 100 tables or
after an error.  For bonus points, if a transaction fails, put all of
the OIDs except the one that provoked the failure back into the list
of OIDs to be dropped, so that we still make a progress even if some
DROPs are failing for some reason.

That might sound adding unnecessary work just for the sake of
paranoia, but I don't think it is.  Failures here won't be common, but
since they are entirely automated there will be no human intelligence
available to straighten things out.  Barring considerable caution,
we'll just enter a flaming death spiral.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> IMO it's not, and closer analysis says that this patch series is an
>> attempt to solve something we already fixed, better, in 9.4.

> ... by the same patch submitter.

[ confused ]  The commit log credits 82233ce7e to MauMau and yourself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Robert Haas
On Fri, Nov 18, 2016 at 4:12 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> IMO it's not, and closer analysis says that this patch series is an
>>> attempt to solve something we already fixed, better, in 9.4.
>
>> ... by the same patch submitter.
>
> [ confused ]  The commit log credits 82233ce7e to MauMau and yourself.

IIUC, MauMau = Tsunakawa Takayuki.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Alvaro Herrera
Robert Haas wrote:
> On Fri, Nov 18, 2016 at 4:12 PM, Tom Lane  wrote:
> > Alvaro Herrera  writes:
> >> Tom Lane wrote:
> >>> IMO it's not, and closer analysis says that this patch series is an
> >>> attempt to solve something we already fixed, better, in 9.4.
> >
> >> ... by the same patch submitter.
> >
> > [ confused ]  The commit log credits 82233ce7e to MauMau and yourself.
> 
> IIUC, MauMau = Tsunakawa Takayuki.

Right, 
https://www.postgresql.org/message-id/964413269E3A41409EA53EE0D813E48C@tunaPC

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-18 Thread Robert Haas
On Fri, Nov 18, 2016 at 8:21 AM, Albe Laurenz  wrote:
> I didn't notice the CREATE TABLE ... AS EXECUTE problem.
>
> But then the change to use CURSOR_OPT_PARALLEL_OK in tcop/postgres.c should
> be reverted as well, because it breaks things just as bad:
>
>/* creates a parallel-enabled plan */
>PQprepare(conn, "pstmt", "SELECT id FROM l WHERE val = $1", 1, NULL);
>/* blows up with "cannot insert tuples during a parallel operation" */
>PQexec(conn, "CREATE TABLE bad(id) AS EXECUTE pstmt('abcd')");

Ouch.  I missed that.

> With Tobias' patch, this does not fail.
>
> I think we should either apply something like that patch or disable parallel
> execution for prepared statements altogether and document that.

I agree.  I think probably the first option is better, but I might
have to commit with one hand so I can hold my nose with the other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-18 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> IIUC, MauMau = Tsunakawa Takayuki.

> Right, 
> https://www.postgresql.org/message-id/964413269E3A41409EA53EE0D813E48C@tunaPC

Ah!  I'd forgotten.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-11-18 Thread Peter Eisentraut
On 10/4/16 10:47 AM, Anastasia Lubennikova wrote:
> 03.10.2016 15:29, Anastasia Lubennikova:
>> 03.10.2016 05:22, Michael Paquier:
>>> On Tue, Sep 27, 2016 at 12:17 AM, Anastasia Lubennikova
>>>  wrote:
 Ok, I'll write it in a few days.
>>> Marked as returned with feedback per last emails exchanged.
>>
>> The only complaint about this patch was a lack of README,
>> which is fixed now (see the attachment). So, I added it to new CF,
>> marked as ready for committer.
> 
> One more fix for pg_upgrade.

Latest patch doesn't apply.  See also review by Brad DeJong.  I'm
setting it back to Waiting.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Alvaro Herrera
Tom Lane wrote:

> AFAICS the proposal here is to go from, eg,
> 
> Discussion: 
> 
> 
> to
> 
> Discussion: 
> https://www.postgresql.org/message-id/caf4au4w6rrh_j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com
> 
> which is annoying, but the line was overlength already.  Maybe we could
> drop the Discussion: prefix and just write a URL:
> 
> https://www.postgresql.org/message-id/caf4au4w6rrh_j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com

If we use our existing URL shortener, as I proposed upthread, it's still
overlength but decent enough:

https://postgr.es/m/caf4au4w6rrh_j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com

I was pretty sure I had proposed this previously and was surprised that
I couldn't find it in the archives anywhere; and sure enough, what I did
was post the idea to sysadm...@postgresql.org (over a year ago) so it
wasn't publicly visible.  Anyway, I think the shortener is worth using
(and we can certainly spare the namespace).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Magnus Hagander
On Fri, Nov 18, 2016 at 11:08 PM, Alvaro Herrera 
wrote:

> Tom Lane wrote:
>
> > AFAICS the proposal here is to go from, eg,
> >
> > Discussion:  whht8...@mail.gmail.com>
> >
> > to
> >
> > Discussion: https://www.postgresql.org/message-id/CAF4Au4w6rrH_
> j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com
> >
> > which is annoying, but the line was overlength already.  Maybe we could
> > drop the Discussion: prefix and just write a URL:
> >
> > https://www.postgresql.org/message-id/CAF4Au4w6rrH_
> j1bvvhzposrihcog7sgj3lsx0ty8zdwhht8...@mail.gmail.com
>
> If we use our existing URL shortener, as I proposed upthread, it's still
> overlength but decent enough:
>
> https://postgr.es/m/CAF4Au4w6rrH_j1bvVhzpOsRiHCog7sGJ3LSX0tY8Zd
> whht8...@mail.gmail.com



It does cut it down a bit. And doing it that way would certainly make it
very "long term survivable". If we put a hash in there, we have to store
the hash somewhere. Not that this is hard, but if there is concern about
long-termability.

Another option would be to teach the URL shortener about our messages
specifically, and just use an internal surrogate key we already have for
them. That way it could be something like the current Planet URLs (which
can be for example http://postgr.es/p/3Dl), so much shorter. To post one
people would have to go to a webform and paste in the messageid or existing
URL though -- or we could put a link on the page in the archives.

It would make the URLs actually short, but as mentioned upthread, that
wouldn't work at all if offline. So it'd be a tradeoff between those, but
so are pretty much all other options that don't include the full message-id.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] patch: function xmltable

2016-11-18 Thread Alvaro Herrera
The SQL standard seems to require a comma after the XMLNAMESPACES
clause:

 ::=
  XMLTABLE 
  [   ]

  [  ]
COLUMNS  

I don't understand the reason for that, but I have added it:

| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' 
',' c_expr xmlexists_argument ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = NIL;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' 
',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = $11;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
;

Another thing I did was remove the TableExprColOptionsOpt production; in
its place I added a third rule in TableExprCol for "ColId Typename
IsNotNull" (i.e. no options).  This seems to reduce the size of the
generated gram.c a good dozen kB.


I didn't like much the use of c_expr in all these productions.  As I
understand it, c_expr is mostly an implementation artifact and we should
be using a_expr or b_expr almost everywhere.  I see that XMLEXISTS
already expanded the very limited use of c_expr there was; I would
prefer to fix that one too rather than replicate it here.  TBH I'm not
sure I like that XMLTABLE is re-using xmlexists_argument.  

Actually, is the existing XMLEXISTS production correct?  What I see in
the standard is

 ::= 

 ::=
   PASSING  
 [ {   }... ]

 ::= 

 ::=  [ {  
 }... ]

 ::=

  | 

 ::=  FOR ORDINALITY

 ::=
  [  ]
  [  ]
  [ PATH  ]

 ::= 

so I think this resolves "PASSING BY {REF,VALUE} ", but what
we have in gram.y is:

/* We allow several variants for SQL and other compatibility. */
xmlexists_argument:
PASSING c_expr
{
$$ = $2;
}
| PASSING c_expr BY REF
{
$$ = $2;
}
| PASSING BY REF c_expr
{
$$ = $4;
}
| PASSING BY REF c_expr BY REF
{
$$ = $4;
}
;

I'm not sure why we allow "PASSING c_expr" at all.  Maybe if BY VALUE/BY
REF is not specified, we should just not have PASSING at all?

If we extended this for XMLEXISTS for compatibility with some other
product, perhaps we should look into what that product supports for
XMLTABLE; maybe XMLTABLE does not need all the same options as
XMLEXISTS.

The fourth option seems very dubious to me.  This was added by commit
641459f26, submitted here:
https://www.postgresql.org/message-id/4c0f6dbf.9000...@mlfowler.com

... Hm, actually some perusal of the XMLEXISTS predicate in the standard
shows that it's quite a different thing from XMLTABLE.  Maybe we
shouldn't reuse xmlexists_argument here at all.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-11-18 Thread Peter Geoghegan
On Thu, Nov 17, 2016 at 12:04 PM, Peter Geoghegan  wrote:
>> Hm, if we want that - and it doesn't seem like a bad idea - I think we
>> should be make it available without recompiling.
>
> I suppose, provided it doesn't let CORRUPTION elevel be < ERROR. That
> might be broken if it was allowed.

What do you think about new argument with default vs. GUC? I guess
that the GUC might be a lot less of a foot-gun. We might even give it
a suitably scary name, to indicate that it will make the server PANIC.
(I gather that you don't care about other aspects of verbosity -- just
about the ability to make amcheck PANIC in the event of an invariant
violation without recompiling it.)


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Stephen Frost
Magnus,

* Magnus Hagander (mag...@hagander.net) wrote:
> It would make the URLs actually short, but as mentioned upthread, that
> wouldn't work at all if offline. So it'd be a tradeoff between those, but
> so are pretty much all other options that don't include the full message-id.

This is a bit of a crazy idea, but in the new list system, couldn't we
add a header which includes "our" surrogate message-id?  Or possibly the
entire URL to the message, and maybe the URL for the entire thread?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Jim Nasby

On 11/18/16 4:35 PM, Magnus Hagander wrote:

Another option would be to teach the URL shortener about our messages
specifically, and just use an internal surrogate key we already have for
them. That way it could be something like the current Planet URLs (which
can be for example http://postgr.es/p/3Dl), so much shorter. To post one
people would have to go to a webform and paste in the messageid or
existing URL though -- or we could put a link on the page in the archives.

It would make the URLs actually short, but as mentioned upthread, that
wouldn't work at all if offline. So it'd be a tradeoff between those,
but so are pretty much all other options that don't include the full
message-id.


Would it be possible to stick that surrogate key into outgoing list 
messages as an X- header?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Andrew Dunstan



On 11/18/2016 07:07 PM, Stephen Frost wrote:

Magnus,

* Magnus Hagander (mag...@hagander.net) wrote:

It would make the URLs actually short, but as mentioned upthread, that
wouldn't work at all if offline. So it'd be a tradeoff between those, but
so are pretty much all other options that don't include the full message-id.

This is a bit of a crazy idea, but in the new list system, couldn't we
add a header which includes "our" surrogate message-id?  Or possibly the
entire URL to the message, and maybe the URL for the entire thread?





I had similar ideas.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-11-18 Thread Peter Geoghegan
On Thu, Aug 18, 2016 at 2:15 PM, Peter Geoghegan  wrote:
> I think that this is a bad idea. We need to implement suffix
> truncation of internal page index tuples at some point, to make them
> contain less information from the original leaf page index tuple.
> That's an important optimization, because it increases fan-in. This
> seems like a move in the opposite direction.

Maybe I was too hasty, here. Can you rebase this patch, Claudio?

It's possible that this idea could have further non-obvious benefits.
I see some potential for this to help with nbtree page deletion, item
recycling, etc. For example, maybe VACUUM can manage to do something
much closer to a regular index scan when that seems to make sense --
Andres (CC'd) has talked about this before. And, maybe we get a
benefit from localizing where the LP_DEAD bit can be set for the
kill_prior_tuple stuff to work in UPDATE-heavy workloads. Naturally,
there'd often be some kind of locality in terms of older row versions
belonging earlier in the heap, versions which can be concentrated onto
one leaf page with this patch. As newer updates make it look like the
leaf page needs to be split, there is a concentration of LP_DEAD-set
line items to reclaim space from, letting some UPDATEs (index tuple
inserters) defer the split. There is a lot less value in the LP_DEAD
stuff if reclaimable line items are sprayed all over the place, which
seems quite possible to me on current versions.

I also think that this might help with bitmap index scans.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-18 Thread Craig Ringer
On 19 November 2016 at 01:29, Robert Haas  wrote:

>> We can and probably should have both.
>>
>> If the server tells us on connect whether it's a standby or not, use that.
>>
>> Otherwise, ask it.
>>
>> That way we don't pay the round-trip cost and get the log spam when
>> talking to newer servers that send us something useful in the startup
>> packet, but we can still query it on older servers. Graceful fallback.
>>
>> Every round trip is potentially very expensive. Having libpq do them
>> unnecessarily is bad.
>
> True, but raising the bar for this feature so that it doesn't get done
> is also bad.  It can be improved in a later patch.

Good point. Starting with the followup query method seems fine.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-11-18 Thread Claudio Freire
On Fri, Nov 18, 2016 at 10:05 PM, Peter Geoghegan  wrote:
> On Thu, Aug 18, 2016 at 2:15 PM, Peter Geoghegan  wrote:
>> I think that this is a bad idea. We need to implement suffix
>> truncation of internal page index tuples at some point, to make them
>> contain less information from the original leaf page index tuple.
>> That's an important optimization, because it increases fan-in. This
>> seems like a move in the opposite direction.
>
> Maybe I was too hasty, here. Can you rebase this patch, Claudio?

I've been changing the on-disk format considerably, to one that makes
more sense.

I haven't posted it because it still doesn't have suffix (and prefix)
truncation, but it should be compatible with it (ie: it could be
implemented as an optimization that doesn't change the on-disk
format).

However, I haven't tested the current state of the patch thoroughly.

If a WIP is fine, I can post the what I have, rebased.

> It's possible that this idea could have further non-obvious benefits.
> I see some potential for this to help with nbtree page deletion, item
> recycling, etc. For example, maybe VACUUM can manage to do something
> much closer to a regular index scan when that seems to make sense --

That was on my mind too

> I also think that this might help with bitmap index scans.

How so?

AFAIK it helps regular scans on low-cardinality indexes more than
bitmap index scans. With low cardinality indexes, the resulting heap
access pattern will be an unordered series of sequential range scans,
which is better than the fully random scan the current layout causes.
Bitmap index scans, however, deny that benefit.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_recvlogical --endpos

2016-11-18 Thread Euler Taveira
On 01-09-2016 01:41, Craig Ringer wrote:
> Here's a rebased version of my pg_recvlogical --endpos patch from the
> 9.5 series, updated to incoroprate Álvaro's changes.
> 
I should review this patch in the other commitfest but was swamped with
work. The patch is almost ready but I have some points.

* We usually don't include itemlist into binary options. I suggest you
to add a new paragraph for the first item and the second one you could
add a note;
* I think you should add a small note explaining the 'endpos' behavior.
Also, I think endpos could be inclusive (since recovery also has this
behavior by default);
* I think there is a typo in those pieces of code:

+   if (endpos != InvalidXLogRecPtr && walEnd >= endpos)

+   if (endpos != InvalidXLogRecPtr && cur_record_lsn > endpos)

If you decide to be inclusive, it should be > otherwise >=.

There could be TAP tests for pg_recvlogical but it is material for
another patch.

I'll mark this patch waiting on author for your considerations.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Andres Freund


On November 18, 2016 1:06:18 PM PST, Tom Lane  wrote:
>Robert Haas  writes:
>> On Thu, Nov 17, 2016 at 10:43 PM, Joshua Drake 
>wrote:
>>> Why not hash the URL? Something like:
>>> Http://postgresopen.org/archive/743257890976432
>
>> I suggested that upthread...
>
>Don't like the hashing idea because that would be useless for
>searching,
>eg, one's own local mail archives.  Also, if you are looking at a
>message
>in your own mail store when composing a commit message (which I
>generally
>am, don't know about other committers) how are you going to get the
>hash?
>I think it's important to have the message-ID in cleartext.

I do ask of that offline or at least locally. So I won't use something that 
makes that uncomfortable.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-11-18 Thread Peter Geoghegan
On Fri, Nov 18, 2016 at 5:27 PM, Claudio Freire  wrote:
> I've been changing the on-disk format considerably, to one that makes
> more sense.

I can see how that would be necessary. I'm going to suggest some more
things that need a new btree version number now, too.

I realized today, quite suddenly, why I disliked your original patch:
it didn't go far enough with embracing the idea of
heap-item-pointer-as-key. In other words, I didn't like that you
didn't change anything in the internal pages. Maybe you should put
heap TIDs someplace in new internal pages, so that even there we treat
them as part of the key. This may significantly benefit UPDATE-heavy
workloads where HOT doesn't work out so well. Particularly for
non-unique indexes, we currently have to wade through a lot of bloat
from the leaf level, rather than jumping straight to the correct leaf
page when updating or inserting.

You might not want to do the same with unique indexes, where we must
initially buffer lock "the first leaf page that a duplicate could be
on" while we potentially look in one or more additional sibling pages
(but bloat won't be so bad there, perhaps). Or, you might want to make
sure that B-Tree tuple insertions still find that "first page" to
check, despite still generally treating heap item pointer as part of
the key proper (in unique indexes, it can still behave like NULL, and
not participate in uniqueness checking, while still essentially being
part of the key/sort order).

Of course, it isn't great to add stuff to the internal pages, but if
you're also implementing suffix truncation, there is probably no harm
at all. You're only going to affect cases that will also greatly
benefit from having that extra information, to guide insertions of new
values to the right place more efficiently (by inserters, I mostly
mean insertions from UPDATEs).

It also occurs to me that our performance for updates in the event of
many NULL values in indexes is probably really bad, and could be
helped by this. You'll want to test all of this with a workload that
isn't very sympathetic to HOT, of course -- most of these benefits are
seen when HOT doesn't help.

> However, I haven't tested the current state of the patch thoroughly.
>
> If a WIP is fine, I can post the what I have, rebased.

I'm mostly curious about the effects on bloat. I now feel like you
haven't sufficiently examined the potential benefits there, since you
never made heap item pointer a first class part of the key space.
Maybe you'd like to look into that yourself first?

>> I also think that this might help with bitmap index scans.
>
> How so?

I was thinking about teaching nbtree to start the scan at the level
above the leaf level. If the heap TID was part of the key proper, one
can imagine it being much cheaper to build a bitmap for a moderate
cardinality value (e.g., NULL values needed for "WHERE foo IS NULL"
index scans). The exact details would need to be worked out, but one
can imagine building a bitmap that is much larger than necessary one
level up from the leaf, then lazily unsetting bits as we descend the
B-Tree to the leaf level and find pages whose bitmap bits can be
unset. We might balance the avoidance of I/O during the scan against
how much we might expect to save in a subsequent bitmap heap scan, and
so on. This might be based on a selectivity estimate.

That's all fairly hand-wavy, certainly, but I see significant
potential along these lines.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Michael Paquier
On Fri, Nov 18, 2016 at 6:05 PM, Andres Freund  wrote:
> I do ask of that offline or at least locally. So I won't use something that 
> makes that uncomfortable.

No committer here. But I do search for mails using the discussion ID
present in the log message by querying in my local mail box copy on a
more or less daily basis. So having a hash will clearly reduce the
flexibility of how things are possible today.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimizations - pushing filter before aggregation

2016-11-18 Thread Douglas Doole
On Fri, Nov 18, 2016 at 12:47 AM Pavel Stehule 
wrote:

Isn't possible in this case push equivalence before aggregation?


If I'm understanding you correctly, that would lead to wrong results.
Here's a simple example:

CREATE VIEW v AS SELECT MIN(a) m FROM t;

and table T contains:

T:A
---
1
2
3

SELECT * FROM v WHERE m = 2

The minimum value of A is 1, so the query should return no rows.

However, if we filter first we'd be effectively doing the query:

SELECT MIN(a) m FROM
   (SELECT a FROM t WHERE a=2) AS v(a)

The subquery is going to return an intermediate result of:

V:A
---
2

And the minimum of that is 2, which is the wrong answer.

- Doug
Salesforce


Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-18 Thread Amit Kapila
On Mon, Nov 14, 2016 at 9:33 AM, Michael Paquier 
wrote:
> On Mon, Nov 14, 2016 at 12:49 PM, Kyotaro HORIGUCHI
>  wrote:
>> At Sat, 12 Nov 2016 10:28:56 +0530, Amit Kapila 
wrote in 
>>> I think it is good to check the performance impact of this patch on
>>> many core m/c.  Is it possible for you to once check with Alexander
>>> Korotkov to see if he can provide you access to his powerful m/c which
>>> has 70 cores (if I remember correctly)?
>
> I heard about a number like that, and there is no reason to not do
> tests to be sure.
>

Okay, I have done some performance tests with this patch and found that it
doesn't have any noticeable impact which is good.  Details of performance
tests is below:
Machine configuration:
2 sockets, 28 cores (56 including Hyper-Threading)
RAM = 64GB
Data directory is configured on the magnetic disk and WAL on SSD.

Non-default postgresql.conf parameters
shared_buffers=8GB
max_connections=200
bgwriter_delay=10ms
checkpoint_completion_target=0

Keeping above parameters as fixed, I have varied checkpoint_timeout for
various tests.  Each of the below results is a median of 3, 15min pgbench
TPC-B tests.  All the tests are performed at 64 and or 128 client-count
(Client Count = number of concurrent sessions and threads (ex. -c 8 -j
8)).  All the tests are done for pgbench scale factor - 300 which means
data fits in shared buffers.


checkpoint_timeout=30s
client_count/patch_ver 64 128
HEAD 5176 6853
Patch 4963 6556
checkpoint_timeout=60s
client_count/patch_ver
64 128
HEAD 4962 6894
Patch 5228 6814
checkpoint_timeout=120s
client_count/patch_ver
64 128
HEAD 5443 7308
Patch 5453 6937
checkpoint_timeout=150s
client_count/patch_ver
128
HEAD 7316
Patch 7188


In above results, you can see that in some cases (example, for
checkpoint_time=30s, @128-client count) TPS with the patch is slightly
lower(1~5%), but I find that as a run-to-run variation, because on
repeating the tests, I could not see such regression.  The reason of
keeping low values for checkpoint_timeout and bgwriter_delay is to test if
there is any impact due to new locking introduced in checkpointer and
bgwriter.  The conclusion from my tests is that this patch is okay as far
as performance is concerned.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Mail thread references in commits

2016-11-18 Thread Andrew Dunstan



On 11/18/2016 09:05 PM, Andres Freund wrote:


On November 18, 2016 1:06:18 PM PST, Tom Lane  wrote:

I think it's important to have the message-ID in cleartext.

I do ask of that offline or at least locally. So I won't use something that 
makes that uncomfortable.




I agree. My original proposal preserved the Message-ID. I don't think 
the hashing proposal would work well.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible optimizations - pushing filter before aggregation

2016-11-18 Thread Pavel Stehule
2016-11-19 3:59 GMT+01:00 Douglas Doole :

>
> On Fri, Nov 18, 2016 at 12:47 AM Pavel Stehule 
> wrote:
>
> Isn't possible in this case push equivalence before aggregation?
>
>
> If I'm understanding you correctly, that would lead to wrong results.
> Here's a simple example:
>
> CREATE VIEW v AS SELECT MIN(a) m FROM t;
>
> and table T contains:
>
> T:A
> ---
> 1
> 2
> 3
>
> SELECT * FROM v WHERE m = 2
>
> The minimum value of A is 1, so the query should return no rows.
>
> However, if we filter first we'd be effectively doing the query:
>
> SELECT MIN(a) m FROM
>(SELECT a FROM t WHERE a=2) AS v(a)
>
> The subquery is going to return an intermediate result of:
>
> V:A
> ---
> 2
>
> And the minimum of that is 2, which is the wrong answer.
>

yes, you have true,

thank you for correcting

Regards

Pavel


>
> - Doug
> Salesforce
>


Re: [HACKERS] patch: function xmltable

2016-11-18 Thread Pavel Stehule
2016-11-19 0:42 GMT+01:00 Alvaro Herrera :

> The SQL standard seems to require a comma after the XMLNAMESPACES
> clause:
>
>  ::=
>   XMLTABLE 
>   [   ]
> 
>   [  ]
> COLUMNS  
>
> I don't understand the reason for that, but I have added it:
>
> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
> ')' ',' c_expr xmlexists_argument ')'
> {
> TableExpr *n = makeNode(TableExpr);
> n->row_path = $8;
> n->expr = $9;
> n->cols = NIL;
> n->namespaces = $5;
> n->location = @1;
> $$ = (Node *)n;
> }
> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
> ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
> {
> TableExpr *n = makeNode(TableExpr);
> n->row_path = $8;
> n->expr = $9;
> n->cols = $11;
> n->namespaces = $5;
> n->location = @1;
> $$ = (Node *)n;
> }
> ;
>
>
yes, looks my oversight - it is better



> Another thing I did was remove the TableExprColOptionsOpt production; in
> its place I added a third rule in TableExprCol for "ColId Typename
> IsNotNull" (i.e. no options).  This seems to reduce the size of the
> generated gram.c a good dozen kB.
>

If I remember well - this was required by better compatibility with Oracle

ANSI SQL: colname type DEFAULT PATH
Oracle: colname PATH DEFAULT

My implementation allows both combinations - there are two reasons: 1. one
less issue when people does port from Oracle, 2. almost all examples of
XMLTABLE on a net are from Oracle - it can be unfriendly, when these
examples would not work on PG - there was discussion about this issue in
this mailing list


>
>
> I didn't like much the use of c_expr in all these productions.  As I
> understand it, c_expr is mostly an implementation artifact and we should
> be using a_expr or b_expr almost everywhere.  I see that XMLEXISTS
> already expanded the very limited use of c_expr there was; I would
> prefer to fix that one too rather than replicate it here.  TBH I'm not
> sure I like that XMLTABLE is re-using xmlexists_argument.
>

There are two situations: c_expr as document content, and c_expr after
DEFAULT and PATH keywords. First probably can be fixed, second not, because
"PATH" is unreserved keyword only.


>
> Actually, is the existing XMLEXISTS production correct?  What I see in
> the standard is
>
>  ::= 
>
>  ::=
>PASSING  
>  [ {   }... ]
>
>  ::= 
>
>  ::=  [ {
>   }... ]
>
>  ::=
> 
>   | 
>
>  ::=  FOR ORDINALITY
>
>  ::=
>   [  ]
>   [  ]
>   [ PATH  ]
>
>  ::= 
>
> so I think this resolves "PASSING BY {REF,VALUE} ",
> but what
> we have in gram.y is:
>
> /* We allow several variants for SQL and other compatibility. */
> xmlexists_argument:
> PASSING c_expr
> {
> $$ = $2;
> }
> | PASSING c_expr BY REF
> {
> $$ = $2;
> }
> | PASSING BY REF c_expr
> {
> $$ = $4;
> }
> | PASSING BY REF c_expr BY REF
> {
> $$ = $4;
> }
> ;
>
> I'm not sure why we allow "PASSING c_expr" at all.  Maybe if BY VALUE/BY
> REF is not specified, we should just not have PASSING at all?
>
>
If we extended this for XMLEXISTS for compatibility with some other
> product, perhaps we should look into what that product supports for
> XMLTABLE; maybe XMLTABLE does not need all the same options as
> XMLEXISTS.
>
>
The reason is a compatibility with other products - DB2. XMLTABLE uses same
options like XMLEXISTS. These options has zero value for Postgres, but its
are important - compatibility, and workable examples.


> The fourth option seems very dubious to me.  This was added by commit
> 641459f26, submitted here:
> https://www.postgresql.org/message-id/4c0f6dbf.9000...@mlfowler.com
>
> ... Hm, actually some perusal of the XMLEXISTS predicate in the standard
> shows that it's quite a different thing from XMLTABLE.  Maybe we
> shouldn't reuse xmlexists_argument her

Re: [HACKERS] patch: function xmltable

2016-11-18 Thread Pavel Stehule
2016-11-19 5:19 GMT+01:00 Pavel Stehule :

>
>
> 2016-11-19 0:42 GMT+01:00 Alvaro Herrera :
>
>> The SQL standard seems to require a comma after the XMLNAMESPACES
>> clause:
>>
>>  ::=
>>   XMLTABLE 
>>   [   ]
>> 
>>   [  ]
>> COLUMNS  
>>
>> I don't understand the reason for that, but I have added it:
>>
>> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
>> ')' ',' c_expr xmlexists_argument ')'
>> {
>> TableExpr *n =
>> makeNode(TableExpr);
>> n->row_path = $8;
>> n->expr = $9;
>> n->cols = NIL;
>> n->namespaces = $5;
>> n->location = @1;
>> $$ = (Node *)n;
>> }
>> | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList
>> ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
>> {
>> TableExpr *n =
>> makeNode(TableExpr);
>> n->row_path = $8;
>> n->expr = $9;
>> n->cols = $11;
>> n->namespaces = $5;
>> n->location = @1;
>> $$ = (Node *)n;
>> }
>> ;
>>
>>
> yes, looks my oversight - it is better
>
>
>
>> Another thing I did was remove the TableExprColOptionsOpt production; in
>> its place I added a third rule in TableExprCol for "ColId Typename
>> IsNotNull" (i.e. no options).  This seems to reduce the size of the
>> generated gram.c a good dozen kB.
>>
>
> If I remember well - this was required by better compatibility with Oracle
>
> ANSI SQL: colname type DEFAULT PATH
> Oracle: colname PATH DEFAULT
>
> My implementation allows both combinations - there are two reasons: 1. one
> less issue when people does port from Oracle, 2. almost all examples of
> XMLTABLE on a net are from Oracle - it can be unfriendly, when these
> examples would not work on PG - there was discussion about this issue in
> this mailing list
>
>
>>
>>
>> I didn't like much the use of c_expr in all these productions.  As I
>> understand it, c_expr is mostly an implementation artifact and we should
>> be using a_expr or b_expr almost everywhere.  I see that XMLEXISTS
>> already expanded the very limited use of c_expr there was; I would
>> prefer to fix that one too rather than replicate it here.  TBH I'm not
>> sure I like that XMLTABLE is re-using xmlexists_argument.
>>
>
> There are two situations: c_expr as document content, and c_expr after
> DEFAULT and PATH keywords. First probably can be fixed, second not, because
> "PATH" is unreserved keyword only.
>

It is not possible PASSING is unreserved keyword too.

Regards

Pavel

>
>
>>
>> Actually, is the existing XMLEXISTS production correct?  What I see in
>> the standard is
>>
>>  ::= 
>>
>>  ::=
>>PASSING  
>>  [ {   }... ]
>>
>>  ::= 
>>
>>  ::=  [ {
>>   }... ]
>>
>>  ::=
>> 
>>   | 
>>
>>  ::=  FOR ORDINALITY
>>
>>  ::=
>>   [  ]
>>   [  ]
>>   [ PATH  ]
>>
>>  ::= 
>>
>> so I think this resolves "PASSING BY {REF,VALUE} ",
>> but what
>> we have in gram.y is:
>>
>> /* We allow several variants for SQL and other compatibility. */
>> xmlexists_argument:
>> PASSING c_expr
>> {
>> $$ = $2;
>> }
>> | PASSING c_expr BY REF
>> {
>> $$ = $2;
>> }
>> | PASSING BY REF c_expr
>> {
>> $$ = $4;
>> }
>> | PASSING BY REF c_expr BY REF
>> {
>> $$ = $4;
>> }
>> ;
>>
>> I'm not sure why we allow "PASSING c_expr" at all.  Maybe if BY VALUE/BY
>> REF is not specified, we should just not have PASSING at all?
>>
>>
> If we extended this for XMLEXISTS for compatibility with some other
>> product, perhaps we should look into what that product supports for
>> XMLTABLE; maybe XMLTABLE does not need all the same options as
>> XMLEXISTS.
>>
>>
> The reason is a compatibility with other products - DB2. XMLTABLE uses
> same options like XMLEXISTS. These options has zero value for Postgres, but
> its are important - compatibility, and workable examples.
>
>
>> The fourth option seems very dubious to me.  This was added by co

Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress

2016-11-18 Thread Michael Paquier
On Fri, Nov 18, 2016 at 7:00 PM, Amit Kapila  wrote:
> Okay, I have done some performance tests with this patch and found that it 
> doesn't have any noticeable impact which is good.  Details of performance 
> tests is below:
> Machine configuration:
> 2 sockets, 28 cores (56 including Hyper-Threading)
> RAM = 64GB
> Data directory is configured on the magnetic disk and WAL on SSD.

Nice spec!

> The conclusion from my tests is that this patch is okay as far as performance 
> is concerned.

Thank you a lot for doing those additional tests!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-18 Thread Michael Paquier
On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas  wrote:
> So now I think that we probably need to make this logic a bit smarter.
> Add all of the OIDs that need to be dropped to a list.  Then have a
> loop prior to the main loop (where it says "Perform operations on
> collected tables.") which iterates over that list and drops those
> tables one by one, starting a transaction every (say) 100 tables or
> after an error.  For bonus points, if a transaction fails, put all of
> the OIDs except the one that provoked the failure back into the list
> of OIDs to be dropped, so that we still make a progress even if some
> DROPs are failing for some reason.

Okay.

> That might sound adding unnecessary work just for the sake of
> paranoia, but I don't think it is.  Failures here won't be common, but
> since they are entirely automated there will be no human intelligence
> available to straighten things out.  Barring considerable caution,
> we'll just enter a flaming death spiral.

Thinking more paranoid, an extra way to enter in this flaming death
spiral is to not limit the maximum number of failures authorized when
dropping a set of orphaned tables and transactions fail multiple
times. This is basically not important as the relation on which the
drop failed gets dropped from the list but failing on each one of them
is a good way to slow down autovacuum, so putting a limit of say 10
transactions failing is I think really important.

I have played with what you suggested, and finished with the patch
attached. I have run some tests using this function to create some
temp tables with several backends to be sure that multiple backend IDs
are used:

CREATE FUNCTION create_temp_tables(i int) RETURNS void
AS $$
BEGIN
FOR i IN 1..i LOOP
  EXECUTE 'CREATE TEMP TABLE aa' || i || ' (a int);';
END LOOP;
END
$$ LANGUAGE plpgsql;

Then I killed the instance. At restart I could see a bunch of temp
tables in pg_class, and I let autovacuum do the cleanup after restart.
I have tested as well the error code path in the PG_TRY() block by
enforcing manually a elog(ERROR) to be sure that the maximum number of
failures is correctly handled, better safe than sorry.
-- 
Michael


autovacuum-orphan-cleanup-v3.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-11-18 Thread Amit Kapila
On Mon, Sep 5, 2016 at 8:42 AM, Tsunakawa, Takayuki
 wrote:
>
> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tsunakawa,
> > Our customer hit a problem of cascading replication, and we found the cause.
> > They are using the latest PostgreSQL 9.2.18.  The bug seems to have been
> > fixed in 9.4 and higher during the big modification of xlog.c, but it's
> > not reflected in older releases.
> >
> > The attached patch is for 9.2.18.  This just borrows the idea from 9.4 and
> > higher.
> >
> > But we haven't been able to reproduce the problem.  Could you review the
> > patch and help to test it?  I would very much appreciate it if you could
> > figure out how to reproduce the problem easily.
>
> We could successfully reproduce the problem and confirm that the patch fixes 
> it.  Please use the attached script to reproduce the problem.
>

I have tried using attached script multiple times on latest 9.2 code,
but couldn't reproduce the issue.  Please find the log attached with
this mail.  Apart from log file, below prints appear:

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
20075/20075 kB (100%), 1/1 tablespace
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
20079/20079 kB (100%), 1/1 tablespace

Let me know, if some parameters need to be tweaked to reproduce the issue?


It seems that the patch proposed is good, but it is better if somebody
other than you can reproduce the issue and verify if the patch fixes
the same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


test.log
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to change order sort of table in HashJoin

2016-11-18 Thread Man Trieu
Hi Experts,

As in the example below, i think the plan which hash table is created on
testtbl2 (the fewer tuples) should be choosen.
Because creating of hash table should faster in testtbl2. But it did not.

I have tried to change the ordering of table by tuning parameter even if
using pg_hint_plan but not success.

Why does planner do not choose the plan which hash table is created on
testtbl2 (which can take less time)?
And how to change the order?

# I also confirm planner info by rebuild postgresql but not found related
usefull info about hash table

---
postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text,
primary key (c1,c2,c3));
CREATE TABLE
postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text,
primary key (c1,c2,c3));
CREATE TABLE
postgres=# insert into testtbl1 select
generate_series(1,100),random()::text,random()::text,random()::text;
INSERT 0 100
postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
INSERT 0 142857

postgres=# explain analyze select * from testtbl1 inner join testtbl2
using(c1,c2,c3);
   QUERY PLAN
-
 Hash Join  (cost=38775.00..47171.72 rows=1 width=59) (actual
time=1120.824..1506.236 rows=142857 loops=1)
   Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
AND (testtbl2.c3 = testtbl1.c3))
   ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56)
(actual time=0.008..27.964 rows=142857 loops=1)
   ->  Hash  (cost=21275.00..21275.00 rows=100 width=55) (actual
time=1120.687..1120.687 rows=100 loops=1)
 Buckets: 131072  Batches: 1  Memory Usage: 89713kB
 ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=100
width=55) (actual time=0.035..458.522 rows=100 loops=1)
 Planning time: 0.922 ms
 Execution time: 1521.258 ms
(8 rows)

postgres=# set pg_hint_plan.enable_hint to on;
SET
postgres=# /*+
postgres*# HashJoin(testtbl1 testtbl2)
postgres*# Leading(testtbl1 testtbl2)
postgres*# */
postgres-# explain analyze select * from testtbl1 inner join testtbl2
using(c1,c2,c3);
   QUERY PLAN
-
 Hash Join  (cost=48541.00..67352.86 rows=1 width=59) (actual
time=1220.625..1799.709 rows=142857 loops=1)
   Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
AND (testtbl2.c3 = testtbl1.c3))
   ->  Seq Scan on testtbl2  (cost=0.00..3039.57 rows=142857 width=56)
(actual time=0.011..58.649 rows=142857 loops=1)
   ->  Hash  (cost=21275.00..21275.00 rows=100 width=55) (actual
time=1219.295..1219.295 rows=100 loops=1)
 Buckets: 8192  Batches: 32  Memory Usage: 2851kB
 ->  Seq Scan on testtbl1  (cost=0.00..21275.00 rows=100
width=55) (actual time=0.021..397.583 rows=100 loops=1)
 Planning time: 3.971 ms
 Execution time: 1807.710 ms
(8 rows)

postgres=#
---


Thanks and best regard!