date:20171106

Re: [HACKERS] Flexible configuration for full-text search

2017-11-06 Thread Aleksandr Parfenov

On Mon, 6 Nov 2017 18:05:23 +1300
Thomas Munro  wrote:

> On Sat, Oct 21, 2017 at 1:39 AM, Aleksandr Parfenov
>  wrote:
> > In attachment updated patch with fixes of empty XML tags in
> > documentation.  
> 
> Hi Aleksandr,
> 
> I'm not sure if this is expected at this stage, but just in case you
> aren't aware, with this version of the patch the binary upgrade test
> in
> src/bin/pg_dump/t/002_pg_dump.pl fails for me:
> 
> #   Failed test 'binary_upgrade: dumps ALTER TEXT SEARCH CONFIGURATION
> dump_test.alt_ts_conf1 ...'
> #   at t/002_pg_dump.pl line 6715.
> 

Hi Thomas,

Thank you for noticing it. I will investigate it during work on next
version of patch.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-11-06 Thread Amit Langote

On 2017/11/07 14:40, Amit Khandekar wrote:
> On 7 November 2017 at 00:33, Robert Haas  wrote:
> 
>> Also, +1 for Amit Langote's idea of trying to merge
>> mt_perleaf_childparent_maps with mt_persubplan_childparent_maps.
> 
> Currently I am trying to see if it simplifies things if we do that. We
> will be merging these arrays into one, but we are adding a new int[]
> array that maps subplans to leaf partitions. Will get back with how it
> looks finally.

One thing to note is that the int[] array I mentioned will be much faster
to compute than going to convert_tuples_by_name() to build the additional
maps array.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-11-06 Thread Amit Khandekar

On 7 November 2017 at 00:33, Robert Haas  wrote:

> Also, +1 for Amit Langote's idea of trying to merge
> mt_perleaf_childparent_maps with mt_persubplan_childparent_maps.

Currently I am trying to see if it simplifies things if we do that. We
will be merging these arrays into one, but we are adding a new int[]
array that maps subplans to leaf partitions. Will get back with how it
looks finally.

Robert, Amit , I will get back with your other review comments.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planning counters in pg_stat_statements

2017-11-06 Thread Tsunakawa, Takayuki

From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Thomas Munro
> I have often wanted $SUBJECT and was happy to find that Fujii-san had posted
> a patch five years ago[1].  The reception then seemed positive.
> So here is a refurbished and (hopefully) improved version of his patch with
> a new column for the replan count.  Thoughts?

That's a timely proposal.  I sometimes faced performance problems where the 
time pg_stat_statements shows is much shorter than the application perceives.  
The latest experience was that the execution time of a transaction, which 
consists of dozens of DMLs and COMMIT, was about 200ms from the application's 
perspective, while pg_stat_statements showed only about 10ms in total.  The 
network should not be the cause because the application ran on the same host as 
the database server.  I wanted to know how long the parsing and planning time 
was.

BTW, the current pg_stat_statement shows unexpected time for COMMIT.  I expect 
it to include the whole COMMIT processing, including the long WAL flush and 
sync rep wait.  However, it only shows the time for the transaction state 
change in memory.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Planning counters in pg_stat_statements

2017-11-06 Thread Thomas Munro

Hi hackers,

I have often wanted $SUBJECT and was happy to find that Fujii-san had
posted a patch five years ago[1].  The reception then seemed positive.
So here is a refurbished and (hopefully) improved version of his patch
with a new column for the replan count.  Thoughts?

Example output:

 query  | plans | plan_time | calls | total_time
+---+---+---+
 prepare x as select $1 | 1 | 0.026 |12 |   0.06
 select substr(query, $1, $2),  |11 | 1.427 |11 |  3.565
 prepare y as select * from foo | 2 | 7.336 | 5 |  0.331

I agree with the sentiment on the old thread that
{total,min,max,mean,stddev}_time now seem badly named, but adding
"execution" makes them so long...  Thoughts?

[1] 
https://www.postgresql.org/message-id/CAHGQGwFx_%3DDO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


pg-stat-statements-planning-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LDAP URI decoding bugs

2017-11-06 Thread Michael Paquier

On Fri, Nov 3, 2017 at 12:57 PM, Thomas Munro
 wrote:
> 1.  If you set up a pg_hba.conf with a URL that lacks a base DN or
> hostname, hba.c will segfault on startup when it tries to pstrdup a
> null pointer.  Examples: ldapurl="ldap://localhost; and
> ldapurl="ldap://;.
>
> 2.  If we fail to bind but have no binddn configured, we'll pass NULL
> to ereport (snprint?) for %s, which segfaults on some libc
> implementations.  That crash requires more effort to reproduce but you
> can see pretty clearly a few lines above in auth.c that it can be
> NULL.  (I'm surprised Coverity didn't complain about that.  Maybe it
> can't see this code due to macros.)

Good question. Indeed Coverity did not complain here, perhaps because
the compiled build is not using openldap?

> Please see attached.

Oops. So...

-hbaline->ldapserver = pstrdup(urldata->lud_host);
+if (urldata->lud_host)
+hbaline->ldapserver = pstrdup(urldata->lud_host);
This prevents the backend to blow up on ldap://.

-   hbaline->ldapbasedn = pstrdup(urldata->lud_dn);
+   if (urldata->lud_dn)
+   hbaline->ldapbasedn = pstrdup(urldata->lud_dn);
And this prevents the crash on ldap://localhost.

-port->hba->ldapbinddn, port->hba->ldapserver,
+port->hba->ldapbinddn ? port->hba->ldapbinddn : "",
+port->hba->ldapserver,
ldapserver should never be NULL thanks to the check on
MANDATORY_AUTH_ARG in parse_hba_line(), still I would tend to be
maniak and do the same check as for ldapbinddn. That feels safer
thinking long-term.

Please note that I have added as well an entry in the next CF to avoid
that bug falling into oblivion:
https://commitfest.postgresql.org/16/1372/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix a typo in dsm_impl.c

2017-11-06 Thread Masahiko Sawada

Hi,

Attached the patch for $subject.

s/reamin/remain/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_dsm_impl_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - pg_dump: flag to suppress output of SQL comments

2017-11-06 Thread Tom Lane

Malcolm Locke  writes:
> Would a patch to add a flag to pg_dump to suppress the output of SQL
> comments be likely to be accepted?

Not unless you can come up with a better rationale than this:

> The SQL generated by pg_dump seems to be fairly constant between
> Postgres versions, however the structure of the SQL comments in the
> dumps changes quite frequently between Postgres versions.  This creates
> a lot of churn on these structure files, unrelated to actual changes in
> the database structure, in our VCS when developers are using different
> versions of Postgres.

That seems like complete nonsense; we don't change the comments more
frequently than other aspects of pg_dump's output, in fact probably
much less often.

Just to make sure I'm not totally mistaken about this, I diffed the
results from pg_dump 9.2 through HEAD dumping the same 9.2 database.
I do see a couple of rounds of comment changes, but there are also two
different rounds of changes in dump order, a round of changes in layout
of view/rule reverse-compilations, a round of changes in the
schema-qualification of ALTER TYPE OWNER commands, multiple changes in
the dump header (particularly the collection of SET commands there), and
assorted changes in minor details of syntax.  And the particular test
database I was using (the 9.2 regression database) doesn't even trigger
some other changes that have been made, such as how to break circular
dependencies involving views.  We're not going to introduce
backwards-compatibility options for that sort of stuff (I hope), so
I do not think that a switch of this sort is really going to produce
the end result you're wishing for.

You might be able to standardize things a bit better if you could get
all your developers to use the same late-model version of pg_dump
while producing output to go into the VCS.  That won't be a 100%
solution, because some of the version-specific output is generated
on the backend side, but I bet it would improve matters a lot.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] A hook for session start

2017-11-06 Thread Michael Paquier

On Sun, Nov 5, 2017 at 3:14 AM, Fabrízio de Royes Mello
 wrote:
> On Sat, Nov 4, 2017 at 1:23 AM, Michael Paquier 
> wrote:
>> On Fri, Nov 3, 2017 at 1:55 PM, Fabrízio de Royes Mello
>>  wrote:
>> >> Passing the database name and user name does not look much useful to
>> >> me. You can have access to this data already with CurrentUserId and
>> >> MyDatabaseId.
>> >
>> > This way we don't need to convert oid to names inside hook code.
>>
>> Well, arguments of hooks are useful if they are used. Now if I look at
>> the two examples mainly proposed in this thread, be it in your set of
>> patches or the possibility to do some SQL transaction, I see nothing
>> using them. So I'd vote for keeping an interface minimal.
>>
>
> Maybe the attached patch with improved test module can illustrate better the
> feature.

I was going to to hack something like that. That's interesting for the
use case Robert has mentioned.

Well, in the case of the end session hook, those variables are passed
to the hook by being directly taken from the context from MyProcPort:
+   (*session_end_hook) (MyProcPort->database_name,
MyProcPort->user_name);
In the case of the start hook, those are directly taken from the
command outer caller, but similarly MyProcPort is already set, so
those are strings available (your patch does so in the end session
hook)... Variables in hooks are useful if those are not available
within the memory context, and refer to a specific state where the
hook is called. For example, take the password hook, this uses the
user name and the password because those values are not available
within the session context. The same stands for other hooks as well.
Keeping the interface minimal helps in readability and maintenance.
See for the attached example that can be applied on top of 0003, which
makes use of the session context, the set regression tests does not
pass but this shows how I think those hooks had better be shaped.

+   (*session_end_hook) (MyProcPort->database_name, MyProcPort->user_name);
+
+   /* Make don't leave any active transactions and/or locks behind */
+   AbortOutOfAnyTransaction();
+   LockReleaseAll(USER_LOCKMETHOD, true);
Let's leave this work to people actually implementing the hook contents.
-- 
Michael


session_hook_simplify.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-06 Thread Amit Kapila

On Mon, Nov 6, 2017 at 7:40 PM, Paul Ramsey  wrote:
> From my perspective, this is much much better. For sufficiently large
> tables, I get parallel behaviour without jimmying with the defaults on
> parallel_setup_cost and parallel_tuple_cost. *And*, the parallel behaviour
> *is* sensitive to the costs of functions in target lists, so reasonably
> chosen costs will flip us into a parallel mode for expensive functions
> against smaller tables too.
>

Thanks for the confirmation.

> Hopefully some variant of this finds it's way into core! Is there any way I
> can productively help?

You have already helped a lot by providing the use case, but feel free
to ping on that thread if you find it is not moving.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-06 Thread Amit Kapila

On Mon, Nov 6, 2017 at 7:05 PM, Robert Haas  wrote:
> On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila  wrote:
>> On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas  wrote:
>>> This looks like it's on the right track to me.  I hope Tom will look
>>> into it, but if he doesn't I may try to get it committed myself.
>>>
>>> -if (rel->reloptkind == RELOPT_BASEREL)
>>> -generate_gather_paths(root, rel);
>>> +if (rel->reloptkind == RELOPT_BASEREL &&
>>> +root->simple_rel_array_size > 2 &&
>>> +!root->append_rel_list)
>>>
>>> This test doesn't look correct to me.  Actually, it doesn't look
>>> anywhere close to correct to me.  So, one of us is very confused...
>>> not sure whether it's you or me.
>>>
>> It is quite possible that I haven't got it right, but it shouldn't be
>> completely bogus as it stands the regression tests and some manual
>> verification.  Can you explain what is your concern about this test?
>
> Well, I suppose that test will fire for a baserel when the total
> number of baserels is at least 3 and there's no inheritance involved.
> But if there are 2 baserels, we're still not the topmost scan/join
> target.
>

No, if there are 2 baserels, then simple_rel_array_size will be 3.
The simple_rel_array_size is always the "number of relations" plus
"one".  See setup_simple_rel_arrays.

>  Also, even if inheritance is used, we might still be the
> topmost scan/join target.
>

Sure, but in that case, it won't generate the gather path here (due to
this part of check "!root->append_rel_list").  I am not sure whether I
have understood the second part of your question, so if my answer
appears inadequate, then can you provide more details on what you are
concerned about?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread Amit Langote

On 2017/11/06 21:52, David Rowley wrote:
> On 6 November 2017 at 23:01, Amit Langote  
> wrote:
>> OK, I have gotten rid of the min/max partition index interface and instead
>> adopted the bms_add_range() approach by including your patch to add the
>> same in the patch set (which is now 0002 in the whole set).  I have to
>> admit that it's simpler to understand the new code with just Bitmapsets to
>> look at, but I'm still a bit concerned about materializing the whole set
>> right within partition.c, although we can perhaps optimize it later.
> 
> Thanks for making that change. The code looks much more simple now.
> 
> For performance, if you're worried about a very large number of
> partitions, then I think you're better off using bms_next_member()
> rather than bms_first_member(), (likely this applies globally, but you
> don't need to worry about those).
> 
> The problem with bms_first_member is that it must always loop over the
> 0 words before it finds any bits set for each call, whereas
> bms_next_member will start on the word it was last called for. There
> will likely be a pretty big performance difference between the two
> when processing a large Bitmapset.

Ah, thanks for the explanation.  I will change it to bms_next_member() in
the next version.

>> Attached updated set of patches, including the fix to make the new pruning
>> code handle Boolean partitioning.
> 
> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

Thank you.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-11-06 Thread Michael Paquier

On Tue, Nov 7, 2017 at 7:27 AM, Asim Praveen  wrote:
> On Mon, Oct 2, 2017 at 6:48 PM, Michael Paquier 
> wrote:
>> Jacob, here are some ideas to make this thread move on. I would
>> suggest to produce a set of patches that do things incrementally:
>> 1) One patch that changes the calls of PageGetLSN to
>> BufferGetLSNAtomic which are now not appropriate. You have spotted
>> some of them in the btree and gist code, but not all based on my first
>> lookup. There is still one in gistFindCorrectParent and one in btree
>> _bt_split. The monitoring of the other calls (sequence.c and
>> vacuumlazy.c) looked safe. There is another one in XLogRecordAssemble
>> that should be fixed I think.
>
> Thank you for your suggestions.  Please find the first patch attached as
> "0001-...".  We verified both, gistFindCorrectParent and _bt_split and all
> calls to PageGetLSN are made with exclusive lock on the buffer contents
> held.

Cool. Thanks for double-checking. XLogRecordAssemble() is fine after
more lookup at this code, XLogRegisterBuffer already doing some sanity
checks.

>> 2) A second patch that strengthens a bit checks around
>> BufferGetLSNAtomic. One idea would be to use LWLockHeldByMe, as you
>> are originally suggesting.
>> A comment could be as well added in bufpage.h for PageGetLSN to let
>> users know that it should be used carefully, in the vein of what is
>> mentioned in src/backend/access/transam/README.
>
> The second patch "0002-..." does the above.  We have a comment added to
> AssertPageIsLockedForLSN as suggested.

Did you really test WAL replay? This still ignores that PageGetLSN is
as well taken in some code paths, like recovery, where actions on the
page are guaranteed to be serialized, like during recovery, so this
patch would cause the system to blow up. Note that pageinspect,
amcheck and wal_consistency_checking also process on page copies. So
the assertion failure of 0002 would trigger in those cases.

> The assertion added caught at least one code path where TestForOldSnapshot
> calls PageGetLSN without holding any lock.  The snapshot_too_old test in
> "check-world" failed due to the assertion failure.  This needs to be fixed,
> see the open question in the opening mail on this thread.

Good point. I am looping Kevin Grittner here for his input, as the
author and committer of old_snapshot_threshold. Things can be
addressed with a separate patch by roughly moving the check on
PageGetLSN() to TestForOldSnapshot_impl() and use BufferGetLSNAtomic()
instead.

The commit fest has lost view of this entry, and so did I. So I have
added a new entry:
https://commitfest.postgresql.org/16/1371/

BufferGetLSNAtomic() could really use LWLockHeldByMe(). Could you
consider it with an extra patch on top of 0001?

It seems to me that 0001 is good for a committer lookup, that will get
rid of all existing bugs. For 0002, what you are proposing is still
not a good idea for anything using page copies. Here are some
suggestions:
- Implement a PageGetLSNFromCopy, dedicated at working correctly when
working on a page copy. Then switch callers of amcheck, pageinspect
and wal_consistency_checking to use that.
- Implement something like GetLSNFromLockedPage, and switch of
backend's PageGetLSN to that. Performance impact could be seen..
- Have a PageGetLSNSafe, which can be used safely for serialized actions.
It could be an idea to remove PageGetLSN to force a breakage of
extensions calling it, so as they would review any of its calls. Not a
fan of that though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread David Rowley

On 7 November 2017 at 01:52, David Rowley  wrote:
> Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

I have a little more review to share:

1. Missing "in" in comment. Should be "mentioned in"

 * get_append_rel_partitions
 * Return the list of partitions of rel that pass the clauses mentioned
 * rel->baserestrictinfo

2. Variable should be declared in inner scope with the following fragment:

void
set_basic_child_rel_properties(PlannerInfo *root,
   RelOptInfo *rel,
   RelOptInfo *childrel,
   AppendRelInfo *appinfo)
{
AttrNumber attno;

if (rel->part_scheme)
{

which makes the code the same as where you moved it from.

3. Normally lfirst() is assigned to a variable at the start of a
foreach() loop. You have code which does not follow this.

foreach(lc, clauses)
{
Expr   *clause;
int i;

if (IsA(lfirst(lc), RestrictInfo))
{
RestrictInfo *rinfo = lfirst(lc);

You could assign this to a Node * since the type is unknown to you at
the start of the loop.

4.
/*
* Useless if what we're thinking of as a constant is actually
* a Var coming from this relation.
*/
if (bms_is_member(rel->relid, constrelids))
continue;

should this be moved to just above the op_strict() test? This one seems cheaper.

5. Typo "paritions": /* No clauses to prune paritions, so scan all
partitions. */

But thinking about it more the comment should something more along the
lines of /* No useful clauses for partition pruning. Scan all
partitions. */

The key difference is that there might be clauses, just without Consts.

Actually, the more I look at get_append_rel_partitions() I think it
would be better if you re-shaped that if/else if test so that it only
performs the loop over the partindexes if it's been set.

I ended up with the attached version of the function after moving
things around a little bit.

I'm still reviewing but thought I'd share this part so far.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
/*
 * get_append_rel_partitions
 *		Return the list of partitions of rel that pass the clauses mentioned
 *		in rel->baserestrictinfo. An empty list is returned if no matching
 *		partitions were found.
 *
 * Returned list contains the AppendRelInfos of chosen partitions.
 */
static List *
get_append_rel_partitions(PlannerInfo *root,
		  RelOptInfo *rel,
		  RangeTblEntry *rte)
{
	List   *partclauses;
	bool	contains_const,
			constfalse;

	/*
	 * Get the clauses that match the partition key, including information
	 * about any nullness tests against partition keys.  Set keynullness to
	 * a invalid value of NullTestType, which 0 is not.
	 */
	partclauses = match_clauses_to_partkey(rel,
		   list_copy(rel->baserestrictinfo),
		   _const,
		   );

	if (!constfalse)
	{
		Relation		parent = heap_open(rte->relid, NoLock);
		PartitionDesc	partdesc = RelationGetPartitionDesc(parent);
		Bitmapset	   *partindexes;
		List		   *result = NIL;
		inti;

		/*
		 * If we have matched clauses that contain at least one constant
		 * operand, then use these to prune partitions.
		 */
		if (partclauses != NIL && contains_const)
			partindexes = get_partitions_from_clauses(parent, partclauses);

		/*
		 * Else there are no clauses that are useful to prune any paritions,
		 * so we must scan all partitions.
		 */
		else
			partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1);

		/* Fetch the partition appinfos. */
		i = -1;
		while ((i = bms_next_member(partindexes, i)) >= 0)
		{
			AppendRelInfo *appinfo = rel->part_appinfos[i];

#ifdef USE_ASSERT_CHECKING
			RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root);

			/*
			 * Must be the intended child's RTE here, because appinfos are ordered
			 * the same way as partitions in the partition descriptor.
			 */
			Assert(partdesc->oids[i] == rte->relid);
#endif

			result = lappend(result, appinfo);
		}

		/* Record which partitions must be scanned. */
		rel->live_part_appinfos = result;

		heap_close(parent, NoLock);

		return result;
	}

	return NIL;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal - pg_dump: flag to suppress output of SQL comments

2017-11-06 Thread Malcolm Locke

Hello Hackers,

Would a patch to add a flag to pg_dump to suppress the output of SQL
comments be likely to be accepted?


So for example `pg_dump`:

  --
  -- Name: foos; Type: TABLE; Schema: public; Owner: -
  --

  CREATE TABLE foos (
  ...


With `pg_dump --no-sql-comments` Would become:

  CREATE TABLE foos (
  ...


The rationale behind this is that we use schema dumps committed to VCS
to ensure database structures are synchronised among teams of
developers.

The SQL generated by pg_dump seems to be fairly constant between
Postgres versions, however the structure of the SQL comments in the
dumps changes quite frequently between Postgres versions.  This creates
a lot of churn on these structure files, unrelated to actual changes in
the database structure, in our VCS when developers are using different
versions of Postgres.  Note this is all via Ruby on Rails so we are not
the only users affected.

We could strip comments after the dump has been generated but this is
not ideal as without parsing the dump file we can't know if a line
beginning with -- is a comment or a string literal.

I'm happy to have a crack at a patch to pg_dump if it would be likely to
be accepted.

Cheers,

Malc


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Early locking option to parallel backup

2017-11-06 Thread Lucas B


Em 05/11/2017 21:09, Andres Freund escreveu:

On 2017-11-05 17:38:39 -0500, Robert Haas wrote:

On Sun, Nov 5, 2017 at 5:17 AM, Lucas  wrote:

The patch creates a "--lock-early" option which will make pg_dump to issue
shared locks on all tables on the backup TOC on each parallel worker start.
That way, the backup has a very small chance of failing. When it does,
happen in the first few seconds of the backup job. My backup scripts (not
included here) are aware of that and retries the backup in case of failure.


I wonder why we don't do this already ... and by default.


Well, the current approach afaics requires #relations * 2 locks, whereas
acquiring them in every worker would scale that with the number of
workers.  


Yes, that is why I proposed as an option. As an option will not affect 
anyone that does not want to use it.



IIUC the problem here is that even though a lock is already
held by the main backend an independent locker's request will prevent
the on-demand lock by the dump worker from being granted.  It seems to
me the correct fix here would be to somehow avoid the fairness logic in
the parallel dump case - although I don't quite know how to best do so.


It seems natural to think several connections in a synchronized snapshot 
as the same connection. Then it may be reasonable to grant a shared lock 
out of turn if any connection of the same shared snapshot already have a 
granted lock for the same relation. Last year Tom mentioned that there 
is already queue-jumping logic of that sort in the lock manager for 
other purposes. Although seems conceptually simple, I suspect the 
implementation is not.


On the other hand, the lock-early option is very simple and has no 
impact on anyone that does not want to use it.


---
Lucas





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

2017-11-06 Thread Юрий Соколов

2017-11-07 1:14 GMT+03:00 Claudio Freire :
>
> On Mon, Nov 6, 2017 at 6:58 PM, Юрий Соколов 
wrote:
> >
> > 2017-11-06 17:55 GMT+03:00 Claudio Freire :
> >>
> >> On Mon, Nov 6, 2017 at 11:50 AM, Юрий Соколов 
> >> wrote:
> >> >> Maybe leave a fallback to qsort if some corner case produces big
> >> >> buckets?
> >> >
> >> > For 8kb pages, each bucket is per 32 bytes. So, for heap pages it is
at
> >> > most 1 heap-tuple per bucket, and for index pages it is at most 2
index
> >> > tuples per bucket. For 32kb pages it is 4 heap-tuples and 8
index-tuples
> >> > per bucket.
> >> > It will be unnecessary overhead to call non-inlineable qsort in this
> >> > cases
> >> >
> >> > So, I think, shell sort could be removed, but insertion sort have to
> >> > remain.
> >> >
> >> > I'd prefer shell sort to remain also. It could be useful in other
places
> >> > also,
> >> > because it is easily inlinable, and provides comparable to qsort
> >> > performance
> >> > up to several hundreds of elements.
> >>
> >> I'd rather have an inlineable qsort.
> >
> > But qsort is recursive. It is quite hard to make it inlineable. And
still it
> > will be
> > much heavier than insertion sort (btw, all qsort implementations uses
> > insertion
> > sort for small arrays). And it will be heavier than shell sort for small
> > arrays.
>
> I haven't seen this trick used in postgres, nor do I know whether it
> would be well received, so this is more like throwing an idea to see
> if it sticks...
>
> But a way to do this without macros is to have an includable
> "template" algorithm that simply doesn't define the comparison
> function/type, it rather assumes it:
>
> qsort_template.h
>
> #define QSORT_NAME qsort_ ## QSORT_SUFFIX
>
> static void QSORT_NAME(ELEM_TYPE arr, size_t num_elems)
> {
> ... if (ELEM_LESS(arr[a], arr[b]))
> ...
> }
>
> #undef QSORT_NAME
>
> Then, in "offset_qsort.h":
>
> #define QSORT_SUFFIX offset
> #define ELEM_TYPE offset
> #define ELEM_LESS(a,b) ((a) < (b))
>
> #include "qsort_template.h"
>
> #undef QSORT_SUFFIX
> #undef ELEM_TYPE
> #undef ELEM_LESS
>
> Now, I realize this may have its cons, but it does simplify
> maintainance of type-specific or parameterized variants of
> performance-critical functions.
>
> > I can do specialized qsort for this case. But it will be larger bunch of
> > code, than
> > shell sort.
> >
> >> And I'd recommend doing that when there is a need, and I don't think
> >> this patch really needs it, since bucket sort handles most cases
> >> anyway.
> >
> > And it still needs insertion sort for buckets.
> > I can agree to get rid of shell sort. But insertion sort is necessary.
>
> I didn't suggest getting rid of insertion sort. But the trick above is
> equally applicable to insertion sort.

This trick is used in simplehash.h . I agree, it could be useful for qsort.
This will not make qsort inlineable, but will reduce overhead much.

This trick is too heavy-weight for insertion sort alone, though. Without
shellsort, insertion sort could be expressed in 14 line macros ( 8 lines
without curly braces). But if insertion sort will be defined together with
qsort (because qsort still needs it), then it is justifiable.

Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-11-06 Thread Asim Praveen

Hi Michael
On Mon, Oct 2, 2017 at 6:48 PM, Michael Paquier 
wrote:
>
> Jacob, here are some ideas to make this thread move on. I would
> suggest to produce a set of patches that do things incrementally:
> 1) One patch that changes the calls of PageGetLSN to
> BufferGetLSNAtomic which are now not appropriate. You have spotted
> some of them in the btree and gist code, but not all based on my first
> lookup. There is still one in gistFindCorrectParent and one in btree
> _bt_split. The monitoring of the other calls (sequence.c and
> vacuumlazy.c) looked safe. There is another one in XLogRecordAssemble
> that should be fixed I think.

Thank you for your suggestions.  Please find the first patch attached as
"0001-...".  We verified both, gistFindCorrectParent and _bt_split and all
calls to PageGetLSN are made with exclusive lock on the buffer contents
held.

> 2) A second patch that strengthens a bit checks around
> BufferGetLSNAtomic. One idea would be to use LWLockHeldByMe, as you
> are originally suggesting.
> A comment could be as well added in bufpage.h for PageGetLSN to let
> users know that it should be used carefully, in the vein of what is
> mentioned in src/backend/access/transam/README.

The second patch "0002-..." does the above.  We have a comment added to
AssertPageIsLockedForLSN as suggested.

The assertion added caught at least one code path where TestForOldSnapshot
calls PageGetLSN without holding any lock.  The snapshot_too_old test in
"check-world" failed due to the assertion failure.  This needs to be fixed,
see the open question in the opening mail on this thread.

Asim and Jacob

0001-Change-incorrect-calls-to-PageGetLSN-to-BufferGetLSN.patch
Description: Binary data

0002-PageGetLSN-assert-that-locks-are-properly-held.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

2017-11-06 Thread Claudio Freire

On Mon, Nov 6, 2017 at 6:58 PM, Юрий Соколов  wrote:
>
> 2017-11-06 17:55 GMT+03:00 Claudio Freire :
>>
>> On Mon, Nov 6, 2017 at 11:50 AM, Юрий Соколов 
>> wrote:
>> >> Maybe leave a fallback to qsort if some corner case produces big
>> >> buckets?
>> >
>> > For 8kb pages, each bucket is per 32 bytes. So, for heap pages it is at
>> > most 1 heap-tuple per bucket, and for index pages it is at most 2 index
>> > tuples per bucket. For 32kb pages it is 4 heap-tuples and 8 index-tuples
>> > per bucket.
>> > It will be unnecessary overhead to call non-inlineable qsort in this
>> > cases
>> >
>> > So, I think, shell sort could be removed, but insertion sort have to
>> > remain.
>> >
>> > I'd prefer shell sort to remain also. It could be useful in other places
>> > also,
>> > because it is easily inlinable, and provides comparable to qsort
>> > performance
>> > up to several hundreds of elements.
>>
>> I'd rather have an inlineable qsort.
>
> But qsort is recursive. It is quite hard to make it inlineable. And still it
> will be
> much heavier than insertion sort (btw, all qsort implementations uses
> insertion
> sort for small arrays). And it will be heavier than shell sort for small
> arrays.

I haven't seen this trick used in postgres, nor do I know whether it
would be well received, so this is more like throwing an idea to see
if it sticks...

But a way to do this without macros is to have an includable
"template" algorithm that simply doesn't define the comparison
function/type, it rather assumes it:

qsort_template.h

#define QSORT_NAME qsort_ ## QSORT_SUFFIX

static void QSORT_NAME(ELEM_TYPE arr, size_t num_elems)
{
... if (ELEM_LESS(arr[a], arr[b]))
...
}

#undef QSORT_NAME

Then, in "offset_qsort.h":

#define QSORT_SUFFIX offset
#define ELEM_TYPE offset
#define ELEM_LESS(a,b) ((a) < (b))

#include "qsort_template.h"

#undef QSORT_SUFFIX
#undef ELEM_TYPE
#undef ELEM_LESS

Now, I realize this may have its cons, but it does simplify
maintainance of type-specific or parameterized variants of
performance-critical functions.

> I can do specialized qsort for this case. But it will be larger bunch of
> code, than
> shell sort.
>
>> And I'd recommend doing that when there is a need, and I don't think
>> this patch really needs it, since bucket sort handles most cases
>> anyway.
>
> And it still needs insertion sort for buckets.
> I can agree to get rid of shell sort. But insertion sort is necessary.

I didn't suggest getting rid of insertion sort. But the trick above is
equally applicable to insertion sort.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

2017-11-06 Thread Юрий Соколов

2017-11-06 17:55 GMT+03:00 Claudio Freire :
>
> On Mon, Nov 6, 2017 at 11:50 AM, Юрий Соколов 
wrote:
> >> Maybe leave a fallback to qsort if some corner case produces big
buckets?
> >
> > For 8kb pages, each bucket is per 32 bytes. So, for heap pages it is at
> > most 1 heap-tuple per bucket, and for index pages it is at most 2 index
> > tuples per bucket. For 32kb pages it is 4 heap-tuples and 8 index-tuples
> > per bucket.
> > It will be unnecessary overhead to call non-inlineable qsort in this
cases
> >
> > So, I think, shell sort could be removed, but insertion sort have to
remain.
> >
> > I'd prefer shell sort to remain also. It could be useful in other places
> > also,
> > because it is easily inlinable, and provides comparable to qsort
performance
> > up to several hundreds of elements.
>
> I'd rather have an inlineable qsort.

But qsort is recursive. It is quite hard to make it inlineable. And still
it will be
much heavier than insertion sort (btw, all qsort implementations uses
insertion
sort for small arrays). And it will be heavier than shell sort for small
arrays.

I can do specialized qsort for this case. But it will be larger bunch of
code, than
shell sort.

> And I'd recommend doing that when there is a need, and I don't think
> this patch really needs it, since bucket sort handles most cases
> anyway.

And it still needs insertion sort for buckets.
I can agree to get rid of shell sort. But insertion sort is necessary.

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Peter Geoghegan


Simon Riggs  wrote:

In step 3 we discover that an entry exists in the index for a committed row.

Since we have a unique index we use it to locate the row we know
exists and UPDATE that.

We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already
violating MVCC for UPDATEs, so why does it matter if we do it for
INSERTs also?


Before I go on to say why I think that this approach is problematic, I
want to point out a few things that I think we actually agree on:

* EPQ is fairly arbitrary as a behavior for READ COMMITTED UPDATE
 conflict handling. It has more to do with how VACUUM works than about
 some platonic ideal that everyone agrees on.

* We can imagine other alternatives, such as the behavior in Oracle
 (statement level rollback + optimistic retry).

* Those alternatives are probably better in some ways but worse in other
 ways.

* EPQ violates snapshot consistency, even though that's not inherently
 necessary to avoid "READ COMMITTED serialization errors".

* ON CONFLICT also violates snapshot consistency, in rather a different
 way. (Whether or not this is necessary is more debatable.)

I actually think that other MVCC systems don't actually copy Oracle here,
either, and for similar pragmatic reasons. It's a mixed bag.


Where hides the problem?


The problem is violating MVCC is something that can be done in different
ways, and by meaningful degrees:

* EPQ semantics are believed to be fine because we don't get complaints
 about it. I think that that's because it's specialized to UPDATEs and
 UPDATE-like operations, where we walk an UPDATE chain specifically,
 and only use a dirty snapshot for the chain's newer tuples.

* ON CONFLICT doesn't care about UPDATE chains. Unlike EPQ, it makes no
 distinction between a concurrent UPDATE, and a concurrent DELETE + fresh
 INSERT. It's specialized to CONFLICTs.

This might seem abstract, but it has real, practical implications.
Certain contradictions exist when you start with MVCC semantics, then
fall back to EPQ semantics, then finally fall back to ON CONFLICT
semantics.

Questions about mixing these two things:

* What do we do if someone concurrently UPDATEs in a way that makes the
 qual not pass during EPQ traversal? Should we INSERT when that
 happens?

* If so, what about the case when the MERGE join qual/unique index
 values didn't change (just some other attributes that do not pass the
 additional WHEN MATCHED qual)?

* What about when there was a concurrent DELETE -- should we INSERT then?

ON CONFLICT goes from a CONFLICT, and then applies its own qual. That's
hugely different to doing it the other way around: starting from your
own MVCC snapshot qual, and going to a CONFLICT. This is because
evaluating the DO UPDATE's WHERE clause is just one little extra step
after the one and only latest row for that value has been locked.  You
could theoretically go this way with 2PL, I think, because that's a bit
like locking every row that the predicate touches, but of course that
isn't at all practical.

I should stop trying to make a watertight case against this, even though
I still think that's possible. For now, instead, I'll just say that this
is *extremely* complicated, and still has unresolved questions about
semantics.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL procedures

2017-11-06 Thread Simon Riggs

On 31 October 2017 at 17:23, Peter Eisentraut
 wrote:
> I've been working on SQL procedures.  (Some might call them "stored
> procedures", but I'm not aware of any procedures that are not stored, so
> that's not a term that I'm using here.)

Looks good

> Everything that follows is intended to align with the SQL standard, at
> least in spirit.

+1

> This first patch does a bunch of preparation work.  It adds the
> CREATE/ALTER/DROP PROCEDURE commands and the CALL statement to call a
> procedure.

I guess it would be really useful to have a cut-down language to use
as an example, but its probably easier to just wait for PLpgSQL.

You mention PARALLEL SAFE is not used for procedures. Isn't it an
architectural restriction that procedures would not be able to execute
in parallel? (At least this year)

> It also adds ROUTINE syntax which can refer to a function or
> procedure.

I think we need an explanatory section of the docs, but there doesn't
seem to be one for Functions, so there is no place to add some text
that says the above.

I found it confusing that ALTER and DROP ROUTINE exists but not CREATE
ROUTINE. At very least we should say somewhere "there is no CREATE
ROUTINE", so its absence is clearly intentional. I did wonder whether
we should have it as well, but its just one less thing to review, so
good.

Was surprised that pg_dump didn't use DROP ROUTINE, when appropriate.

> I have extended that to include aggregates.  And then there
> is a bunch of leg work, such as psql and pg_dump support.  The
> documentation is a lot of copy-and-paste right now; that can be
> revisited sometime.  The provided procedural languages (an ever more
> confusing term) each needed a small touch-up to handle pg_proc entries
> with prorettype == 0.
>
> Right now, there is no support for returning values from procedures via
> OUT parameters.  That will need some definitional pondering; and see
> also below for a possible alternative.
>
> With this, you can write procedures that are somewhat compatible with
> DB2, MySQL, and to a lesser extent Oracle.
>
> Separately, I will send patches that implement (the beginnings of) two
> separate features on top of this:
>
> - Transaction control in procedure bodies
>
> - Returning multiple result sets

Both of those would be good, though my suggested priority would be
transaction control first and then multiple result sets, if we cannot
have both this release.

> (In various previous discussions on "real stored procedures" or
> something like that, most people seemed to have one of these two
> features in mind.  I think that depends on what other SQL systems one
> has worked with previously.)

Almost all of the meat happens in later patches, so no other review comments.

That seems seems strange in a patch of this size, but its true.
Procedures are just a new type of object with very little interaction
with replication, persistence or optimization.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Simon Riggs

On 6 November 2017 at 18:35, Peter Geoghegan  wrote:

>> APPROACH2 (modified from my original proposal slightly)
>
>
> This write-up actually begins to confront the issues that I've raised.
> I'm glad to see this.
>
>> 1. Join...
>> 2. Apply results for UPDATE, if present not visible via the snapshot
>> taken at 1, do EPQ to ensure we locate current live tuple
>> 3. If still not visible, do speculative insertion if we have a unique
>> index available, otherwise ERROR. If spec insertion fails, go to 2
>>
>> The loop created above can live-lock, meaning that an infinite loop
>> could be created.
>
>
> The loop is *guaranteed* to live-lock once you "goto 2". So you might as
> well just throw an error at that point, which is the behavior that I've
> been arguing for all along!
>
> If this isn't guaranteed to live-lock at "goto 2", then it's not clear
> why. The outcome of step 2 is clearly going to be identical if you don't
> acquire a new MVCC snapshot, but you don't address that.
>
> You might have meant "apply an equivalent ON CONFLICT DO UPDATE", or
> something like that, despite the fact that the use of ON CONFLICT DO
> NOTHING was clearly implied by the "goto 2". I also see problems with
> that, but I'll wait for you to clarify what you meant before going into
> what they are.

In step 3 we discover that an entry exists in the index for a committed row.

Since we have a unique index we use it to locate the row we know
exists and UPDATE that.

We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already
violating MVCC for UPDATEs, so why does it matter if we do it for
INSERTs also?


Where hides the problem?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-11-06 Thread Robert Haas

On Wed, Oct 25, 2017 at 11:40 AM, Amit Khandekar  wrote:
> Below I have addressed the remaining review comments :

The changes to trigger.c still make me super-nervous.  Hey THOMAS
MUNRO, any chance you could review that part?

+   /* The caller must have already locked all the partitioned tables. */
+   root_rel = heap_open(root_relid, NoLock);
+   *all_part_cols = NULL;
+   foreach(lc, partitioned_rels)
+   {
+   Index   rti = lfirst_int(lc);
+   Oid relid = getrelid(rti, rtables);
+   Relationpart_rel = heap_open(relid, NoLock);
+
+   pull_child_partition_columns(part_rel, root_rel, all_part_cols);
+   heap_close(part_rel, NoLock);

I don't like the fact that we're opening and closing the relation here
just to get information on the partitioning columns.  I think it would
be better to do this someplace that already has the relation open and
store the details in the RelOptInfo.  set_relation_partition_info()
looks like the right spot.

+void
+pull_child_partition_columns(Relation rel,
+Relation parent,
+Bitmapset **partcols)

This code has a lot in common with is_partition_attr().  I'm not sure
it's worth trying to unify them, but it could be done.

+ * 'num_update_rri' : number of UPDATE per-subplan result rels. For INSERT,

Instead of " : ", you could just write "is the".

+* For Updates, if the leaf partition is already present in the
+* per-subplan result rels, we re-use that rather than
initialize a
+* new result rel. The per-subplan resultrels and the
resultrels of
+* the leaf partitions are both in the same canonical
order. So while

It would be good to explain the reason.  Also, Updates shouldn't be
capitalized here.

+   Assert(cur_update_rri <= update_rri +
num_update_rri - 1);

Maybe just cur_update_rri < update_rri + num_update_rri, or even
current_update_rri - update_rri < num_update_rri.

Also, +1 for Amit Langote's idea of trying to merge
mt_perleaf_childparent_maps with mt_persubplan_childparent_maps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Peter Geoghegan


Simon Riggs  wrote:

APPROACH1
1. Join to produce results based upon snapshot at start of query
2. Apply results for INSERT, UPDATE or DELETE



Such failures are of great concern in practice because the time
between 1 and 2 could be very long for large statements, or for
smaller statements we might have sufficiently high concurrency to
allow us to see regular failures.


I'm not sure that they're a *great* concern in a world with something
that targets UPSERT use cases, which is a situation that does not exist
in DBMSs with MERGE (with the notable exception of Teradata). But it's
clearly a concern that users may expect to avoid duplicate violations in
READ COMMITTED, since this caused confusion among users of other
database systems with MERGE.


APPROACH2 (modified from my original proposal slightly)


This write-up actually begins to confront the issues that I've raised.
I'm glad to see this.


1. Join...
2. Apply results for UPDATE, if present not visible via the snapshot
taken at 1, do EPQ to ensure we locate current live tuple
3. If still not visible, do speculative insertion if we have a unique
index available, otherwise ERROR. If spec insertion fails, go to 2

The loop created above can live-lock, meaning that an infinite loop
could be created.


The loop is *guaranteed* to live-lock once you "goto 2". So you might as
well just throw an error at that point, which is the behavior that I've
been arguing for all along!

If this isn't guaranteed to live-lock at "goto 2", then it's not clear
why. The outcome of step 2 is clearly going to be identical if you don't
acquire a new MVCC snapshot, but you don't address that.

You might have meant "apply an equivalent ON CONFLICT DO UPDATE", or
something like that, despite the fact that the use of ON CONFLICT DO
NOTHING was clearly implied by the "goto 2". I also see problems with
that, but I'll wait for you to clarify what you meant before going into
what they are.


In practice, such live-locks are rare and we could detect them by
falling out of the loop after a few tries. Approach2's purpose is to
alleviate errors in Approach1, so falling out of the loop merely takes
us back to the error we would have got if we didn't try, so Approach2
has considerable benefit over Approach1.


I don't hate the idea of retrying a fixed number of times for things
like this, but I don't like it either. I'm going to assume that it's
fine for now.


I read that step 3 in Approach2 is some kind of problem in MVCC
semantics. My understanding is that SQL Standard allows us to define
what the semantics of the statement are in relation to concurrency, so
any semantic issue can be handled by defining it to work the way we
want.


My only concern is that our choices here should be good ones, based on
practical considerations. We both more or less agree on how this should
be assessed, I think; we just reach different conclusions.


As you point out, whichever we choose, we will be bound by those
semantics. So if we take Approach1, as has been indicated currently,
what is the written explanation for that, so we can show that to the
people who ask in the future about our decisions?


Well, Approach1 is what other systems implement. I think that it would
be important to point out that MERGE with Approach1 isn't special, but
ON CONFLICT DO UPDATE is special. We'd also say that higher isolation
levels will not have duplicate violations.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-06 Thread Jim Van Fleet

Hi --

pgsql-hackers-ow...@postgresql.org wrote on 11/06/2017 09:47:22 AM:

> From: Andres Freund 

> 
> Hi,
> 
> Please don't top-quote on postgresql lists.
Sorry 
> 
> On 2017-11-06 09:44:24 -0600, Jim Van Fleet wrote:
> > > >hammerdb, in this configuration, runs a variant of tpcc
> > > 
> > > Hard to believe that any of the changes here are relevant in that 
> > > case - this is parallelism specific stuff. Whereas tpcc is oltp, 
right?
> 
> > correct
> 
> In that case, could you provide before/after profiles of the performance
> changing runs?
sure -- happy to share -- gzipped files (which include trace, perf, 
netstat, system data) are are large (9G and 13G)
Should I post them on the list or somewhere else (or trim them -- if so, 
what would you like to have?) 
> 
Jim

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Simon Riggs

On 3 November 2017 at 16:35, Peter Geoghegan  wrote:
> Simon Riggs  wrote:

 The *only* behavioural difference I have proposed would be the *lack*
 of an ERROR in (some) concurrent cases.
>>>
>>>
>>> I think that's a big difference.  Error vs. non-error is a big deal by
>>> itself;
>>
>>
>> Are you saying avoiding an ERROR is a bad or good thing?
>
>
> Are you really asking Robert to repeat what has already been said about
> a dozen different ways?

I'm asking for clarity of explanation rather than assertions.

> That's *not* the only difference. You need to see a couple of steps
> ahead to see further differences, as the real dilemma comes when you
> have to reconcile having provided the UPSERT-guarantees with cases that
> that doesn't map on to (which can happen in a number of different ways).
>
> I don't understand why you'll talk about just about anything but that.
> This is a high-level concern about the overarching design. Do you really
> not understand the concern at this point?

You're either referring to what is in the docs, which is INSERT ... ON
CONFLICT violates MVCC in a particular way, or something as yet
unstated. If it is the former, then I still don't see the problem (see
later). If it is the latter, I need more. So either way I need more.

> Robert Haas said
>In the past, there have been objections to implementations of MERGE
>which would give rise to such serialization anomalies, but I'm not
>sure we should feel bound by those discussions.  One thing that's
>different is that the common and actually-useful case can now be made
>to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE;
>if less useful cases are vulnerable to some weirdness, maybe it's OK
>to just document the problems.

I agreed with that, and still do.

We need a clear, explicit description of this situation so I will
attempt that in detail here.

The basic concurrency problem we have is this

APPROACH1
1. Join to produce results based upon snapshot at start of query
2. Apply results for INSERT, UPDATE or DELETE

Given there is a time delay between 1 and 2 there is a race condition
so that if another user concurrently inserts the same value into a
unique index then an INSERT will fail with a uniqueness violation.

Such failures are of great concern in practice because the time
between 1 and 2 could be very long for large statements, or for
smaller statements we might have sufficiently high concurrency to
allow us to see regular failures.

APPROACH2 (modified from my original proposal slightly)
1. Join...
2. Apply results for UPDATE, if present not visible via the snapshot
taken at 1, do EPQ to ensure we locate current live tuple
3. If still not visible, do speculative insertion if we have a unique
index available, otherwise ERROR. If spec insertion fails, go to 2

The loop created above can live-lock, meaning that an infinite loop
could be created.

In practice, such live-locks are rare and we could detect them by
falling out of the loop after a few tries. Approach2's purpose is to
alleviate errors in Approach1, so falling out of the loop merely takes
us back to the error we would have got if we didn't try, so Approach2
has considerable benefit over Approach1. This only applies if we do an
INSERT, so if there is a WHEN NOT MATCHED ... AND clause with
probability W, that makes the INSERT rare then we simply have the
probablility of error in Approach2 approach the probability of error
in Approach1 as the W drops to zero, but with W high we may avoid many
errors. Approach2 never generates more errors than Approach1.

I read that step 3 in Approach2 is some kind of problem in MVCC
semantics. My understanding is that SQL Standard allows us to define
what the semantics of the statement are in relation to concurrency, so
any semantic issue can be handled by defining it to work the way we
want. The semantics are:
a) when a unique index is available we avoid errors by using semantics
of INSERT .. ON CONFLICT UPDATE.
b) when a unique index is not available we use other semantics.
To me this is the same as INSERTs failing in the presence of unique
indexes, but not failing when no index is present. The presence of a
unique constraint alters the semantics of the query.
We can choose Approach2 - as Robert says "[we should not] feel bound
by those [earlier] discussions"

Please explain what is wrong with the above without merely asserting
there is a problem.

As you point out, whichever we choose, we will be bound by those
semantics. So if we take Approach1, as has been indicated currently,
what is the written explanation for that, so we can show that to the
people who ask in the future about our decisions?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] pow support for pgbench

2017-11-06 Thread Fabien COELHO




I don't want to go too deep into it, but you get stuff like this:

Select pow(2.0, -3)::text = pow(2, -3)::text;


Sure. It does so with any overloaded operator or function:

  fabien=# SELECT (2.0 + 3)::TEXT = (2 + 3)::TEXT; # f

Patch applies, make check ok in pgbench, doc gen ok.

ipow code is nice and simple.

I switched the patch to "Ready for Committer"

Let's now hope that a committer gets around to consider these patch some 
day.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pow support for pgbench

2017-11-06 Thread Raúl Marín Rodríguez

Hi,


Indeed, this is quite strange...


 I don't want to go too deep into it, but you get stuff like this:

Select pow(2.0, -3)::text = pow(2, -3)::text;
 ?column?
--
 f
(1 row)


 - you can simplify the ipow function by removing handling of y<0 case,
>maybe add an assert to be sure to avoid it.

I agree, done.

 - you should add more symmetry and simplify the evaluation:

Done too.

 Add a test case to show what happens on NULL arguments, hopefully the
> result is NULL.

Done and it does.

Thanks again for the review.

On Mon, Nov 6, 2017 at 4:14 PM, Fabien COELHO  wrote:

>
> Hello,
>
> Sorry for the confusion, I wasn't aware that SQL pow changed types
>> depending on the input value.
>>
>
> Indeed, this is quite strange...
>
>   fabien=# SELECT i, POW(2, i) FROM generate_series(-2, 2) AS i;
>-2 | 0.25
>-1 | 0.5
> 0 | 1
> 1 | 2
> 2 | 4
>
> I've modified the function to match more closely the behaviour of SQL,
>> except that 0^(negative) returns 'double inf'. Do you think there is any
>> value in raising an error instead?
>>
>
>   fabien=# SELECT POW(0,-1);
>   ERROR:  zero raised to a negative power is undefined
>
> H... I'm fine with double inf, because exception in pgbench means the
> end of the script, which is not desirable for benchmarking purposes.
>
> I think that:
>
>  - you can simplify the ipow function by removing handling of y<0 case,
>maybe add an assert to be sure to avoid it.
>
>  - you should add more symmetry and simplify the evaluation:
>
>if (int & int)
>{
>   i1, i2 = ...;
>   if (i2 >= 0)
> setIntValue(retval, ipow(i1, i2));
>   else
> // conversion is done by C, no need to coerce again
> setDoubleValue(retval, pow(i1, i2));
>}
>else
>{
>  d1, d2 = ...;
>  setDoubleValue(retval, pow(d1, d2));
>}
>
> Add a test case to show what happens on NULL arguments, hopefully the
> result is NULL.
>
> --
> Fabien.
>



-- 

*Raúl Marín Rodríguez *carto.com
From 47f6ec396d3bc11c39066dbd4b31e75b76102094 Mon Sep 17 00:00:00 2001
From: Raul Marin 
Date: Fri, 13 Oct 2017 17:42:23 +0200
Subject: [PATCH] Add pow() support to pgbench

---
 doc/src/sgml/ref/pgbench.sgml|  7 
 src/bin/pgbench/exprparse.y  |  3 ++
 src/bin/pgbench/pgbench.c| 62 
 src/bin/pgbench/pgbench.h|  3 +-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 15 +++
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1f55967e40a..32c94ba0dc1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1233,6 +1233,13 @@ pgbench  options  d
sqrt(2.0)
1.414213562
   
+  
+   pow(x, y)
+   integer if x and y are integers and y >= 0, else double
+   Numeric exponentiation
+   pow(2.0, 10)
+   1024.0
+  
  
  

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 770be981f06..290bca99d12 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -334,6 +334,9 @@ static const struct
 	{
 		"!case_end", -2, PGBENCH_CASE
 	},
+	{
+		"pow", 2, PGBENCH_POW
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index add653bf90c..3781e75721d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -739,6 +739,27 @@ getPoissonRand(TState *thread, int64 center)
 }
 
 /*
+ * pow() for integer values with exp >= 0. Matches SQL pow() behaviour
+ */
+static int64
+ipow(int64 base, int64 exp)
+{
+	int64 result = 1;
+
+	Assert(exp >= 0);
+
+	while (exp)
+	{
+		if (exp & 1)
+			result *= base;
+		exp >>= 1;
+		base *= base;
+	}
+
+	return result;
+}
+
+/*
  * Initialize the given SimpleStats struct to all zeroes
  */
 static void
@@ -1918,6 +1939,47 @@ evalFunc(TState *thread, CState *st,
 return true;
 			}
 
+		case PGBENCH_POW:
+			{
+PgBenchValue *lval = [0];
+PgBenchValue *rval = [1];
+
+Assert(nargs == 2);
+
+/*
+ * If both operands are int and exp >= 0 use
+ * the ipow() function, else use pow()
+ */
+if (lval->type == PGBT_INT &&
+	 rval->type == PGBT_INT)
+{
+
+	int64		li,
+ri;
+
+	if (!coerceToInt(lval, ) ||
+		!coerceToInt(rval, ))
+		return false;
+
+	if (ri >= 0)
+		setIntValue(retval, ipow(li, ri));
+	else
+		setDoubleValue(retval, pow(li, ri));
+}
+else
+{
+	double		ld,
+rd;
+
+	if (!coerceToDouble(lval, ) ||
+		!coerceToDouble(rval, ))
+		return false;
+
+	setDoubleValue(retval, pow(ld, rd));
+}
+return true;
+			}
+
 		default:
 			/* cannot get here */
 			Assert(0);
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index e1277a1dde6..9f26af92bf6

Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Magnus Hagander

On Mon, Nov 6, 2017 at 5:00 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Mon, Nov 6, 2017 at 4:46 PM, Tom Lane  wrote:
> >> Hm, around here it's no match -> spam bucket.  But in any case, why
>
> > I think you're quite uncommon in that setup.
>
> Interesting, because "it's not addressed to me (or any list I'm on)"
> is the best single spam filtering rule I know, and has been for a
> decade or two.
>

Oh, I think I misunderstood you. I thought you meant "any list tagged email
that's for a list I don't know what it is". As in "list-id exists but is
unknown".

The way you explain there makes a lot more sense. I think not many people
do that either, mainly since gmail/yahoo/whatnot doesn't make it very easy
to do that. But it does make a lot more sense that way.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Tom Lane

Magnus Hagander  writes:
> On Mon, Nov 6, 2017 at 4:46 PM, Tom Lane  wrote:
>> Hm, around here it's no match -> spam bucket.  But in any case, why

> I think you're quite uncommon in that setup.

Interesting, because "it's not addressed to me (or any list I'm on)"
is the best single spam filtering rule I know, and has been for a
decade or two.

But we veer far off topic here.  Do as you will.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Magnus Hagander

On Mon, Nov 6, 2017 at 4:46 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Mon, Nov 6, 2017 at 4:40 PM, Tom Lane  wrote:
> >> I suggest doing that the other way 'round.  Otherwise, the email
> >> about the change will inevitably go into a lot of peoples' bit
> >> buckets if they haven't adjusted their mail filters yet.
>
> > The argument for doing it after the migration is that the complaints that
> > we have received so far have all been from people where email ends up in
> > the *inbox* after the migration, not the bitbucket. That's the default
> > action in most peoples MUAs when their rules no longer match...
>
> Hm, around here it's no match -> spam bucket.  But in any case, why
>

I think you're quite uncommon in that setup. For obvious reasons, but I've
never heard of anybody other than you doing that :)

> would you not want to send it before so that it would end up where
> they're accustomed to seeing the list's traffic?
>

The experience from the pgadmin lists is that a lot of people have the
lists filtered into folders that they don't check often (or at all). So
they don't notice the migraiton message. But they start noticing once all
the list mail shows up in their inbox instead.

It might well be that we end up getting the other half of people when we do
it this order, but we definitely at a *lot* of people in that first bucket.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-06 Thread Andres Freund

Hi,

Please don't top-quote on postgresql lists.

On 2017-11-06 09:44:24 -0600, Jim Van Fleet wrote:
> > >hammerdb, in this configuration, runs a variant of tpcc
> > 
> > Hard to believe that any of the changes here are relevant in that 
> > case - this is parallelism specific stuff. Whereas tpcc is oltp, right?

> correct

In that case, could you provide before/after profiles of the performance
changing runs?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Tom Lane

Magnus Hagander  writes:
> On Mon, Nov 6, 2017 at 4:40 PM, Tom Lane  wrote:
>> I suggest doing that the other way 'round.  Otherwise, the email
>> about the change will inevitably go into a lot of peoples' bit
>> buckets if they haven't adjusted their mail filters yet.

> The argument for doing it after the migration is that the complaints that
> we have received so far have all been from people where email ends up in
> the *inbox* after the migration, not the bitbucket. That's the default
> action in most peoples MUAs when their rules no longer match...

Hm, around here it's no match -> spam bucket.  But in any case, why
would you not want to send it before so that it would end up where
they're accustomed to seeing the list's traffic?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-06 Thread Jim Van Fleet

correct

> >hammerdb, in this configuration, runs a variant of tpcc
> 
> Hard to believe that any of the changes here are relevant in that 
> case - this is parallelism specific stuff. Whereas tpcc is oltp, right?
> 
> Andres
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Each list will receive an email with a link to the wiki about the
> > migration after the list has been migrated.
> 
> I suggest doing that the other way 'round.  Otherwise, the email
> about the change will inevitably go into a lot of peoples' bit
> buckets if they haven't adjusted their mail filters yet.

My thought had been to do before-and-after, but I got complaints from
others that we'd then be spamming a lot of people with email.

We definitely need one after the migration because the new mail *won't*
end up caught in people's filters, and for those who intentionally
filter the list traffic into the garbage because they couldn't figure
out how to unsubscribe, this is going to be most annoying (this is what
we saw with the pgadmin lists and it was quite painful).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Magnus Hagander

On Mon, Nov 6, 2017 at 4:40 PM, Tom Lane  wrote:

> Stephen Frost  writes:
> > Each list will receive an email with a link to the wiki about the
> > migration after the list has been migrated.
>
> I suggest doing that the other way 'round.  Otherwise, the email
> about the change will inevitably go into a lot of peoples' bit
> buckets if they haven't adjusted their mail filters yet.
>

The argument for doing it after the migration is that the complaints that
we have received so far have all been from people where email ends up in
the *inbox* after the migration, not the bitbucket. That's the default
action in most peoples MUAs when their rules no longer match...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Tom Lane

Stephen Frost  writes:
> Each list will receive an email with a link to the wiki about the
> migration after the list has been migrated.

I suggest doing that the other way 'round.  Otherwise, the email
about the change will inevitably go into a lot of peoples' bit
buckets if they haven't adjusted their mail filters yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Schedule for migration to pglister

2017-11-06 Thread Stephen Frost

Greetings,

The PostgreSQL Infrastructure team is working to migrate the project's
mailing lists from the existing system (an ancient and unmaintained
piece of software called "majordomo2") to a newly developed mailing list
system (known as "PGLister"), which better addresses the needs of the
PostgreSQL community and is updated to work with recent improvements in
email technology and spam filtering. These changes will impact certain
aspects of the system but we are hopeful that these changes will have a
minimal impact on users, although everyone will notice the differences.

The changes which we expect to be most significant to users can be found
on the wiki here: https://wiki.postgresql.org/wiki/PGLister_Announce

Our planned migration schedule is as follows:

Nov 6 -
  pgsql-www

Nov 13 -
  pgsql-hackers
  pgsql-bugs
  pgsql-committers

Nov 20 -
  pgsql-admin
  pgsql-general
  pgsql-sql
  pgsql-jobs
  pgsql-novice

Nov 27 -
  pgsql-announce

After -
  the rest

We will be starting the migration of pgsql-www shortly.

Each list will receive an email with a link to the wiki about the
migration after the list has been migrated.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-06 Thread Andres Freund



On November 6, 2017 7:30:49 AM PST, Jim Van Fleet  wrote:
>Andres Freund  wrote on 11/05/2017 03:40:15 PM:
>
>hammerdb, in this configuration, runs a variant of tpcc

Hard to believe that any of the changes here are relevant in that case - this 
is parallelism specific stuff. Whereas tpcc is oltp, right?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-06 Thread Jim Van Fleet

Andres Freund  wrote on 11/05/2017 03:40:15 PM:

hammerdb, in this configuration, runs a variant of tpcc
> 
> What query(s) did you measure?
> 
> Andres
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-11-06 Thread Sokolov Yura

On 2017-10-20 11:54, Sokolov Yura wrote:

Hello,

On 2017-10-19 19:46, Andres Freund wrote:

On 2017-10-19 14:36:56 +0300, Sokolov Yura wrote:

> > + init_local_spin_delay();
>
> The way you moved this around has the disadvantage that we now do this -
> a number of writes - even in the very common case where the lwlock can
> be acquired directly.

Excuse me, I don't understand fine.
Do you complain against init_local_spin_delay placed here?

Yes.

I could place it before perform_spin_delay under `if (!spin_inited)` if 
you

think it is absolutely must.

I looked at assembly, and remembered, that last commit simplifies
`init_local_spin_delay` to just two-three writes of zeroes (looks
like compiler combines 2*4byte write into 1*8 write). Compared to
code around (especially in LWLockAcquire itself), this overhead
is negligible.

Though, I found that there is benefit in calling LWLockAttemptLockOnce
before entering loop with calls to LWLockAttemptLockOrQueue in the
LWLockAcquire (in there is not much contention). And this way, `inline`
decorator for LWLockAttemptLockOrQueue could be omitted. Given, clang
doesn't want to inline this function, it could be the best way.

Should I add such commit to patch?

--
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pow support for pgbench

2017-11-06 Thread Fabien COELHO



Hello,

Sorry for the confusion, I wasn't aware that SQL pow changed types 
depending on the input value.


Indeed, this is quite strange...

  fabien=# SELECT i, POW(2, i) FROM generate_series(-2, 2) AS i;
   -2 | 0.25
   -1 | 0.5
0 | 1
1 | 2
2 | 4

I've modified the function to match more closely the behaviour of SQL, 
except that 0^(negative) returns 'double inf'. Do you think there is any 
value in raising an error instead?


  fabien=# SELECT POW(0,-1);
  ERROR:  zero raised to a negative power is undefined

H... I'm fine with double inf, because exception in pgbench means the 
end of the script, which is not desirable for benchmarking purposes.


I think that:

 - you can simplify the ipow function by removing handling of y<0 case,
   maybe add an assert to be sure to avoid it.

 - you should add more symmetry and simplify the evaluation:

   if (int & int)
   {
  i1, i2 = ...;
  if (i2 >= 0)
setIntValue(retval, ipow(i1, i2));
  else
// conversion is done by C, no need to coerce again
setDoubleValue(retval, pow(i1, i2));
   }
   else
   {
 d1, d2 = ...;
 setDoubleValue(retval, pow(d1, d2));
   }

Add a test case to show what happens on NULL arguments, hopefully the 
result is NULL.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

2017-11-06 Thread Claudio Freire

On Mon, Nov 6, 2017 at 11:50 AM, Юрий Соколов  wrote:
>> Maybe leave a fallback to qsort if some corner case produces big buckets?
>
> For 8kb pages, each bucket is per 32 bytes. So, for heap pages it is at
> most 1 heap-tuple per bucket, and for index pages it is at most 2 index
> tuples per bucket. For 32kb pages it is 4 heap-tuples and 8 index-tuples
> per bucket.
> It will be unnecessary overhead to call non-inlineable qsort in this cases
>
> So, I think, shell sort could be removed, but insertion sort have to remain.
>
> I'd prefer shell sort to remain also. It could be useful in other places
> also,
> because it is easily inlinable, and provides comparable to qsort performance
> up to several hundreds of elements.

I'd rather have an inlineable qsort.

And I'd recommend doing that when there is a need, and I don't think
this patch really needs it, since bucket sort handles most cases
anyway.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

2017-11-06 Thread Юрий Соколов

2017-11-05 20:44 GMT+03:00 Claudio Freire :
>
> On Sat, Nov 4, 2017 at 8:07 PM, Юрий Соколов 
wrote:
> > 2017-11-03 5:46 GMT+03:00 Tom Lane :
> >>
> >> Sokolov Yura  writes:
> >> > [ 0001-Improve-compactify_tuples.patch, v5 or thereabouts ]
> >>
> >> I went to check the shellsort algorithm against Wikipedia's entry,
> >> and found that this appears to be an incorrect implementation of
> >> shellsort: where pg_shell_sort_pass has
> >>
> >> for (_i = off; _i < _n; _i += off) \
> >>
> >> it seems to me that we need to have
> >>
> >> for (_i = off; _i < _n; _i += 1) \
> >>
> >> or maybe just _i++.
> >
> >
> > Shame on me :-(
> > I've wrote shell sort several times, so I forgot to recheck myself once
> > again.
> > And looks like best gap sequence from wikipedia is really best
> > ( {301, 132, 57, 23, 10 , 4} in my notation),
> >
> >
> > 2017-11-03 17:37 GMT+03:00 Claudio Freire :
> >> On Thu, Nov 2, 2017 at 11:46 PM, Tom Lane  wrote:
> >>> BTW, the originally given test case shows no measurable improvement
> >>> on my box.
> >>
> >> I did manage to reproduce the original test and got a consistent
> >> improvement.
> >
> > I've rechecked my self using my benchmark.
> > Without memmove, compactify_tuples comsumes:
> > - with qsort 11.66% cpu (pg_qsort + med3 + swapfunc + itemoffcompare +
> > compactify_tuples = 5.97 + 0.51 + 2.87 + 1.88 + 0.44)
> > - with just insertion sort 6.65% cpu (sort is inlined, itemoffcompare
also
> > inlined, so whole is compactify_tuples)
> > - with just shell sort 5,98% cpu (sort is inlined again)
> > - with bucket sort 1,76% cpu (sort_itemIds + compactify_tuples = 1.30 +
> > 0.46)
>
> Is that just insertion sort without bucket sort?

Yes. Just to show that inlined insertion sort is better than non-inlined
qsort
in this particular use-case.

> Because I think shell sort has little impact in your original patch
> because it's rarely exercised. With bucket sort, most buckets are very
> small, too small for shell sort to do any useful work.

Yes. In the patch, buckets are sorted with insertion sort. Shell sort is
used
only on full array if its size less than 48.
Bucket sort has constant overhead of traversing all buckets, even if they
are empty. That is why I think, shell sort for small arrays is better.
Though,
I didn't measure that carefully. And probably insertion sort for small
arrays
will be just enough.

> Maybe leave a fallback to qsort if some corner case produces big buckets?

For 8kb pages, each bucket is per 32 bytes. So, for heap pages it is at
most 1 heap-tuple per bucket, and for index pages it is at most 2 index
tuples per bucket. For 32kb pages it is 4 heap-tuples and 8 index-tuples
per bucket.
It will be unnecessary overhead to call non-inlineable qsort in this cases

So, I think, shell sort could be removed, but insertion sort have to remain.

I'd prefer shell sort to remain also. It could be useful in other places
also,
because it is easily inlinable, and provides comparable to qsort performance
up to several hundreds of elements.

With regards,
Sokolov Yura aka funny_falcon.

Re: [HACKERS] pow support for pgbench

2017-11-06 Thread Raúl Marín Rodríguez

Hi Fabien,

Sorry for the confusion, I wasn't aware that SQL pow changed types
depending on
the input value.

I've modified the function to match more closely the behaviour of SQL,
except
that 0^(negative) returns 'double inf'. Do you think there is any value in
raising an error instead?


On Mon, Nov 6, 2017 at 2:12 PM, Fabien COELHO  wrote:

>
> Hello Raúl,
>
> I've fixed the documentation and added an ipow function that handles both
>> positive and negative ints, having 0^0 == 1 and 0^(negative) ==
>> PG_INT64_MAX
>> since that's what my glibc math.h pow() is returning.
>>
>
> From the comment:
>
>  * For exp < 0 return 0 except when the base is 1 or -1
>
> I think that it should do what POW does in psql, i.e.:
>
>  fabien=# SELECT POW(2, -2); # 0.25
>
> that is if exp < 0 the double version should be used, it should
> not return 0.
>
> Basically the idea is that the pgbench client-side version should behave
> the same as the SQL version.
>
> --
> Fabien.




-- 

*Raúl Marín Rodríguez*carto.com
From be2fedcae277f7eede621dcda66b15b08372ce63 Mon Sep 17 00:00:00 2001
From: Raul Marin 
Date: Fri, 13 Oct 2017 17:42:23 +0200
Subject: [PATCH] Add pow() support to pgbench

---
 doc/src/sgml/ref/pgbench.sgml|  7 +++
 src/bin/pgbench/exprparse.y  |  3 +
 src/bin/pgbench/pgbench.c| 84 
 src/bin/pgbench/pgbench.h|  3 +-
 src/bin/pgbench/t/001_pgbench_with_server.pl | 13 +
 5 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1f55967e40a..32c94ba0dc1 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1233,6 +1233,13 @@ pgbench  options  d
sqrt(2.0)
1.414213562
   
+  
+   pow(x, y)
+   integer if x and y are integers and y >= 0, else double
+   Numeric exponentiation
+   pow(2.0, 10)
+   1024.0
+  
  
  

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 770be981f06..290bca99d12 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -334,6 +334,9 @@ static const struct
 	{
 		"!case_end", -2, PGBENCH_CASE
 	},
+	{
+		"pow", 2, PGBENCH_POW
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index add653bf90c..d565880b6e2 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -739,6 +739,51 @@ getPoissonRand(TState *thread, int64 center)
 }
 
 /*
+ * pow() for integer values
+ */
+static int64
+ipow(int64 base, int64 exp)
+{
+	int64 result;
+
+	if (base == 0)
+	{
+		if (exp > 0)
+			return 0;
+		else if (exp == 0)
+			return 1;
+		return PG_INT64_MAX;
+	}
+
+	/*
+	 * For exp > 0 calculate normally
+	 * For exp == 0 return 1 * sign of base
+	 * For exp < 0 return 0 except when the base is 1 or -1
+	 */
+	if (exp > 0)
+	{
+		result = 1;
+		while (exp)
+		{
+			if (exp & 1)
+result *= base;
+			exp >>= 1;
+			base *= base;
+		}
+	}
+	else if (exp == 0)
+		result = (base > 0) - (base < 0);
+	else
+	{
+		result = 1 / base;
+		if (exp % 2 == 0)
+			result *= result;
+	}
+
+	return result;
+}
+
+/*
  * Initialize the given SimpleStats struct to all zeroes
  */
 static void
@@ -1918,6 +1963,45 @@ evalFunc(TState *thread, CState *st,
 return true;
 			}
 
+		case PGBENCH_POW:
+			{
+PgBenchValue *lval = [0];
+PgBenchValue *rval = [1];
+double		ld,
+			rd;
+
+Assert(nargs == 2);
+
+/*
+ * If both operands are int and exp >= 0 use
+ * the ipow() function, else use pow()
+ */
+if (lval->type == PGBT_INT &&
+	 rval->type == PGBT_INT)
+{
+
+	int64		li,
+ri;
+
+	if (!coerceToInt(lval, ) ||
+		!coerceToInt(rval, ))
+		return false;
+
+	if (ri >= 0)
+	{
+		setIntValue(retval, ipow(li, ri));
+		return true;
+	}
+}
+
+if (!coerceToDouble(lval, ) ||
+	!coerceToDouble(rval, ))
+	return false;
+
+setDoubleValue(retval, pow(ld, rd));
+return true;
+			}
+
 		default:
 			/* cannot get here */
 			Assert(0);
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index e1277a1dde6..9f26af92bf6 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -94,7 +94,8 @@ typedef enum PgBenchFunction
 	PGBENCH_LE,
 	PGBENCH_LT,
 	PGBENCH_IS,
-	PGBENCH_CASE
+	PGBENCH_CASE,
+	PGBENCH_POW
 } PgBenchFunction;
 
 typedef struct PgBenchExpr PgBenchExpr;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 8e19bbd3f45..2a1bb1216d7 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -237,6 +237,12 @@ sub pgbench
 		qr{command=36.: int 36\b},
 		qr{command=37.: boolean true\b},
 		qr{command=38.: boolean true\b},
+		qr{command=44.: int

Re: [HACKERS] Early locking option to parallel backup

2017-11-06 Thread Stephen Frost

Lucas,

* Lucas (luca...@gmail.com) wrote:
> pg_dump was taking more than 24 hours to complete in one of my databases. I
> begin to research alternatives. Parallel backup reduced the backup time to
> little less than a hour, but it failed almost every time because of
> concurrent queries that generated exclusive locks. It is difficult to
> guarantee that my applications will not issue queries such as drop table,
> alter table, truncate table, create index or drop index for a hour. And I
> prefer not to create controls mechanisms to that end if I can work around
> it.

I certainly understand the value of pg_dump-based backups, but have you
considered doing file-based backups?  That would avoid the need to do
any in-database locking at all, and would give you the ability to do
PITR too.  Further, you could actually restore that backup to another
system and then do a pg_dump there to get a logical representation (and
this would test your physical database backup/restore process too...).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-06 Thread Paul Ramsey

>From my perspective, this is much much better. For sufficiently large
tables, I get parallel behaviour without jimmying with the defaults
on parallel_setup_cost and parallel_tuple_cost. *And*, the parallel
behaviour *is* sensitive to the costs of functions in target lists, so
reasonably chosen costs will flip us into a parallel mode for expensive
functions against smaller tables too.
Hopefully some variant of this finds it's way into core! Is there any way I
can productively help?
P.

On Sat, Nov 4, 2017 at 10:02 PM, Amit Kapila 
wrote:

> On Sat, Nov 4, 2017 at 4:43 AM, Tom Lane  wrote:
> > Paul Ramsey  writes:
> >>> Whether I get a parallel aggregate seems entirely determined by the
> number
> >>> of rows, not the cost of preparing those rows.
> >
> >> This is true, as far as I can tell and unfortunate. Feeding tables with
> >> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
> >> matter how costly the work going on within. That's true of changing
> costs
> >> on the subquery select list, and on the aggregate transfn.
> >
> > This sounds like it might be the same issue being discussed in
> >
> > https://www.postgresql.org/message-id/flat/CAMkU=
> 1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com
> >
>
> I have rebased the patch being discussed on that thread.
>
> Paul, you might want to once check with the recent patch [1] posted on
> the thread mentioned by Tom.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2B1H5Urm0_
> Wp-n5XszdLX1YXBqS_zW0f-vvWKwdh3eCJA%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

Re: [HACKERS] Client Connection redirection support for PostgreSQL

2017-11-06 Thread Robert Haas

On Thu, Nov 2, 2017 at 4:33 PM, Craig Ringer  wrote:
>> Add the ability to the PostgreSQL server instance to route the traffic to a
>> different server instance based on the rules defined in server’s pg_bha.conf
>> configuration file. At a high level this enables offloading the user
>> requests to a different server instance based on the rules defined in the
>> pg_hba.conf configuration file.
>
> pg_hba.conf is "host based access [control]" . I'm not sure it's
> really the right place.

Well, we could invent someplace else, but I'm not sure I quite see the
point (full disclosure: I suggested the idea of doing this via
pg_hba.conf in an off-list discussion).

I do think the functionality is useful, for the same reasons that HTTP
redirects are useful.  For example, let's say you have all of your
databases for various clients on a single instance.  Then, one client
starts using a lot more resources, so you want to move that client to
a separate instance on another VM.  You can set up logical replication
to replicate all of the data to the new instance, and then add a
pg_hba.conf entry to redirect connections to that database to the new
master (this would be even smoother if we had multi-master replication
in core).  So now that client is moved off to another machine in a
completely client-transparent way.  I think that's pretty cool.

> When this has come up before, one of the issues has been determining
> what exactly should constitute "read only" vs "read write" for the
> purposes of redirecting work.

Yes, that needs some thought.

> Backends used just for a redirect would be pretty expensive though.

Not as expensive as proxying the whole connection, as pgpool and other
systems do today.  I think the in-core use of this redirect
functionality is useful, but I think the real win would be optionally
using it in pgpool and pgbouncer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pow support for pgbench

2017-11-06 Thread Raúl Marín Rodríguez

Hi Fabien,

Thanks for the review.
I've fixed the documentation and added an ipow function that handles both
positive and negative ints, having 0^0 == 1 and 0^(negative) == PG_INT64_MAX
since that's what my glibc math.h pow() is returning.

On Sat, Nov 4, 2017 at 12:34 PM, Fabien COELHO  wrote:

>
> Hello Raúl,
>
> Sorry about the patch. Attaching it now so it can be considered as
>> submitted.
>>
>
> There is a typo in the XML doc:
>
> 1024.0/
>
> Please check that the documentation compiles.
>
> I'm at odds with having the integer version rely on a double pow(), even
> if it works. I think that there should be a specific integer version which
> does use integer operations. From stack overflow, the following is
> suggested:
>
>  int ipow(int base, int exp)
>  {
> int result = 1;
> while (exp)
> {
> if (exp & 1)
> result *= base;
> exp >>= 1;
> base *= base;
> }
>
> return result;
>  }
>
> The integer version should be when x & y are integers *AND* y >= 0.
>
> if y is a negative integer, the double version should be used.
>
> --
> Fabien.




-- 

*Raúl Marín Rodríguez*carto.com
From 09105f8108e439834510ee5fb53036473f2977d5 Mon Sep 17 00:00:00 2001
From: Raul Marin 
Date: Fri, 13 Oct 2017 17:42:23 +0200
Subject: [PATCH] Add pow() support to pgbench

---
 doc/src/sgml/ref/pgbench.sgml|  7 
 src/bin/pgbench/exprparse.y  |  3 ++
 src/bin/pgbench/pgbench.c| 54 
 src/bin/pgbench/pgbench.h|  3 +-
 src/bin/pgbench/t/001_pgbench_with_server.pl |  9 +
 5 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 1f55967e40a..4e7f75ddf87 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1233,6 +1233,13 @@ pgbench  options  d
sqrt(2.0)
1.414213562
   
+  
+   pow(x, y)
+   double if x or y are doubles, else integer
+   Numeric exponentiation
+   pow(2.0, 10)
+   1024.0
+  
  
  

diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 770be981f06..290bca99d12 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -334,6 +334,9 @@ static const struct
 	{
 		"!case_end", -2, PGBENCH_CASE
 	},
+	{
+		"pow", 2, PGBENCH_POW
+	},
 	/* keep as last array element */
 	{
 		NULL, 0, 0
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index add653bf90c..bdf3e97682a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -739,6 +739,51 @@ getPoissonRand(TState *thread, int64 center)
 }
 
 /*
+ * pow() for integer values
+ */
+static int64
+ipow(int64 base, int64 exp)
+{
+	int64 result;
+
+	if (base == 0)
+	{
+		if (exp > 0)
+			return 0;
+		else if (exp == 0)
+			return 1;
+		return PG_INT64_MAX;
+	}
+
+	/*
+	 * For exp > 0 calculate normally
+	 * For exp == 0 return 1 * sign of base
+	 * For exp < 0 return 0 except when the base is 1 or -1
+	 */
+	if (exp > 0)
+	{
+		result = 1;
+		while (exp)
+		{
+			if (exp & 1)
+result *= base;
+			exp >>= 1;
+			base *= base;
+		}
+	}
+	else if (exp == 0)
+		result = (base > 0) - (base < 0);
+	else
+	{
+		result = 1 / base;
+		if (exp % 2 == 0)
+			result *= result;
+	}
+
+	return result;
+}
+
+/*
  * Initialize the given SimpleStats struct to all zeroes
  */
 static void
@@ -1474,6 +1519,7 @@ evalFunc(TState *thread, CState *st,
 		case PGBENCH_NE:
 		case PGBENCH_LE:
 		case PGBENCH_LT:
+		case PGBENCH_POW:
 			{
 PgBenchValue *lval = [0],
 		   *rval = [1];
@@ -1525,6 +1571,10 @@ evalFunc(TState *thread, CState *st,
 			setBoolValue(retval, ld < rd);
 			return true;
 
+		case PGBENCH_POW:
+			setDoubleValue(retval, pow(ld, rd));
+			return true;
+
 		default:
 			/* cannot get here */
 			Assert(0);
@@ -1602,6 +1652,10 @@ evalFunc(TState *thread, CState *st,
 
 			return true;
 
+		case PGBENCH_POW:
+			setIntValue(retval, ipow(li, ri));
+			return true;
+
 		default:
 			/* cannot get here */
 			Assert(0);
diff --git a/src/bin/pgbench/pgbench.h b/src/bin/pgbench/pgbench.h
index e1277a1dde6..9f26af92bf6 100644
--- a/src/bin/pgbench/pgbench.h
+++ b/src/bin/pgbench/pgbench.h
@@ -94,7 +94,8 @@ typedef enum PgBenchFunction
 	PGBENCH_LE,
 	PGBENCH_LT,
 	PGBENCH_IS,
-	PGBENCH_CASE
+	PGBENCH_CASE,
+	PGBENCH_POW
 } PgBenchFunction;
 
 typedef struct PgBenchExpr PgBenchExpr;
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl
index 8e19bbd3f45..c5ecd749c40 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -237,6 +237,10 @@ sub pgbench
 		qr{command=36.: int 36\b},
 		qr{command=37.: boolean true\b},
 		qr{command=38.: boolean true\b},
+		qr{command=44.: int

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-06 Thread Robert Haas

On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila  wrote:
> On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas  wrote:
>> This looks like it's on the right track to me.  I hope Tom will look
>> into it, but if he doesn't I may try to get it committed myself.
>>
>> -if (rel->reloptkind == RELOPT_BASEREL)
>> -generate_gather_paths(root, rel);
>> +if (rel->reloptkind == RELOPT_BASEREL &&
>> +root->simple_rel_array_size > 2 &&
>> +!root->append_rel_list)
>>
>> This test doesn't look correct to me.  Actually, it doesn't look
>> anywhere close to correct to me.  So, one of us is very confused...
>> not sure whether it's you or me.
>>
> It is quite possible that I haven't got it right, but it shouldn't be
> completely bogus as it stands the regression tests and some manual
> verification.  Can you explain what is your concern about this test?

Well, I suppose that test will fire for a baserel when the total
number of baserels is at least 3 and there's no inheritance involved.
But if there are 2 baserels, we're still not the topmost scan/join
target.  Also, even if inheritance is used, we might still be the
topmost scan/join target.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Early locking option to parallel backup

2017-11-06 Thread Andres Freund

On 2017-11-05 22:43:34 -0500, Tom Lane wrote:
> > IIUC the problem here is that even though a lock is already
> > held by the main backend an independent locker's request will prevent
> > the on-demand lock by the dump worker from being granted.  It seems to
> > me the correct fix here would be to somehow avoid the fairness logic in
> > the parallel dump case - although I don't quite know how to best do so.
> 
> I wonder if we couldn't somehow repurpose the work that was done for
> parallel workers' locks.  Lots of security-type issues to be handled
> if we're to open that up to clients, but maybe it's solvable.  For
> instance, maybe only allowing it to clients sharing the same snapshot
> would help.

Yea, I'd been wondering the same.

I'm slightly worried that somehow tying multiple clients into parallel
mode would cause a bunch of problems - that's not really the purpose of
the code and a number of its assumptions aren't quite right for that.

I'm not sure it really buys us much in contrast to just allowing a
locker to specify that it's allowed to jump the lock queue for an ASL if
it has 'backup' rights or such.  Or actually, just allow it as a general
option to LOCK, there's plenty other operational cases where the current
"fair" behaviour is really annoying, e.g. when executing operational
DDL/DML and such.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Early locking option to parallel backup

2017-11-06 Thread Robert Haas

On Mon, Nov 6, 2017 at 4:43 AM, Tom Lane  wrote:
> I wonder if we couldn't somehow repurpose the work that was done for
> parallel workers' locks.  Lots of security-type issues to be handled
> if we're to open that up to clients, but maybe it's solvable.  For
> instance, maybe only allowing it to clients sharing the same snapshot
> would help.

Interesting idea.  There's a bunch of holes that would need to be
patched there; for instance, you can't have one session running DDL
while somebody else has AccessShareLock.  Parallel query relies on the
parallel-mode restrictions to prevent that kind of thing from
happening, but it would be strange (and likely somewhat broken) to try
to enforce those here.  It would be strange and probably bad if LOCK
TABLE a; LOCK TABLE b in one session and LOCK TABLE b; LOCK TABLE a in
another session failed to deadlock.  In short, there's a big
difference between a single session using multiple processes and
multiple closely coordinated sessions.

Also, even if you did it, you still need a lot of PROCLOCKs.  Workers
don't need to take all locks up front because they can be assured of
getting them later, but they've still got to lock the objects they
actually want to access.  Group locking aims to prevent deadlocks
between cooperating processes; it is not a license to skip locking
altogether.

None of which is to say that the problems don't feel related somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restricting maximum keep segments by repslots

2017-11-06 Thread Andres Freund

Hi,

On 2017-10-31 18:43:10 +0900, Kyotaro HORIGUCHI wrote:
>   - distance:
> how many bytes LSN can advance before the margin defined by
> max_slot_wal_keep_size (and wal_keep_segments) is exhasuted,
> or how many bytes this slot have lost xlog from restart_lsn.

I don't think 'distance' is a good metric - that's going to continually
change. Why not store the LSN that's available and provide a function
that computes this? Or just rely on the lsn - lsn operator?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restricting maximum keep segments by repslots

2017-11-06 Thread Andres Freund

On 2017-11-06 11:07:04 +0800, Craig Ringer wrote:
> Would it make sense to teach xlogreader how to fetch from WAL archive,
> too? That way if there's an archive, slots could continue to be used
> even after we purge from local pg_xlog, albeit at a performance cost.
> 
> I'm thinking of this mainly for logical slots.

That seems more like a page read callback's job than xlogreader's.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Secondary index access optimizations

2017-11-06 Thread Konstantin Knizhnik


On 11/06/2017 04:27 AM, Thomas Munro wrote:

On Fri, Sep 8, 2017 at 3:58 AM, Konstantin Knizhnik
 wrote:

Updated version of the patch is attached to this mail.
Also I added support of date type to operator_predicate_proof to be able to
imply (logdate <= '2017-03-31') from (logdate < '2017-04-01') .

Hi Konstantin,

Is there any reason why you don't want to split this into two separate
proposals?  One for remove_restrictions_implied_by_constraints() and
one for the operator_predicate_proof() changes.

Your v3 patch breaks the new partition_join test (the recently
committed partition-wise join stuff), as far as I can tell in a good
way.  Can you please double check those changes and post an updated
patch?


Hi Thomas.

The primary idea of this patch was to provide more efficient plans for queries 
on partitioned tables.
So remove_restrictions_implied_by_constraints() removes redundant predicate 
checks.
But it doesn't work for standard Postgres 10 partitioning, because here 
constraints are set using intervals with open high boundary and original 
version of
operator_predicate_proof() is not able to handle this case.

I have explained this problem in my previous e-mails in this thread.
This is why I have changed operator_predicate_proof() to correctly handle this 
case.

If you think this patch should be splitted into two, ok: I can do it.
I just want to notice that without patching operator_predicate_proof() it may 
give not positive effect for standard partitioning,
which I expect to be the most popular use case where this optimization may have 
an effect.


Concerning broken partition_join test: it is "expected" failure: my patch 
removes from the plans redundant checks.
So the only required action is to update expected file with results.
Attached please find updated patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 4339bbf..0931af1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -626,12 +626,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- Nu
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest
- QUERY PLAN  
--
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL and c3 is not null;-- NullTest
+QUERY PLAN
+--
  Foreign Scan on public.ft1 t1
Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c3 IS NOT NULL))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index ddfec79..878bfc7 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -292,7 +292,7 @@ RESET enable_nestloop;
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- NullTest
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL and c3 is not null;-- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;  -- OpExpr(l)
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE 1 = c1!;   -- OpExpr(r)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a5c1b68..082d1cc 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -346,6 +346,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		switch (rel->rtekind)
 		{
 			case RTE_RELATION:
+remove_restrictions_implied_by_constraints(root, rel, rte);
 if (rte->relkind == RELKIND_FOREIGN_TABLE)
 {
 	/* Foreign table */
@@ -1312,6 +1313,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			set_dummy_rel_pathlist(childrel);

Re: [HACKERS] pow support for pgbench

2017-11-06 Thread Fabien COELHO



Hello Raúl,


I've fixed the documentation and added an ipow function that handles both
positive and negative ints, having 0^0 == 1 and 0^(negative) == PG_INT64_MAX
since that's what my glibc math.h pow() is returning.



From the comment:


 * For exp < 0 return 0 except when the base is 1 or -1

I think that it should do what POW does in psql, i.e.:

 fabien=# SELECT POW(2, -2); # 0.25

that is if exp < 0 the double version should be used, it should
not return 0.

Basically the idea is that the pgbench client-side version should behave 
the same as the SQL version.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-11-06 Thread Kyotaro HORIGUCHI

Thank you for the new patch.

- The latest patch is missing xpath_parser.h at least since
  ns-3. That of the first (not-numbered) version was still
  usable.

- c29c578 conflicts on doc/src/sgml/func.sgml


At Sun, 15 Oct 2017 12:06:11 +0200, Pavel Stehule  
wrote in 
> 2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp>:
> 
> > Hi, thanks for the new patch.
> >
> > # The patch is missing xpath_parser.h. That of the first patch was usable.
> >
> > At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule 
> > wrote in

Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread David Rowley

On 6 November 2017 at 23:01, Amit Langote  wrote:
> OK, I have gotten rid of the min/max partition index interface and instead
> adopted the bms_add_range() approach by including your patch to add the
> same in the patch set (which is now 0002 in the whole set).  I have to
> admit that it's simpler to understand the new code with just Bitmapsets to
> look at, but I'm still a bit concerned about materializing the whole set
> right within partition.c, although we can perhaps optimize it later.

Thanks for making that change. The code looks much more simple now.

For performance, if you're worried about a very large number of
partitions, then I think you're better off using bms_next_member()
rather than bms_first_member(), (likely this applies globally, but you
don't need to worry about those).

The problem with bms_first_member is that it must always loop over the
0 words before it finds any bits set for each call, whereas
bms_next_member will start on the word it was last called for. There
will likely be a pretty big performance difference between the two
when processing a large Bitmapset.

> Attached updated set of patches, including the fix to make the new pruning
> code handle Boolean partitioning.

Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Display number of heap accesses for index scans

2017-11-06 Thread Alexander Korotkov

On Mon, Nov 6, 2017 at 6:33 AM, Tom Lane  wrote:

> Peter Geoghegan  writes:
> > Andres Freund  wrote:
> >> The number of index lookups that failed to return anything can be a
> >> critical performance factor in OLTP workloads.  Therefore it seems like
> >> it'd be a good idea to extend the explain analyze output to include that
> >> information.
>
> > I certainly agree.
>
> Doesn't the EXPLAIN (BUFFERS) output already address this?
>

In plain index scan EXPLAIN (ANALYZE, BUFFERS) doesn't distinguish buffers
accessed in index and buffers accessed in heap

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-06 Thread Masahiko Sawada

On Mon, Oct 30, 2017 at 3:17 PM, Masahiko Sawada  wrote:
> On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas  wrote:
>> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada  
>> wrote:
>>> Since the previous patch conflicts with current HEAD, I attached the
>>> updated patch for next CF.
>>
>> I think we should back up here and ask ourselves a couple of questions:
>
> Thank you for summarizing of the purpose and discussion of this patch.
>
>> 1. What are we trying to accomplish here?
>>
>> 2. Is this the best way to accomplish it?
>>
>> To the first question, the problem as I understand it as follows:
>> Heavyweight locks don't conflict between members of a parallel group.
>> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
>> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
>> don't arise, because parallel operations are strictly read-only
>> (except for inserts by the leader into a just-created table, when only
>> one member of the group can be taking the lock anyway).  However, once
>> we allow writes, they become possible, so some solution is needed.
>>
>> To the second question, there are a couple of ways we could fix this.
>> First, we could continue to allow these locks to be taken in the
>> heavyweight lock manager, but make them conflict even between members
>> of the same lock group.  This is, however, complicated.  A significant
>> problem (or so I think) is that the deadlock detector logic, which is
>> already quite hard to test, will become even more complicated, since
>> wait edges between members of a lock group need to exist at some times
>> and not other times.  Moreover, to the best of my knowledge, the
>> increased complexity would have no benefit, because it doesn't look to
>> me like we ever take any other heavyweight lock while holding one of
>> these four kinds of locks.  Therefore, no deadlock can occur: if we're
>> waiting for one of these locks, the process that holds it is not
>> waiting for any other heavyweight lock.  This gives rise to a second
>> idea: move these locks out of the heavyweight lock manager and handle
>> them with separate code that does not have deadlock detection and
>> doesn't need as many lock modes.  I think that idea is basically
>> sound, although it's possibly not the only sound idea.
>
> I'm on the same page.
>
>>
>> However, that makes me wonder whether we shouldn't be a bit more
>> aggressive with this patch: why JUST relation extension locks?  Why
>> not all four types of locks listed above?  Actually, tuple locks are a
>> bit sticky, because they have four lock modes.  The other three kinds
>> are very similar -- all you can do is "take it" (implicitly, in
>> exclusive mode), "try to take it" (again, implicitly, in exclusive
>> mode), or "wait for it to be released" (i.e. share lock and then
>> release).  Another idea is to try to handle those three types and
>> leave the tuple locking problem for another day.
>>
>> I suggest that a good thing to do more or less immediately, regardless
>> of when this patch ends up being ready, would be to insert an
>> insertion that LockAcquire() is never called while holding a lock of
>> one of these types.  If that assertion ever fails, then the whole
>> theory that these lock types don't need deadlock detection is wrong,
>> and we'd like to find out about that sooner or later.
>
> I understood. I'll check that first.

I've checked whether LockAcquire is called while holding a lock of one
of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
we cannot move these four lock types together out of heavy-weight
lock, but can move only the relation extension lock with tricks.

Here is detail of the survey.

* LOCKTAG_RELATION_EXTENSION
There is a path that LockRelationForExtension() could be called while
holding another relation extension lock. In brin_getinsertbuffer(), we
acquire a relation extension lock for a index relation and could
initialize a new buffer (brin_initailize_empty_new_buffer()). During
initializing a new buffer, we call RecordPageWithFreeSpace() which
eventually can call fsm_readbuf(rel, addr, true) where the third
argument is "extend". We can process this problem by having the list
(or local hash) of acquired locks and skip acquiring the lock if
already had. For other call paths calling LockRelationForExtension, I
don't see any problem.

* LOCKTAG_PAGE, LOCKTAG_TUPLE, LOCKTAG_SPECULATIVE_INSERTION
There is a path that we can acquire a relation extension lock while
holding these lock.
For LOCKTAG_PAGE, in ginInsertCleanup() we acquire a page lock for the
meta page and process the pending list which could acquire a relation
extension lock for a index relation. For LOCKTAG_TUPLE, in
heap_update() we acquire a tuple lock and could call
RelationGetBufferForTuple(). For LOCKTAG_SPECULATIVE_INSERTION, in
ExecInsert() we acquire a

Re: [HACKERS] Statement-level rollback

2017-11-06 Thread Thomas Munro

On Wed, Nov 1, 2017 at 6:47 AM, MauMau  wrote:
> From: Simon Riggs
> On 14 August 2017 at 23:58, Peter Eisentraut
>  wrote:
>> On 2/28/17 02:39, Tsunakawa, Takayuki wrote:
>>> The code for stored functions is not written yet, but I'd like your
> feedback for the specification and design based on the current patch.
> I'll add this patch to CommitFest 2017-3.
>>
>> This patch needs to be rebased for the upcoming commit fest.
>
> I'm willing to review this if the patch is going to be actively worked
> on.
>
>
> I'm very sorry I couldn't reply to your kind offer.  I rebased the
> patch and will add it to CF 2017/11.  I hope I will complete the patch
> in this CF.

Hi Tsunakawa-san,

With your v2 patch "make docs" fails.  Here is a small patch to apply
on top of yours to fix that and some small copy/paste errors, if I
understood correctly.

-- 
Thomas Munro
http://www.enterprisedb.com


docs-suggestion.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statement-level rollback

2017-11-06 Thread MauMau

From: Thomas Munro
With your v2 patch "make docs" fails.  Here is a small patch to apply
on top of yours to fix that and some small copy/paste errors, if I
understood correctly.

Ouch, thanks.  I'd like to merge your fix when I submit the next
revision of my patch.

Regards
MauMau




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-06 Thread Amit Kapila

On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas  wrote:
> On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila  wrote:
>> Thanks for the confirmation.  Find rebased patch attached.
>
> This looks like it's on the right track to me.  I hope Tom will look
> into it, but if he doesn't I may try to get it committed myself.
>
> -if (rel->reloptkind == RELOPT_BASEREL)
> -generate_gather_paths(root, rel);
> +if (rel->reloptkind == RELOPT_BASEREL &&
> +root->simple_rel_array_size > 2 &&
> +!root->append_rel_list)
>
> This test doesn't look correct to me.  Actually, it doesn't look
> anywhere close to correct to me.  So, one of us is very confused...
> not sure whether it's you or me.
>

It is quite possible that I haven't got it right, but it shouldn't be
completely bogus as it stands the regression tests and some manual
verification.  Can you explain what is your concern about this test?

>  simple_gather_path = (Path *)
>  create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
> NULL, NULL);
> +
> +/* Add projection step if needed */
> +if (target && simple_gather_path->pathtarget != target)
> +simple_gather_path = apply_projection_to_path(root, rel,
> simple_gather_path, target);
>
> Instead of using apply_projection_to_path, why not pass the correct
> reltarget to create_gather_path?
>

We need to push it to gather's subpath as is done in
apply_projection_to_path and then we have to cost it accordingly.  I
think if we don't use apply_projection_to_path  then we might end up
with much of the code similar to it in generate_gather_paths.  In
fact, I have tried something similar to what you are suggesting in the
first version of patch [1] and it didn't turn out to be clean.  Also,
I think we already do something similar in create_ordered_paths.

> +/* Set or update cheapest_total_path and related fields */
> +set_cheapest(current_rel);
>
> I wonder if it's really OK to call set_cheapest() a second time for
> the same relation...
>

I think if we want we can avoid it by checking whether we have
generated any gather path for the relation (basically, check if it has
partial path list).  Another idea could be that we consider the
generation of gather/gathermerge for top-level scan/join relation as a
separate step and generate a new kind of upper rel for it which will
be a mostly dummy but will have paths for gather/gathermerge.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JUvL9WS9z%3D5hjSuSMNCo3TdBxFa0pA%3DE__E%3Dp6iUffUQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2017-11-06 Thread Amit Langote

On 2017/11/06 13:15, David Rowley wrote:
> On 31 October 2017 at 21:43, Amit Langote  
> wrote:
>> Attached updated version of the patches
> 
> match_clauses_to_partkey() needs to allow for the way quals on Bool
> columns are represented.
> 
> create table pt (a bool not null) partition by list (a);
> create table pt_true partition of pt for values in('t');
> create table pt_false partition of pt for values in('f');
> explain select * from pt where a = true;
> QUERY PLAN
> --
>  Append  (cost=0.00..76.20 rows=2810 width=1)
>->  Seq Scan on pt_false  (cost=0.00..38.10 rows=1405 width=1)
>  Filter: a
>->  Seq Scan on pt_true  (cost=0.00..38.10 rows=1405 width=1)
>  Filter: a
> (5 rows)
> 
> match_clause_to_indexcol() shows an example of how to handle this.
> 
> explain select * from pt where a = false;
> 
> will need to be allowed too. This works slightly differently.

You're right.  I've fixed things to handle Boolean partitioning in the
updated set of patches I will post shortly.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-06 Thread Ashutosh Bapat

On Mon, Nov 6, 2017 at 10:01 AM, Thomas Munro
 wrote:
>
> This idea seems to makes intuitive sense.  I see that you use
> order_qual_clauses() to know what order they'll run in, so I'm
> wondering if there is any reason we shouldn't do it up front and keep
> it during path building, instead of running it again at plan creation
> time.  Is there some way it could ever produce a different result at
> the two times?

IIRC, only thing that changes between plan time quals and execution
time quals is constaint folding of constant parameters. But I don't
think we change the selectivity estimates when that's done. At the
same time, I don't think we should make a lot of effort to make sure
that the order used during the estimation is same as the order at the
execution; we are anyway estimating. There can always be some
difference between what's estimated and what actually happens.

> Why not also apply this logic to qpquals of joins,
> foreign scans, subplans?  That is, instead of replacing cost_qual_eval
> with this code for baserels, I wonder if we should teach
> cost_qual_eval how to do this so those other users could also benefit
> (having perhaps already ordered the qual clauses earlier).

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Removing useless DISTINCT clauses

2017-11-06 Thread David Rowley

In [1] we made a change to process the GROUP BY clause to remove any
group by items that are functionally dependent on some other GROUP BY
items.

This really just checks if a table's PK columns are entirely present
in the GROUP BY clause and removes anything else belonging to that
table.

All this seems to work well, but I totally failed to consider that the
exact same thing applies to DISTINCT too.

Over in [2], Rui Liu mentions that the planner could do a better job
for his case.

Using Rui Liu's example:

CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text);
INSERT into test_tbl select generate_series(1,1000), 'test';

Master:

postgres=# explain analyze verbose select distinct col, k from
test_tbl order by k limit 1000;
   QUERY PLAN
-
 Limit  (cost=1658556.19..1658563.69 rows=1000 width=9) (actual
time=8934.962..8935.495 rows=1000 loops=1)
   Output: col, k
   ->  Unique  (cost=1658556.19..1733557.50 rows=1175 width=9)
(actual time=8934.961..8935.460 rows=1000 loops=1)
 Output: col, k
 ->  Sort  (cost=1658556.19..1683556.63 rows=1175 width=9)
(actual time=8934.959..8935.149 rows=1000 loops=1)
   Output: col, k
   Sort Key: test_tbl.k, test_tbl.col
   Sort Method: external merge  Disk: 215128kB
   ->  Seq Scan on public.test_tbl  (cost=0.00..154056.75
rows=1175 width=9) (actual time=0.062..1901.728 rows=1000
loops=1)
 Output: col, k
 Planning time: 0.092 ms
 Execution time: 8958.687 ms
(12 rows)

Patched:

postgres=# explain analyze verbose select distinct col, k from
test_tbl order by k limit 1000;

 QUERY PLAN
--
 Limit  (cost=0.44..34.31 rows=1000 width=9) (actual time=0.030..0.895
rows=1000 loops=1)
   Output: col, k
   ->  Unique  (cost=0.44..338745.50 rows=1175 width=9) (actual
time=0.029..0.814 rows=1000 loops=1)
 Output: col, k
 ->  Index Scan using test_tbl_pkey on public.test_tbl
(cost=0.44..313745.06 rows=1175 width=9) (actual time=0.026..0.452
rows=1000 loops=1)
   Output: col, k
 Planning time: 0.152 ms
 Execution time: 0.985 ms
(8 rows)

A patch to implement this is attached.

I'll add it to the Jan commitfest. (I don't expect anyone to look at
this before then).


[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d4c3a156cb46dcd1f9f97a8011bd94c544079bb5

[2] 
https://www.postgresql.org/message-id/flat/CAKJS1f9q0j3BgMUsDbtf9%3DecfVLnqvkYB44MXj0gpVuamcN8Xw%40mail.gmail.com#CAKJS1f9q0j3BgMUsDbtf9=ecfvlnqvkyb44mxj0gpvuamcn...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_useless_distinct_clauses.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Race to build pg_isolation_regress in "make -j check-world"

2017-11-06 Thread Noah Misch

I've been enjoying the speed of parallel check-world, but I get spurious
failures from makefile race conditions.  Commit c66b438 fixed the simple ones.
More tricky is this problem of multiple "make" processes entering
src/test/regress concurrently, which causes failures like these:

  gcc: error: pg_regress.o: No such file or directory
  make[4]: *** [pg_isolation_regress] Error 1

  /bin/sh: ../../../src/test/isolation/pg_isolation_regress: Permission denied
  make -C test_extensions check
  make[2]: *** [check] Error 126
  make[2]: Leaving directory `/home/nm/src/pg/backbranch/10/src/test/isolation'

  /bin/sh: ../../../../src/test/isolation/pg_isolation_regress: Text file busy
  make[3]: *** [isolationcheck] Error 126
  make[3]: Leaving directory 
`/home/nm/src/pg/backbranch/10/src/test/modules/snapshot_too_old'

This is reproducible since commit 2038bf4 or earlier; "make -j check-world"
had worse problems before that era.  A workaround is to issue "make -j; make
-j -C src/test/isolation" before the check-world.  This problem doesn't affect
src/test/regress/pg_regress.  Every top-level "make" or "make install",
including temp-install, builds pg_regress.

I tried fixing this by building src/test/isolation at the same times we run
install-temp.  Naturally, that didn't help installcheck-world.  It also caused
multiple "make" processes to enter src/port concurrently.  I could fix both
check-world and installcheck-world with the attached hack of building
src/test/isolation during every top-level build or install.

The problem of multiple "make" processes in a directory (especially src/port)
shows up elsewhere.  In a cleaned tree, "make -j -C src/bin" or "make -j
installcheck-world" will do it.  For more-prominent use cases, src/Makefile
prevents this with ".NOTPARALLEL:" and building first the directories that are
frequent submake targets.  Perhaps we could fix the general problem with
directory locking; targets that call "$(MAKE) -C FOO" would first sleep until
FOO's lock is available.  That could be tricky to make robust.

For now, I propose back-patching the attached, sad hack.  Better ideas?

Thanks,
nm
diff --git a/src/Makefile b/src/Makefile
index 380da92..febbced 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -28,6 +28,7 @@ SUBDIRS = \
pl \
makefiles \
test/regress \
+   test/isolation \
test/perl
 
 # There are too many interdependencies between the subdirectories, so
diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile
index 8eb4969..efbdc40 100644
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
@@ -15,6 +15,13 @@ OBJS =  specparse.o isolationtester.o $(WIN32RES)
 
 all: isolationtester$(X) pg_isolation_regress$(X)
 
+# Though we don't install these binaries, build them during installation
+# (including temp-install).  Otherwise, "make -j check-world" and "make -j
+# installcheck-world" would spawn multiple, concurrent builds in this
+# directory.  Later builds would overwrite files while earlier builds are
+# reading them, causing occasional failures.
+install: | all
+
 submake-regress:
$(MAKE) -C $(top_builddir)/src/test/regress pg_regress.o
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

68 matches

Mail list logo