Re: [HACKERS] Whether to back-patch fix for aggregate transtype width estimates

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 10:41 PM, Tom Lane  wrote:
> While fixing the recent unpleasantness over parallel polymorphic
> aggregates, I discovered that count_agg_clauses_walker's consideration
> of an aggregate argument's typmod in estimating transition space
> consumption has been broken since commit 34d26872e (which changed
> Aggref.args from a simple expression list to a list of TargetEntry).
> It had been looking at "exprTypmod((Node *) linitial(aggref->args))",
> and nobody taught it that there was now a TargetEntry in the way that
> it needed to look through to get a correct answer.
>
> Ordinarily I'd just summarily back-patch a fix, but that commit shipped
> in 9.0, which means it's been broken a long time.  I'm worried that
> back-patching a change might be more likely to destabilize plan choices
> than to do anything anybody's happy about.
>
> Thoughts?

I suspect the consequences here aren't too bad, or someone would have
noticed by now.  So I would be tempted to leave it alone in
back-branches.  But I might change my mind if it's actually awful...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-06-17 Thread Josh Berkus
On 06/17/2016 10:04 AM, Alvaro Herrera wrote:
> Merlin Moncure wrote:
> 
>> Ugliness is a highly subjective qualifier.  OTOH, Backwards
>> compatibility, at least when the checks are properly written :-), is a
>> very objective benefit.
> 
> This is the argument that made us kept the PostgreSQL name instead of
> renaming back to Postgres.  I'm not a fan of it.
> 

Well ... no.

We kept the PostgreSQL name for three reasons.

Back in 2005, which was the last time we could have reasonably changed
it, nobody had the time/energy to do all of the
search-and-replace-and-contact-every-packager required.  The folks who
were most enthusiastic about the change wanted someone else to do the
work.  Plus, our Japanese community, which was like 40% of our worldwide
community at the time, was opposed to the change.

The third reason is that we have a registered trademark on "PostgreSQL",
but "postgres" is public domain.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New design for FK-based join selectivity estimation

2016-06-17 Thread Tomas Vondra

Hi,

A few more comments, about re-reading the patch more thoroughly. I 
wouldn't say any of those qualify as bugs, but rather as discussion 
about some of the design choices:


1) NULL handling

I'd argue that we should do something about this, although I agree it's 
non-trivial to estimate - at least until we get some sort of correlation 
stats (e.g. my patch already provides most of the pieces, I believe). 
But I'd argue that in the case of multi-column foreign keys we can do 
better even without it - my experience is that in such cases either all 
values are NULL or none of them, and a single NULL value breaks the FK 
of course. So I think max(null_frac) would work.


2) get_foreign_key_join_selectivity vs. incomplete matches

The comment right before checking (removedlist == NIL) says:

 * For a multi-column FK, it's possible that we found matches to some
 * columns but not all, implying that one of the above effects applied
 * to just some of the columns.  For the moment, we go ahead and
 * remove those clauses and apply the FK's selectivity anyway.  It
 * might be better to put back the removed clauses and ignore the FK;
 * but that amounts to betting on independence of the clauses, which
 * doesn't seem like a good bet in such messy cases.

Is this a good idea? I'd probably vote to do what's proposed (and 
rejected) in the second half of the comment, i.e. put back the clauses 
and skip the FK as that pretty much says "improve estimate or keep the 
current one, but don't risk making it worse."


3) ForeignKeyOptInfo->rinfos as a List

Can we actually get a list of matching RestrictInfos for a single 
foreign key? I've been unable to construct such query.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Assert(LWLockHeldByMeInMode(lock, LW_EXCLUSIVE))

2016-06-17 Thread Thomas Munro
Hi hackers,

Several times now when reading, debugging and writing code I've wished
that LWLockHeldByMe assertions specified the expected mode, especially
where exclusive locking is required.

What do you think about something like the attached?  See also an
example of use.  I will add this to the next commitfest.

-- 
Thomas Munro
http://www.enterprisedb.com


bufmgr-assert-lwlock-mode.patch
Description: Binary data


test-lwlock-mode.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Whether to back-patch fix for aggregate transtype width estimates

2016-06-17 Thread Tom Lane
While fixing the recent unpleasantness over parallel polymorphic
aggregates, I discovered that count_agg_clauses_walker's consideration
of an aggregate argument's typmod in estimating transition space
consumption has been broken since commit 34d26872e (which changed
Aggref.args from a simple expression list to a list of TargetEntry).
It had been looking at "exprTypmod((Node *) linitial(aggref->args))",
and nobody taught it that there was now a TargetEntry in the way that
it needed to look through to get a correct answer.

Ordinarily I'd just summarily back-patch a fix, but that commit shipped
in 9.0, which means it's been broken a long time.  I'm worried that
back-patching a change might be more likely to destabilize plan choices
than to do anything anybody's happy about.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-17 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Andrew Gierth wrote:

> > Why is the correct rule not "check for and ignore pre-upgrade mxids
> > before even trying to fetch members"?
> 
> I propose something like the attached patch, which implements that idea.

Here's a backpatch of that to 9.3 and 9.4.

I tested this by creating a 9.2 installation with an out-of-range
multixact, and verified that after upgrading this to 9.3 it fails with
the "apparent wraparound" message in VACUUM.  With this patch applied,
it silently removes the multixact.

I will clean this up and push to all branches after the tagging of
9.6beta2 on Monday.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 5a57daa6ef9784c42f3cb2ec6e3a58d1e4004593
Author: Alvaro Herrera 
AuthorDate: Thu Jun 16 23:33:20 2016 -0400
CommitDate: Fri Jun 17 18:46:04 2016 -0400

Fix upgraded multixact problem

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index 075d781..ee5b241 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -165,8 +165,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 
 values[Atnum_ismulti] = pstrdup("true");
 
-allow_old = !(infomask & HEAP_LOCK_MASK) &&
-	(infomask & HEAP_XMAX_LOCK_ONLY);
+allow_old = HEAP_LOCKED_UPGRADED(infomask);
 nmembers = GetMultiXactIdMembers(xmax, , allow_old);
 if (nmembers == -1)
 {
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1f1bac5..44aa495 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4641,8 +4641,7 @@ l5:
 		 * pg_upgrade; both MultiXactIdIsRunning and MultiXactIdExpand assume
 		 * that such multis are never passed.
 		 */
-		if (!(old_infomask & HEAP_LOCK_MASK) &&
-			HEAP_XMAX_IS_LOCKED_ONLY(old_infomask))
+		if (HEAP_LOCKED_UPGRADED(old_infomask))
 		{
 			old_infomask &= ~HEAP_XMAX_IS_MULTI;
 			old_infomask |= HEAP_XMAX_INVALID;
@@ -5001,6 +5000,7 @@ l4:
 int		i;
 MultiXactMember *members;
 
+/* XXX do we need a HEAP_LOCKED_UPGRADED test? */
 nmembers = GetMultiXactIdMembers(rawxmax, , false);
 for (i = 0; i < nmembers; i++)
 {
@@ -5329,14 +5329,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
-	bool		allow_old;
 
 	*flags = 0;
 
 	/* We should only be called in Multis */
 	Assert(t_infomask & HEAP_XMAX_IS_MULTI);
 
-	if (!MultiXactIdIsValid(multi))
+	if (!MultiXactIdIsValid(multi) ||
+		HEAP_LOCKED_UPGRADED(t_infomask))
 	{
 		/* Ensure infomask bits are appropriately set/reset */
 		*flags |= FRM_INVALIDATE_XMAX;
@@ -5349,14 +5349,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 * was a locker only, it can be removed without any further
 		 * consideration; but if it contained an update, we might need to
 		 * preserve it.
-		 *
-		 * Don't assert MultiXactIdIsRunning if the multi came from a
-		 * pg_upgrade'd share-locked tuple, though, as doing that causes an
-		 * error to be raised unnecessarily.
 		 */
-		Assert((!(t_infomask & HEAP_LOCK_MASK) &&
-HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)) ||
-			   !MultiXactIdIsRunning(multi));
+		Assert(!MultiXactIdIsRunning(multi));
 		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
 		{
 			*flags |= FRM_INVALIDATE_XMAX;
@@ -5397,9 +5391,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * anything.
 	 */
 
-	allow_old = !(t_infomask & HEAP_LOCK_MASK) &&
-		HEAP_XMAX_IS_LOCKED_ONLY(t_infomask);
-	nmembers = GetMultiXactIdMembers(multi, , allow_old);
+	nmembers =
+		GetMultiXactIdMembers(multi, , false);
 	if (nmembers <= 0)
 	{
 		/* Nothing worth keeping */
@@ -5959,14 +5952,15 @@ static bool
 DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 		LockTupleMode lockmode)
 {
-	bool	allow_old;
 	int		nmembers;
 	MultiXactMember *members;
 	bool	result = false;
 	LOCKMODE wanted = tupleLockExtraInfo[lockmode].hwlock;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, , allow_old);
+	if (HEAP_LOCKED_UPGRADED(infomask))
+		return false;
+
+	nmembers = GetMultiXactIdMembers(multi, , false);
 	if (nmembers >= 0)
 	{
 		int		i;
@@ -6037,14 +6031,14 @@ static bool
 Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
    int *remaining, uint16 infomask, bool nowait)
 {
-	bool		allow_old;
 	bool		result = true;
 	MultiXactMember *members;
 	int			nmembers;
 	int			remain = 0;
 
-	allow_old = !(infomask & HEAP_LOCK_MASK) && HEAP_XMAX_IS_LOCKED_ONLY(infomask);
-	nmembers = GetMultiXactIdMembers(multi, , allow_old);
+	/* for pre-pg_upgrade tuples, no need to sleep at all */
+	nmembers = HEAP_LOCKED_UPGRADED(infomask) ? -1 :
+		GetMultiXactIdMembers(multi, , false);
 
 	if (nmembers >= 0)
 	{
@@ -6165,9 +6159,11 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, 

Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Sat, Jun 18, 2016 at 3:43 AM, Tom Lane  wrote:
> DSM already exists, and for many purposes its lack of a
> within-a-shmem-segment dynamic allocator is irrelevant; the same purpose
> is served (with more speed, more reliability, and less code) by releasing
> the whole DSM segment when no longer needed.  The DSM segment effectively
> acts like a memory context, saving code from having to account precisely
> for every single allocation it makes.
>
> I grant that having a dynamic allocator added to DSM will support even
> more use-cases.  What I'm not convinced of is that we need a dynamic
> allocator within the fixed-size shmem segment.  Robert already listed some
> reasons why that's rather dubious, but I'll add one more: any leak becomes
> a really serious bug, because the only way to recover the space is to
> restart the whole database instance.
>

Okay, if you say that DSM segments work the best for accumulating
transient data that may be freed together when it becomes unnecessary
at once, then I agree with that.

My code is for long-living data that could be allocated and freed
chunk by chunk. As if an extension wants to store more data and in
more complicated fashion than fits to an ordinary dynahash with the
HASH_SHARED_MEM flag.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] forcing a rebuild of the visibility map

2016-06-17 Thread Robert Haas
Since time is short and it seems better to get the xlog page magic
bump done before beta2, I've gone ahead and committed this, but there
are two points here that seem to warrant some input from other senior
hackers.  Andres and I have discussed these points off list but
without reaching a meeting of the minds.

On Fri, Jun 17, 2016 at 2:59 PM, Robert Haas  wrote:
>> Having an XLogInsert() in contrib makes me more than a bit squeamish.  I
>> think it'd be fair bit better to have that section of code in
>> visibilitymap.c, and then call that from the extension.
>
> I thought about that too, but it seemed like it was just bloating the
> core server for no real reason.  It's not like contrib is off in space
> someplace.

So, Andres thinks it's a bad idea to have an XLogInsert() call in
contrib, and I don't think it matters.  Anyone else have an opinion?
Andres, do you want to explain the nature of your concern further?

>>> + /* Don't keep the relation locked any longer than necessary! */
>>> + relation_close(rel, AccessExclusiveLock);
>>
>> Don't think that's a good idea. We use transactional semantics for a
>> good reason, and the function returns afterwards anyway.
>
> Yeah, but if you want to blow away a bunch of visibility map forks at
> one go, you're not going to appreciate this function collecting all of
> those locks at the same time.  This is also why VACUUM starts a
> separate transaction for each table.

Andres has a concern here that there might be a problem related to
invalidation messages.  Currently, the only invalidation message
that's getting generated here is a non-transactional smgr
invalidation, which is sent before the lock is released and therefore
no race exists.  However, Andres is concerned that this may at a
minimum not be very future-proof and possibly unsafe even today.  To
me, however, this doesn't seem particularly different than what e.g.
lazy_truncate_heap() always does and has done for a long time.  More
input here is welcome and appreciated - it would be good not to screw
this up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Tom Lane
Aleksey Demakov  writes:
> On Sat, Jun 18, 2016 at 12:45 AM, Tom Lane  wrote:
>> You're right, but that doesn't mean that the community is going to take
>> much interest in an unportable replacement for code that already exists.

> Excuse me, what code already exists? As far as I understand, we
> compare the approach taken in my code against Robert's code that
> is not yet available to the community.

DSM already exists, and for many purposes its lack of a
within-a-shmem-segment dynamic allocator is irrelevant; the same purpose
is served (with more speed, more reliability, and less code) by releasing
the whole DSM segment when no longer needed.  The DSM segment effectively
acts like a memory context, saving code from having to account precisely
for every single allocation it makes.

I grant that having a dynamic allocator added to DSM will support even
more use-cases.  What I'm not convinced of is that we need a dynamic
allocator within the fixed-size shmem segment.  Robert already listed some
reasons why that's rather dubious, but I'll add one more: any leak becomes
a really serious bug, because the only way to recover the space is to
restart the whole database instance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Tom Lane
I wrote:
> Meanwhile, I'll go see if I can work out exactly what's broken about the
> polymorphic type-slinging.  That I would like to see working in beta2.

OK, I've got at least the core of the problem.  fix_combine_agg_expr
changes the upper aggregate's Aggref node so that its args list is a
single Var of the transtype.  This has nothing whatsoever to do with the
original user-supplied aggregate arguments, not as to datatype nor even
number of arguments.  Nonetheless, nodeAgg.c uses that args list as input
to get_aggregate_argtypes(), and thence to resolve_aggregate_transtype()
and build_aggregate_finalfn_expr(), as well as some miscellaneous checks
in build_pertrans_for_aggref (though by luck it appears that those checks
aren't reachable for a combining Aggref).

It's basically just luck that this ever works; we'd have noticed it long
ago if it weren't that so few of the aggregates that currently have
combinefns are either polymorphic or multiple-argument.  Likely the
only polymorphic one that ever got tested is anyarray_agg, and that
accidentally fails to fail because (1) enforce_generic_type_consistency
has special-case exceptions for ANYARRAY, and (2) the array-mashing code
doesn't actually need to know which array type it's dealing with
since it finds that out by looking into the array values.

We could fix the resolve_aggregate_transtype aspect of this by storing
the resolved transtype into Aggref nodes sometime before whacking them
into partial format, rather than re-deducing it during nodeAgg startup.
I'm inclined to do that just on efficiency grounds; but it's not enough
to fix the problem here, because we'd still be passing the wrong argument
types to build_aggregate_finalfn_expr(), and any finalfunc that actually
used that information would fail.

It seems like the best way to fix this is to add another field to Aggref
that is an OID list of the original argument types, which we don't change
when replacing the argument list for a combining agg, and to look to that
rather than the physical args list when extracting the argument types in
nodeAgg.c.  This is kind of bulky and grotty, but it should work reliably.

I also thought about leaving the original args list in place, and having
a secondary args list used only for combining Aggrefs that contains just
the Var referencing the subsidiary transtype result.  An attraction of
that is that we'd not need that Rube-Goldberg-worthy method for printing
combining Aggrefs in ruleutils.c.  However, the cruft would still leak out
somewhere, because if we just did that much then setrefs.c would want to
find all the aggregate input values in the output of the subsidiary plan
node, where of course they aren't and shouldn't be.  Maybe we could skip
resolving the original args list into subplan references in this case,
but that might well break ruleutils, so it just seems like a mess.

Yet another idea is to have nodeAgg do something similar to what ruleutils
does and drill down through the subsidiary-transtype-result Var to find
the partial Aggref and grab its argument list.  But that's just doubling
down on a mechanism that we should be looking to get rid of not emulate.

So at this point my proposal is:

1. Add an OID-list field to Aggref holding the data types of the
user-supplied arguments.  This can be filled in parse analysis since it
won't change thereafter.  Replace calls to get_aggregate_argtypes() with
use of the OID-list field, or maybe keep the function but have it look
at the OID list not the physical args list.

2. Add an OID field to Aggref holding the resolved (non polymorphic)
transition data type's OID.  For the moment we could just set this
during parse analysis, since we do not support changing the transtype
of an existing aggregate.  If we ever decide we want to allow that,
the lookup could be postponed into the planner.  Replace calls to
resolve_aggregate_transtype with use of this field.
(resolve_aggregate_transtype() wouldn't disappear, but it would be
invoked only once during parse analysis, not multiple times per query.)

#2 isn't necessary to fix the bug, but as long as we are doing #1
we might as well do #2 too, to buy back some of the planner overhead
added by parallel query.

Barring objections I'll make this happen by tomorrow.

I still don't like anything about the way that the matching logic in
fix_combine_agg_expr works, but at the moment I can't point to any
observable bug from that, so I'll not try to change it before beta2.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Sat, Jun 18, 2016 at 12:45 AM, Tom Lane  wrote:
> Aleksey Demakov  writes:
>> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas  wrote:
>>> In my opinion, that's not going to fly.  If I thought otherwise, I
>>> would not have developed the DSM facility in the first place.
>
>> Essentially this is pessimizing for the lowest common denominator
>> among OSes.
>
> You're right, but that doesn't mean that the community is going to take
> much interest in an unportable replacement for code that already exists.

Excuse me, what code already exists? As far as I understand, we
compare the approach taken in my code against Robert's code that
is not yet available to the community.

Discussing DSM is beyond the point.

My code might be smoothly hooked into the existing system from an
extension module just with a couple of calls:

RequestAddinShmemSpace() and ShmemInitStruct().

After that this extension might use my concurrent memory allocator
and safe memory reclamation for implementing highly optimized
concurrent data structures of their choice. E.g. concurrent data
structures that I am going to add to the package in the future.

All in all, currently this is not a replacement for anything. This is an
experimental add-on and a food for thought for interested people.

Integrating my code right into the core to replace anything there is
a very remote possibility. I understand if it ever happens it would
take very serious work and multiple iterations.

> Especially not an unportable replacement that also needs sweeping
> assumptions like "disciplined use of mmap in postgresql core and
> extensions".  You don't have to look further than the availability of
> mmap to plperlu programmers to realize that that won't fly.  (Even if
> we threw all the untrusted PLs overboard, I believe plain old stdio
> is willing to use mmap in many versions of libc.)
>

Sorry. I made a sloppy statement about mmap/munmap use. As
correctly pointed out by Andres Freund, it is problematic. So the
whole line about "disciplined use of mmap in postgresql core and
extensions" goes away. Forget it.

But the other techniques that I mentioned do not take such a
special discipline.

The corrected statement is that a single contiguous shared space
is practically doable on many platforms with some effort. And this
approach would make implementation of many shared data
structures more efficient.

Furthermore, I'd guess there is no much point to enable parallel
query execution on a macbook. Or at least one wouldn't expect
superb results from this anyway.

I'd make a wild claim that users who would benefit from parallel
queries or my concurrency work most of the time are the same
users who run platforms that can support single address space.

Thus if there is a solution that benefits e.g. 95% of target users
then why refrain from it in the name of the other 5%? Should not
the support of those 5% be treated as a lower-priority fallback,
while the main effort be put on optimizing for 95-percenters?

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:

> > Can't we have them be ExtensibleNode?
> 
> No, these are data types and values of data types, not parsetree nodes.

Ah, right.

> The core problem is to teach the type system to prevent you from
> sticking foo(internal) into a context where the actual value passed
> will be some other kind of "internal" than it's expecting.

Makes sense as a safety measure.

> Having said that, another possible solution is to establish a convention
> that every sort of "internal" thingy should have an ID word (with a
> different magic number for each kind of thingy) that all functions check.
> But I'm not convinced that that's going to be convenient to do everywhere,
> and it's certainly not likely to be easier to bolt on than a bunch of new
> internal-ish type OIDs.

I think the magic number is a lot more convenient than new type OIDs.
As the number of internal types grows, it will become a nuisance,
considering that you need to add boilerplate for in/out/send/recv
functions, document it as a pseudotype, yadda yadda yadda.  Some central
place to define the internal numbers to ensure uniqueness, akin to
duplicate_oids, is probably a necessity, though, which may be a bit too
much scripting to be indulging in, this late in the cycle.

I suppose we could have done pg_ddl_command this way, but it's already
in 9.5 so it's probably too late for that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 11:07 AM, Teodor Sigaev  wrote:
>>> Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair
>>> could be
>>> not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
>>
>>
>> Oh ... the indexes in the lists don't have much to do with the distances,
>> do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
>> okay with the patch as presented.
>
>
> Huh, I found that my isn't correct for example which I show :(. Reworked
> patch is in attach.

We're really quickly running out of time to get this done before
beta2.  Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Robert Haas  writes:
>>> I think we should break up internal into various kinds of internal
>>> depending on what kind of a thing we've actually got a pointer to.

>> Not a bad long-term project, but it's not happening in this cycle.
>> I'm not very sure how we'd go about it anyway --- for examples
>> like this, every new user-defined aggregate potentially wants its
>> own flavor of "internal", so how do we manage that?

> Can't we have them be ExtensibleNode?

No, these are data types and values of data types, not parsetree nodes.
The core problem is to teach the type system to prevent you from
sticking foo(internal) into a context where the actual value passed
will be some other kind of "internal" than it's expecting.

Having said that, another possible solution is to establish a convention
that every sort of "internal" thingy should have an ID word (with a
different magic number for each kind of thingy) that all functions check.
But I'm not convinced that that's going to be convenient to do everywhere,
and it's certainly not likely to be easier to bolt on than a bunch of new
internal-ish type OIDs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 3:14 PM, Tom Lane  wrote:
> The concern I have is that you could stick it into an aggregate that isn't
> the one it's expecting to be used in, or into a slot in that aggregate
> that isn't deserialize(), and the run-time test can't detect either of
> those things.  Now maybe this is locked down sufficiently by the fact
> that we don't let non-superusers create aggregates with transtype
> INTERNAL, but that seems pretty shaky to me given the number of moving
> parts in aggregates these days and the fact that we keep adding more.

Well, I'm not averse to changing it for more security, but "there
could be a bug there in somewhere" is a bit different from "the claim
in the comment there that it's okay if we check for aggregate context
is a joke".

>>> Not to mention that CREATE
>>> FUNCTION won't allow creation of such functions, so extensions are locked
>>> out of using this feature.
>
>> Oops.
>
> I think that means we *have* to change this.

Well, we don't *have* to change things for this reason, but it's
certainly not at all desirable for user-defined aggregates to be
locked out of this functionality.  So I'm in favor of changing it.

>> I think we should break up internal into various kinds of internal
>> depending on what kind of a thing we've actually got a pointer to.
>
> Not a bad long-term project, but it's not happening in this cycle.
> I'm not very sure how we'd go about it anyway --- for examples
> like this, every new user-defined aggregate potentially wants its
> own flavor of "internal", so how do we manage that?

I think we'd want some way to easily spin up new internal-ish types.
CREATE TYPE myinternalthingy AS INTERNAL, or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Alvaro Herrera
Tom Lane wrote:
> Robert Haas  writes:

> > I think we should break up internal into various kinds of internal
> > depending on what kind of a thing we've actually got a pointer to.
> 
> Not a bad long-term project, but it's not happening in this cycle.
> I'm not very sure how we'd go about it anyway --- for examples
> like this, every new user-defined aggregate potentially wants its
> own flavor of "internal", so how do we manage that?

Can't we have them be ExtensibleNode?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jun 17, 2016 at 1:45 PM, Tom Lane  wrote:
>> there's a far bigger problem which is that "deserialize(bytea) returns
>> internal" is a blatant security hole, which I see you ignored the defense
>> against in opr_sanity.sql.  The claim in the comment there that it's okay
>> if we check for aggregate context is a joke --- or haven't you heard that
>> we allow users to create aggregates?  That's not nearly good enough to
>> prevent unsafe usage of such functions.

> That may be true, but it isn't obvious to me.  The user can't simply
> say SELECT numeric_avg_deserialize(...) or whatever because the
> function won't execute when called that way.  How would you arrange to
> call such a function in a context where it wouldn't fail outright but
> would return a value that you could then hand to some other function
> that expects a type-internal input?

The concern I have is that you could stick it into an aggregate that isn't
the one it's expecting to be used in, or into a slot in that aggregate
that isn't deserialize(), and the run-time test can't detect either of
those things.  Now maybe this is locked down sufficiently by the fact
that we don't let non-superusers create aggregates with transtype
INTERNAL, but that seems pretty shaky to me given the number of moving
parts in aggregates these days and the fact that we keep adding more.
But whether there's a live security bug there today is really moot,
because of:

>> Not to mention that CREATE
>> FUNCTION won't allow creation of such functions, so extensions are locked
>> out of using this feature.

> Oops.

I think that means we *have* to change this.

>> A possible solution is to give deserialize an extra dummy argument, along
>> the lines of "deserialize(bytea, internal) returns internal", thereby
>> ensuring it can't be called in any non-system-originated contexts.

> Seems promising.

If nobody has a better idea, I'll pursue that next week or so.  It doesn't
seem like a must-fix-for-beta2.  (Letting it slide means we're locked into
an initdb or pg_upgrade again for beta3, but I don't have a lot of hope of
avoiding that anyway.)

Meanwhile, I'll go see if I can work out exactly what's broken about the
polymorphic type-slinging.  That I would like to see working in beta2.

> I originally had the thought that you might want to make the
> serialized representation something that could be human-readable.

Perhaps, but on efficiency grounds I doubt that people would want that
to be the same API that's used for partial aggregation.  I could see
adding an additional function slot "prettyprint(internal) returns text"
to aggregates for this purpose.  In any case, even if we lock down the
declared type to be bytea, there's nothing stopping someone from defining
their serialization code to output an ASCII string.

> I think we should break up internal into various kinds of internal
> depending on what kind of a thing we've actually got a pointer to.

Not a bad long-term project, but it's not happening in this cycle.
I'm not very sure how we'd go about it anyway --- for examples
like this, every new user-defined aggregate potentially wants its
own flavor of "internal", so how do we manage that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel safety tagging of extension functions

2016-06-17 Thread Robert Haas
On Thu, Jun 16, 2016 at 9:15 AM, Andreas Karlsson  wrote:
>> That was the only clear mistake I found, but I tend
>> to think that changing the markings on anything defined by
>> UNSUPPORTED_FUNCTION() is pretty silly, because there's no point in
>> going to extra planner effort to generate a parallel plan only to
>> error out as soon as we try to execute it.  I think you should leave
>> all of those out of the patch.
>
> I will fix this.
>
>> I also took a look at the patch for tablefunc.  I think that you've
>> got the markings right, here, but I think that it would be good to add
>> PARALLEL UNSAFE explicitly to the 1.1 version of the file for the
>> functions are unsafe, and add a comment like "-- query might do
>> anything" or some other indication as to why they are so marked, for
>> the benefit of future readers.
>
> Good suggestion.

I was kind of hoping you'd have a new version of this posted already.
beta2 is wrapping on Monday, and I'm inclined to shelve anything that
isn't done by then for the next release.  And I don't really plan to
work much this weekend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] forcing a rebuild of the visibility map

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 2:48 PM, Andres Freund  wrote:
>> From 010e99b403ec733d50c71a7d4ef646b1b446ef07 Mon Sep 17 00:00:00 2001
>> From: Robert Haas 
>> Date: Wed, 15 Jun 2016 22:52:58 -0400
>> Subject: [PATCH 2/2] Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
>
>>Furthermore,
>> +  except when performing an aggressive vacuum, some pages may be skipped
>> +  in order to avoid waiting for other sessions to finish using them.
>
> Isn't that still the case? We'll not wait for a cleanup lock and such,
> if not necessary?

That's just there to explain what behavior gets changed when you
specify DISABLE_PAGE_SKIPPING.

>>  /* non-export function prototypes */
>> -static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
>> -Relation *Irel, int nindexes, bool aggressive);
>> +static void lazy_scan_heap(Relation onerel, int options,
>> +LVRelStats *vacrelstats, Relation *Irel, int 
>> nindexes,
>> +bool aggressive);
>
> Generally I think it's better to pass bitmaks arguments as an unsigned
> integer. But since other related routines already use int...

Many years ago, I was told by Tom Lane that project standard was int.
My gut was to use unsigned too, but I've got better things to do than
argue about it - and project style is a legitimate argument in any
case.

>>   /*
>> -  * We request an aggressive scan if either the table's frozen Xid is 
>> now
>> +  * We request an aggressive scan if the table's frozen Xid is now
>>* older than or equal to the requested Xid full-table scan limit; or 
>> if
>>* the table's minimum MultiXactId is older than or equal to the 
>> requested
>> -  * mxid full-table scan limit.
>> +  * mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was 
>> specified.
>>*/
>>   aggressive = 
>> TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
>>  
>> xidFullScanLimit);
>>   aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
>>  
>>mxactFullScanLimit);
>> + if (options & VACOPT_DISABLE_PAGE_SKIPPING)
>> + aggressive = true;
>
> I'm inclined to convert the agressive |= to an if, it looks a bit
> jarring if consecutive assignments use differing forms.

I thought about that, but I like it better the way I have it.

> This code really makes me unhappy, everytime I see it.

That's not a defect in this patch.

>> + | IDENT
>> + {
>> + if (strcmp($1, 
>> "disable_page_skipping") == 0)
>> + $$ = 
>> VACOPT_DISABLE_PAGE_SKIPPING;
>> + else
>> + ereport(ERROR,
>> + 
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> +  errmsg("unrecognized 
>> VACUUM option \"%s\"", $1),
>> +  
>> parser_errposition(@1)));
>> + }
>>   ;
>
> If we could get rid of that indentation behaviour by pgindent, I'd be
> eclectic.

Neither is that, although the indentation there looks fine to me.

>>  /*
>> + * Remove the visibility map fork for a relation.  If there turn out to be
>> + * any bugs in the visibility map code that require rebuilding the VM, this
>> + * provides users with a way to do it that is cleaner than shutting down the
>> + * server and removing files by hand.
>> + *
>> + * This is a cut-down version of RelationTruncate.
>> + */
>> +Datum
>> +pg_truncate_visibility_map(PG_FUNCTION_ARGS)
>> +{
>> + Oid relid = PG_GETARG_OID(0);
>> + Relationrel;
>> +
>> + rel = relation_open(relid, AccessExclusiveLock);
>> +
>> + if (rel->rd_rel->relkind != RELKIND_RELATION &&
>> + rel->rd_rel->relkind != RELKIND_MATVIEW &&
>> + rel->rd_rel->relkind != RELKIND_TOASTVALUE)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> +errmsg("\"%s\" is not a table, materialized view, or TOAST 
>> table",
>> +   RelationGetRelationName(rel;
>> +
>> + RelationOpenSmgr(rel);
>> + rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
>> +
>> + visibilitymap_truncate(rel, 0);
>> +
>> + if (RelationNeedsWAL(rel))
>> + {
>> + xl_smgr_truncatexlrec;
>> +
>> + xlrec.blkno = 0;
>> + xlrec.rnode = rel->rd_node;
>> + xlrec.flags = SMGR_TRUNCATE_VM;
>> +
>> + 

Re: [HACKERS] forcing a rebuild of the visibility map

2016-06-17 Thread Andres Freund
Hi,


On 2016-06-15 23:06:37 -0400, Robert Haas wrote:
> After having been scared out of my mind by the discovery of
> longstanding breakage in heap_update[1], it occurred to me that this
> is an excellent example of a case in which the option for which Andres
> was agitating - specifically forcing VACUUM to visit ever single page
> in the heap - would be useful. [...] .  Being able
> to simply force every page to be visited is better.  Patch for that
> attached.  I decided to call the option disable_page_skipping rather
> than even_frozen_pages because it should also force visiting pages
> that are all-visible but not all-frozen.
> 
> However, I also think that the approach described above - providing a
> way to excise VM forks specifically - is useful.

I think both make sense.


> From 010e99b403ec733d50c71a7d4ef646b1b446ef07 Mon Sep 17 00:00:00 2001
> From: Robert Haas 
> Date: Wed, 15 Jun 2016 22:52:58 -0400
> Subject: [PATCH 2/2] Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.

>Furthermore,
> +  except when performing an aggressive vacuum, some pages may be skipped
> +  in order to avoid waiting for other sessions to finish using them.

Isn't that still the case? We'll not wait for a cleanup lock and such,
if not necessary?


>  /* non-export function prototypes */
> -static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
> -Relation *Irel, int nindexes, bool aggressive);
> +static void lazy_scan_heap(Relation onerel, int options,
> +LVRelStats *vacrelstats, Relation *Irel, int 
> nindexes,
> +bool aggressive);

Generally I think it's better to pass bitmaks arguments as an unsigned
integer. But since other related routines already use int...


>   /*
> -  * We request an aggressive scan if either the table's frozen Xid is now
> +  * We request an aggressive scan if the table's frozen Xid is now
>* older than or equal to the requested Xid full-table scan limit; or if
>* the table's minimum MultiXactId is older than or equal to the 
> requested
> -  * mxid full-table scan limit.
> +  * mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was 
> specified.
>*/
>   aggressive = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
>   
>xidFullScanLimit);
>   aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
>   
>   mxactFullScanLimit);
> + if (options & VACOPT_DISABLE_PAGE_SKIPPING)
> + aggressive = true;

I'm inclined to convert the agressive |= to an if, it looks a bit
jarring if consecutive assignments use differing forms.


>  static void
> -lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
> +lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
>  Relation *Irel, int nindexes, bool aggressive)
>  {
>   BlockNumber nblocks,
> @@ -542,25 +545,28 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
>* the last page.  This is worth avoiding mainly because such a lock 
> must
>* be replayed on any hot standby, where it can be disruptive.
>*/
> - for (next_unskippable_block = 0;
> -  next_unskippable_block < nblocks;
> -  next_unskippable_block++)
> + next_unskippable_block = 0;
> + if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
>   {
> - uint8   vmstatus;
> -
> - vmstatus = visibilitymap_get_status(onerel, 
> next_unskippable_block,
> - 
> );
> - if (aggressive)
> + while (next_unskippable_block < nblocks)
>   {
> - if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
> - break;
> - }
> - else
> - {
> - if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
> - break;
> + uint8   vmstatus;
> +
> + vmstatus = visibilitymap_get_status(onerel, 
> next_unskippable_block,
> + 
> );
> + if (aggressive)
> + {
> + if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
> + break;
> + }
> + else
> + {
> + if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
> + break;
> + }
> + vacuum_delay_point();
> + 

Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Tom Lane
Aleksey Demakov  writes:
> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas  wrote:
>> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov  wrote:
>>> I believe it would be perfectly okay to allocate huge amount of address
>>> space with mmap on startup.  If the pages are not touched, the OS VM
>>> subsystem will not commit them.

>> In my opinion, that's not going to fly.  If I thought otherwise, I
>> would not have developed the DSM facility in the first place.

> Essentially this is pessimizing for the lowest common denominator
> among OSes.

You're right, but that doesn't mean that the community is going to take
much interest in an unportable replacement for code that already exists.
Especially not an unportable replacement that also needs sweeping
assumptions like "disciplined use of mmap in postgresql core and
extensions".  You don't have to look further than the availability of
mmap to plperlu programmers to realize that that won't fly.  (Even if
we threw all the untrusted PLs overboard, I believe plain old stdio
is willing to use mmap in many versions of libc.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 2:23 PM, Aleksey Demakov  wrote:
> Essentially this is pessimizing for the lowest common denominator
> among OSes.

I totally agree.  That's how we make the server portable.

> Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.

I think that would be great if you could make it work, but it has to
support Linux, Windows (all supported versions), MacOS X, all the
various BSD flavors for which we have buildfarm animals, and other
platforms that we currently run on like HP-UX.   If you come up with a
solution that works for this on all of those platforms, I will shake
your hand.  But I think that's probably impossible, or at least
really, really hard.

> You are right that this might highly depend on the OS. But you are
> only partially right that it's impossible to give the memory back once
> you touched it. It is possible in many cases with additional measures.
> That is with additional control over memory mapping. Surprisingly, in
> this case windows has the most straightforward solution. VirtualAlloc
> has separate MEM_RESERVE and MEM_COMMIT flags. On various
> Unix flavours it is possible to play with mmap MAP_NORESERVE
> flag and madvise syscall. Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.

I have never understood how mmap() with a non-NULL argument could be
anything but a giant foot-gun.  If the operation system positions a
shared library or your process stack or anything else in the chosen
address range, you are dead.  I do agree that there are a bunch of
other tools that could be used on various platforms, but the need to
have a cross-platform solution for anything that goes into core makes
this very hard.

> Thus providing a single contiguous shared address space is doable.

Not convinced.

> The other question is how much it would buy. As for development
> time of an allocator it is a clear win. In terms of easy passing direct
> memory pointers between backends this a clear win again.

I agree it would be a huge win if it could be done.

> In terms of resulting performance, I don't know. This would take
> a few cycles on every step. You have a shared hash table. You
> cannot keep pointers there. You need to store offsets against the
> base address. Any reference would involve additional arithmetics.
> When these things add up, the net effect might become noticeable.

I'm sure it's going to be somewhat slower, but I think that's just a
tax that we have to pay for using processes rather than threads.  I
think it's still going to be fast enough to do plenty of cool stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread David G. Johnston
On Fri, Jun 17, 2016 at 2:23 PM, Aleksey Demakov  wrote:

> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas 
> wrote:
> > On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov 
> wrote:
> >>> I expect that to be useful for parallel query and anything else where
> >>> processes need to share variable-size data.  However, that's different
> >>> from this because ours can grown to arbitrary size and shrink again by
> >>> allocating and freeing with DSM segments.  We also do everything with
> >>> relative pointers since DSM segments can be mapped at different
> >>> addresses in different processes, whereas this would only work with
> >>> memory carved out of the main shared memory segment (or some new DSM
> >>> facility that guaranteed identical placement in every address space).
> >>>
> >>
> >> I believe it would be perfectly okay to allocate huge amount of address
> >> space with mmap on startup.  If the pages are not touched, the OS VM
> >> subsystem will not commit them.
> >
> > In my opinion, that's not going to fly.  If I thought otherwise, I
> > would not have developed the DSM facility in the first place.
> >
> > First, the behavior in this area is highly dependent on choice of
> > operating system and configuration parameters.  We've had plenty of
> > experience with requiring non-default configuration parameters to run
> > PostgreSQL, and it's all bad.  I don't really want to have to tell
> > users that they must run with a particular value of
> > vm.overcommit_memory in order to run the server.  Nor do I want to
> > tell users of other operating systems that their ability to run
> > PostgreSQL is dependent on the behavior their OS has in this area.  I
> > had a MacBook Pro up until a year or two ago where a sufficiently
> > shared memory request would cause a kernel panic.  That bug will
> > probably be fixed at some point if it hasn't been already, but
> > probably by returning an error rather than making it work.
> >
> > Second, there's no way to give memory back once you've touched it.  If
> > you decide to do a hash join on a 250GB inner table using a shared
> > hash table, you're going to have 250GB in swap-backed pages floating
> > around when you're done.  If the user has swap configured (and more
> > and more people don't), the operating system will eventually page
> > those out, but until that happens those pages are reducing the amount
> > of page cache that's available, and after it happens they're using up
> > swap.  In either case, the space consumed is consumed to no purpose.
> > You don't care about that hash table any more once the query
> > completes; there's just no way to tell the operating system that.  If
> > your workload follows an entirely predictable pattern and you always
> > have about the same amount of usage of this facility then you can just
> > reuse the same pages and everything is fine.  But if your usage
> > fluctuates I believe it will be a big problem.  With DSM, we can and
> > do explicitly free the memory back to the OS as soon as we don't need
> > it any more - and that's a big benefit.
> >
>
> Essentially this is pessimizing for the lowest common denominator
> among OSes. Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.
>
>
​Given PostgreSQL's goals regarding multi-platform operation it would seem
that at minimum there needs to be an implementation available that indeed
has these properties.  Improving our current base implementation within
these guidelines would be nice since everyone would benefit from the work
and the net amount of code is going to be reasonable since the old stuff
will likely be removed while the new stuff is being added.

While platform dependent default configuration parameters are undesirable​
enabling better but less widely usable algorithms seems to be one use for
compile-time options.  Is this arena amenable to such swapping out of
behavior at compile time?

​David J.​


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Sat, Jun 18, 2016 at 12:31 AM, Andres Freund  wrote:
> On 2016-06-18 00:23:14 +0600, Aleksey Demakov wrote:
>> Finally, it's possible to repeatedly mmap
>> and munmap on portions of a contiguous address space providing
>> a given addr argument for both of them. The last option might, of
>> course, is susceptible to hijacking this portion of the address by an
>> inadvertent caller of mmap with NULL addr argument. But probably
>> this could be avoided by imposing a disciplined use of mmap in
>> postgresql core and extensions.
>
> I don't think that's particularly realistic. malloc() uses mmap(NULL)
> internally.  And you can't portably mmap non-file backed memory from
> different processes; you need something like tmpfs backed / posix shared
> memory / for it.  On linux you can do stuff like madvise(MADV_FREE),
> which kinda helps.

Oops. Agreed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 1:45 PM, Tom Lane  wrote:
> I wrote:
>> Robert Haas  writes:
>>> I don't mind if you feel the need to refactor this.
>
>> I'm not sure yet.  What I think we need to work out first is exactly
>> how polymorphic parallel aggregates ought to identify which concrete
>> data types they are using.  There were well-defined rules before,
>> but how do we adapt those to two levels of aggregate evaluation?
>
> After reviewing things a bit, it seems like the first design-level issue
> there is what about polymorphic combine functions.  So far as the type
> system is concerned, there's no issue: the declared signature is always
> going to be "combine(foo, foo) returns foo" and it doesn't matter whether
> foo is an ordinary type, a polymorphic one, or INTERNAL.  The other
> question is whether the function itself knows what it's operating on in
> the current invocation.  For regular polymorphic types this seems no
> different from any other usage.  If the transtype is INTERNAL, then the
> type system provides no help; but we could reasonably assume that the
> internal representation itself contains as much information as the combine
> func needs.  We could add extra arguments like the "finalfunc_extra"
> option does for the finalfunc, but I don't really see a need --- that hack
> is mainly to satisfy the type system that the finalfunc's signature is
> sane, not to tell the finalfunc something it has no other way to find out.
> So I think we're probably OK as far as the combine function is concerned.

OK

> The situation is much direr as far as serialize/deserialize are concerned.
> These basically don't work at all for polymorphic transtypes: if you try
> to declare "deserialize(bytea) returns anyarray", the type system won't
> let you.  Perhaps that's not an issue because you shouldn't really need
> serialize/deserialize for anything except INTERNAL transtype.

In fact, I think that it won't even let you declare a serialize or
deserialize function for anything other than an INTERNAL transtype
right now.  There was a PostGIS use case that made it seem like we
might want to relax that, but currently I believe that's the law of
the land.

> However,
> there's a far bigger problem which is that "deserialize(bytea) returns
> internal" is a blatant security hole, which I see you ignored the defense
> against in opr_sanity.sql.  The claim in the comment there that it's okay
> if we check for aggregate context is a joke --- or haven't you heard that
> we allow users to create aggregates?  That's not nearly good enough to
> prevent unsafe usage of such functions.

That may be true, but it isn't obvious to me.  The user can't simply
say SELECT numeric_avg_deserialize(...) or whatever because the
function won't execute when called that way.  How would you arrange to
call such a function in a context where it wouldn't fail outright but
would return a value that you could then hand to some other function
that expects a type-internal input?

> Not to mention that CREATE
> FUNCTION won't allow creation of such functions, so extensions are locked
> out of using this feature.

Oops.

> This has to be redesigned or else reverted entirely.  I'm not going to
> take no for an answer.

Please don't act as if I've been refusing to fix bugs.  I've been
doing practically nothing else.

> A possible solution is to give deserialize an extra dummy argument, along
> the lines of "deserialize(bytea, internal) returns internal", thereby
> ensuring it can't be called in any non-system-originated contexts.

Seems promising.

> This
> is still rather dangerous if the other argument is variable, as somebody
> might be able to abuse an internal-taking function by naming it as the
> deserialize function for a maliciously-designed aggregate.

The complex and, as far as I can tell, largely undocumented set of
rules about what sorts of internal-taking-and-returning functions we
allow to exist strikes me as a house of cards just waiting for the
next stiff breeze to come along.

> What I'm
> inclined to do to lock it down further is to drop the "serialtype"
> argument to CREATE AGGREGATE, which seems rather pointless (what else
> would you ever use besides bytea?).  Instead, insist that
> serialize/deserialize apply *only* when the transtype is INTERNAL, and
> their signatures are exactly "serialize(internal) returns bytea" and
> "deserialize(bytea, internal) returns internal", never anything else.

I originally had the thought that you might want to make the
serialized representation something that could be human-readable.  I'd
actually like to have an SQL syntax that outputs the transition state
or, if it's internal, the serialized representation of it.  That would
allow us to do partitionwise aggregation given a query like SELECT
avg(foo) FROM parent_table_for_several_foreign_tables.  You'd need to
be able to ship something like SELECT PARTIAL avg(foo) FROM t1 to each
child.  And it 

Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
Sorry for unclear language. Late Friday evening in my place is to blame.

On Sat, Jun 18, 2016 at 12:23 AM, Aleksey Demakov  wrote:
> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas  wrote:
>> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov  wrote:
 I expect that to be useful for parallel query and anything else where
 processes need to share variable-size data.  However, that's different
 from this because ours can grown to arbitrary size and shrink again by
 allocating and freeing with DSM segments.  We also do everything with
 relative pointers since DSM segments can be mapped at different
 addresses in different processes, whereas this would only work with
 memory carved out of the main shared memory segment (or some new DSM
 facility that guaranteed identical placement in every address space).

>>>
>>> I believe it would be perfectly okay to allocate huge amount of address
>>> space with mmap on startup.  If the pages are not touched, the OS VM
>>> subsystem will not commit them.
>>
>> In my opinion, that's not going to fly.  If I thought otherwise, I
>> would not have developed the DSM facility in the first place.
>>
>> First, the behavior in this area is highly dependent on choice of
>> operating system and configuration parameters.  We've had plenty of
>> experience with requiring non-default configuration parameters to run
>> PostgreSQL, and it's all bad.  I don't really want to have to tell
>> users that they must run with a particular value of
>> vm.overcommit_memory in order to run the server.  Nor do I want to
>> tell users of other operating systems that their ability to run
>> PostgreSQL is dependent on the behavior their OS has in this area.  I
>> had a MacBook Pro up until a year or two ago where a sufficiently
>> shared memory request would cause a kernel panic.  That bug will
>> probably be fixed at some point if it hasn't been already, but
>> probably by returning an error rather than making it work.
>>
>> Second, there's no way to give memory back once you've touched it.  If
>> you decide to do a hash join on a 250GB inner table using a shared
>> hash table, you're going to have 250GB in swap-backed pages floating
>> around when you're done.  If the user has swap configured (and more
>> and more people don't), the operating system will eventually page
>> those out, but until that happens those pages are reducing the amount
>> of page cache that's available, and after it happens they're using up
>> swap.  In either case, the space consumed is consumed to no purpose.
>> You don't care about that hash table any more once the query
>> completes; there's just no way to tell the operating system that.  If
>> your workload follows an entirely predictable pattern and you always
>> have about the same amount of usage of this facility then you can just
>> reuse the same pages and everything is fine.  But if your usage
>> fluctuates I believe it will be a big problem.  With DSM, we can and
>> do explicitly free the memory back to the OS as soon as we don't need
>> it any more - and that's a big benefit.
>>
>
> Essentially this is pessimizing for the lowest common denominator
> among OSes. Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.
>
> You are right that this might highly depend on the OS. But you are
> only partially right that it's impossible to give the memory back once
> you touched it. It is possible in many cases with additional measures.
> That is with additional control over memory mapping. Surprisingly, in
> this case windows has the most straightforward solution. VirtualAlloc
> has separate MEM_RESERVE and MEM_COMMIT flags. On various
> Unix flavours it is possible to play with mmap MAP_NORESERVE
> flag and madvise syscall. Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.
>
> Thus providing a single contiguous shared address space is doable.
> The other question is how much it would buy. As for development
> time of an allocator it is a clear win. In terms of easy passing direct
> memory pointers between backends this a clear win again.
>
> In terms of resulting performance, I don't know. This would take
> a few cycles on every step. You have a shared hash table. You
> cannot keep pointers there. You need to store offsets against the
> base address. Any reference would involve additional arithmetics.
> When these things add up, the net effect might become noticeable.
>
> Regards,
> Aleksey


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Andres Freund
On 2016-06-18 00:23:14 +0600, Aleksey Demakov wrote:
> Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.

I don't think that's particularly realistic. malloc() uses mmap(NULL)
internally.  And you can't portably mmap non-file backed memory from
different processes; you need something like tmpfs backed / posix shared
memory / for it.  On linux you can do stuff like madvise(MADV_FREE),
which kinda helps.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas  wrote:
> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov  wrote:
>>> I expect that to be useful for parallel query and anything else where
>>> processes need to share variable-size data.  However, that's different
>>> from this because ours can grown to arbitrary size and shrink again by
>>> allocating and freeing with DSM segments.  We also do everything with
>>> relative pointers since DSM segments can be mapped at different
>>> addresses in different processes, whereas this would only work with
>>> memory carved out of the main shared memory segment (or some new DSM
>>> facility that guaranteed identical placement in every address space).
>>>
>>
>> I believe it would be perfectly okay to allocate huge amount of address
>> space with mmap on startup.  If the pages are not touched, the OS VM
>> subsystem will not commit them.
>
> In my opinion, that's not going to fly.  If I thought otherwise, I
> would not have developed the DSM facility in the first place.
>
> First, the behavior in this area is highly dependent on choice of
> operating system and configuration parameters.  We've had plenty of
> experience with requiring non-default configuration parameters to run
> PostgreSQL, and it's all bad.  I don't really want to have to tell
> users that they must run with a particular value of
> vm.overcommit_memory in order to run the server.  Nor do I want to
> tell users of other operating systems that their ability to run
> PostgreSQL is dependent on the behavior their OS has in this area.  I
> had a MacBook Pro up until a year or two ago where a sufficiently
> shared memory request would cause a kernel panic.  That bug will
> probably be fixed at some point if it hasn't been already, but
> probably by returning an error rather than making it work.
>
> Second, there's no way to give memory back once you've touched it.  If
> you decide to do a hash join on a 250GB inner table using a shared
> hash table, you're going to have 250GB in swap-backed pages floating
> around when you're done.  If the user has swap configured (and more
> and more people don't), the operating system will eventually page
> those out, but until that happens those pages are reducing the amount
> of page cache that's available, and after it happens they're using up
> swap.  In either case, the space consumed is consumed to no purpose.
> You don't care about that hash table any more once the query
> completes; there's just no way to tell the operating system that.  If
> your workload follows an entirely predictable pattern and you always
> have about the same amount of usage of this facility then you can just
> reuse the same pages and everything is fine.  But if your usage
> fluctuates I believe it will be a big problem.  With DSM, we can and
> do explicitly free the memory back to the OS as soon as we don't need
> it any more - and that's a big benefit.
>

Essentially this is pessimizing for the lowest common denominator
among OSes. Having a contiguous address space makes things so
much simpler that considering this case, IMHO, is well worth of it.

You are right that this might highly depend on the OS. But you are
only partially right that it's impossible to give the memory back once
you touched it. It is possible in many cases with additional measures.
That is with additional control over memory mapping. Surprisingly, in
this case windows has the most straightforward solution. VirtualAlloc
has separate MEM_RESERVE and MEM_COMMIT flags. On various
Unix flavours it is possible to play with mmap MAP_NORESERVE
flag and madvise syscall. Finally, it's possible to repeatedly mmap
and munmap on portions of a contiguous address space providing
a given addr argument for both of them. The last option might, of
course, is susceptible to hijacking this portion of the address by an
inadvertent caller of mmap with NULL addr argument. But probably
this could be avoided by imposing a disciplined use of mmap in
postgresql core and extensions.

Thus providing a single contiguous shared address space is doable.
The other question is how much it would buy. As for development
time of an allocator it is a clear win. In terms of easy passing direct
memory pointers between backends this a clear win again.

In terms of resulting performance, I don't know. This would take
a few cycles on every step. You have a shared hash table. You
cannot keep pointers there. You need to store offsets against the
base address. Any reference would involve additional arithmetics.
When these things add up, the net effect might become noticeable.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype

2016-06-17 Thread Tom Lane
I wrote:
> Robert Haas  writes:
>> I don't mind if you feel the need to refactor this.

> I'm not sure yet.  What I think we need to work out first is exactly
> how polymorphic parallel aggregates ought to identify which concrete
> data types they are using.  There were well-defined rules before,
> but how do we adapt those to two levels of aggregate evaluation?

After reviewing things a bit, it seems like the first design-level issue
there is what about polymorphic combine functions.  So far as the type
system is concerned, there's no issue: the declared signature is always
going to be "combine(foo, foo) returns foo" and it doesn't matter whether
foo is an ordinary type, a polymorphic one, or INTERNAL.  The other
question is whether the function itself knows what it's operating on in
the current invocation.  For regular polymorphic types this seems no
different from any other usage.  If the transtype is INTERNAL, then the
type system provides no help; but we could reasonably assume that the
internal representation itself contains as much information as the combine
func needs.  We could add extra arguments like the "finalfunc_extra"
option does for the finalfunc, but I don't really see a need --- that hack
is mainly to satisfy the type system that the finalfunc's signature is
sane, not to tell the finalfunc something it has no other way to find out.
So I think we're probably OK as far as the combine function is concerned.

The situation is much direr as far as serialize/deserialize are concerned.
These basically don't work at all for polymorphic transtypes: if you try
to declare "deserialize(bytea) returns anyarray", the type system won't
let you.  Perhaps that's not an issue because you shouldn't really need
serialize/deserialize for anything except INTERNAL transtype.  However,
there's a far bigger problem which is that "deserialize(bytea) returns
internal" is a blatant security hole, which I see you ignored the defense
against in opr_sanity.sql.  The claim in the comment there that it's okay
if we check for aggregate context is a joke --- or haven't you heard that
we allow users to create aggregates?  That's not nearly good enough to
prevent unsafe usage of such functions.  Not to mention that CREATE
FUNCTION won't allow creation of such functions, so extensions are locked
out of using this feature.

This has to be redesigned or else reverted entirely.  I'm not going to
take no for an answer.

A possible solution is to give deserialize an extra dummy argument, along
the lines of "deserialize(bytea, internal) returns internal", thereby
ensuring it can't be called in any non-system-originated contexts.  This
is still rather dangerous if the other argument is variable, as somebody
might be able to abuse an internal-taking function by naming it as the
deserialize function for a maliciously-designed aggregate.  What I'm
inclined to do to lock it down further is to drop the "serialtype"
argument to CREATE AGGREGATE, which seems rather pointless (what else
would you ever use besides bytea?).  Instead, insist that
serialize/deserialize apply *only* when the transtype is INTERNAL, and
their signatures are exactly "serialize(internal) returns bytea" and
"deserialize(bytea, internal) returns internal", never anything else.

A different way of locking things down, which might be cleaner in the
long run, is to invent a new pseudo-type for the sole purpose of being
the serialization type, that is we'd insist on the signatures being
"serialize(internal) returns serialized_internal" and
"deserialize(serialized_internal) returns internal", where
serialized_internal has the representation properties of bytea but is
usable for no other purpose than this.  Not sure it's worth the trouble
though.

Anyway, setting aside the security angle, it doesn't seem like there is
anything wrong with the high-level design for polymorphic cases.  I'm now
thinking the problem I saw is just a garden variety implementation bug
(or bugs).  I'm still not very happy with the confusion around aggtype
though, not least because I suspect it contributed directly to this bug.
I also feel that both the code comments and the user-facing documentation
for this feature are well below project standards, eg where's the
discussion in section 35.10?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-17 Thread Alvaro Herrera
Andrew Gierth wrote:
> > "Alvaro" == Alvaro Herrera  writes:
> 
>  >> (It can, AFAICT, be inside the currently valid range due to
>  >> wraparound, i.e. without there being a valid pg_multixact entry for
>  >> it, because AFAICT in 9.2, once the mxid is hinted dead it is never
>  >> again either looked up or cleared, so it can sit in the tuple xmax
>  >> forever, even through multiple wraparounds.)
> 
>  Alvaro> HeapTupleSatisfiesVacuum removes very old multixacts
> 
> It does nothing of the kind; it only marks them HEAP_XMAX_INVALID. The
> actual mxid remains in the tuple xmax field.
> 
> The failing mxids in the case I analyzed on -bugs are failing _in spite
> of_ being already hinted HEAP_XMAX_INVALID, because the code path in
> question doesn't check that.

Ah, right.  I had some code to reset HEAP_XMAX_IS_MULTI early on but
somebody didn't like it and we removed it; and we also removed some of
the checks for HEAP_XMAX_INVALID we had, or perhaps didn't extend them
to every place that needed them.  It's not critical now anyway; the
patch I posted (or some variation thereof) should suffice as a fix.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 1:04 PM, Alvaro Herrera
 wrote:
> Merlin Moncure wrote:
>
>> Ugliness is a highly subjective qualifier.  OTOH, Backwards
>> compatibility, at least when the checks are properly written :-), is a
>> very objective benefit.
>
> This is the argument that made us kept the PostgreSQL name instead of
> renaming back to Postgres.  I'm not a fan of it.

I, on the other hand, am a big fan of backward-compatibility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-06-17 Thread Alvaro Herrera
Merlin Moncure wrote:

> Ugliness is a highly subjective qualifier.  OTOH, Backwards
> compatibility, at least when the checks are properly written :-), is a
> very objective benefit.

This is the argument that made us kept the PostgreSQL name instead of
renaming back to Postgres.  I'm not a fan of it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in parallel comment of heap_delete()

2016-06-17 Thread Robert Haas
On Thu, Jun 16, 2016 at 12:32 AM, Thomas Munro
 wrote:
> On Tue, Jun 7, 2016 at 12:00 AM, Robert Haas  wrote:
>> On Sun, Jun 5, 2016 at 4:39 PM, Jim Nasby  wrote:
>>> I'm pretty sure this is a typo...
>>
>> Sure is.  Thanks.
>
> The same typo appears in heap_update.

Commited.

> PS Far be it from me, but postgres_fdw.c seems to have a stray
> conditional form where I would expect a present subjunctive:  "...
> lest the new path *would* kick ..."
> /me ducks

Rephrased.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov  wrote:
>> I expect that to be useful for parallel query and anything else where
>> processes need to share variable-size data.  However, that's different
>> from this because ours can grown to arbitrary size and shrink again by
>> allocating and freeing with DSM segments.  We also do everything with
>> relative pointers since DSM segments can be mapped at different
>> addresses in different processes, whereas this would only work with
>> memory carved out of the main shared memory segment (or some new DSM
>> facility that guaranteed identical placement in every address space).
>>
>
> I believe it would be perfectly okay to allocate huge amount of address
> space with mmap on startup.  If the pages are not touched, the OS VM
> subsystem will not commit them.

In my opinion, that's not going to fly.  If I thought otherwise, I
would not have developed the DSM facility in the first place.

First, the behavior in this area is highly dependent on choice of
operating system and configuration parameters.  We've had plenty of
experience with requiring non-default configuration parameters to run
PostgreSQL, and it's all bad.  I don't really want to have to tell
users that they must run with a particular value of
vm.overcommit_memory in order to run the server.  Nor do I want to
tell users of other operating systems that their ability to run
PostgreSQL is dependent on the behavior their OS has in this area.  I
had a MacBook Pro up until a year or two ago where a sufficiently
shared memory request would cause a kernel panic.  That bug will
probably be fixed at some point if it hasn't been already, but
probably by returning an error rather than making it work.

Second, there's no way to give memory back once you've touched it.  If
you decide to do a hash join on a 250GB inner table using a shared
hash table, you're going to have 250GB in swap-backed pages floating
around when you're done.  If the user has swap configured (and more
and more people don't), the operating system will eventually page
those out, but until that happens those pages are reducing the amount
of page cache that's available, and after it happens they're using up
swap.  In either case, the space consumed is consumed to no purpose.
You don't care about that hash table any more once the query
completes; there's just no way to tell the operating system that.  If
your workload follows an entirely predictable pattern and you always
have about the same amount of usage of this facility then you can just
reuse the same pages and everything is fine.  But if your usage
fluctuates I believe it will be a big problem.  With DSM, we can and
do explicitly free the memory back to the OS as soon as we don't need
it any more - and that's a big benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Fri, Jun 17, 2016 at 10:18 PM, Robert Haas  wrote:
> On Fri, Jun 17, 2016 at 11:30 AM, Tom Lane  wrote:
> But I'm a bit confused about where it gets the bytes it wants to
> manage.  There's no call to dsm_create() or ShmemAlloc() anywhere in
> the code, at least not that I could find quickly.  The only way to get
> shar_base set to a non-NULL value seems to be to call SharAttach(),
> and if there's no SharCreate() where would we get that non-NULL value?
>

You are right, I just have to tidy up the initialisation code before
publishing it.

> I expect that to be useful for parallel query and anything else where
> processes need to share variable-size data.  However, that's different
> from this because ours can grown to arbitrary size and shrink again by
> allocating and freeing with DSM segments.  We also do everything with
> relative pointers since DSM segments can be mapped at different
> addresses in different processes, whereas this would only work with
> memory carved out of the main shared memory segment (or some new DSM
> facility that guaranteed identical placement in every address space).
>

I believe it would be perfectly okay to allocate huge amount of address
space with mmap on startup.  If the pages are not touched, the OS VM
subsystem will not commit them.

>  I've been a bit reluctant to put it out there
> until we have a tangible application of the allocator working, for
> fear people will say "that's not good for anything!".  I'm confident
> it's good for lots of things, but other people have been known not to
> share my confidence.
>

This is what I've been told by Postgres Pro folks too. But I felt that this
thing deserves to be shown to the community sooner rather than latter.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 9:29 AM, Robert Haas  wrote:
> On Fri, Jun 17, 2016 at 9:04 AM, Amit Kapila  wrote:
>> Attached, please find updated patch based on above lines.  Due to addition
>> of projection path on top of partial paths, some small additional cost is
>> added for parallel paths. In one of the tests in select_parallel.sql the
>> plan was getting converted to serial plan from parallel plan due to this
>> cost, so I have changed it to make it more restrictive.  Before changing, I
>> have debugged the test to confirm that it was due to this new projection
>> path cost.  I have added two new tests as well to cover the new code added
>> by patch.
>
> I'm going to go work on this patch now.

Here is a revised version, which I plan to commit in a few hours
before the workday expires.  I kept it basically as Amit had it, but I
whacked the comments around some and, more substantively, avoided
inserting and/or charging for a Result node when that's not necessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index cefec7b..0434a5a 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -465,7 +465,8 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	 * cheapest path.)
 	 */
 	sorted_path = apply_projection_to_path(subroot, final_rel, sorted_path,
-		   create_pathtarget(subroot, tlist));
+		   create_pathtarget(subroot, tlist),
+		   false);
 
 	/*
 	 * Determine cost to get just the first row of the presorted path.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 07b925e..0af8151 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1500,6 +1500,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		PathTarget *grouping_target;
 		PathTarget *scanjoin_target;
 		bool		have_grouping;
+		bool		scanjoin_target_parallel_safe = false;
 		WindowFuncLists *wflists = NULL;
 		List	   *activeWindows = NIL;
 		List	   *rollup_lists = NIL;
@@ -1730,7 +1731,16 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_target = grouping_target;
 
 		/*
-		 * Forcibly apply that target to all the Paths for the scan/join rel.
+		 * Check whether scan/join target is parallel safe ... unless there
+		 * are no partial paths, in which case we don't care.
+		 */
+		if (current_rel->partial_pathlist &&
+			!has_parallel_hazard((Node *) scanjoin_target->exprs, false))
+			scanjoin_target_parallel_safe = true;
+
+		/*
+		 * Forcibly apply scan/join target to all the Paths for the scan/join
+		 * rel.
 		 *
 		 * In principle we should re-run set_cheapest() here to identify the
 		 * cheapest path, but it seems unlikely that adding the same tlist
@@ -1746,7 +1756,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 
 			Assert(subpath->param_info == NULL);
 			path = apply_projection_to_path(root, current_rel,
-			subpath, scanjoin_target);
+			subpath, scanjoin_target,
+			scanjoin_target_parallel_safe);
 			/* If we had to add a Result, path is different from subpath */
 			if (path != subpath)
 			{
@@ -1759,6 +1770,70 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * Upper planning steps which make use of the top scan/join rel's
+		 * partial pathlist will expect partial paths for that rel to produce
+		 * the same output as complete paths ... and we just changed the
+		 * output for the complete paths, so we'll need to do the same thing
+		 * for partial paths.
+		 */
+		if (scanjoin_target_parallel_safe)
+		{
+			/*
+			 * Apply the scan/join target to each partial path.  Otherwise,
+			 * anything that attempts to use the partial paths for further
+			 * upper planning may go wrong.
+			 */
+			foreach(lc, current_rel->partial_pathlist)
+			{
+Path	   *subpath = (Path *) lfirst(lc);
+Path	   *newpath;
+
+/*
+ * We can't use apply_projection_to_path() here, because there
+ * could already be pointers to these paths, and therefore
+ * we cannot modify them in place.  Instead, we must use
+ * create_projection_path().  The good news is this won't
+ * actually insert a Result node into the final plan unless
+ * it's needed, but the bad news is that it will charge for
+ * the node whether it's needed or not.  Therefore, if the
+ * target list is already what we need it to be, just leave
+ * this partial path alone.
+ */
+if (equal(scanjoin_target->exprs, subpath->pathtarget->exprs))
+	continue;
+
+Assert(subpath->param_info == NULL);
+newpath = (Path *) create_projection_path(root,
+			 current_rel,
+			 subpath,
+			 scanjoin_target);
+if 

Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Fri, Jun 17, 2016 at 9:30 PM, Tom Lane  wrote:
> Aleksey Demakov  writes:
>> I have some very experimental code to enable dynamic memory allocation
>> of shared memory for postgresql backend processes.
>
> Um ... what's this do that the existing DSM stuff doesn't do?
>

It operates over a single large shared memory segment. Within this
segment it lets alloc / free small chunks of memory from 16 bytes to
16 kilobytes. Chunks are carved out from fixed-size 32k blocks. Each
block is used to allocate chunks of single size class. When a block is
full, another block for a given size class is taken from the top
shared segment.

The goal is to support high levels of concurrency for alloc / free
calls. Therefore the allocator is mostly non-blocking. Currently it
uses Heller's lazy list algorithm to maintain block lists of a given
size class, so it uses slocks once in a while, when a new block is
added or removed. If this proves to cause scalability problems the
Heller's list might be replaced with Maged Michael's lock-free list to
make the whole allocator absolutely lock-free.

Additionally it provides epoch-based memory reclamation facility that
solves ABA-problem for lock-free algorithms. I am going to implement
some lock-free algorithms (extendable hash-tables and probably skip
lists) on top of this facility.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 11:30 AM, Tom Lane  wrote:
> Aleksey Demakov  writes:
>> I have some very experimental code to enable dynamic memory allocation
>> of shared memory for postgresql backend processes.
>
> Um ... what's this do that the existing DSM stuff doesn't do?

It seems to be a full-fledged allocator, rather than just a way of
getting a slab of bytes from the operating system.  Think malloc()
rather than sbrk().

But I'm a bit confused about where it gets the bytes it wants to
manage.  There's no call to dsm_create() or ShmemAlloc() anywhere in
the code, at least not that I could find quickly.  The only way to get
shar_base set to a non-NULL value seems to be to call SharAttach(),
and if there's no SharCreate() where would we get that non-NULL value?

EnterpriseDB is working on a memory allocator which will manage chunks
of dynamic shared memory and provide an allocate/free interface to
allow small allocations to be carved out of large DSM segments:

https://wiki.postgresql.org/wiki/EnterpriseDB_database_server_roadmap

I expect that to be useful for parallel query and anything else where
processes need to share variable-size data.  However, that's different
from this because ours can grown to arbitrary size and shrink again by
allocating and freeing with DSM segments.  We also do everything with
relative pointers since DSM segments can be mapped at different
addresses in different processes, whereas this would only work with
memory carved out of the main shared memory segment (or some new DSM
facility that guaranteed identical placement in every address space).

I expect we'll probably post our implementation of this shortly after
9.7 development opens.  I've been a bit reluctant to put it out there
until we have a tangible application of the allocator working, for
fear people will say "that's not good for anything!".  I'm confident
it's good for lots of things, but other people have been known not to
share my confidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Tom Lane
Aleksey Demakov  writes:
> I have some very experimental code to enable dynamic memory allocation
> of shared memory for postgresql backend processes.

Um ... what's this do that the existing DSM stuff doesn't do?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
Hi all,

I have some very experimental code to enable dynamic memory allocation
of shared memory for postgresql backend processes. The source code in
the repository is not complete yet. Moreover it is not immediately
useful by itself. However it might serve as the basis to implement
higher-level features. Such as expanding hash-tables or other data
structures to share data between backends. Ultimately it might be used
for an in-memory data store usable via FDW interface. Despite such
higher level features are not available yet the code anyway might be
interesting for curious eyes.

https://github.com/ademakov/sharena

The first stage of this project was funded by Postgres Pro. Many
thanks to this wonderful team.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restriction of windows functions

2016-06-17 Thread Tom Lane
Konstantin Knizhnik  writes:
> On 17.06.2016 17:01, Tom Lane wrote:
>> IIRC, the sticking point was defining a reasonably datatype-independent
>> (i.e. extensible) notion of distance.

> Why it is not possible just to locate "-" or "+ operator for this type?

Because there's no guarantee that that operator has anything to do with
the sort ordering imposed by the type's btree opclass.  We must have a
distance operator that is consistent with the sort order, or the windowing
logic will get hopelessly confused.  For that matter, we support multiple
opclasses (sort orderings) per data type, and there's certainly no way
that a single "-" operator will be consistent with all of them.

At the time we discussed extending the definition of a btree opclass to
allow specification of related "+" and "-" operators, but the
infrastructure additions required seemed rather daunting.  Now that
we have pg_amop.amoppurpose, it might be easier to add such a concept;
they could be put in with a new "purpose" that shows that they are
not intended as index search operators.

Again, please see the archives.  I'm just speaking off the cuff here,
and probably don't remember all the details.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Teodor Sigaev

Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';


Oh ... the indexes in the lists don't have much to do with the distances,
do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
okay with the patch as presented.


Huh, I found that my isn't correct for example which I show :(. Reworked patch 
is in attach.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase_exact_distance-2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New design for FK-based join selectivity estimation

2016-06-17 Thread Tom Lane
Tomas Vondra  writes:
> Thanks for getting the patch into a much better shape. I've quickly 
> reviewed the patch this morning before leaving to the airport - I do 
> plan to do additional review/testing once in the air or perhaps over the 
> weekend. So at the moment I only have a few minor comments:

> 1) Shouldn't we define a new macro in copyfuncs.c, to do the memcpy for 
> us? Seems a bit strange we have macros for everything else.

Perhaps, but it seemed not that compelling since we need bespoke code for
those fields in outfuncs.c etc.  Maybe it would be worth thinking about
macro infrastructure for array-type fields in all of those modules.

> 2) I'm wondering whether removing the restrict infos when
>  if (fkinfo->eclass[i] == rinfo->parent_ec)
> is actually correct. Can't the EC include conditions that do not match 
> the FK?

Doesn't matter.  The point is that it *does* include a condition that
does match the FK, whether it chose to generate exactly that condition
for this join or some related one.

> I mean something like this:
>CREATE TABLE a (id1 INT PRIMARY KEY, id2 INT);
>CREATE TABLE b (id1 INT REFERENCES a (id1), id2 INT);
> and then something like
>SELECT * FROM a JOIN b ON (a.id1 = b.id1 AND a.id1 = b.id2)

Right.  In this case we'll have an EC containing {a.id1, b.id1, b.id2}
which means that equivclass.c will generate a restriction condition
b.id1 = b.id2 to be applied at the scan of b.  At the join level, it
has a choice whether to generate a.id1 = b.id1 or a.id1 = b.id2.
It could generate both, but that would be pointlessly inefficient (and
would likely confuse the selectivity estimators, too).  But even if it
chooses to generate a.id1 = b.id2, we should recognize that the FK is
matched.  What we're effectively doing by dropping that clause in
favor of treating the FK as matched is overridding equivclass.c's
arbitrary choice of which join clause to generate with an equally valid
choice that is easier to estimate for.

> 3) I think this comment in get_foreign_key_join_selectivity is wrong and 
> should instead say 'to FK':
>/* Otherwise, see if rinfo was previously matched to EC */

Duh, yeah.


I rewrote the comments in this section to clarify a bit:

/* Drop this clause if it matches any column of the FK */
for (i = 0; i < fkinfo->nkeys; i++)
{
if (rinfo->parent_ec)
{
/*
 * EC-derived clauses can only match by EC.  It is okay to
 * consider any clause derived from the same EC as
 * matching the FK: even if equivclass.c chose to generate
 * a clause equating some other pair of Vars, it could
 * have generated one equating the FK's Vars.  So for
 * purposes of estimation, we can act as though it did so.
 *
 * Note: checking parent_ec is a bit of a cheat because
 * there are EC-derived clauses that don't have parent_ec
 * set; but such clauses must compare expressions that
 * aren't just Vars, so they cannot match the FK anyway.
 */
if (fkinfo->eclass[i] == rinfo->parent_ec)
{
remove_it = true;
break;
}
}
else
{
/*
 * Otherwise, see if rinfo was previously matched to FK as
 * a "loose" clause.
 */
if (list_member_ptr(fkinfo->rinfos[i], rinfo))
{
remove_it = true;
break;
}
}
}


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restriction of windows functions

2016-06-17 Thread Konstantin Knizhnik



On 17.06.2016 17:01, Tom Lane wrote:


Certainly it assumes that window is ordered by key and the key type
supports subtraction, so "text" can not be used here.
IIRC, the sticking point was defining a reasonably datatype-independent
(i.e. extensible) notion of distance.


Why it is not possible just to locate "-" or "+ operator for this type?
I do not see any difference here with locating comparison operator 
needed for sorting.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Tom Lane
Teodor Sigaev  writes:
> Tom Lane wrote:
>> Hmm, couldn't the loop logic be simplified a great deal if this is the
>> definition?  Or are you leaving it like that with the idea that we might
>> later introduce another operator with the less-than-or-equal behavior?

> Do you suggest something like merge join of two sorted lists? ie:
> ...
> Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could 
> be 
> not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';

Oh ... the indexes in the lists don't have much to do with the distances,
do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
okay with the patch as presented.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

2016-06-17 Thread Teodor Sigaev



Tom Lane wrote:

Teodor Sigaev  writes:

So I think there's a reasonable case for decreeing that  should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.



Agree, seems that's easy to change.
...
Patch is attached


Hmm, couldn't the loop logic be simplified a great deal if this is the
definition?  Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?


Do you suggest something like merge join of two sorted lists? ie:

while(Rpos < Rdata.pos + Rdata.npos && Lpos < Ldata.pos + Ldata.npos)
{
if (*Lpos > *Rpos)
Rpos++;
else if (*Lpos < *Rpos)
{
if (*Rpos - *Lpos == distance)
match!
Lpos++;
}
else
{
if (distance == 0)
match!
Lpos++; Rpos++;
}
}

Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be 
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restriction of windows functions

2016-06-17 Thread Tom Lane
Konstantin Knizhnik  writes:
> ERROR:  RANGE PRECEDING is only supported with UNBOUNDED
> Is there some principle problem in implementing such kind of window?

There was such code in the original windowagg patch and it got rejected
as being too broken.  Please consult the archives.

> Certainly it assumes that window is ordered by key and the key type 
> supports subtraction, so "text" can not be used here.

IIRC, the sticking point was defining a reasonably datatype-independent
(i.e. extensible) notion of distance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-17 Thread Andrew Gierth
> "Robert" == Robert Haas  writes:

 >> Why is the correct rule not "check for and ignore pre-upgrade mxids
 >> before even trying to fetch members"?

 Robert> I entirely believe that's the correct rule, but doesn't
 Robert> implementing it require a crystal balll?

Why would it? Pre-9.3 mxids are identified by the combination of flag
bits in the infomask, see Alvaro's patch.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-17 Thread Robert Haas
On Fri, Jun 17, 2016 at 9:04 AM, Amit Kapila  wrote:
> Attached, please find updated patch based on above lines.  Due to addition
> of projection path on top of partial paths, some small additional cost is
> added for parallel paths. In one of the tests in select_parallel.sql the
> plan was getting converted to serial plan from parallel plan due to this
> cost, so I have changed it to make it more restrictive.  Before changing, I
> have debugged the test to confirm that it was due to this new projection
> path cost.  I have added two new tests as well to cover the new code added
> by patch.

I'm going to go work on this patch now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel.c is not marked as test covered

2016-06-17 Thread Robert Haas
On Wed, Jun 15, 2016 at 5:10 PM, Robert Haas  wrote:
> I'm comfortable with that.  Feel free to make it so, unless you want
> me to do it for some reason.

Time is short, so I did this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-06-17 Thread Merlin Moncure
On Fri, Jun 17, 2016 at 1:01 AM, Craig Ringer  wrote:
> On 17 June 2016 at 08:34, Greg Stark  wrote:
>>
>> So we would release 10.0.0 and 10.0.1 and the next major release would be
>> 11.0.0.
>>
>> This would have two benefits:
>>
>> 1) It emphasises that minor releases continue to be safe minor updates
>> that offer the same stability guarantees. Users would be less likely to be
>> intimidated by 10.0.1 than they would be 10.1. And it gives users a
>> consistent story they can apply to any version whether 9.x or 10.0+
>
>
> And matches semver.
>
>>
>> 2) If we ever do release incompatible feature releases on older branches
>> -- or more likely some fork does -- it gives them a natural way to number
>> their release.
>
> Seems unlikely, though.
>
> I thought about raising this, but I think in the end it's replacing one
> confusing and weird versioning scheme for another confusing and weird
> versioning scheme.
>
> It does have the advantage that that compare a two-part major like 090401 vs
> 090402 won't be confused when they compare 100100 and 100200, since it'll be
> 11 and 12. So it's more backward-compatible. But ugly.

Ugliness is a highly subjective qualifier.  OTOH, Backwards
compatibility, at least when the checks are properly written :-), is a
very objective benefit.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 10.0

2016-06-17 Thread David G. Johnston
On Fri, Jun 17, 2016 at 2:01 AM, Craig Ringer  wrote:

> On 17 June 2016 at 08:34, Greg Stark  wrote:
>
>> So we would release 10.0.0 and 10.0.1 and the next major release would be
>> 11.0.0.
>>
>> This would have two benefits:
>>
>> 1) It emphasises that minor releases continue to be safe minor updates
>> that offer the same stability guarantees. Users would be less likely to be
>> intimidated by 10.0.1 than they would be 10.1. And it gives users a
>> consistent story they can apply to any version whether 9.x or 10.0+
>>
>
> And matches semver.
>

​If we were or ever expected to have some kind of semver policy then a
numbering scheme matching semver would make sense.  We don't and so having
one just allows people to make invalid assumptions.​

​This possibility was known when the discussion at pgCon happened and,
IIUC, the decision to use 10.0 was made.  And this thread went on for quite
a while prior to that.  Lets let this die, please.  Or at worse wait until
we open HEAD up for 10.0 and someone commits the fully fleshed out
versioning changes to the docs.

David J.
​


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-17 Thread Amit Kapila
On Fri, Jun 17, 2016 at 8:18 AM, Amit Kapila 
wrote:
>
> On Fri, Jun 17, 2016 at 3:20 AM, Robert Haas 
wrote:
> >
>
> > Something like what you have there might work if you use
> > create_projection_path instead of apply_projection_to_path.
> >
>
> Okay, I think you mean to say use create_projection_path() while applying
scanjoin_target to partial path lists in below code:
> foreach(lc, current_rel->partial_pathlist)
> {
> Path   *subpath = (Path *) lfirst(lc);
> Assert(subpath->param_info == NULL);
> lfirst(lc) = apply_projection_to_path(root, current_rel,
> subpath, scanjoin_target,
> scanjoin_target_parallel_safe);
> }
>

Attached, please find updated patch based on above lines.  Due to addition
of projection path on top of partial paths, some small additional cost is
added for parallel paths. In one of the tests in select_parallel.sql the
plan was getting converted to serial plan from parallel plan due to this
cost, so I have changed it to make it more restrictive.  Before changing, I
have debugged the test to confirm that it was due to this new projection
path cost.  I have added two new tests as well to cover the new code added
by patch.

Note - Apart from the tests related to new code,  Dilip Kumar has helped to
verify that TPC-H queries also worked fine (he tested using Explain). He
doesn't encounter problem reported above [1] whereas without patch TPCH
queries were failing.


[1] - https://www.postgresql.org/message-id/575E6BE6.8040006%40lab.ntt.co.jp

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_scanjoin_pathtarget_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in postgres_fdw/deparse.c:1116

2016-06-17 Thread Robert Haas
On Thu, Jun 16, 2016 at 10:44 PM, Etsuro Fujita
 wrote:
> On 2016/06/16 22:00, Robert Haas wrote:
>> On Thu, Jun 16, 2016 at 7:05 AM, Etsuro Fujita
>>  wrote:
>>>
>>> ISTM that a robuster solution to this is to push down the ft1-ft2-ft3
>>> join
>>> with the PHV by extending deparseExplicitTargetList() and/or something
>>> else
>>> so that we can send the remote query like:
>>>
>>> select ft1.a, (ft3.a IS NOT NULL) from (ft1 inner join ft2 on ft1.a =
>>> ft2.a)
>>> left join ft3 on ft1.a = ft3.a
>
>> I completely agree we should have that, but not for 9.6.
>
> OK, I'll work on that for 10.0.

Great, that will be a nice improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-17 Thread Robert Haas
On Thu, Jun 16, 2016 at 4:50 AM, Andrew Gierth
 wrote:
> Why is the correct rule not "check for and ignore pre-upgrade mxids
> before even trying to fetch members"?

I entirely believe that's the correct rule, but doesn't implementing
it require a crystal balll?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Restriction of windows functions

2016-06-17 Thread Konstantin Knizhnik

Hi,

One of the popular queries in financial analytic systems is to calculate 
some moving aggregate within some time interval, i.e. moving average of 
trade price within 5 minutes window. Unfortunately this case is not 
supported by PostgreSQL:


select symbol,date,avg(price) over (order by date range between '5 
minutes' preceding and current row) from Trades;

ERROR:  RANGE PRECEDING is only supported with UNBOUNDED

Is there some principle problem in implementing such kind of window?
May be I missed something, but it seems to me that it should not be very 
difficult.
There is update_frameheadpos function which adjusts head position of 
windows in "rows" mode and

reports error in rows mode:

if (frameOptions & FRAMEOPTION_ROWS)
{
/* In ROWS mode, bound is physically n before/after current */
int64offset = 
DatumGetInt64(winstate->startOffsetValue);


if (frameOptions & FRAMEOPTION_START_VALUE_PRECEDING)
offset = -offset;

winstate->frameheadpos = winstate->currentpos + offset;
/* frame head can't go before first row */
if (winstate->frameheadpos < 0)
winstate->frameheadpos = 0;
else if (winstate->frameheadpos > winstate->currentpos)
{
/* make sure frameheadpos is not past end of partition */
spool_tuples(winstate, winstate->frameheadpos - 1);
if (winstate->frameheadpos > winstate->spooled_rows)
winstate->frameheadpos = winstate->spooled_rows;
}
winstate->framehead_valid = true;
}
else if (frameOptions & FRAMEOPTION_RANGE)
{
/* parser should have rejected this */
elog(ERROR, "window frame with value offset is not 
implemented");

}
else

The straightforward approach to support range mode is to advance head 
position until  "distance" between head and current row is less or 
equal  than specified range value. Looks like not something too complex 
to implement, doesn't it? Are there some caveats?
Certainly it assumes that window is ordered by key and the key type 
supports subtraction, so "text" can not be used here.

Something else?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXactId error after upgrade to 9.3.4

2016-06-17 Thread Andrew Gierth
> "Alvaro" == Alvaro Herrera  writes:

 >> (It can, AFAICT, be inside the currently valid range due to
 >> wraparound, i.e. without there being a valid pg_multixact entry for
 >> it, because AFAICT in 9.2, once the mxid is hinted dead it is never
 >> again either looked up or cleared, so it can sit in the tuple xmax
 >> forever, even through multiple wraparounds.)

 Alvaro> HeapTupleSatisfiesVacuum removes very old multixacts

It does nothing of the kind; it only marks them HEAP_XMAX_INVALID. The
actual mxid remains in the tuple xmax field.

The failing mxids in the case I analyzed on -bugs are failing _in spite
of_ being already hinted HEAP_XMAX_INVALID, because the code path in
question doesn't check that.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-17 Thread Andreas Joseph Krogh
På fredag 17. juni 2016 kl. 08:14:39, skrev Amit Kapila >:
On Fri, Jun 17, 2016 at 11:39 AM, Andreas Joseph Krogh > wrote: På torsdag 16. juni 2016 kl. 20:19:44, 
skrev Tom Lane >:
Amit Kapila 
> writes:
 > On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane > wrote:
 >> min_parallel_relation_size, or min_parallelizable_relation_size, or
 >> something like that?

 > You are right that such a variable will make it simpler to write tests for
 > parallel query.  I have implemented such a guc and choose to keep the name
 > as min_parallel_relation_size.

 Pushed with minor adjustments.  My first experiments with this say that
 we should have done this long ago:
https://www.postgresql.org/message-id/22782.1466100...@sss.pgh.pa.us 


 > One thing to note is that in function
 > create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
 > parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so
 > as to be consistent with guc.c.  I am not sure if it is advisable to use
 > PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.

 I agree that using INT_MAX is more consistent with the code elsewhere in
 guc.c, and more correct given that we declare the variable in question
 as int not int32.  But you need to include  to use INT_MAX ...

 regards, tom lane
 


As of 4c56f3269a84a81461cc53941e0eee02fc920ab6 I'm still getting it in one of 
my queries:
ORDER/GROUP BY expression not found in targetlist
 
I am working on preparing a patch to fix this issue.
 
 
Am I missing something?
 

 No, the fix is still not committed.



 
Ah, I thought Tom pushed a fix, but it apparently was another fix.
 
Thanks for fixing.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-17 Thread Amit Kapila
On Fri, Jun 17, 2016 at 11:39 AM, Andreas Joseph Krogh 
wrote:

> På torsdag 16. juni 2016 kl. 20:19:44, skrev Tom Lane :
>
> Amit Kapila  writes:
> > On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane  wrote:
> >> min_parallel_relation_size, or min_parallelizable_relation_size, or
> >> something like that?
>
> > You are right that such a variable will make it simpler to write tests
> for
> > parallel query.  I have implemented such a guc and choose to keep the
> name
> > as min_parallel_relation_size.
>
> Pushed with minor adjustments.  My first experiments with this say that
> we should have done this long ago:
> https://www.postgresql.org/message-id/22782.1466100...@sss.pgh.pa.us
>
> > One thing to note is that in function
> > create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
> > parallel_threshold to check for overflow, I have changed it to INT_MAX/3
> so
> > as to be consistent with guc.c.  I am not sure if it is advisable to use
> > PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.
>
> I agree that using INT_MAX is more consistent with the code elsewhere in
> guc.c, and more correct given that we declare the variable in question
> as int not int32.  But you need to include  to use INT_MAX ...
>
> regards, tom lane
>
>
> As of 4c56f3269a84a81461cc53941e0eee02fc920ab6 I'm still getting it in one
> of my queries:
> ORDER/GROUP BY expression not found in targetlist
>

I am working on preparing a patch to fix this issue.


> Am I missing something?
>

No, the fix is still not committed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist

2016-06-17 Thread Andreas Joseph Krogh
På torsdag 16. juni 2016 kl. 20:19:44, skrev Tom Lane >:
Amit Kapila  writes:
 > On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane  wrote:
 >> min_parallel_relation_size, or min_parallelizable_relation_size, or
 >> something like that?

 > You are right that such a variable will make it simpler to write tests for
 > parallel query.  I have implemented such a guc and choose to keep the name
 > as min_parallel_relation_size.

 Pushed with minor adjustments.  My first experiments with this say that
 we should have done this long ago:
 https://www.postgresql.org/message-id/22782.1466100...@sss.pgh.pa.us

 > One thing to note is that in function
 > create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
 > parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so
 > as to be consistent with guc.c.  I am not sure if it is advisable to use
 > PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.

 I agree that using INT_MAX is more consistent with the code elsewhere in
 guc.c, and more correct given that we declare the variable in question
 as int not int32.  But you need to include  to use INT_MAX ...

 regards, tom lane
 
As of 4c56f3269a84a81461cc53941e0eee02fc920ab6 I'm still getting it in one of 
my queries:
ORDER/GROUP BY expression not found in targetlist
  
Am I missing something?
 
I could dig into this further to reproduce if necessary.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: [HACKERS] pg_isready features

2016-06-17 Thread Craig Ringer
On 16 June 2016 at 00:40, Jimmy  wrote:

>
> (1) I can (and do) use psql, pg_isready seems lighter and since it already
> supports credentials and DB name, I see very little harm for pg_isready
> to say back whether user logged IN or not using these credentials.
>
>
> (2) timeout is not the same - timeout does not retry, its a simple timeout
> in case connection hangs, try it
>
>>
That threw me recently, too.

If your DB is in recovery, pg_isready with a long timeout won't wait until
it's accepting normal user connections or until timeout. There's no way to
get that behaviour without a shell script loop or similar.

So yeah, +1 for this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] 10.0

2016-06-17 Thread Craig Ringer
On 17 June 2016 at 08:34, Greg Stark  wrote:

> So we would release 10.0.0 and 10.0.1 and the next major release would be
> 11.0.0.
>
> This would have two benefits:
>
> 1) It emphasises that minor releases continue to be safe minor updates
> that offer the same stability guarantees. Users would be less likely to be
> intimidated by 10.0.1 than they would be 10.1. And it gives users a
> consistent story they can apply to any version whether 9.x or 10.0+
>

And matches semver.


> 2) If we ever do release incompatible feature releases on older branches
> -- or more likely some fork does -- it gives them a natural way to number
> their release.
>
Seems unlikely, though.

I thought about raising this, but I think in the end it's replacing one
confusing and weird versioning scheme for another confusing and weird
versioning scheme.

It does have the advantage that that compare a two-part major like 090401
vs 090402 won't be confused when they compare 100100 and 100200, since
it'll be 11 and 12. So it's more backward-compatible. But ugly.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services