[HACKERS] dead or outdated URLs found in win32.h

2017-10-06 Thread Ashutosh Sharma
Hi All,

The following URLs present in src/include/port/win32.h to share the
information on dllimport or dllexport (used in windows) seems to be
either dead/obsolete and i think, they need to be updated.

http://support.microsoft.com/kb/132044 -- dead
http://msdn.microsoft.com/en-us/library/8fskxacy(v=vs.80).aspx -- outdated
http://msdn.microsoft.com/en-us/library/a90k134d(v=vs.80).aspx -- outdated

https://msdn.microsoft.com/en-gb/library/aa489609.aspx -- outdated

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Pavel Stehule
2017-10-07 6:49 GMT+02:00 Nico Williams :

> On Sat, Oct 07, 2017 at 05:44:00AM +0200, Pavel Stehule wrote:
> > 2017-10-06 21:36 GMT+02:00 Nico Williams :
> > > But the nice thing about them is that you need only create them once,
> so
> > > leave them in the catalog.  Stats about them should not be gathered nor
> > > stored, since they could be different per-session.
> >
> > Unfortunately one field from pg_class are not static - reltuples should
> be
> > per session.
>
> It's "only an estimate" "used by the query planner".  We could estimate
> zero for global temp tables, and the query planner can get the true
> value from an internal temp table.
>

It can be solution.


> > But it can be moved to different table
>
> That too, if it's OK.
>
> Nico
> --
>


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Nico Williams
On Sat, Oct 07, 2017 at 05:44:00AM +0200, Pavel Stehule wrote:
> 2017-10-06 21:36 GMT+02:00 Nico Williams :
> > But the nice thing about them is that you need only create them once, so
> > leave them in the catalog.  Stats about them should not be gathered nor
> > stored, since they could be different per-session.
> 
> Unfortunately one field from pg_class are not static - reltuples should be
> per session.

It's "only an estimate" "used by the query planner".  We could estimate
zero for global temp tables, and the query planner can get the true
value from an internal temp table.

> But it can be moved to different table

That too, if it's OK.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Pavel Stehule
2017-10-06 21:36 GMT+02:00 Nico Williams :

> On Fri, Oct 06, 2017 at 08:51:53PM +0200, Pavel Stehule wrote:
> > 2017-10-06 20:39 GMT+02:00 Nico Williams :
> > > On Fri, Oct 06, 2017 at 06:37:57PM +0200, Pavel Stehule wrote:
> > > > When we talked about this topic, there are two issues:
> > > >
> > > > a) probably not too hard issue - some internal data can be in
> session sys
> > > > cache.
> > > >
> > > > b) the session sys data should be visible on SQL level too (for some
> > > tools
> > > > and consistency) - it is hard task.
> > >
> > > Can you expand on this?
> >
> > If global temporary tables should be effective, then you have not have
> > modify system catalogue after creating. But lot of processes requires it
> -
> > ANALYZE, query planning.
>
> But the nice thing about them is that you need only create them once, so
> leave them in the catalog.  Stats about them should not be gathered nor
> stored, since they could be different per-session.
>

Unfortunately one field from pg_class are not static - reltuples should be
per session.

But it can be moved to different table

Regards

Pavel


Re: [HACKERS] Discussion on missing optimizations

2017-10-06 Thread Andres Freund
On 2017-10-06 22:19:54 -0400, Tom Lane wrote:
> The impression I have in a quick scan is that probably hardly any of these
> are cases that any of the DB designers think are important in
> themselves.

> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods.  If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.

Partially agreed. A comment to the article also mentions that some other
database performs more optimizations depending on the cost of the
plan. That's not easy to do in our current plan structure, but I think
it's quite a worthwhile concept.


> Rather, they fall out of more general optimization attempts, or not,
> depending on the optimization mechanisms in use in a particular DB.
> For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
> comes out of a constant-subexpression-precalculation mechanism for us,
> whereas "WHERE column=column" doesn't fall to that approach.  ISTM it
> would be really dumb to expend planner cycles looking specifically for
> that case, so I guess that DB2 et al are finding it as a side-effect of
> some more general optimization ... I wonder what that is?
> 
> (edit: a few minutes later, I seem to remember that equivclass.c has
> to do something special with the X=X case, so maybe it could do
> something else special instead, with little new overhead.)

Yea, I think this should be inferrable from information we essentially
already compute for equivclasses.


> > (i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar
> > with OR)
> 
> > I can't remember proposals about adding this, but it seems worthwhile to
> > consider. I think we should be able to check for this without a lot of
> > planner overhead.
> 
> It would be enormously expensive to check that in the general case with
> a bunch of entries in each IN list.  Perhaps it would be OK to add on
> the presumption that few queries would contain multiple IN's on the same
> column in the first place, though.  Not sure.

Merging of ORs should be near free if you leave duplicates in there, and
the duplicates aren't a huge concern if your alternative is is to have
them, but also have two clauses to evaluate.

I think the IN AND IN case should usually end up a clear win as
well. Both lists are going to be evaluated for each row anyway - you
don't need a lot of rows where clauses are evaluated to make it worth
the O(n*log(n)) of sorting the lists.  Sorting them would be beneficial
for other reasons as well, e.g. it improves access patterns for SAO
index scans.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:
>> The article in question is here:
>> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

> That's interesting.

The impression I have in a quick scan is that probably hardly any of these
are cases that any of the DB designers think are important in themselves.
Rather, they fall out of more general optimization attempts, or not,
depending on the optimization mechanisms in use in a particular DB.
For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
comes out of a constant-subexpression-precalculation mechanism for us,
whereas "WHERE column=column" doesn't fall to that approach.  ISTM it
would be really dumb to expend planner cycles looking specifically for
that case, so I guess that DB2 et al are finding it as a side-effect of
some more general optimization ... I wonder what that is?

(edit: a few minutes later, I seem to remember that equivclass.c has
to do something special with the X=X case, so maybe it could do
something else special instead, with little new overhead.)

> (i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar
> with OR)

> I can't remember proposals about adding this, but it seems worthwhile to
> consider. I think we should be able to check for this without a lot of
> planner overhead.

It would be enormously expensive to check that in the general case with
a bunch of entries in each IN list.  Perhaps it would be OK to add on
the presumption that few queries would contain multiple IN's on the same
column in the first place, though.  Not sure.

> 9. Unneeded Self JOIN

> Can't remember discussions of this.

I can't get very excited about that one either.

In the end, what the article fails to consider is that all of these are
tradeoffs, not unalloyed goods.  If you spend planner cycles on every
query to look for cases that only the most unabashedly brain-dead ORMs
ever generate, you're not really doing your users a favor on balance.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Discussion on missing optimizations

2017-10-06 Thread Andres Freund
Hi,

On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:
> Hopefully it's alright for me to post this here, please let me know if not.

> I ran across an article on blog.jooq.org comparing all the major RDBMS'
> with their ability to optimize away unnecessary work with queries which are
> less than optimal, and saw some discussion on hackernews and reddit, but I
> hadn't seen any discussion here.
> 
> The article in question is here:
> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

That's interesting.


> Commercial databases seem to have a serious leg up in this area, and there
> are even some that MySQL has that Postgres doesn't.
> 
> I was wondering which of these are planned, which have had discussion
> before and decided not to support, and which just haven't been thought of?

Hm, going through the ones we don't [fully] support:

3. JOIN Elimination

There's been a lot of discussion and several patches. There's a bunch of
problems here, one being that there's cases (during trigger firing,
before the constraint checks) where foreign keys don't hold true, so we
can't quite generally make these optimization.  Another concern is
whether the added plan time is actually worthwhile.


4. Removing “Silly” Predicates

(i.e stuff like column = column)

This has deemed not to be a valuable use of plan time. I'm doubtful
about that argument, but that IIRC was the concensus the last time this
came up.


6. Predicate Merging

(i.e. combining a IN (a,b,c) and a IN (c,d,f) into a IN (c), similar
with OR)

I can't remember proposals about adding this, but it seems worthwhile to
consider. I think we should be able to check for this without a lot of
planner overhead.


7. Provably Empty Sets
8. CHECK Constraints

I think some of this should actually work with constraint exclusion
turned on.


9. Unneeded Self JOIN

Can't remember discussions of this.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Discussion on missing optimizations

2017-10-06 Thread Adam Brusselback
Hopefully it's alright for me to post this here, please let me know if not.

I ran across an article on blog.jooq.org comparing all the major RDBMS'
with their ability to optimize away unnecessary work with queries which are
less than optimal, and saw some discussion on hackernews and reddit, but I
hadn't seen any discussion here.

The article in question is here:
https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/

Commercial databases seem to have a serious leg up in this area, and there
are even some that MySQL has that Postgres doesn't.

I was wondering which of these are planned, which have had discussion
before and decided not to support, and which just haven't been thought of?

I thought i'd bring it up and hopefully others who are more knowledgeable
can chime in.


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen  wrote:
> Yesterday, I've been spending time with pg_visibility on the pages when I 
> reproduce the issue in 9.6.
> None of the all-frozen or all-visible bits are necessarily set in problematic 
> pages.

Since this happened yesterday, I assume it was with an unfixed version?

As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].

I've tried to independently reproduce the problem on the master
branch's current tip, with today's new fix, but cannot break things
despite trying many variations. I cannot reproduce the problem that
Alvaro still sees.

I'll have to wait until Alvaro posts his repro to the list before
commenting further, which I assume he'll post as soon as he can. There
doesn't seem to be much point in not waiting for that.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)

2017-10-06 Thread Badrul Chowdhury
Hi Tom and Robert, 

I added a mechanism to fall back to v3.0 if the BE fails to start when FE 
initiates a connection with v3.1 (with optional startup parameters). This 
completely eliminates the need to backpatch older servers, ie newer FE can 
connect to older BE. Please let me know what you think.

Thanks,
Badrul

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Wednesday, October 4, 2017 4:54 AM
To: Tom Lane 
Cc: Badrul Chowdhury ; Satyanarayana Narlapuram 
; Craig Ringer ; 
Peter Eisentraut ; Magnus Hagander ; 
PostgreSQL-development 
Subject: Re: [HACKERS] Re: protocol version negotiation (Re: Libpq 
PGRES_COPY_BOTH - version compatibility)

On Tue, Oct 3, 2017 at 9:46 PM, Tom Lane  wrote:
> Badrul Chowdhury  writes:
>> 1. Pgwire protocol v3.0 with negotiation is called v3.1.
>> 2. There are 2 patches for the change: a BE-specific patch that will be 
>> backported and a FE-specific patch that is only for pg10 and above.
>
> TBH, anything that presupposes a backported change in the backend is 
> broken by definition.  We expect libpq to be able to connect to older 
> servers, and that has to include servers that didn't get this memo.
>
> It would be all right for libpq to make a second connection attempt if 
> its first one fails, as we did in the 2.0 -> 3.0 change.

Hmm, that's another approach, but I prefer the one advocated by Tom Lane.

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F30788.1498672033%40sss.pgh.pa.us=02%7C01%7Cbachow%40microsoft.com%7Cd183fe16a3a445f4bc7c08d50b1e9e9e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636427148510331370=jLwhk6twUrlsm9K6yLronVvg%2Fjx93MM37UXm6NndfLY%3D=0
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fmessage-id%2F24357.1498703265%2540sss.pgh.pa.us=02%7C01%7Cbachow%40microsoft.com%7Cd183fe16a3a445f4bc7c08d50b1e9e9e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636427148510331370=gtFfNcxR3qK7rzieQQ0EAOFn%2BsDsw8rjtQeWwyIv6EY%3D=0

--
Robert Haas
EnterpriseDB: 
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com=02%7C01%7Cbachow%40microsoft.com%7Cd183fe16a3a445f4bc7c08d50b1e9e9e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636427148510331370=wf9cTkQEnRzkdaZxZ1D6NBY9kZbiViyni5lkA7nzEXM%3D=0
The Enterprise PostgreSQL Company


pgwire3.1.patch
Description: pgwire3.1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] separate serial_schedule useful?

2017-10-06 Thread Tom Lane
Peter Eisentraut  writes:
> I noticed that the test "hash_func" was listed in parallel_schedule but
> not in serial_schedule.  I have seen that a few times recently where a
> patch proposes to add a new test file but forgets to add it to the
> serial_schedule.

Yeah, this is way too routine :-(

> I wonder whether it's still useful to keep two separate test lists.  I
> think we could just replace make installcheck with what make
> installcheck-parallel MAX_CONNECTIONS=1 does.  Thoughts?

Hm, that seems like potentially a good idea.  I can't see an argument
against it offhand.

The other routine mistake, which I see Robert just made again,
is to break the at-most-twenty-parallel-tests-at-once convention.
I wonder if we can get in some sort of automated check for that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] separate serial_schedule useful?

2017-10-06 Thread Peter Eisentraut
I noticed that the test "hash_func" was listed in parallel_schedule but
not in serial_schedule.  I have seen that a few times recently where a
patch proposes to add a new test file but forgets to add it to the
serial_schedule.

I wonder whether it's still useful to keep two separate test lists.  I
think we could just replace make installcheck with what make
installcheck-parallel MAX_CONNECTIONS=1 does.  Thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Nico Williams
On Fri, Oct 06, 2017 at 08:51:53PM +0200, Pavel Stehule wrote:
> 2017-10-06 20:39 GMT+02:00 Nico Williams :
> > On Fri, Oct 06, 2017 at 06:37:57PM +0200, Pavel Stehule wrote:
> > > When we talked about this topic, there are two issues:
> > >
> > > a) probably not too hard issue - some internal data can be in session sys
> > > cache.
> > >
> > > b) the session sys data should be visible on SQL level too (for some
> > tools
> > > and consistency) - it is hard task.
> >
> > Can you expand on this?
> 
> If global temporary tables should be effective, then you have not have
> modify system catalogue after creating. But lot of processes requires it -
> ANALYZE, query planning.

But the nice thing about them is that you need only create them once, so
leave them in the catalog.  Stats about them should not be gathered nor
stored, since they could be different per-session.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-06 Thread Robert Haas
On Fri, Oct 6, 2017 at 3:07 PM, Ashutosh Bapat
 wrote:
> On Fri, Oct 6, 2017 at 8:45 PM, Robert Haas  wrote:
>> I think this is very good work and I'm excited about the feature.  Now
>> I'll wait to see whether the buildfarm, or Tom, yell at me for
>> whatever problems this may still have...
>
> Buildfarm animal prion turned red. Before going into that failure,
> good news is that the other animals are green. So the plans are
> stable.
>
> prion runs the regression with -DRELCACHE_FORCE_RELEASE, which
> destroys a relcache entry as soon as its reference count drops down to
> 0. This destroys everything that's there in corresponding relcache
> entry including partition key information and partition descriptor
> information. find_partition_scheme() and set_relation_partition_info()
> both assume that the relcache information will survive as long as the
> relation lock is held. They do not copy the relevant partitioning
> information but just copy the pointers. That assumption is wrong.
> Because of -DRELCACHE_FORCE_RELEASE, as soon as refcount drops to
> zero, the data in partition scheme and partition bounds goes invalid
> and various checks to see if partition wise join is possible fail.
> That causes partition_join test to fail on prion. But I think, the bug
> could cause crash as well.
>
> The fix is to copy the relevant partitioning information from relcache
> into PartitionSchemeData and RelOptInfo. Here's a quick patch with
> that fix.

Committed.  I hope that makes things less red rather than more,
because I'm going to be AFK for a few hours anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan  wrote:
>> I don't know if it's really the freeze map at fault or something else.
>
> Ideally, it would be possible to effectively disable the new freeze
> map stuff in a minimal way, for testing purposes. Perhaps the authors
> of that patch, CC'd, can suggest a way to do that.

Actually, the simplest thing might be to just use pg_visibility's
pg_check_frozen() to check that the visibility/freeze map accurately
summarizes the all-frozen status of tuples in the heap. If that
doesn't indicate that there is corruption, we can be fairly confident
that the problem is elsewhere. The metadata in the visibility/freeze
map should be accurate when a bit is set to indicate that an entire
heap page is all-frozen (or, separately, all-visible). We can hardly
expect it to have better information that the authoritative source of
truth, the heap itself.

The more I think about it, the more I tend to doubt that the remaining
problems are with the freeze map. If the freeze map was wrong, and
incorrectly said that a page was all-frozen, then surely the outward
symptoms would take a long time to show up, as they always do when we
accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
that that's the only meaningful way that the freeze map can be wrong
-- it only promises to be accurate when it says that no further
freezing is needed for a page/bit.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-10-06 Thread Mike Rylander
On Fri, Oct 6, 2017 at 1:22 PM, Paul A Jungwirth
 wrote:
> On Fri, Jul 22, 2016 at 4:15 AM, Anton Dignös  wrote:
>> We would like to contribute to PostgreSQL a solution that supports the query
>> processing of "at each time point". The basic idea is to offer two new
>> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
>> ranges of tuples so that subsequent queries can use the usual grouping and
>> equality conditions to get the intended results.
>
> I just wanted to chime in and say that the work these people have done
> is *amazing*. I read two of their papers yesterday [1, 2], and if you
> are interested in temporal data, I encourage you to read them too. The
> first one is only 12 pages and quite readable. After that the second
> is easy because it covers a lot of the same ground but adds "scaling"
> of values when a tuple is split, and some other interesting points.
> Their contributions could be used to implement SQL:2011 syntax but go
> way beyond that.
>

I've also been following this feature with great interest, and would
definitely throw whatever tiny weight I have, sitting out here in the
the peanut gallery, behind accepting the ALIGN and NORMALIZE syntax.
I estimate that about a third of the non-trivial queries in the
primary project I work on (and have, on Postgres, for the last 13+
years) would be simpler with support of the proposed syntax, and some
of the most complex business logic would be simplified nearly to the
point of triviality.

Anyway, that's my $0.02.

Thank you, Anton and Peter!

-- Mike


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-06 Thread Ashutosh Bapat
On Fri, Oct 6, 2017 at 8:45 PM, Robert Haas  wrote:
>
> I think this is very good work and I'm excited about the feature.  Now
> I'll wait to see whether the buildfarm, or Tom, yell at me for
> whatever problems this may still have...
>

Buildfarm animal prion turned red. Before going into that failure,
good news is that the other animals are green. So the plans are
stable.

prion runs the regression with -DRELCACHE_FORCE_RELEASE, which
destroys a relcache entry as soon as its reference count drops down to
0. This destroys everything that's there in corresponding relcache
entry including partition key information and partition descriptor
information. find_partition_scheme() and set_relation_partition_info()
both assume that the relcache information will survive as long as the
relation lock is held. They do not copy the relevant partitioning
information but just copy the pointers. That assumption is wrong.
Because of -DRELCACHE_FORCE_RELEASE, as soon as refcount drops to
zero, the data in partition scheme and partition bounds goes invalid
and various checks to see if partition wise join is possible fail.
That causes partition_join test to fail on prion. But I think, the bug
could cause crash as well.

The fix is to copy the relevant partitioning information from relcache
into PartitionSchemeData and RelOptInfo. Here's a quick patch with
that fix.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 3a8306a..ebda85e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -702,6 +702,74 @@ partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
 }
 
 /*
+ * Return a copy of given PartitionBoundInfo structure. The data types of bounds
+ * are described by given partition key specificiation.
+ */
+extern PartitionBoundInfo
+partition_bounds_copy(PartitionBoundInfo src,
+	  PartitionKey key)
+{
+	PartitionBoundInfo	dest;
+	int		i;
+	int		ndatums;
+	int		partnatts;
+	int		num_indexes;
+
+	dest = (PartitionBoundInfo) palloc(sizeof(PartitionBoundInfoData));
+
+	dest->strategy = src->strategy;
+	ndatums = dest->ndatums = src->ndatums;
+	partnatts = key->partnatts;
+
+	/* Range partitioned table has an extra index. */
+	num_indexes = key->strategy == PARTITION_STRATEGY_RANGE ? ndatums + 1 : ndatums;
+
+	/* List partitioned tables have only a single partition key. */
+	Assert(key->strategy != PARTITION_STRATEGY_LIST || partnatts == 1);
+
+	dest->datums = (Datum **) palloc(sizeof(Datum *) * ndatums);
+
+	if (src->kind != NULL)
+	{
+		dest->kind = (PartitionRangeDatumKind **) palloc(ndatums *
+sizeof(PartitionRangeDatumKind *));
+		for (i = 0; i < ndatums; i++)
+		{
+			dest->kind[i] = (PartitionRangeDatumKind *) palloc(partnatts *
+sizeof(PartitionRangeDatumKind));
+
+			memcpy(dest->kind[i], src->kind[i],
+   sizeof(PartitionRangeDatumKind) * key->partnatts);
+		}
+	}
+	else
+		dest->kind = NULL;
+
+	for (i = 0; i < ndatums; i++)
+	{
+		int		j;
+		dest->datums[i] = (Datum *) palloc(sizeof(Datum) * partnatts);
+
+		for (j = 0; j < partnatts; j++)
+		{
+			if (dest->kind == NULL ||
+dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
+dest->datums[i][j] = datumCopy(src->datums[i][j],
+			   key->parttypbyval[j],
+			   key->parttyplen[j]);
+		}
+	}
+
+	dest->indexes = (int *) palloc(sizeof(int) * num_indexes);
+	memcpy(dest->indexes, src->indexes, sizeof(int) * num_indexes);
+
+	dest->null_index = src->null_index;
+	dest->default_index = src->default_index;
+
+	return dest;
+}
+
+/*
  * check_new_partition_bound
  *
  * Checks if the new partition's bound overlaps any of the existing partitions
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 93cc757..9d35a41 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1825,13 +1825,15 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
 			Relation relation)
 {
 	PartitionDesc partdesc;
+	PartitionKey  partkey;
 
 	Assert(relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
 	partdesc = RelationGetPartitionDesc(relation);
+	partkey = RelationGetPartitionKey(relation);
 	rel->part_scheme = find_partition_scheme(root, relation);
 	Assert(partdesc != NULL && rel->part_scheme != NULL);
-	rel->boundinfo = partdesc->boundinfo;
+	rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey);
 	rel->nparts = partdesc->nparts;
 	set_baserel_partition_key_exprs(relation, rel);
 }
@@ -1888,18 +1890,33 @@ find_partition_scheme(PlannerInfo *root, Relation relation)
 
 	/*
 	 * Did not find matching partition scheme. Create one copying relevant
-	 * information from the relcache. Instead of copying whole arrays, copy
-	 * the pointers in relcache. It's safe to do so since
-	 * RelationClearRelation() wouldn't change it while 

Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Pavel Stehule
2017-10-06 20:39 GMT+02:00 Nico Williams :

> On Fri, Oct 06, 2017 at 06:37:57PM +0200, Pavel Stehule wrote:
> > 2017-10-06 6:48 GMT+02:00 Nico Williams :
> > > On Fri, Oct 06, 2017 at 04:52:09AM +0200, Pavel Stehule wrote:
> > > > Current TEMP tables, if you do it for any session has pretty
> significant
> > > > overhead  - with possible risk of performance lost (system catalog
> > > bloat).
> > >
> > > Because of the DDLs for them?
> >
> > yes - pg_attribute, pg_class, pg_stats are bloating - and when these
> tables
> > are bloated, then DDL is slow.
>
> :(
>
> > > No, I want GLOBAL TEMP tables.
> >
> > me too :) - and lot of customer and users.
>
> > I though about it, but I have other on my top priority. GLOBAL TEMP TABLE
> > is on 90% unlogged table. But few fields should be session based instead
> > shared persistent - statistics, rows in pg_class, filenode.
>
> Unlogged tables don't provide isolation between sessions the way temp
> tables do, so I don't see the connection.
>
> But the necessary components (temp heaps and such) are all there, and I
> suspect a PoC could be done fairly quickly.  But there are some
> subtleties like that FKs between GLOBAL TEMP and persistent tables must
> not be allowed (in either direction), so a complete implementation will
> take significant work.
>
> The work looks like:
>
>  - add syntax (trivial)
>
>  - add new kind of persistence (lots of places to touch, but it's mostly
>mechanical)
>
>  - redirect all references to global temp table contents to temp
>heaps/indexes/whatever
>
>  - add logic to prevent FKs between persistent and global temp tables
>
>  - what else?
>
> > When we talked about this topic, there are two issues:
> >
> > a) probably not too hard issue - some internal data can be in session sys
> > cache.
> >
> > b) the session sys data should be visible on SQL level too (for some
> tools
> > and consistency) - it is hard task.
>
> Can you expand on this?
>

If global temporary tables should be effective, then you have not have
modify system catalogue after creating. But lot of processes requires it -
ANALYZE, query planning.

>
> Nico
> --
>


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Nico Williams
On Fri, Oct 06, 2017 at 06:37:57PM +0200, Pavel Stehule wrote:
> 2017-10-06 6:48 GMT+02:00 Nico Williams :
> > On Fri, Oct 06, 2017 at 04:52:09AM +0200, Pavel Stehule wrote:
> > > Current TEMP tables, if you do it for any session has pretty significant
> > > overhead  - with possible risk of performance lost (system catalog
> > bloat).
> >
> > Because of the DDLs for them?
> 
> yes - pg_attribute, pg_class, pg_stats are bloating - and when these tables
> are bloated, then DDL is slow.

:(

> > No, I want GLOBAL TEMP tables.
> 
> me too :) - and lot of customer and users.

> I though about it, but I have other on my top priority. GLOBAL TEMP TABLE
> is on 90% unlogged table. But few fields should be session based instead
> shared persistent - statistics, rows in pg_class, filenode.

Unlogged tables don't provide isolation between sessions the way temp
tables do, so I don't see the connection.

But the necessary components (temp heaps and such) are all there, and I
suspect a PoC could be done fairly quickly.  But there are some
subtleties like that FKs between GLOBAL TEMP and persistent tables must
not be allowed (in either direction), so a complete implementation will
take significant work.

The work looks like:

 - add syntax (trivial)

 - add new kind of persistence (lots of places to touch, but it's mostly
   mechanical)

 - redirect all references to global temp table contents to temp
   heaps/indexes/whatever

 - add logic to prevent FKs between persistent and global temp tables

 - what else?

> When we talked about this topic, there are two issues:
> 
> a) probably not too hard issue - some internal data can be in session sys
> cache.
> 
> b) the session sys data should be visible on SQL level too (for some tools
> and consistency) - it is hard task.

Can you expand on this?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 10:49 AM, Alvaro Herrera  wrote:
> I can tell that, in 9.6, REINDEX still reports the error we saw in
> earlier releases, after some of the runs of my reproducer scripts.  I'm
> unable to reproduce it anymore in 9.3 to 9.5.  I can't see the one Dan
> originally reported anywhere, either.

You mean the enhanced stress-test that varied fillfactor, added filler
columns, and so on [1]? Can you post that to the list, please? I think
that several of us would like to have a reproducible test case.

> I don't know if it's really the freeze map at fault or something else.

Ideally, it would be possible to effectively disable the new freeze
map stuff in a minimal way, for testing purposes. Perhaps the authors
of that patch, CC'd, can suggest a way to do that.

If I had to guess, I'd say that it's just as likely that the issue is
only reproducible on 9.6 because of the enhancements added in that
release that improved buffer pinning (the use of atomic ops to pin
buffers, moving buffer content locks into buffer descriptors, etc). It
was already a bit tricky to get the problem that remained after
20b6552 but before today's a5736bf to reproduce with Dan's script. It
often took me 4 or 5 attempts. (I wonder what it looks like with your
enhanced version of that script -- the one that I just asked about.)

It seems possible that we've merely reduced the window for the race to
the point that it's practically (though not theoretically) impossible
to reproduce the problem on versions < 9.6, though not on 9.6+.
Applying Occam's razor, the problem doesn't seem particularly likely
to be in the freeze map stuff, which isn't actually all that closely
related.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-06 Thread Maksim Milyutin

Hi!


On 06.10.2017 19:37, Alvaro Herrera wrote:

As you propose, IMO this new feature would use the standard index
creation syntax:
CREATE [UNIQUE] INDEX someindex ON parted_table (a, b);

This command currently throws an error.  We'd have it do two things:

1. create catalog rows in pg_class and pg_index for the main table,
indicating the existance of this partitioned index.  These would not
point to an actual index (since the main table itself is empty), but
instead to an "abstract" index.  This abstract index can be used by
various operations; see below.


Robert Haas proposed[1] to use the name "partitioned index" (instead of 
abstract) that have to be reflected in 'relkind' field of pg_class as 
the RELKIND_PARTITIONED_INDEX value.



I propose that an index declared UNIQUE throws an error for now.  We can
implement uniqueness (for the case where the indexed columns match the
partitioning key) later, once we sort out all the issues here first.  I
think unique indexes would be very useful even with that limitation, but
let's have it as a separate project.


Yes, global uniqueness through local unique indexes causes further work 
related with foreign keys on partitioned table, full support of INSERT 
OF CONFLICT, etc. It make sense to implement after the current stage of 
work.



I think using pg_depend as the mechanism to link partition indexes to
parent is a bad idea.  How about pg_inherits instead?  Seems more
appropriate.


Greg Stark proposed[2] to use new pg_index field of oid type that refers 
to the parent pg_index item. AFAIC pg_inherits also makes sense but 
semantically it deals with inheriting tables. IMHO the using of this 
catalog table to define relation between partitioned table and 
partitions looks like a hack to make use of constraint exclusion logic 
for partition pruning.



Creating hierachy-wide indexes for existing partitioned tables is likely
to take a long time, so we must include CONCURRENTLY as an option.  This
will need some transaction machinery support in order to ensure that
each partition gets its index created at some point in a long chain of
transactions, and that the whole thing is marked valid only at the end.
Also, if the creation is interrupted (because of a crash or a regular
shutdown), it'll be useful to be able to continue rather than being
forced to start from scratch.


This option was very difficult for me. I would be interested to see the 
implementation.



During ALTER TABLE ... ATTACH PARTITION, we check that an indexing
satisfying the abstract index exist (and we create a pg_inherit link).
If not, the command is aborted.


We could create necessary index for partition, not abort ALTER TABLE ... 
ATTACH PARTITION statement.



We need to come up with some way to generate names for each partition
index.


I think the calling 'ChooseIndexName(RelationGetRelationName(childrel), 
namespaceId, indexColNames, ...)' resolves this problem.



I am going to work on this now.


It will be great! I think this project is difficult for me on the part 
of integration described above functionality with the legacy postgres 
code. Also IMO this project is very important because it opens the way 
for such feature as global uniqueness of fields of partitioned tables. 
And any protraction in implementation is bad.


I would like to review and test your intermediate results.


1. 
https://www.postgresql.org/message-id/CA%2BTgmoY5UOUnW%3DMcwT7xUB_2W5dAkvOg5kD20Spx5gF-Ad47cA%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAM-w4HOVftuv5RVi3a%2BsRV6nBpg204w7%3DL8MwPXVvYBFo1uM1Q%40mail.gmail.com


--
Regards,
Maksim Milyutin



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 7:59 AM, Alvaro Herrera  wrote:
> By the way, I still wonder if there's any way for a new tuple to get
> inserted in the place where a HOT redirect would be pointing to, and
> have it be marked as Frozen, where the old redirect contains a
> non-invalid Xmax.  I tried to think of a way for that to happen, but
> couldn't think of anything.
>
> What I imagine is a sequence like this:
>
> 1. insert a tuple
> 2. HOT-update a tuple
> 3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
> 4. start transaction
> 5. HOT-update the tuple again, creating HOT in lp 3
> 6. abort transaction (leaving aborted update in lp 3)
> 7. somehow remove tuple from lp 3, make slot available
> 8. new transaction comes along, inserts tuple in lp 3
> 9. somebody freezes tuple in lp3 (???)
>
> Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
> the tuple is part of the chain because of an xid "match".
>
> Basically from step 7 onwards I don't think this is possible, but maybe
> I'm just blind.

For the record, I also think that this is impossible, in part because
pruning requires a cleanup buffer lock (and because HOT chains cannot
span pages). I wouldn't say that I am 100% confident about this,
though.

BTW, is this comment block that appears above
heap_prepare_freeze_tuple() now obsolete, following 20b65522 (and
maybe much earlier commits)?

 * NB: It is not enough to set hint bits to indicate something is
 * committed/invalid -- they might not be set on a standby, or after crash
 * recovery.  We really need to remove old xids.
 */

We WAL-log setting hint bits during freezing now, iff tuple xmin is
before the Xid cutoff and tuple is a heap-only tuple.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera  
> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed.  There must be something new in
> > 9.6 that is causing the problem to reappear.
> 
> What problem persists? The original one (or, at least, the original
> symptom of pruning HOT chains incorrectly)? If that's what you mean, I
> wouldn't be so quick to assume that it's the freeze map.

I can tell that, in 9.6, REINDEX still reports the error we saw in
earlier releases, after some of the runs of my reproducer scripts.  I'm
unable to reproduce it anymore in 9.3 to 9.5.  I can't see the one Dan
originally reported anywhere, either.

I don't know if it's really the freeze map at fault or something else.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-10-06 Thread Paul A Jungwirth
On Fri, Jul 22, 2016 at 4:15 AM, Anton Dignös  wrote:
> We would like to contribute to PostgreSQL a solution that supports the query
> processing of "at each time point". The basic idea is to offer two new
> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
> ranges of tuples so that subsequent queries can use the usual grouping and
> equality conditions to get the intended results.

I just wanted to chime in and say that the work these people have done
is *amazing*. I read two of their papers yesterday [1, 2], and if you
are interested in temporal data, I encourage you to read them too. The
first one is only 12 pages and quite readable. After that the second
is easy because it covers a lot of the same ground but adds "scaling"
of values when a tuple is split, and some other interesting points.
Their contributions could be used to implement SQL:2011 syntax but go
way beyond that.

Almost every project I work on could use temporal database support,
but there is nothing available in the Open Source world. The
temporal_tables extension [3] offers transaction-time support, which
is great for auditing, but it has no valid-time support (aka
application-time or state-time). Same with Magnus Hagander's TARDIS
approach [4], Chronomodel [5] (an extension to the Rails ORM), or any
other project I've seen. But valid-time is the more valuable
dimension, because it tells you the history of the thing itself (not
just when the database was changed). Also nothing is even attempting
full bitemporal support.

The ideas behind temporal data are covered extensively in Snodgrass's
1999 book [6], which shows how valuable it is to handle temporal data
in a principled way, rather than ad hoc. But that book also
demonstrates how complex the queries become to do things like temporal
foreign key constraints and temporal joins. I was sad to learn that
his proposed TSQL2 was rejected as a standard back in the 90s,
although the critiques by C. J. Date [7] have some merit. In
particular, since TSQL2 used *statement* modifiers, some of the
behavior was unclear or bad when using subqueries, views, and
set-returning functions. It makes more sense to have temporal
*operators*, so alongside inner join you have temporal inner join, and
likewise with temporal left outer join, temporal
union/intersection/difference, temporal aggregation, etc. (I think the
drawbacks of TSQL2 came from pursuing an unachievable goal, which was
to enable seamlessly converting existing non-temporal tables to
temporal without breaking any queries.)

Another unsatisfactory approach at historical data, from the industry
rather than academia, is in chapter 4 and elsewhere of Ralph Kimball's
*Data Warehouse Toolkit* [8]. His first suggestion (Type 1 Dimensions)
is to ignore the problem and overwrite old data with new. His Type 2
approach (make a new row) is better but loses the continuity between
the old row and the new. Type 3 fixes that but supports only one
change, not several. And anyway his ideas are tailored to star-schema
designs so are not as broadly useful. Workarounds like bridge tables
and "put the data in the fact table" are even more wedded to a
star-schema approach. But I think his efforts do show how valuable
historical data is, and how hard it is to handle without built-in
support.

As far as I can tell SQL:2011 avoids the statement modifier problem
(I'm not 100% sure), but it is quite limited, mostly covering
transaction-time semantics and not giving any way to do valid-time
outer joins or aggregations. It is clearly an early first step.
Unfortunately the syntax feels (to me) crippled by over-specificity,
like it will have a hard time growing to support all the things you'd
want to do.

The research by Dignös et al shows how you can define temporal
variants for every operator in the relational algebra, and then
implement them by using just two transformations (ALIGN and NORMALIZE)
combined with the existing non-temporal operators. It has a strong
theoretical basis and avoids the TSQL2 problems with composability.
And unlike SQL:2011 it has a great elegance and completeness I haven't
seen anywhere else.

I believe with range types the approach was to build up useful
primitives rather than jumping straight to a less-factored full
implementation of temporal features. (This in spite of SQL:2011
choosing to model begin/end times as separate columns, not as ranges.
:-) It seems to me the Dignös work follows the same philosophy. Their
ALIGN and NORMALIZE could be used to implement SQL:2011 features, but
they are also useful for much more. In their papers they actually
suggest that these transformations need not be exposed to end-users,
although it was convenient to have access to them for their own
research. I think it'd be great if Postgres's SQL dialect supported
them though, since SQL:2011 leaves out so much.

Anyway, I wanted to thank them for their excellent work, their
generosity, and also their 

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera  wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed.  There must be something new in
> 9.6 that is causing the problem to reappear.

What problem persists? The original one (or, at least, the original
symptom of pruning HOT chains incorrectly)? If that's what you mean, I
wouldn't be so quick to assume that it's the freeze map.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] A hook for session start

2017-10-06 Thread Pavel Stehule
2017-10-06 6:48 GMT+02:00 Nico Williams :

> On Fri, Oct 06, 2017 at 04:52:09AM +0200, Pavel Stehule wrote:
> > 2017-10-05 22:31 GMT+02:00 Nico Williams :
> > > On Tue, Aug 01, 2017 at 03:36:23PM -0400, Peter Eisentraut wrote:
> > > > On 7/21/17 13:14, Jim Mlodgenski wrote:
> > > > > When I first saw this thread, my initial thought of a use case is
> to
> > > > > prepare some key application queries so they are there and ready
> to go.
> > > > > That would need to be before the ExecutorStart_hook or
> > > > > ProcessUtility_hook if an app would just want to execute the
> prepared
> > > > > statement.
> > > >
> > > > Isn't that what the preprepare extension does already?
> > >
> > > more generic facility -> more useful
> > >
> > > My use case is to pre-create TEMP schema elements that VIEWs,
> FUNCTIONs,
> > > and TRIGGERs, might need.
> >
> > It is better to work on GLOBAL TEMP tables.
>
> I don't disagree.
>
> In fact, I was scoping out what it might take to do that just yesterday.
>
> I've too thoughts on that: either a new relpersistence kind that is very
> similar to persistent, but which always uses temp heaps, or a modifier
> for the persistent kind that says to use temp heaps.  Either way it
> looks like it should be fairly straightforward (but then, i've only
> ever written one thing for PG, earlier this week, the ALWAYS DEFERRED
> thing).
>
> > Current TEMP tables, if you do it for any session has pretty significant
> > overhead  - with possible risk of performance lost (system catalog
> bloat).
>
> Because of the DDLs for them?
>

yes - pg_attribute, pg_class, pg_stats are bloating - and when these tables
are bloated, then DDL is slow.



> > So often creating local temp tables is antipattern (in Postgres)
> > unfortunately.
>
> I do it plenty, but sometimes I use an UNLOGGED table with a txid column
> in the PK set to txid_current(), then I clean up where I can.  It'd be
> nice to have COMMIT triggers for cleaning up such rows, among other
> things.  I've implemented that using DDL event triggers, but to perform
> well it needs to be a native feature.
>
> > I am not sure, if we should to support this case more :( Probably is
> > better, so it is hard to use local TEMP tables.
>
> No, I want GLOBAL TEMP tables.
>

me too :) - and lot of customer and users.

There is a workaround - you can use a array instead temp tables in 50%. But
it is not a solution in other 50%.

I though about it, but I have other on my top priority. GLOBAL TEMP TABLE
is on 90% unlogged table. But few fields should be session based instead
shared persistent - statistics, rows in pg_class, filenode.

When we talked about this topic, there are two issues:

a) probably not too hard issue - some internal data can be in session sys
cache.

b) the session sys data should be visible on SQL level too (for some tools
and consistency) - it is hard task.

Regards

Pavel


> Nico
> --
>


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-06 Thread Alvaro Herrera
Hello

I've been thinking about this issue too.  I think your patch is not
ambitious enough.  Here's my brain dump on the issue, to revive
discussion.

As you propose, IMO this new feature would use the standard index
creation syntax:
   CREATE [UNIQUE] INDEX someindex ON parted_table (a, b);

This command currently throws an error.  We'd have it do two things:

1. create catalog rows in pg_class and pg_index for the main table,
   indicating the existance of this partitioned index.  These would not
   point to an actual index (since the main table itself is empty), but
   instead to an "abstract" index.  This abstract index can be used by
   various operations; see below.

2. create one index for each existing partition.  These would be
   identical to what would happen if you created the index directly on
   each partition, except that there is an additional dependency to the
   parent's abstract index.

If partitions are themselves partitioned, we would recursively apply the
indexes to the sub-partitions by doing (1) above for the partitioned
partition.

Once the index has been created for all existing partitions, the
hierarchy-wide index becomes valid and can be used normally by the
planner/executor.  I think we could use the pg_index.indisvalid property
for the abstract index for this.

I propose that an index declared UNIQUE throws an error for now.  We can
implement uniqueness (for the case where the indexed columns match the
partitioning key) later, once we sort out all the issues here first.  I
think unique indexes would be very useful even with that limitation, but
let's have it as a separate project.

I think using pg_depend as the mechanism to link partition indexes to
parent is a bad idea.  How about pg_inherits instead?  Seems more
appropriate.


Creating hierachy-wide indexes for existing partitioned tables is likely
to take a long time, so we must include CONCURRENTLY as an option.  This
will need some transaction machinery support in order to ensure that
each partition gets its index created at some point in a long chain of
transactions, and that the whole thing is marked valid only at the end.
Also, if the creation is interrupted (because of a crash or a regular
shutdown), it'll be useful to be able to continue rather than being
forced to start from scratch.

When a new partition is added, indexes satisfying the partitioned
table's abstract indexes are created automatically.

During ALTER TABLE ... ATTACH PARTITION, we check that an indexing
satisfying the abstract index exist (and we create a pg_inherit link).
If not, the command is aborted.

REINDEX is easily supported (just reindex each partition's index
individually), but that command is blocking, which is not good.  For
concurrent operation for tables not partitioned, a typical pattern to
avoid blocking the table is to suggest creation of an identical index
using CREATE INDEX CONCURRENTLY, then drop the original one.  That's not
going to work with these partitioned indexes, which is going to be a
problem.  I don't have any great ideas about this part yet.

We need to come up with some way to generate names for each partition
index.


I am going to work on this now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Dilip Kumar
On Fri, Oct 6, 2017 at 7:24 PM, Dilip Kumar  wrote:
> On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
>  wrote:
>>
>>> Analysis: The estimated value of the lossy_pages is way higher than
>>> its actual value and reason is that the total_pages calculated by the
>>> "Mackert and Lohman formula" is not correct.
>>
>>
>> I think the problem might be that the total_pages includes cache effects and
>> rescans. For bitmap entries we should use something like relation pages *
>> selectivity.
>
> I have noticed that for the TPCH case if I use "pages * selectivity"
> it give me better results, but IMHO directly multiplying the pages
> with selectivity may not be the correct way to calculate the number of
> heap pages it can only give the correct result when all the TID being
> fetched are clustered.  But on the other hand "Mackert and Lohman
> formula" formulae consider that all the TID's are evenly distributed
> across the heap pages which can also give the wrong estimation like we
> are seeing in our TPCH case.

I agree with the point that the total_pages included the cache effects
and rescan when loop_count > 1, that can be avoided if we always
calculate heap_pages as it is calculated in the else part
(loop_count=0).  Fortunately, in all the TPCH query plan what I posted
up thread bitmap scan was never at the inner side of the NLJ so
loop_count was always 0.  I will fix this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-06 Thread Ashutosh Bapat
On Fri, Oct 6, 2017 at 8:45 PM, Robert Haas  wrote:
> On Fri, Oct 6, 2017 at 8:40 AM, Ashutosh Bapat
>  wrote:
>> Sorry. I sent a wrong file. Here's the real v37.
>
> Committed 0001-0006.  I made some assorted comment and formatting
> changes and two small substantive changes:
>
> - In try_nestloop_path, bms_free(outerrelids) before returning if we
> can't reparameterize.

Hmm. I missed that.

>
> - Moved the call to try_partition_wise_join inside
> populate_joinrel_with_paths, instead of always calling it just after
> that function is called.

This looks good too.

>
> I think this is very good work and I'm excited about the feature.

Thanks a lot Robert for detailed review and guidance. Thanks a lot
Rafia for benchmarking the feature with TPCH and esp. very large scale
database and also for testing and reported some real issues. Thanks
Rajkumar for testing it with an exhaustive testset. Thanks Amit
Langote, Thomas Munro, Dilip Kumar, Antonin Houska, Alvaro Herrera and
Amit Khandekar for their review comments and suggestions. Thanks
Jeevan Chalke, who used the patchset to implement partition-wise
aggregates and provided some insights offlist. Sorry if I have missed
anybody.

As Robert says in the commit message, there's more to do but now that
we have basic feature, improving it incrementally becomes a lot
easier.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On markers of changed data

2017-10-06 Thread Stephen Frost
Tom, Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Fri, Oct 6, 2017 at 11:22 PM, Tom Lane  wrote:
> > Andrey Borodin  writes:
> >> Is it safe to use file modification time to track that file were changes
> >> since previous backup?
> >
> > I'd say no:
> >
> > 1. You don't know the granularity of the filesystem's timestamps, at least
> > not without making unportable assumptions.
> >
> > 2. There's no guarantee that the system clock can't be set backwards.
> >
> > 3. It's not uncommon for filesystems to have optimizations whereby they
> > skip or delay some updates of file mtimes.  (I think this is usually
> > optional, but you couldn't know whether it's turned on.)
> >
> > #2 is probably the worst of these problems.
> 
> Or upwards. A simple example of things depending on clock changes is
> for example VM snapshotting. Any logic not depending on monotonic
> timestamps, with things like clock_gettime(CLOCK_MONOTONIC) is a lot
> of fun to investigate until you know that they are not using any
> monotonic logic... So the answer is *no*, do not depend on FS-level
> timestamps. The only sane method for Postgres is really to scan the
> page header LSNs, and of course you already know that.

Really, these comments appear, at least to me, to be based on an
incorrect assumption that's only considering how tools like rsync use
mtime.

No, you can't trust rsync-based backups that look at mtime and only copy
if the mtime of the source file is currently 'more recent' than the
mtime of the destination file.

That doesn't mean that mtime can't be used to perform incremental
backups using PG, but care has to be taken when doing so to minimize the
risk of a file getting skipped that should have been copied.

There's a few things to do to minimize that risk:

Use mtime only as an indication of if the file changed from the last
time you looked at it- doesn't matter if the mtime on the file is newer
or older.  If the mtime is *different*, then you can't trust that the
contents are the same and you need to include it in the backup.  Of
course, combine this with checking the file size has changed, but in PG
there's lots of files of the same size, so that's not a terribly good
indicator.

Further, you have to get the mtime of all the files *before* you start
backing them up.  If you take the mtime of the file at the time you
start actually copying it, then it could possibly be modified while you
copy it but without the mtime being updated from when you initially
pulled it (and that's not even talking about the concerns around the
clock time moving back and forth).  To address the granularity concern,
you should also be sure to wait after you collect all the mtimes but
before actually starting the backup to the level of granularity.  Any
optimization which delays setting the mtime would, certainly, still get
around to updating the mtime before the next backup runs and therefore
that file might get copied even though it hadn't changed, but that's
still technically correct, just slightly more work.  Lastly, don't trust
any times which are from after the time that you collected the mtimes-
either during the initial backup or when you are doing the subsequent
incremental.  Any file whose mtime is different *or* is from after the
time the mtimes were collected should be copied.

This isn't to say that there isn't some risk to using mtime, there still
is- if a backup is made of a file and its mtime collected, and then time
moves backwards, and the file is modified again at the *exact* same
time, leading the 'new' mtime to be identical to the 'old' mtime while
the file's contents are different, and that file is not subsequently
modified before the next backup happens, then the file might not be
included in the backup even though it should be.  

Other risks are just blatent corruption happening in the mtime field, or
a kernel-level bug that doesn't update mtime when it should, or the
kernel somehow resetting the mtime back after the file has been changed,
or someone explicitly setting the mtime back after changing a file, or
perhaps other such attacks, though eliminating all of those risks isn't
possible (regardless of solution- someone could go change the LSN on a
page too, for example, and foil a tool which was based on that).

These are risks which I'd love to remove, but they also strike me as
quite small and ones which practical users are willing to accept for
their incremental and differential backups, though it's a reason to also
take full backups regularly.

As Alvaro notes downthread, it's also the only reasonable option
available today.  It'd be great to have a better solution, and perhaps
one which summarizes the LSNs in each file would work and be better, but
that would also only be available for PG11, at the earliest.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-06 Thread Robert Haas
On Fri, Oct 6, 2017 at 8:40 AM, Ashutosh Bapat
 wrote:
> Sorry. I sent a wrong file. Here's the real v37.

Committed 0001-0006.  I made some assorted comment and formatting
changes and two small substantive changes:

- In try_nestloop_path, bms_free(outerrelids) before returning if we
can't reparameterize.

- Moved the call to try_partition_wise_join inside
populate_joinrel_with_paths, instead of always calling it just after
that function is called.

I think this is very good work and I'm excited about the feature.  Now
I'll wait to see whether the buildfarm, or Tom, yell at me for
whatever problems this may still have...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
By the way, I still wonder if there's any way for a new tuple to get
inserted in the place where a HOT redirect would be pointing to, and
have it be marked as Frozen, where the old redirect contains a
non-invalid Xmax.  I tried to think of a way for that to happen, but
couldn't think of anything.

What I imagine is a sequence like this:

1. insert a tuple
2. HOT-update a tuple
3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
4. start transaction
5. HOT-update the tuple again, creating HOT in lp 3
6. abort transaction (leaving aborted update in lp 3)
7. somehow remove tuple from lp 3, make slot available
8. new transaction comes along, inserts tuple in lp 3
9. somebody freezes tuple in lp3 (???)

Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
the tuple is part of the chain because of an xid "match".

Basically from step 7 onwards I don't think this is possible, but maybe
I'm just blind.


Maybe we can forestall the problem by checking whether the Xmax
TransactionIdIsCurrentTransaction || TransactionIdDidCommit (or some
variant thereof).  This would be very slow but safer; and in 9.4 and up
we'd only need to do it if the xmin value is actually FrozenXid which
should be rare (only in pages upgraded from 9.3).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

2017-10-06 Thread Konstantin Knizhnik



On 06.10.2017 15:29, Petr Jelinek wrote:

On 06/10/17 12:16, Konstantin Knizhnik wrote:

When creating logical replication slots we quite often get the following
error:

ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin
already is valid

which cause restart of WAL sender.
The comment to this line doesn't clarify much:

 /* so we don't overwrite the existing value */
 if (TransactionIdIsValid(MyPgXact->xmin))
 elog(ERROR, "cannot build an initial slot snapshot when
MyPgXact->xmin already is valid");
I wonder if it is normal situation or something goes wrong?


Hi,

no it's not normal situation, it seems you are doing something that
assigns xid before you run the CREATE_REPLICATION_SLOT command on that
connection.


I have not doing something in this connection: it is wal sender 
executing CREATE_REPLICATION_SLOT replication command.
Please look at the stack in my original e-mail. It shows who and when is 
setting MyPgXact->xmin.

It is GetSnapshotData called because of reloading configuration settings.
And configuration setting are reloaded because our application is 
updating "synchronous_standby_names".



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On markers of changed data

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote:

> The only sane method for Postgres is really to scan the
> page header LSNs, and of course you already know that.

I hope the idea is not to have to scan every single page in the
database, because that would be too slow.  It should be possible to
build this so that a single summary LSN is kept for a largish group of
pages, allowing large portions of the database to be skipped from even
being read if they are known not to contain any page newer than the
previous backup.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On markers of changed data

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 11:22 PM, Tom Lane  wrote:
> Andrey Borodin  writes:
>> Is it safe to use file modification time to track that file were changes
>> since previous backup?
>
> I'd say no:
>
> 1. You don't know the granularity of the filesystem's timestamps, at least
> not without making unportable assumptions.
>
> 2. There's no guarantee that the system clock can't be set backwards.
>
> 3. It's not uncommon for filesystems to have optimizations whereby they
> skip or delay some updates of file mtimes.  (I think this is usually
> optional, but you couldn't know whether it's turned on.)
>
> #2 is probably the worst of these problems.

Or upwards. A simple example of things depending on clock changes is
for example VM snapshotting. Any logic not depending on monotonic
timestamps, with things like clock_gettime(CLOCK_MONOTONIC) is a lot
of fun to investigate until you know that they are not using any
monotonic logic... So the answer is *no*, do not depend on FS-level
timestamps. The only sane method for Postgres is really to scan the
page header LSNs, and of course you already know that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On markers of changed data

2017-10-06 Thread Tom Lane
Andrey Borodin  writes:
> Is it safe to use file modification time to track that file were changes
> since previous backup?

I'd say no:

1. You don't know the granularity of the filesystem's timestamps, at least
not without making unportable assumptions.

2. There's no guarantee that the system clock can't be set backwards.

3. It's not uncommon for filesystems to have optimizations whereby they
skip or delay some updates of file mtimes.  (I think this is usually
optional, but you couldn't know whether it's turned on.)

#2 is probably the worst of these problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Stephen Frost
Robert,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> Michael Paquier wrote:
> > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera  
> > wrote:
> > > Wood, Dan wrote:
> > >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> > >>
> > >> I would prefer to focus on either latest 9X or 11dev.
> > >
> > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > > (with the patch, I waited 10x as many iterations as it took for the
> > > problem to occur ~10 times without the patch), but I can reproduce a
> > > problem in 9.6 with my patch installed.  There must be something new in
> > > 9.6 that is causing the problem to reappear.
> > 
> > The freeze visibility map has been introduced in 9.6... There could be
> > interactions on this side.
> 
> Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
> of researching this problem there, then.  I'll go commit what I have
> now.

I don't doubt you're watching this thread too, but just to be 110% sure
that we don't end up with the November releases still having this issue,
I'm adding you to the CC on this thread as the one who did the freeze
visibility map work.  Depending on hope here is a bit too squishy for me
when we're talking about corruption issues.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 10:57 PM, Alvaro Herrera  wrote:
> Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
> of researching this problem there, then.

I have some stuff using 9.6 extensively, so like Dan I think I'll
chime in anyway. Not before Tuesday though, long weekend in Japan
ahead.

> I'll go commit what I have now.

As far as I saw this set definitely improves the situation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Dilip Kumar
On Fri, Oct 6, 2017 at 7:04 PM, Dilip Kumar  wrote:
> On Fri, Oct 6, 2017 at 6:36 PM, Robert Haas  wrote:
>> On Fri, Oct 6, 2017 at 2:12 AM, Dilip Kumar  wrote:
 The performance results look good, but that's a slightly different
 thing from whether the estimate is accurate.

 +nbuckets = tbm_calculate_entires(maxbytes);

 entires?
>>>
>>> changed to
>>> + tbm->maxentries = (int) tbm_calculate_entires(maxbytes);
>>
>> My point was not that you should add a cast, but that you wrote
>> "entires" rather than "entries".
>
> My bad, I thought you were suggesting the variable name as "entries"
> instead of "nbuckets" so I removed the variable "nbuckets".  I will
> fix the typo in the function name and post the patch.

Fixed in the attached version.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


improve_bitmap_cost_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera  
> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed.  There must be something new in
> > 9.6 that is causing the problem to reappear.
> 
> The freeze visibility map has been introduced in 9.6... There could be
> interactions on this side.

Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
of researching this problem there, then.  I'll go commit what I have
now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Dilip Kumar
On Fri, Oct 6, 2017 at 6:08 PM, Alexander Kuzmenkov
 wrote:
>
>> Analysis: The estimated value of the lossy_pages is way higher than
>> its actual value and reason is that the total_pages calculated by the
>> "Mackert and Lohman formula" is not correct.
>
>
> I think the problem might be that the total_pages includes cache effects and
> rescans. For bitmap entries we should use something like relation pages *
> selectivity.

I have noticed that for the TPCH case if I use "pages * selectivity"
it give me better results, but IMHO directly multiplying the pages
with selectivity may not be the correct way to calculate the number of
heap pages it can only give the correct result when all the TID being
fetched are clustered.  But on the other hand "Mackert and Lohman
formula" formulae consider that all the TID's are evenly distributed
across the heap pages which can also give the wrong estimation like we
are seeing in our TPCH case.

>
> Meanwhile, I ran TPC-H briefly with the v3 patch: scale factor 25, 2
> workers, SSD storage.
> It shows significant improvement on 4MB work_mem and no change on 2GB.
>
> Here are the results (execution time in seconds, average of 5 runs):
> work_mem4MB2GB
> Query masterpatchmasterpatch
> 4179174168167
> 5190168155156
> 628087227229
> 8197114179172
> 10269227190192
> 14110108106105
>

Thanks for the testing number looks good to me.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera  wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed.  There must be something new in
> 9.6 that is causing the problem to reappear.

The freeze visibility map has been introduced in 9.6... There could be
interactions on this side.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Dilip Kumar
On Fri, Oct 6, 2017 at 6:36 PM, Robert Haas  wrote:
> On Fri, Oct 6, 2017 at 2:12 AM, Dilip Kumar  wrote:
>>> The performance results look good, but that's a slightly different
>>> thing from whether the estimate is accurate.
>>>
>>> +nbuckets = tbm_calculate_entires(maxbytes);
>>>
>>> entires?
>>
>> changed to
>> + tbm->maxentries = (int) tbm_calculate_entires(maxbytes);
>
> My point was not that you should add a cast, but that you wrote
> "entires" rather than "entries".

My bad, I thought you were suggesting the variable name as "entries"
instead of "nbuckets" so I removed the variable "nbuckets".  I will
fix the typo in the function name and post the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 7:57 PM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera  
>> wrote:
>> +/*
>> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
>> + * taking into account that the Xmin might have been frozen.
>> + */
>> [...]
>> +   /*
>> +* We actually don't know if there's a match, but if the previous tuple
>> +* was frozen, we cannot really rely on a perfect match.
>> +*/
>
> I don't know what you had in mind here,

Impossible to know if I don't actually send the contents :)

> but I tweaked the 9.3 version so that it now looks like this:

I wanted to mention that the comments could be reworked. And forgot to
suggest some.

> /*
>  * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
>  *
>  * Given the new version of a tuple after some update, verify whether the
>  * given Xmax (corresponding to the previous version) matches the tuple's
>  * Xmin, taking into account that the Xmin might have been frozen after the
>  * update.
>  */
> bool
> HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
> {
> TransactionId   xmin = HeapTupleHeaderGetXmin(htup);
>
> /*
>  * If the xmax of the old tuple is identical to the xmin of the new 
> one,
>  * it's a match.
>  */
> if (TransactionIdEquals(xmax, xmin))
> return true;
>
> /*
>  * When a tuple is frozen, the original Xmin is lost, but we know 
> it's a
>  * committed transaction.  So unless the Xmax is InvalidXid, we don't
>  * know for certain that there is a match, but there may be one; and 
> we
>  * must return true so that a HOT chain that is half-frozen can be 
> walked
>  * correctly.
>  */
> if (TransactionIdEquals(xmin, FrozenTransactionId) &&
> TransactionIdIsValid(xmax))
> return true;
>
> return false;
> }

Those are clearly improvements.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Wood, Dan wrote:
> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> 
> I would prefer to focus on either latest 9X or 11dev.  

I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed.  There must be something new in
9.6 that is causing the problem to reappear.

> Does Alvaro’s patch presume any of the other patch to set COMMITTED in
> the freeze code?

I don't know what you mean.  Here is the 9.6 version of my patch.  Note
that HEAP_XMIN_FROZEN (which uses the XMIN_COMMITTED bit as I recall)
was introduced in 9.4.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 46ca12d56402d33a78ea0e13367d1e0e25a474dd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 6 Oct 2017 14:11:34 +0200
Subject: [PATCH] Fix bugs

---
 src/backend/access/heap/heapam.c| 52 +
 src/backend/access/heap/pruneheap.c |  4 +--
 src/backend/executor/execMain.c |  6 ++---
 src/include/access/heapam.h |  3 +++
 4 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b41f2a2fdd..10916b140e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2060,8 +2060,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation 
relation, Buffer buffer,
 * broken.
 */
if (TransactionIdIsValid(prev_xmax) &&
-   !TransactionIdEquals(prev_xmax,
-
HeapTupleHeaderGetXmin(heapTuple->t_data)))
+   !HeapTupleUpdateXmaxMatchesXmin(prev_xmax, 
heapTuple->t_data))
break;
 
/*
@@ -2244,7 +2243,7 @@ heap_get_latest_tid(Relation relation,
 * tuple.  Check for XMIN match.
 */
if (TransactionIdIsValid(priorXmax) &&
-   !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(tp.t_data)))
+   !HeapTupleUpdateXmaxMatchesXmin(priorXmax, tp.t_data))
{
UnlockReleaseBuffer(buffer);
break;
@@ -2276,6 +2275,50 @@ heap_get_latest_tid(Relation relation,
}   /* end of loop 
*/
 }
 
+/*
+ * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
+ *
+ * Given the new version of a tuple after some update, verify whether the
+ * given Xmax (corresponding to the previous version) matches the tuple's
+ * Xmin, taking into account that the Xmin might have been frozen after the
+ * update.
+ */
+bool
+HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
+{
+   TransactionId   xmin = HeapTupleHeaderGetXmin(htup);
+
+   /*
+* If the xmax of the old tuple is identical to the xmin of the new one,
+* it's a match.
+*/
+   if (TransactionIdEquals(xmax, xmin))
+   return true;
+
+   /*
+* If the Xmin that was in effect prior to a freeze matches the Xmax,
+* it's good too.
+*/
+   if (HeapTupleHeaderXminFrozen(htup) &&
+   TransactionIdEquals(HeapTupleHeaderGetRawXmin(htup), xmax))
+   return true;
+
+   /*
+* When a tuple is frozen, the original Xmin is lost, but we know it's a
+* committed transaction.  So unless the Xmax is InvalidXid, we don't 
know
+* for certain that there is a match, but there may be one; and we must
+* return true so that a HOT chain that is half-frozen can be walked
+* correctly.
+*
+* We no longer freeze tuples this way, but we must keep this in order 
to
+* interpret pre-pg_upgrade pages correctly.
+*/
+   if (TransactionIdEquals(xmin, FrozenTransactionId) &&
+   TransactionIdIsValid(xmax))
+   return true;
+
+   return false;
+}
 
 /*
  * UpdateXmaxHintBits - update tuple hint bits after xmax transaction ends
@@ -5693,8 +5736,7 @@ l4:
 * end of the chain, we're done, so return success.
 */
if (TransactionIdIsValid(priorXmax) &&
-   
!TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
-priorXmax))
+   !HeapTupleUpdateXmaxMatchesXmin(priorXmax, 
mytup.t_data))
{
result = HeapTupleMayBeUpdated;
goto out_locked;
diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
index 52231ac417..7753ee7b12 100644

Re: [HACKERS] parallel worker (PID ) exited with exit code 1

2017-10-06 Thread Robert Haas
On Fri, Oct 6, 2017 at 8:19 AM, tushar  wrote:
> I got some few queries after running sqlsmith against PG HEAD , where i am
> getting LOG message like - "parallel worker (PID) exited with exit code 1"
>
> set force_parallel_mode =1;
>  select
>   pg_catalog.pg_wal_replay_pause() as c0,
>   ref_0.ev_type as c1
> from
>   pg_catalog.pg_rewrite as ref_0
> where ref_0.ev_enabled > ref_0.ev_type
> limit 53;
>
> 2017-10-06 13:15:34.785 BST [5680] LOG:  background worker "parallel worker"
> (PID 5964) exited with exit code 1
> ERROR:  recovery is not in progress
> HINT:  Recovery control functions can only be executed during recovery.
> CONTEXT:  parallel worker

You seem to be assuming this is wrong, but I don't see what the
problem is.  There's nothing unsafe about an error happening side of a
worker, and you got an error because this function isn't safe to
execute during normal running.

I would say this is a case of everything working as intended.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel worker (PID ) exited with exit code 1

2017-10-06 Thread Bernd Helmle
Am Freitag, den 06.10.2017, 17:49 +0530 schrieb tushar:
> I got some few queries after running sqlsmith against PG HEAD , where
> i 
> am getting LOG message like - "parallel worker (PID) exited with
> exit 
> code 1"
> 
> set force_parallel_mode =1;
>   select
>pg_catalog.pg_wal_replay_pause() as c0,
>ref_0.ev_type as c1
>  from
>pg_catalog.pg_rewrite as ref_0
>  where ref_0.ev_enabled > ref_0.ev_type
>  limit 53;

Looks like pg_wal_replay_pause() is marked as parallel safe:

SELECT proparallel FROM pg_proc WHERE proname = 'pg_wal_replay_pause';
-[ RECORD 1 ]--
proparallel | s

Bernd



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Robert Haas
On Fri, Oct 6, 2017 at 2:12 AM, Dilip Kumar  wrote:
>> The performance results look good, but that's a slightly different
>> thing from whether the estimate is accurate.
>>
>> +nbuckets = tbm_calculate_entires(maxbytes);
>>
>> entires?
>
> changed to
> + tbm->maxentries = (int) tbm_calculate_entires(maxbytes);

My point was not that you should add a cast, but that you wrote
"entires" rather than "entries".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel worker (PID ) exited with exit code 1

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 9:19 PM, tushar  wrote:
> 2017-10-06 13:15:34.785 BST [5680] LOG:  background worker "parallel worker"
> (PID 5964) exited with exit code 1
> ERROR:  recovery is not in progress
> HINT:  Recovery control functions can only be executed during recovery.
> CONTEXT:  parallel worker

pg_wal_replay_pause() is a function marked as PARALLEL SAFE, meaning
that it can be pushed down to parallel workers. Still, it can only be
executed on standbys, so this would correctly fail with the error you
are seeing here on a primary server. Perhaps there is a way to
blacklist some functions depending on the server context. This
question may be better asked directly where the project is maintained
then: https://github.com/anse1/sqlsmith. I am adding as well Andreas
in CC, he works on sqlsmith.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Alexander Kuzmenkov



Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.


I think the problem might be that the total_pages includes cache effects 
and rescans. For bitmap entries we should use something like relation 
pages * selectivity.


Meanwhile, I ran TPC-H briefly with the v3 patch: scale factor 25, 2 
workers, SSD storage.

It shows significant improvement on 4MB work_mem and no change on 2GB.

Here are the results (execution time in seconds, average of 5 runs):
work_mem4MB2GB
Query masterpatchmasterpatch
4179174168167
5190168155156
628087227229
8197114179172
10269227190192
14110108106105

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

2017-10-06 Thread Petr Jelinek
On 06/10/17 12:16, Konstantin Knizhnik wrote:
> When creating logical replication slots we quite often get the following
> error:
> 
> ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin
> already is valid
> 
> which cause restart of WAL sender.
> The comment to this line doesn't clarify much:
> 
>     /* so we don't overwrite the existing value */
>     if (TransactionIdIsValid(MyPgXact->xmin))
>     elog(ERROR, "cannot build an initial slot snapshot when
> MyPgXact->xmin already is valid");
>>
> I wonder if it is normal situation or something goes wrong?
> 

Hi,

no it's not normal situation, it seems you are doing something that
assigns xid before you run the CREATE_REPLICATION_SLOT command on that
connection.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] On markers of changed data

2017-10-06 Thread Andrey Borodin
Hi, hackers!

Currently I'm working on page-level incremental backups using WAL-G 
codebase[0]. And I have two questions that I cannot resolve myself.

Incremental backup is a set of changes, that should be applied over preexisting 
backup. I use page LSN to understand should page be backup`ed or not.

Question 1. FSM and VM.
As you can see here [1] FSM and VM files are exempt from incremental tracking 
and are backuped as whole files. I've done it this way, because sanity checks 
[2] of page headers have indicated a lot of "invalid" pages in FSM and VM 
files. But seems like in some pages headers are valid with sane LSNs.
Can I use LSNs as history marker on FSM and VM pages? On 1Tb backup I get like 
150Mb of FSM+VM, and it's kind of a lot.

Question 2. File dates.
Is it safe to use file modification time to track that file were changes since 
previous backup? If the file has date before start of previous backup I just 
add it to "skip list" [3].
I have assumption: every time file is changes in filesystem, it's modification 
date is updated to higher value.
Is this assumption valid for most of used platforms and filesystems? Or can I 
check this "capacity" of FS?

Thank you for your attention. I'll be glad to receive any information\pointers 
on this matter.


Best regards, Andrey Borodin, Yandex.

[0] https://github.com/wal-g/wal-g/pull/29
[1] 
https://github.com/wal-g/wal-g/pull/29/files#diff-d77406e827f5f947d4d4a1e6d76c1f4eR114
[2] 
https://github.com/wal-g/wal-g/pull/29/files#diff-d77406e827f5f947d4d4a1e6d76c1f4eR50
[3] 
https://github.com/wal-g/wal-g/pull/29/files#diff-f5c8f0067297f98eb5acc6e2c6b1b234R87

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 7:34 PM, Alvaro Herrera  wrote:
> Just search for "Æsop" in the archives:
> https://www.postgresql.org/message-id/CACjxUsPPCbov6DDPnuGpR=fmxhsjsn_mri3rjygvbrmcrff...@mail.gmail.com

(laugh)
I didn't know this one.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] parallel worker (PID ) exited with exit code 1

2017-10-06 Thread tushar

Hi,

I got some few queries after running sqlsmith against PG HEAD , where i 
am getting LOG message like - "parallel worker (PID) exited with exit 
code 1"


set force_parallel_mode =1;
 select
  pg_catalog.pg_wal_replay_pause() as c0,
  ref_0.ev_type as c1
    from
  pg_catalog.pg_rewrite as ref_0
    where ref_0.ev_enabled > ref_0.ev_type
    limit 53;

2017-10-06 13:15:34.785 BST [5680] LOG:  background worker "parallel 
worker" (PID 5964) exited with exit code 1

ERROR:  recovery is not in progress
HINT:  Recovery control functions can only be executed during recovery.
CONTEXT:  parallel worker

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-10-06 Thread Jesper Pedersen

Hi Amul,

On 09/28/2017 05:56 AM, amul sul wrote:

It does not really do the partition pruning via constraint exclusion and I don't
think anyone is going to use the remainder in the where condition to fetch
data and hash partitioning is not meant for that.

But I am sure that we could solve this problem using your and Beena's work
toward faster partition pruning[1] and Runtime Partition Pruning[2].

Will think on this changes if it is required for the pruning feature.



Could you rebase on latest master ?

Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-10-06 Thread Amit Kapila
On Fri, Oct 6, 2017 at 2:32 AM, Robert Haas  wrote:
> On Thu, Oct 5, 2017 at 1:16 PM, Amit Kapila  wrote:
>> On Thu, Oct 5, 2017 at 6:08 PM, Robert Haas  wrote:
>>> On Thu, Oct 5, 2017 at 5:52 AM, Amit Kapila  wrote:
 Now, unless, I am missing something here, it won't be possible to
 detect params in such cases during forming of join rels and hence we
 need the tests in generate_gather_paths.  Let me know if I am missing
 something in this context or if you have any better ideas to make it
 work?
>>>
>>> Hmm, in a case like this, I think we shouldn't need to detect it.  The
>>> Var is perfectly parallel-safe, the problem is that we can't use a
>>> not-safe plan for the inner rel.  I wonder why that's happening
>>> here...
>>
>> It is because the inner rel (Result Path) contains a reference to a
>> param which appears to be at the same query level.  Basically due to
>> changes in max_parallel_hazard_walker().
>
> I spent several hours debugging this issue this afternoon.
>

Thanks a lot for spending time.

>  I think
> you've misdiagnosed the problem.  I think that the Param reference in
> the result path is parallel-safe; that doesn't seem to me to be wrong.
> If we see a Param reference for our own query level, then either we're
> below the Gather and the new logic added by this patch will pass down
> the value or we're above the Gather and we can access the value
> directly.  Either way, no problem.
>
> However, I think that if you attach an InitPlan to a parallel-safe
> plan, it becomes parallel-restricted.  This is obvious in the case
> where the InitPlan's plan isn't itself parallel-safe, but it's also
> arguably true even when the InitPlan's plan *is* parallel-safe,
> because pushing that below a Gather introduces a multiple-evaluation
> hazard.  I think we should fix that problem someday by providing a
> DSA-based parameter store, but that's a job for another day.
>
> And it turns out that we actually have such logic already, but this
> patch removes it:
>
> @@ -2182,7 +2181,6 @@ SS_charge_for_initplans(PlannerInfo *root,
> RelOptInfo *final_rel)
>
> path->startup_cost += initplan_cost;
> path->total_cost += initplan_cost;
> -   path->parallel_safe = false;
> }
>
> /* We needn't do set_cheapest() here, caller will do it */
>

Yeah, it is a mistake from my side.  I thought that we don't need it
now as we pass the computed value of initplan params for valid cases
and for other cases we can prohibit them as was done in the patch.

> Now, the header comment for SS_charge_for_initplans() is wrong; it
> claims we can't transmit initPlans to workers, but I think that's
> already wrong even before this patch.
>

What exactly you want to convey as part of that comment?

I have fixed the other review comment related to using safe_param_list
in the attached patch.  I think I have fixed all comments given by
you, but let me know if I have missed anything or you have any other
comment.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_initplan_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera  
> wrote:

> +   /*
> +* If the xmax of the old tuple is identical to the xmin of the new one,
> +* it's a match.
> +*/
> +   if (xmax == xmin)
> +   return true;
> I would use TransactionIdEquals() here, to remember once you switch
> that to a macro.

I've had second thoughts about the macro thing -- for now I'm keeping it
a function, actually.

> +/*
> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
> + * taking into account that the Xmin might have been frozen.
> + */
> [...]
> +   /*
> +* We actually don't know if there's a match, but if the previous tuple
> +* was frozen, we cannot really rely on a perfect match.
> +*/

I don't know what you had in mind here, but I tweaked the 9.3 version so
that it now looks like this:

/*
 * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
 *
 * Given the new version of a tuple after some update, verify whether the
 * given Xmax (corresponding to the previous version) matches the tuple's
 * Xmin, taking into account that the Xmin might have been frozen after the
 * update.
 */
bool
HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
{
TransactionId   xmin = HeapTupleHeaderGetXmin(htup);

/*
 * If the xmax of the old tuple is identical to the xmin of the new one,
 * it's a match.
 */
if (TransactionIdEquals(xmax, xmin))
return true;

/*
 * When a tuple is frozen, the original Xmin is lost, but we know it's a
 * committed transaction.  So unless the Xmax is InvalidXid, we don't
 * know for certain that there is a match, but there may be one; and we
 * must return true so that a HOT chain that is half-frozen can be 
walked
 * correctly.
 */
if (TransactionIdEquals(xmin, FrozenTransactionId) &&
TransactionIdIsValid(xmax))
return true;

return false;
}


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote:
> On Tue, Oct 3, 2017 at 12:54 AM, Alvaro Herrera  
> wrote:

> > I'd argue about this in the same direction I argued about
> > BufferGetPage() needing an LSN check that's applied separately: if it's
> > too easy for a developer to do the wrong thing (i.e. fail to apply the
> > separate LSN check after reading the page) then the routine should be
> > patched to do the check itself, and another routine should be offered
> > for the cases that explicitly can do without the check.
> >
> > I was eventually outvoted in the BufferGetPage() saga, though, so I'm
> > not sure that that discussion sets precedent.
> 
> Oh... I don't recall this discussion. A quick lookup at the archives
> does not show me a specific thread either.

Just search for "Æsop" in the archives:
https://www.postgresql.org/message-id/CACjxUsPPCbov6DDPnuGpR=fmxhsjsn_mri3rjygvbrmcrff...@mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Issue with logical replication: MyPgXact->xmin already is valid

2017-10-06 Thread Konstantin Knizhnik
When creating logical replication slots we quite often get the following 
error:


ERROR:  cannot build an initial slot snapshot when MyPgXact->xmin 
already is valid


which cause restart of WAL sender.
The comment to this line doesn't clarify much:

/* so we don't overwrite the existing value */
if (TransactionIdIsValid(MyPgXact->xmin))
elog(ERROR, "cannot build an initial slot snapshot when 
MyPgXact->xmin already is valid");



I have checked that MyPgXact->xmin is set by GetSnapshotData:

#3  0x0086025e in GetSnapshotData (snapshot=0xf28040 
) at procarray.c:1714
#4  0x00a48523 in GetNonHistoricCatalogSnapshot (relid=2615) at 
snapmgr.c:479

#5  0x00a484d3 in GetCatalogSnapshot (relid=2615) at snapmgr.c:452
#6  0x004f15bf in systable_beginscan (heapRelation=0x256bdb0, 
indexId=2684, indexOK=1 '\001', snapshot=0x0, nkeys=1, 
key=0x7ffdf633c770) at genam.c:353
#7  0x009d676e in SearchCatCache (cache=0x25230a8, v1=39586984, 
v2=0, v3=0, v4=0) at catcache.c:1169
#8  0x009ec13d in SearchSysCache (cacheId=35, key1=39586984, 
key2=0, key3=0, key4=0) at syscache.c:1109
#9  0x009ec259 in GetSysCacheOid (cacheId=35, key1=39586984, 
key2=0, key3=0, key4=0) at syscache.c:1187
#10 0x00574b85 in get_namespace_oid (nspname=0x25c0ca8 
"pg_catalog", missing_ok=1 '\001') at namespace.c:3009
#11 0x00574886 in LookupExplicitNamespace (nspname=0x25c0ca8 
"pg_catalog", missing_ok=1 '\001') at namespace.c:2871
#12 0x0057437d in get_ts_config_oid (names=0x25c2438, 
missing_ok=1 '\001') at namespace.c:2653
#13 0x009f56ca in check_TSCurrentConfig (newval=0x7ffdf633cb90, 
extra=0x7ffdf633cba0, source=PGC_S_FILE) at ts_cache.c:600
#14 0x00a1bbdc in call_string_check_hook (conf=0xf26870 
, newval=0x7ffdf633cb90, 
extra=0x7ffdf633cba0, source=PGC_S_FILE, elevel=12) at guc.c:9912
#15 0x00a14420 in parse_and_validate_value (record=0xf26870 
, name=0x25c0620 
"default_text_search_config", value=0x25c0658 "pg_catalog.english", 
source=PGC_S_FILE, elevel=12,

newval=0x7ffdf633cb90, newextra=0x7ffdf633cba0) at guc.c:5840
#16 0x00a15543 in set_config_option (name=0x25c0620 
"default_text_search_config", value=0x25c0658 "pg_catalog.english", 
context=PGC_SIGHUP, source=PGC_S_FILE, action=GUC_ACTION_SET, 
changeVal=1 '\001', elevel=12,

is_reload=0 '\000') at guc.c:6442
#17 0x00a1eb5c in ProcessConfigFileInternal (context=PGC_SIGHUP, 
applySettings=1 '\001', elevel=13) at guc-file.l:439
#18 0x00a1e4ee in ProcessConfigFile (context=PGC_SIGHUP) at 
guc-file.l:155
#19 0x0082433c in WalSndWaitForWal (loc=25991904) at 
walsender.c:1309
#20 0x00823423 in logical_read_xlog_page (state=0x25a4f10, 
targetPagePtr=25985024, reqLen=6880, targetRecPtr=25991880, 
cur_page=0x25a6c60 "\227\320\005", pageTLI=0x25a57bc) at walsender.c:761
#21 0x00558c3d in ReadPageInternal (state=0x25a4f10, 
pageptr=25985024, reqLen=6880) at xlogreader.c:556
#22 0x00558405 in XLogReadRecord (state=0x25a4f10, 
RecPtr=25991880, errormsg=0x7ffdf633cea8) at xlogreader.c:255
#23 0x0080dda6 in DecodingContextFindStartpoint (ctx=0x25a4c50) 
at logical.c:450
#24 0x00823a0c in CreateReplicationSlot (cmd=0x24dc398) at 
walsender.c:934
#25 0x008247e4 in exec_replication_command (cmd_string=0x254e8f0 
"CREATE_REPLICATION_SLOT \"shardman_copy_t_10_3_4_17307_sync_17302\" 
TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1515
#26 0x0088eccc in PostgresMain (argc=1, argv=0x24f0b28, 
dbname=0x24f0948 "postgres", username=0x24bf7f0 "knizhnik") at 
postgres.c:4086

#27 0x007ef9e2 in BackendRun (port=0x24dee00) at postmaster.c:4357
#28 0x007ef10c in BackendStartup (port=0x24dee00) at 
postmaster.c:4029

#29 0x007eb6cc in ServerLoop () at postmaster.c:1753
#30 0x007eacb8 in PostmasterMain (argc=3, argv=0x24bd660) at 
postmaster.c:1361

#31 0x00728593 in main (argc=3, argv=0x24bd660) at main.c:228


I wonder if it is normal situation or something goes wrong?





--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Still another race condition in recovery TAP tests

2017-10-06 Thread Craig Ringer
On 6 October 2017 at 14:03, Noah Misch  wrote:
> On Fri, Sep 08, 2017 at 10:32:03PM -0400, Tom Lane wrote:
>> (I do kinda wonder why we rolled our own RecursiveCopy; surely there's
>> a better implementation in CPAN?)
>
> Fewer people will test as we grow the list of modules they must first install.

Meh, I don't buy that. At worst, all we have to do is provide a script
that fetches them, from distro repos if possible, and failing that
from CPAN.

With cpanminus, that's pretty darn simple too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Tests for reloptions

2017-10-06 Thread Nikolay Shaplov
В письме от 3 октября 2017 11:48:43 пользователь Michael Paquier написал:

> > src/backend/access/common/reloptions.c get only 7 lines, it was quite
> > covered by existing test, but all most of the access methods gets some
> > coverage increase:
> > 
> > src/backend/access/brin 1268 -> 1280 (+18)
> > src/backend/access/gin  2924 -> 2924 (0)
> > src/backend/access/gist 1673 -> 2108 (+435)
> > src/backend/access/hash 1580 -> 1638 (+58)
> > src/backend/access/heap 2863 -> 2866 (+3)
> > src/backend/access/nbtree   2565 -> 2647 (+82)
> > src/backend/access/spgist   2066 -> 2068 (+2)
> > 
> > Though I should say that incredible coverage boost for gist, is the result
> > of not steady results of test run. The real value should be much less...
> +-- Test buffering and fillfactor reloption takes valid values
> +create index gist_pointidx2 on gist_point_tbl using gist(p) with
> (buffering = on, fillfactor=50);
> +create index gist_pointidx3 on gist_point_tbl using gist(p) with
> (buffering = off);
> +create index gist_pointidx4 on gist_point_tbl using gist(p) with
> (buffering = auto);
> Those are the important bits doing the boost in gist visibly. To be kept.
> 
> -CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
> +CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random
> float8_ops) WITH (fillfactor=60);
> Woah. So that alone increases hash by 58 steps.
Might be... As I have noticed, tests  for indexes filled with random data test 
coverage, gives random coverage. There needed more random data to give steady 
results. I am gathering statistics now, and later I hope I will offer another 
patch(es) to fix it. So not all of 58 lines might be result of this patch :-)
 
> +CREATE INDEX bloomidx2 ON tst USING bloom (i, t) WITH (length=0);
> +ERROR:  value 0 out of bounds for option "length"
> +DETAIL:  Valid values are between "1" and "4096".
> contrib/bloom contributes to the coverage of reloptions.c thanks to
> its coverage of the add_ routines when the library is loaded. And
> those don't actually improve any coverage, right?
I actually run only simple "make check". running "make check" for bloom will 
also add some add_ routines to coverage.


> > Nevertheless tests touches the reloptions related code, checks for proper
> > error handling, and it is good.
> 
> Per your measurements reloptions.c is improved by 1.7%, or 7 lines as
> you say. Five of them are in parse_one_reloption for integer (2) and
> reals (2), plus one error at the top. Two are in transformRelOptions
> for a valid namespace handling. In your patch reloptions.sql is 141
> lines. Couldn't it be shorter with the same improvements? It looks
> that a lot of tests are overlapping with existing ones.
> 
> > I think we should commit it.
> 
> My take is that a lighter version could be committed instead. My
> suggestion is to keep the new test reloptions minimal so as it tests
> the relopt types and their bounds, and I think that you could remove
> the same bounding checks in the other tests that you added: bloom,
> gist, etc.

I've been thinking a lot, and rereading the patch. When I reread it I've been 
thinking that I would like to add more tests to it now... ;-)

If the only purpose of tests is to get better coverage, then I would agree 
with you. But I've been thinking that tests also should check that everything 
behaves as expected (or as written in documentation).

I would say that options names is a of part of SQL dialect that postgres uses, 
kind of part of the syntax. It is good to be sure that they still supported. 
So I would add a test for every option heap supports.

Checking minimum and maximum values are more disputable. The argumentation for 
it is that min and max are written in documentation, and it is good to check 
that postgres behaves according to documentation...



But it you tell me for sure that test only for code coverage and should not do 
anything more, than I surely remove all tests that are not increase test 
coverage.


-- 
Do code for fun. Can do it for money (Perl & C/C++ ~10h/week)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] search path security issue?

2017-10-06 Thread Magnus Hagander
On Fri, Oct 6, 2017 at 12:05 AM, Joshua D. Drake 
wrote:

> On 10/05/2017 02:54 PM, David G. Johnston wrote:
>
>> On Thu, Oct 5, 2017 at 2:37 PM, Joshua D. Drake > >wrote:
>>
>> I get being able to change my search_path on the fly but it seems
>> odd that as user foo I can change my default search path?
>>
>>
>> Seems down-right thoughtful of us to allow users to change their own
>> defaults instead of forcing them to always change things on-the-fly or bug
>> a DBA to change the default for them.
>>
>
> It seems that if a super user changes the search path with ALTER
> USER/ROLE, then the user itself should not (assuming not an elevated
> privilege) should not be able to change it. Again, I get being able to do
> it with SET but a normal user shouldn't be able to reset a super user
> determined setting.
>

This is in no way specific to search_path.

It would be a nice feature to have in general, like a "basic guc
permissions" thing. At least allowing a superuser to prevent exactly this.
You could argue the same thing for example for memory parameters and such.
We have no permissions at all when it comes to userset gucs today -- and of
course, if something should be added about this, it should be done in a way
that works for all the userset variables, not just search_path.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] pgsql: Remove ICU tests from default run

2017-10-06 Thread Noah Misch
On Sat, Mar 25, 2017 at 04:30:45AM +, Peter Eisentraut wrote:
> Remove ICU tests from default run
> 
> These tests require the test database to be in UTF8 encoding.  Until
> there is a better solution, take them out of the default test set and
> treat them like the existing collate.linux.utf8 test, meaning it has to
> be selected manually.

I'm thinking the regression suite should create an ENCODING=UTF8, LOCALE=C
database in addition to the traditional one.  This isn't the first or the
second use case for such a database.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera  wrote:
> I think this is the patch for 9.3.  I ran the test a few hundred times
> (with some additional changes such as randomly having an update inside a
> savepoint that's randomly aborted, randomly aborting the transaction,
> randomly skipping the for key share lock, randomly sleeping at a few
> points; and also adding filler columns, reducing fillfactor and using
> 5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
> to stay in the same page or migrate to later pages).  The final REINDEX
> has never complained again about failing to find the root tuple.  I hope
> it's good now.

I have looked and played with your patch as well for a couple of
hours, and did not notice any failures again. The structure of the
pages looked sane as well, I could see also with pageinspect a correct
HOT chain using the first set of tests provided on this thread.

> The attached patch needs a few small tweaks, such as improving
> commentary in the new function, maybe turn it into a macro (otherwise I
> think it could be bad for performance; I'd like a static func but not
> sure those are readily available in 9.3), change the XID comparison to
> use the appropriate macro rather than ==, and such.

Yeah that would be better.

> Regarding changes of xmin/xmax comparison, I also checked manually the
> spots I thought should be modified and later double-checked against the
> list that Michael posted.

Thanks. I can see see that your patch is filling the holes.

> It's a match, except for rewriteheap.c which
> I cannot make heads or tails about.  (I think it's rather unfortunate
> that it sticks a tuple's Xmax into a field that's called Xmin, but let's
> put that aside).  Maybe there's a problem here, maybe there isn't.

rewrite_heap_tuple is used only by CLUSTER, and the oldest new tuples
are frozen before being rewritten. So this looks safe to me at the
end.

> I'm now going to forward-port this to 9.4.

+   /*
+* If the xmax of the old tuple is identical to the xmin of the new one,
+* it's a match.
+*/
+   if (xmax == xmin)
+   return true;
I would use TransactionIdEquals() here, to remember once you switch
that to a macro.

+/*
+ * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
+ * taking into account that the Xmin might have been frozen.
+ */
[...]
+   /*
+* We actually don't know if there's a match, but if the previous tuple
+* was frozen, we cannot really rely on a perfect match.
+*/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2017-10-06 Thread Etsuro Fujita

On 2017/10/05 20:06, Kyotaro HORIGUCHI wrote:

Since WCO ensures finally inserted values, we can't do other than
acturally requesting for the values.


I think so too.


So just merging WCO columns
to RETURNING in deparsed query is ok. But can't we concatenate
returningList and withCheckOptionList at more higher level?
Specifically, just passing calculated used_attr to
deparse(Insert|Update)Sql instead of returningList and
withCheckOptionList separately.  Deparsed queries anyway forget
the origin of requested columns.


We could do that, but I think that would need a bit more code to 
postgresPlanForeignModify including changes to the deparseDeleteSql API 
in addition to the deparse(Insert|Update)Sql APIs.  I prefer making high 
level functions simple, so I'd vote for just passing withCheckOptionList 
separately to deparse(Insert|Update)Sql, as proposed in the patch.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comment typo

2017-10-06 Thread Etsuro Fujita

On 2017/10/05 21:48, Robert Haas wrote:

On Thu, Oct 5, 2017 at 6:11 AM, Etsuro Fujita
 wrote:

Here is a small patch to fix a typo in a comment in partition.c:
s/mantain/maintain/.


Committed.


Thanks!

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-06 Thread Ashutosh Bapat
On Wed, Oct 4, 2017 at 9:04 PM, Robert Haas  wrote:
> On Tue, Oct 3, 2017 at 3:27 PM, Robert Haas  wrote:
>> I decided to skip over 0001 for today and spend some time looking at
>> 0002-0006.
>
> Back to 0001.
>
> +Enables or disables the query planner's use of partition-wise join
> +plans. When enabled, it spends time in creating paths for joins 
> between
> +partitions and consumes memory to construct expression nodes to be 
> used
> +for those joins, even if partition-wise join does not result in the
> +cheapest path. The time and memory increase exponentially with the
> +number of partitioned tables being joined and they increase linearly
> +with the number of partitions. The default is off.
>
> I think this is too scary and too much technical detail.  I think you
> could just say something like: Enables or disables use of
> partition-wise join, which allows a join between partitioned tables to
> be performed by joining the matching partitions.  Partition-wise join
> currently applies only when the join conditions include all the
> columns of the partition keys, which must be of the same data type and
> have exactly matching sets of child partitions.  Because
> partition-wise join planning can use significantly increase CPU time
> and memory usage during planning, the default is off.

Done. With slight change. "include all the columns of the partition
keys" has a different meaning when partition key is an expression, so
I have used "include all the partition keys". Also changed the last
sentence as "... can use significantly more CPU time and memory during
planning ...". Please feel free to revert those changes, if you don't
like them.

>
> +partitioned table. The join partners can not be found in other partitions. 
> This
> +condition allows the join between partitioned tables to be broken into joins
> +between the matching partitions. The resultant join is partitioned in the 
> same
>
> "The join partners can not be found in other partitions." is redundant
> with the previous sentence.  I suggest deleting it.  I also suggest
> "This condition allows the join between partitioned tables to be
> broken" -> "Because of this, the join between partitioned tables can
> be broken".

Done.

>
> +relation" for both partitioned table as well as join between partitioned 
> tables
> +which can use partition-wise join technique.
>
> for either a partitioned table or a join between compatibly partitioned tables

Done.

>
> +Partitioning properties of a partitioned relation are stored in
> +PartitionSchemeData structure. Planner maintains a list of canonical 
> partition
> +schemes (distinct PartitionSchemeData objects) so that any two partitioned
> +relations with same partitioning scheme share the same PartitionSchemeData
> +object. This reduces memory consumed by PartitionSchemeData objects and makes
> +it easy to compare the partition schemes of joining relations.
>
> Not all of the partitioning properties are stored in the
> PartitionSchemeData structure any more.  I think this needs some
> rethinking and maybe some expansion.  As written, each of the first
> two sentences needs a "the" at the beginning.

Changed to

The partitioning properties of a partitioned relation are stored in its
RelOptInfo.  The information about data types of partition keys are stored in
PartitionSchemeData structure. The planner maintains a list of canonical
partition schemes (distinct PartitionSchemeData objects) so that RelOptInfo of
any two partitioned relations with same partitioning scheme point to the same
PartitionSchemeData object.  This reduces memory consumed by
PartitionSchemeData objects and makes it easy to compare the partition schemes
of joining relations.

Let me know if this looks good.

>
> +   /*
> +* Create "append" paths for
> partitioned joins. Do this before
> +* creating GatherPaths so that
> partial "append" paths in
> +* partitioned joins will be considered.
> +*/
>
> I think you could shorten this to a single-line comment and just keep
> the first sentence.  Similarly in the other location where you have
> the same sort of thing.

Done.

>
> + * child-joins. Otherwise, add_path might delete a path that some "append"
> + * path has reference to.
>
> to which some path generated here has a reference.

Done.

>
> Here and elsewhere, you use "append" rather than Append to refer to
> the paths added.  I suppose that's weasel-wording to work around the
> fact that they might be either Append or MergeAppend paths, but I'm
> not sure it's really going to convey that to anyone.  I suggest
> rephrasing those comments more generically, e.g.:
>
> +   /* Add "append" paths containing paths from child-joins. */
>
> You could say: Build additional paths for this 

Re: [HACKERS] Parallel Append implementation

2017-10-06 Thread Amit Khandekar
On 6 October 2017 at 08:49, Amit Kapila  wrote:
> On Thu, Oct 5, 2017 at 4:11 PM, Amit Khandekar  wrote:
>>
>> Ok. How about removing pa_all_partial_subpaths altogether , and
>> instead of the below condition :
>>
>> /*
>> * If all the child rels have partial paths, and if the above Parallel
>> * Append path has a mix of partial and non-partial subpaths, then consider
>> * another Parallel Append path which will have *all* partial subpaths.
>> * If enable_parallelappend is off, make this one non-parallel-aware.
>> */
>> if (partial_subpaths_valid && !pa_all_partial_subpaths)
>> ..
>>
>> Use this condition :
>> if (partial_subpaths_valid && pa_nonpartial_subpaths != NIL)
>> ..
>>
>
> Sounds good to me.
>
> One minor point:
>
> + if (!node->as_padesc)
> + {
> + /*
> + */
> + if (!exec_append_seq_next(node))
> + return ExecClearTuple(node->ps.ps_ResultTupleSlot);
> + }
>
> It seems either you want to add a comment in above part of patch or
> you just left /**/ mistakenly.

Oops. Yeah, the comment wrapper remained there when I moved its
content "This is Parallel-aware append. Follow it's own logic ..." out
of the if block. Since this is too small a change for an updated
patch, I will do this along with any other changes that would be
required as the review progresses.

>
>> 
>>
>>
>> Regarding a mix of partial and non-partial paths, I feel it always
>> makes sense for the leader to choose the partial path.
>>
>
> Okay, but why not cheapest partial path?

I gave some thought on this point. Overall I feel it does not matter
which partial path it should pick up. RIght now the partial paths are
not ordered. But for non-partial paths sake, we are just choosing the
very last path. So in case of mixed paths, leader will get a partial
path, but that partial path would not be the cheapest path. But if we
also order the partial paths, the same logic would then pick up
cheapest partial path. The question is, should we also order the
partial paths for the leader ?

The only scenario I see where leader choosing cheapest partial path
*might* show some benefit, is if there are some partial paths that
need to do some startup work using only one worker. I think currently,
parallel hash join is one case where it builds the hash table, but I
guess here also, we support parallel hash build, but not sure about
the status. For such plan, if leader starts it, it would be slow, and
no other worker would be able to help it, so its actual startup cost
would be drastically increased. (Another path is parallel bitmap heap
scan where the leader has to do something and the other workers wait.
But here, I think it's not much work for the leader to do). So
overall, to handle such cases, it's better for leader to choose a
cheapest path, or may be, a path with cheapest startup cost. We can
also consider sorting partial paths with decreasing startup cost.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Still another race condition in recovery TAP tests

2017-10-06 Thread Noah Misch
On Fri, Sep 08, 2017 at 10:32:03PM -0400, Tom Lane wrote:
> (I do kinda wonder why we rolled our own RecursiveCopy; surely there's
> a better implementation in CPAN?)

Fewer people will test as we grow the list of modules they must first install.
Bundling a copy is tempting, but most CPAN modules use the Perl license.
We're in that hole already[1], but I perceived a consensus to stop digging.
Code used only for testing is less concerning, but I'd still not bundle it.

[1] https://www.postgresql.org/message-id/flat/54EC6D80.5090507%40vmware.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Dilip Kumar
On Thu, Oct 5, 2017 at 8:15 PM, Robert Haas  wrote:
> On Sun, Sep 17, 2017 at 7:04 AM, Dilip Kumar  wrote:
>> I used  lossy_pages = max(0, total_pages - maxentries / 2). as
>> suggesed by Alexander.
>
> Does that formula accurately estimate the number of lossy pages?

I have printed the total_pages, exact_pages and lossy_pages during
planning time, and for testing purpose, I tweak the code a bit so that
it doesn't consider lossy_pages in cost calculation (same as base
code).

I have tested TPCH scale factor 20. at different work_mem(4MB, 20MB,
64MB) and noted down the estimated pages vs actual pages.

Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.

work_mem=4 MB

query:4
estimated: total_pages=552472.00 exact_pages=32768.00
lossy_pages=519704.00
actual:exact=18548 lossy=146141

query:6
estimated: total_pages=1541449.00 exact_pages=32768.00
lossy_pages=1508681.00
actual:exact=13417 lossy=430385

query:8
estimated:  total_pages=552472.00 exact_pages=32768.00
lossy_pages=519704.00
actual: exact=56869 lossy=495603

query:14
estimated:  total_pages=1149603.00 exact_pages=32768.00
lossy_pages=1116835.00
actual: exact=17115 lossy=280949

work_mem: 20 MB
query:4
estimated:  total_pages=552472.00 exact_pages=163840.00
lossy_pages=388632.00
actual: exact=109856 lossy=57761

query:6
estimated:   total_pages=1541449.00 exact_pages=163840.00
lossy_pages=1377609.00
actual:  exact=59771 lossy=397956

query:8
estimated:  total_pages=552472.00 exact_pages=163840.00
lossy_pages=388632.00
actual: Heap Blocks: exact=221777 lossy=330695

query:14
estimated:  total_pages=1149603.00 exact_pages=163840.00
lossy_pages=985763.00
actual: exact=63381 lossy=235513

work_mem:64 MB
query:4
estimated:  total_pages=552472.00 exact_pages=552472.00
lossy_pages=0.00
actual: exact=166005 lossy=0

query:6
estimated:  total_pages=1541449.00 exact_pages=524288.00
lossy_pages=1017161.00
actual: exact=277717 lossy=185919

query:8
estimated: total_pages=552472.00 exact_pages=552472.00
lossy_pages=0.00
actual:exact=552472 lossy=0

query:14
estimated:  total_pages=1149603.00 exact_pages=524288.00
lossy_pages=625315.00
actual: exact=309091 lossy=0


>
> The performance results look good, but that's a slightly different
> thing from whether the estimate is accurate.
>
> +nbuckets = tbm_calculate_entires(maxbytes);
>
> entires?

changed to
+ tbm->maxentries = (int) tbm_calculate_entires(maxbytes);


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


improve_bitmap_cost_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers