Re: [HACKERS] Postgres_fdw join pushdown - INNER - FULL OUTER join combination generating wrong result

2016-03-30 Thread Rajkumar Raghuwanshi
Thanks Ashutosh for the patch. I have applied and tested it. Now getting
proper result for reported issue.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Tue, Mar 29, 2016 at 7:50 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
> Observation:_ Inner join and full outer join combination on a table
>>>
>>> generating wrong result.
>>>
>>> SELECT * FROM lt;
>>>   c1
>>> 
>>>1
>>>2
>>> (2 rows)
>>>
>>> SELECT * FROM ft;
>>>   c1
>>> 
>>>1
>>>2
>>> (2 rows)
>>>
>>> \d+ ft
>>>   Foreign table "public.ft"
>>>   Column |  Type   | Modifiers | FDW Options | Storage | Stats target |
>>> Description
>>>
>>> +-+---+-+-+--+-
>>>   c1 | integer |   | | plain   |  |
>>> Server: link_server
>>> FDW Options: (table_name 'lt')
>>>
>>> --inner join and full outer join on local tables
>>> SELECT t1.c1,t2.c1,t3.c1 FROM lt t1 INNER JOIN lt t2 ON (t1.c1 = t2.c1)
>>> FULL JOIN lt t3 ON (t2.c1 = t3.c1);
>>>   c1 | c1 | c1
>>> ++
>>>1 |  1 |  1
>>>2 |  2 |  2
>>> (2 rows)
>>>
>>> --inner join and full outer join on corresponding foreign tables
>>> SELECT t1.c1,t2.c1,t3.c1 FROM ft t1 INNER JOIN ft t2 ON (t1.c1 = t2.c1)
>>> FULL JOIN ft t3 ON (t2.c1 = t3.c1);
>>>   c1 | c1 | c1
>>> ++
>>>1 |  1 |  1
>>>1 |  2 |
>>>2 |  1 |
>>>2 |  2 |  2
>>> (4 rows)
>>>
>>
> Thanks Rajkumar for the detailed report.
>
>
>>
>> I think the reason for that is in foreign_join_ok.  This in that function:
>>
>> wrongly pulls up remote_conds from joining relations in the FULL JOIN
>> case.  I think we should not pull up such conditions in the FULL JOIN case.
>>
>>
> Right. For a full outer join, since each joining relation acts as outer
> for the other, we can not pull up the quals to either join clauses or other
> clauses. So, in such a case, we will need to encapsulate the joining
> relation with conditions into a subquery. Unfortunately, the current
> deparse logic does not handle this encapsulation. Adding that functionality
> so close to the feature freeze might be risky given the amount of code
> changes required.
>
> PFA patch with a quick fix. A full outer join with either of the joining
> relations having WHERE conditions (or other clauses) is not pushed down. In
> the particular case that was reported, the bug triggered because of the way
> conditions are handled for an inner join. For an inner join, all the
> conditions in ON as well as WHERE clause are treated like they are part of
> WHERE clause. This allows pushing down a join even if there are unpushable
> join clauses. But the pushable conditions can be put back into the ON
> clause. This avoids using subqueries while deparsing.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>


Re: [HACKERS] Breakage with VACUUM ANALYSE + partitions

2016-03-30 Thread Noah Misch
On Sat, Mar 26, 2016 at 01:39:42PM +0100, Andres Freund wrote:
> On 2016-03-25 12:02:05 -0400, Robert Haas wrote:
> > Gosh, that's surprising.  I wonder if that just revealed an underlying
> > issue rather than creating it.
> 
> I think that's the case; it's just somewhat unlikely to hit in other
> cases.
> 
> If SMgrRelation->md_fd[n] is an empty relation, and mdread() or another
> routine is asking for a block in the second segment - which will error
> out. But even if the first segment is zero bytes, _mdfd_getseg() will
> dutifully try to open the second segment. Which will succeed in the case
> of a truncated relation, because we leave the truncated segment in
> place.
> 
> ISTM that _mdfd_getseg better check the length of the last segment
> before opening the next one?

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Andres,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.

Non-generic addendum: granted, s/created/unearthed/ is looking more precise.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-03-30 Thread Noah Misch
On Thu, Mar 31, 2016 at 01:10:56AM -0400, Noah Misch wrote:
> On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote:
> > On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:
> > > As mentioned in my earlier mail i was not able to apply
> > > *pinunpin-cas-5.patch* on commit *6150a1b0,
> > 
> > That's not surprising; that's pretty old.
> > 
> > > *therefore i thought of applying it on the latest commit and i was
> > > able to do it successfully. I have now taken the performance readings
> > > at latest commit i.e. *76281aa9* with and without applying
> > > *pinunpin-cas-5.patch* and my observations are as follows,
> > >
> > 
> > > 1. I can still see that the current performance lags by 2-3% from the
> > > expected performance when *pinunpin-cas-5.patch *is applied on the commit
> > > 
> > > *76281aa9.*
> > > 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
> > > commit *76281aa9 *the overall performance lags by 50-60% from the expected
> > > performance.
> > > 
> > > *Note:* Here, the expected performance is the performance observed before
> > > commit *6150a1b0 *when* ac1d794 *is reverted.
> > 
> > Thanks for doing these benchmarks. What's the performance if you revert
> > 6150a1b0 on top of a recent master? There've been a lot of other patches
> > influencing performance since 6150a1b0, so minor performance differences
> > aren't necessarily meaningful; especially when that older version then
> > had other patches reverted.
> 
> [This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 9.6 open item.  Andres,
> since you committed the patch believed to have created it, you own this open
> item.  If that responsibility lies elsewhere, please let us know whose
> responsibility it is to fix this.  Since new open items may be discovered at
> any time and I want to plan to have them all fixed well in advance of the ship
> date, I will appreciate your efforts toward speedy resolution.  Please
> present, within 72 hours, a plan to fix the defect within seven days of this
> message.  Thanks.

My attribution above was incorrect.  Robert Haas is the committer and owner of
this one.  I apologize.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v1] GSSAPI encryption support

2016-03-30 Thread Michael Paquier
On Wed, Mar 30, 2016 at 1:01 PM, Robbie Harwood  wrote:
> A new version of my GSSAPI encryption patchset is available, both in
> this email and on my github:
> https://github.com/frozencemetery/postgres/tree/feature/gssencrypt9
>
> This version is intended to address David's review suggestions:
>
> - The only code change (not counting SGML and comments) is that I've
>   resolved the pre-existing "XXX: Should we loop and read all messages?"
>   comment in the affirmative by... looping and reading all messages in
>   pg_GSS_error.  Though commented on as part of the first patch, this is
>   bundled with the rest of the auth cleanup since the first patch is
>   code motion only.
>
> - Several changes to comments, documentation rewordings, and whitespace
>   additions.  I look forward to finding out what needs even more of the
>   same treatment.  Of all the changesets I've posted thus far, this
>   might be the one for which it makes the most sense to see what changed
>   by diffing from the previous changeset.

Thanks for the new versions. For now I have been having a look at only 0001.

The first thing I have noticed is that 0001 breaks the Windows build
using Visual Studio. When building without GSSAPI, fe-gssapi-common.c
and be-gssapi-common.c should be removed from the list of files used
so that's simple enough to fix. Now when GSSAPI build is enabled,
there are some declaration conflicts with your patch, leading to many
compilation errors. This took me some time to figure out but the cause
is this diff in be-gssapi-common.c:
+#include "libpq/be-gssapi-common.h"
+
+#include "postgres.h"

postgres.h should be declared *before* be-gssapi-common.h. As I
suggested the refactoring and I guess you don't have a Windows
environment at hand, attached is a patch that fixes the build with and
without gssapi for Visual Studio, that can be applied on top of your
0001. Feel free to rebase using it.

Note for others: I had as well to patch the MSVC scripts because in
the newest installations of Kerberos:
- inc/ does not exist anymore, include/ does
- The current VS scripts ignore completely x64 builds, the libraries
linked to are for x86 unconditionally.
I am not sure if there are *any* people building the code with
Kerberos on Windows except I, but as things stand those scripts are
rather broken in my view.

Now looking at 0002 and 0003...
-- 
Michael
diff --git a/src/backend/libpq/be-gssapi-common.c b/src/backend/libpq/be-gssapi-common.c
index eab68a5..dc27fa8 100644
--- a/src/backend/libpq/be-gssapi-common.c
+++ b/src/backend/libpq/be-gssapi-common.c
@@ -13,10 +13,10 @@
  *-
  */
 
-#include "libpq/be-gssapi-common.h"
-
 #include "postgres.h"
 
+#include "libpq/be-gssapi-common.h"
+
 #if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
 /*
  * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index dbc09b8..923e339 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -163,12 +163,17 @@ sub mkvcbuild
 	$postgres->AddLibrary('wldap32.lib') if ($solution->{options}->{ldap});
 	$postgres->FullExportDLL('postgres.lib');
 
-   # The OBJS scraper doesn't know about ifdefs, so remove be-secure-openssl.c
-   # if building without OpenSSL
+	# The OBJS scraper doesn't know about ifdefs, so remove be-secure-openssl.c
+	# if building without OpenSSL, and be-gssapi-common.c when building with
+	# GSSAPI.
 	if (!$solution->{options}->{openssl})
 	{
 		$postgres->RemoveFile('src/backend/libpq/be-secure-openssl.c');
 	}
+	if (!$solution->{options}->{gss})
+	{
+		$postgres->RemoveFile('src/backend/libpq/be-gssapi-common.c');
+	}
 
 	my $snowball = $solution->AddProject('dict_snowball', 'dll', '',
 		'src/backend/snowball');
@@ -218,12 +223,17 @@ sub mkvcbuild
 		'src/interfaces/libpq/libpq.rc');
 	$libpq->AddReference($libpgport);
 
-   # The OBJS scraper doesn't know about ifdefs, so remove fe-secure-openssl.c
-   # if building without OpenSSL
+	# The OBJS scraper doesn't know about ifdefs, so remove fe-secure-openssl.c
+	# if building without OpenSSL, and fe-gssapi-common.c when building with
+	# GSSAPI
 	if (!$solution->{options}->{openssl})
 	{
 		$libpq->RemoveFile('src/interfaces/libpq/fe-secure-openssl.c');
 	}
+	if (!$solution->{options}->{gss})
+	{
+		$libpq->RemoveFile('src/interfaces/libpq/fe-gssapi-common.c');
+	}
 
 	my $libpqwalreceiver =
 	  $solution->AddProject('libpqwalreceiver', 'dll', '',

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-03-30 Thread Noah Misch
On Sun, Mar 27, 2016 at 02:15:50PM +0200, Andres Freund wrote:
> On 2016-03-27 02:34:32 +0530, Ashutosh Sharma wrote:
> > As mentioned in my earlier mail i was not able to apply
> > *pinunpin-cas-5.patch* on commit *6150a1b0,
> 
> That's not surprising; that's pretty old.
> 
> > *therefore i thought of applying it on the latest commit and i was
> > able to do it successfully. I have now taken the performance readings
> > at latest commit i.e. *76281aa9* with and without applying
> > *pinunpin-cas-5.patch* and my observations are as follows,
> >
> 
> > 1. I can still see that the current performance lags by 2-3% from the
> > expected performance when *pinunpin-cas-5.patch *is applied on the commit
> > 
> > *76281aa9.*
> > 2. When *pinunpin-cas-5.patch *is ignored and performance is measured at
> > commit *76281aa9 *the overall performance lags by 50-60% from the expected
> > performance.
> > 
> > *Note:* Here, the expected performance is the performance observed before
> > commit *6150a1b0 *when* ac1d794 *is reverted.
> 
> Thanks for doing these benchmarks. What's the performance if you revert
> 6150a1b0 on top of a recent master? There've been a lot of other patches
> influencing performance since 6150a1b0, so minor performance differences
> aren't necessarily meaningful; especially when that older version then
> had other patches reverted.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Andres,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-03-30 Thread Noah Misch
On Sun, Mar 13, 2016 at 10:18:46PM +0100, Michael Paquier wrote:
> On Wed, Mar 9, 2016 at 9:05 PM, Michael Paquier
>  wrote:
> > On Wed, Mar 9, 2016 at 12:29 PM, Alvaro Herrera
> >  wrote:
> >> Michael Paquier wrote:
> >>> After sleeping (best debugger ever) on that, actually a way popped up
> >>> in my mind, and I propose the attached, which refactors a bit 005 and
> >>> checks that the LSN position of master has been applied on standby
> >>> after at least the delay wanted. A maximum delay of 90s is authorized,
> >>> like poll_query_until.
> >>
> >> Hmm, okay, that's great.  A question: what happens if the test itself is
> >> slow and the servers are fast, and the test doesn't manage to run two
> >> iterations before the two seconds have elapsed?  This may happen on
> >> overloaded or slow servers, if you're unlucky.
> >
> > Yes, a failure would happen. The same thought occurred to me during a
> > long flight. And this is why the previous patch was full of meh.
> >
> >> I don't have any ideas on ensuring that we don't apply earlier than the
> >> given period at the moment.
> >
> > Attached is one, which is based on timestamp values queried from the
> > standby server. We could use as well perl's localtime call to
> > calculate the time delay.
> 
> Actually, the attached is better. This one relies on time() to perform
> the delay checks, and takes care of things even for slow machines.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Alvaro,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-30 Thread Noah Misch
On Thu, Mar 24, 2016 at 01:02:57PM +0900, Etsuro Fujita wrote:
> On 2016/03/24 11:14, Michael Paquier wrote:
> >On Wed, Mar 23, 2016 at 10:05 PM, Thom Brown  wrote:
> >>I've noticed that you now can't cancel a query if there's DML pushdown
> >>to a foreign server.  This previously worked while it was sending
> >>individual statements as it interrupted and rolled it back.
> >>
> >>Here's what the local server sees when trying to cancel:
> >>
> >># DELETE FROM remote.contacts;
> >>^CCancel request sent
> >>DELETE 500
> >>
> >>This should probably be fixed.
> 
> >Looking at what has been committed, execute_dml_stmt is using
> >PQexecParams, so we'd want to use an asynchronous call and loop on
> >PQgetResult with CHECK_FOR_INTERRUPTS() in it.
> 
> Will fix.
> 
> Thanks for the report, Thom!  Thanks for the advice, Michael!

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-03-30 Thread Noah Misch
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
> As a result of looked into code around the recvoery, ISTM that the
> cause is related to relation cache clear.
> In heap_xlog_visible, if the standby server receives WAL record then
> relation cache is eventually cleared in vm_extend,  but If standby
> server receives FPI then relation cache would not be cleared.
> For example, after I applied attached patch to HEAD, (it might not be
> right way but) this problem seems to be resolved.
> 
> Is this a bug? or not?

It's a bug.  I don't expect it causes queries to return wrong answers, because
visibilitymap.c says "it's always safe to clear a bit in the map from
correctness point of view."  (The bug makes a visibility map bit temporarily
appear to have been cleared.)  I still call it a bug, because recovery
behavior becomes too difficult to verify when xlog replay produces conditions
that don't happen outside of recovery.  Even if there's no way to get a wrong
query answer today, this would be too easy to break later.  I wonder if we
make the same omission in other xlog replay functions.  Similar omissions may
cause wrong query answers, even if this particular one does not.

Would you like to bisect for the commit, or at least the major release, at
which the bug first appeared?

I wonder if your discovery has any relationship to this recently-reported case
of insufficient smgr invalidation:
http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=nh2o1fxkkj6yvobtvy...@mail.gmail.com

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-30 Thread Dilip Kumar
On Tue, Mar 29, 2016 at 10:08 AM, Amit Kapila 
wrote:

> Yes, that makes sense.  One more point is that if the reason for v13
> giving better performance is extra blocks (which we believe in certain
> cases can leak till the time Vacuum updates the FSM tree), do you think it
> makes sense to once test by increasing lockWaiters * 20 limit to may
> be lockWaiters * 25 or lockWaiters * 30.


I tested COPY 1 record, by increasing the number of blocks just to find
out why we are not as good as V13
 with extraBlocks = Min( lockWaiters * 40, 2048) and got below results..

COPY 1

Client  Patch(extraBlocks = Min( lockWaiters * 40, 2048))
   -
16 752
32 708

This proves that main reason of v13 being better is its adding extra blocks
without control.
though v13 is better than these results, I think we can get that also by
changing multiplier and max limit .

But I think we are ok with the max size as 4MB (512 blocks) right?.

Does this test make sense ?

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-30 Thread Tom Lane
Craig Ringer  writes:
> On 31 March 2016 at 07:49, Josh berkus  wrote:
>> So, can we stop supporting Windows native now?

> Why would we want to?

> The cost is small.

Surely you jest.  Windows is the single biggest PITA platform from a
portability perspective, and has been in every release cycle since we
first had a Windows port, and there is no close second.

If you have any doubts about this, please consult the commit logs over
the past week or three for a particularly painful sequence.

> ... is a burden carried mainly by those who
> care about Windows support.

Really?  Good.  I just committed my very last Windows-related fix, then.
Somebody else can deal with it.

Having thus vented, I do not seriously believe that we can drop
Windows-native.  But please don't give me any jive about how it's
a small cost to support that platform.

regards, tom lane

PS: if you meant that as a 1-April troll, you're a day early.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-03-30 Thread Pavan Deolasee
On Thu, Mar 31, 2016 at 6:27 AM, Craig Ringer  wrote:
>
>
> Can you describe the process used to generate the sample WAL segment?
>
>
Shame that I can't find the sql file used to create the problematic WAL
segment. But this is what I did.

I wrote a plpgsql function which inserts rows in a loop, every time
checking if the remaining space in the WAL segment  has fallen to less than
couple of hundred bytes. Of course, I used pg_settings to get the WAL
segment size, WAL page size and pg_current_xlog_insert_location() to
correctly compute remaining bytes in the WAL segment. At this point, do a
non-HOT update, preferably to table which is already fully vacuumed and
CHECKPOINTed to avoid getting any other WAL records in between. Assuming
FPW is turned ON, the UPDATE should generate enough WAL to cross over the
first page in the new WAL segment.

Let me know if you would like to me to put together a sql based on this
description.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-30 Thread Thomas Munro
On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada  wrote:
> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada  
> wrote:
>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>>  wrote:
>>> I personally don't think it needs such a survive measure. It is
>>> very small syntax and the parser reads very short text. If the
>>> parser failes in such mode, something more serious should have
>>> occurred.
>>>
>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  
>>> wrote in 
>>> 
 On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
  wrote:
 > Hello,
 >
 > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
 >  wrote in 
 > 
 > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
 >>  wrote:
 > As mentioned in my comment, SQL parser converts yy_fatal_error
 > into ereport(ERROR), which can be caught by the upper PG_TRY (by
 > #define'ing fprintf). So it is doable if you mind exit().

 I'm afraid that your idea doesn't work in postmaster. Because 
 ereport(ERROR) is
 implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
 flex fatal error occurs, postmaster just exits instead of jumping out of 
 parser.
>>>
>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>> fatal erros are recoverable. guc-file.l is doing so.
>>>
 ISTM that, when an internal flex fatal error occurs, it's
 better to elog(FATAL) and terminate the problematic
 process. This might lead to the server crash (e.g., if
 postmaster emits a FATAL error, it and its all child processes
 will exit soon). But probably we can live with this because the
 fatal error basically rarely happens.
>>>
>>> I agree to this
>>>
 OTOH, if we make the process keep running even after it gets an internal
 fatal error (like Sawada's patch or your idea do), this might cause more
 serious problem. Please imagine the case where one walsender gets the fatal
 error (e.g., because of OOM), abandon new setting value of
 synchronous_standby_names, and keep running with the previous setting 
 value.
 OTOH, the other walsender processes successfully parse the setting and
 keep running with new setting. In this case, the inconsistency of the 
 setting
 which each walsender is based on happens. This completely will mess up the
 synchronous replication.
>>>
>>> On the other hand, guc-file.l seems ignoring parser errors under
>>> normal operation, even though it may cause similar inconsistency,
>>> if any..
>>>
>>> | LOG:  received SIGHUP, reloading configuration files
>>> | LOG:  input in flex scanner failed at file 
>>> "/home/horiguti/data/data_work/postgresql.conf" line 1
>>> | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
>>> contains errors; no changes were applied
>>>
 Therefore, I think that it's better to make the problematic process exit
 with FATAL error rather than ignore the error and keep it running.
>>>
>>> +1. Restarting walsender would be far less harmful than keeping
>>> it running in doubtful state.
>>>
>>> Sould I wait for the next version or have a look on the latest?
>>>
>>
>> Attached latest patch incorporate some review comments so far, and is
>> rebased against current HEAD.
>>
>
> Sorry I attached wrong patch.
> Attached patch is correct patch.
>
> [mulit_sync_replication_v21.patch]

Here are some TPS numbers from some quick tests I ran on a set of
Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
as primary + 3 standbys, to try out different combinations of
synchronous_commit levels and synchronous_standby_names numbers.  They
were run for a short time only and these are of course systems with
limited and perhaps uneven IO and CPU, but they still give some idea
of the trends.  And reassuringly, the trends are travelling in the
expected directions.

All default settings except shared_buffers = 1GB, and the GUCs
required for replication.

pgbench postgres -j2 -c2 -N bench2 -T 600

   1(*) 2(*) 3(*)
     
off  = 4056 4096 4092
local= 1323 1299 1312
remote_write = 1130 1046  958
on   =  860  744  701
remote_apply =  785  725  604

pgbench postgres -j16 -c16 -N bench2 -T 600

   1(*) 2(*) 3(*)
     
off  = 3952 3943 3933
local= 2964 2984 3026
remote_write = 2790 2724 2675
on   = 2731 2627 2523
remote_apply = 2627 2501 2432

One thing I noticed is that there are LOG messages telling me when a
standby becomes a synchronous standby, but nothing to tell me if a
standby stops being a standby 

Re: [HACKERS] Very small patch for decode.c

2016-03-30 Thread Alvaro Herrera
Konstantin Knizhnik wrote:
> diff --git a/src/backend/replication/logical/decode.c 
> b/src/backend/replication/logical/decode.c
> index 2380ea2..a992662 100644
> --- a/src/backend/replication/logical/decode.c
> +++ b/src/backend/replication/logical/decode.c
> @@ -488,7 +488,7 @@ DecodeCommit(LogicalDecodingContext *ctx, 
> XLogRecordBuffer *buf,
>  {
> XLogRecPtr  origin_lsn = InvalidXLogRecPtr;
> TimestampTz commit_time = parsed->xact_time;
> -   XLogRecPtr  origin_id = XLogRecGetOrigin(buf->record);
> +   RepOriginId origin_id = XLogRecGetOrigin(buf->record);
> int i;

Pushed, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-03-30 Thread Michael Paquier
On Thu, Mar 31, 2016 at 1:14 AM, Robert Haas  wrote:
> On Wed, Mar 30, 2016 at 9:46 AM, Michael Paquier
>  wrote:
>>> Things I noticed:
>>> 1.
>>> when using either
>>> CREATE ROLE
>>> ALTER ROLE
>>> with the parameter
>>> ENCRYPTED
>>> md5 encryption is always assumed (I've come to realize that UNENCRYPTED
>>> always equals plain and, in the past, ENCRYPTED equaled md5 since there were
>>> no other options)
>>
>> Yes, that's to match the current behavior, and make something fully
>> backward-compatible. Switching to md5 + scram may have made sense as
>> well though.
>
> I think we're not going to have much luck getting people to switch
> over to SCRAM if the default remains MD5. Perhaps there should be a
> GUC for this - and we can initially set that GUC to md5, allowing
> people who are ready to adopt SCRAM to change it.  And then in a later
> release we can change the default, once we're pretty confident that
> most connectors have added support for the new authentication method.
> This is going to take a long time to roll out.
> Alternatively, we could control it strictly through DDL.

This maps quite a lot with the existing password_encryption, so adding
a GUC to control only the format of protocols only for ENCRYPTED is
disturbing, say password_encryption_encrypted. I'd rather keep
ENCRYPTED to md5 as default when password_encryption is 'on', switch
to scram a couple of releases later, and extend the DDL grammar with
something like PROTOCOL {'md5' | 'plain' | 'scram'}, which can be used
instead of UNENCRYPTED | ENCRYPTED as an additional keyword. Smooth
transition to a more-extensive system.

> Note that the existing behavior is pretty wonky:
> alter user rhaas unencrypted password 'foo'; -> rolpassword foo
> alter user rhaas encrypted password 'foo'; -> rolpassword
> md5e748797a605a1c95f3d6b5f140b2d528
> alter user rhaas encrypted password
> 'md5e748797a605a1c95f3d6b5f140b2d528'; -> rolpassword
> md5e748797a605a1c95f3d6b5f140b2d528
> alter user rhaas unencrypted password
> 'md5e748797a605a1c95f3d6b5f140b2d528'; -> rolpassword
> md5e748797a605a1c95f3d6b5f140b2d528

I actually wrote some regression tests for that. Those are upthread as
part of 0001, have for example a look at password.sql.

> So basically the use of the ENCRYPTED keyword means "if it does
> already seem to be the sort of MD5 blob we're expecting, turn it into
> that".  And we just rely on the format to distinguish between an MD5
> verifier and an unencrypted password.  Personally, I think a good
> start here, and I think you may have something like this in the patch
> already, would be to split rolpassword into two columns, say
> rolencryption and rolpassword.  rolencryption says how the password
> verifier is encrypted and rolpassword contains the verifier itself.

The patch has something like that. And doing this split is not that
complicated to be honest. Surely that would be clearer than relying on
the prefix of the identifier to see if it is md5 or not.

> Initially, rolencryption will be 'plain' or 'md5', but later we can
> add 'scram' as another choice, or maybe it'll be more specific like
> 'scram-hmac-doodad'.  And then maybe introduce syntax like this:
>
> alter user rhaas set password 'raw-unencrypted-passwordt' using
> 'verifier-method';
> alter user rhaas set password verifier 'verifier-goes-here' using
> 'verifier-method';
>
> That might require making verifier a key word, which would be good to
> avoid.  Perhaps we could use "password validator" instead?

Yes, that matches what I wrote above. At this point putting that back
on board and discuss it openly at PGCon is the best course of action
IMO.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Alvaro Herrera
Michael Paquier wrote:

> Just a note: I began looking at the tests, but finished looking at the
> patch entirely at the end by curiosity. Regarding the integration of
> this patch for 9.6, I think that bumping that to 9.7 would be wiser
> because the patch needs to be re-written largely, and that's never a
> good sign at this point of the development cycle.

Not rewritten surelY?  It will need a very large mechanical change to
existing BufferGetPage calls, but to me that doesn't equate "rewriting"
it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-30 Thread David Steele
On 3/30/16 4:18 AM, Magnus Hagander wrote:
> 
> On Wed, Mar 30, 2016 at 4:10 AM, David Steele  > wrote:
> 
> This certainly looks like it would work but it raises the barrier for
> implementing backups by quite a lot.  It's fine for backrest or barman
> but it won't be pleasant for anyone who has home-grown scripts.
> 
> 
> How much does it really raise the bar, though?
> 
> It would go from "copy all files and make damn sure you copy pg_control
> last, and rename it to pg_control.backup" to "take this binary blob you
> got from the server and write it to pg_control.backup"?
> 
> Also, the target of these APIs is specifically the backup tools and not
> homewritten scripts.

Then what would home-grown scripts use, the deprecated API that we know
has issues?

> A simple shellscript will have trouble enough using
> it in the first place since it requires a persistent connection to the
> database. 

All that's required is to spawn a psql process.  I'm no shell expert but
that's simple enough.

> But those scripts are likely broken anyway.

Yes, probably.  Backup and especially archiving correctly are harder
than most people realize.

> <...>
> 
> The main reason for Heikki to suggest this one over the other basic one
> is that it brings protection against the "backup script/program crashed
> halfway through but the user still tried to restore from that". They will
> outright fail because there is no pg_control.backup in that case.

But if we are going to make this complicated I'm not sure it's a big
deal just to require pg_control to be copied last.  pgBackRest already
does that and Barman probably does, too.

I don't see "don't copy pg_control" and "copy pg_control last" as all
that different in terms of complexity.

pgBackRest also *restores* pg_control last which I think is pretty
important and would not be addressed by this solution.

> If we
> don't care about that, then we can go back to just saying "copy
> pg_control last and we're done". But you yourself complained about that
> requirement because it's too easy to get wrong (though you advocated
> using backup_label to transfer the data over -- but that has the
> potential for getting more complicated if we now or at any point in the
> future want more than one field to transfer, for example).

Perhaps.  I'm still not convinced that getting some information from
backup_label and other information from pg_control is a good solution.
I would rather write the recovery point into the backup_label and use
that instead of the value in pg_control.  Text files are much easier to
parse and push around accurately (and test).

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Correction for replication slot creation error message in 9.6

2016-03-30 Thread Michael Paquier
On Thu, Mar 31, 2016 at 10:15 AM, Ian Barwick  wrote:
> Currently pg_create_physical_replication_slot() may refer to
> the deprecated wal_level setting "archive":

(Adding Peter in CC who committed this patch).

> Patch changes the error message to:
>
>   ERROR:  replication slots can only be used if wal_level is "replica" or 
> "logical"

Sounds right to me.

> Explicitly naming the valid WAL levels matches the wording in the wal_level
> error hint used in a couple of places, i.e.
>
>   "wal_level must be set to "replica" or "logical" at server start."

It is worth telling that Peter and I both had this code in front of
our eyes during the review :) Still we missed that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-30 Thread Peter Geoghegan
Okay. I'll look at it later today.

--
Peter Geoghegan


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-30 Thread Josh berkus
On 03/30/2016 02:47 PM, Josh berkus wrote:
> On 03/29/2016 07:43 PM, Peter Geoghegan wrote:
>> Do you think it would be okay if the SQL query to detect potentially
>> affected indexes only considered the leading attribute? Since that's
>> the only attribute that could use abbreviated keys, it ought to be
>> safe to not require users to REINDEX indexes that happen to have a
>> second-or-subsequent text/varchar(n) attribute that doesn't use the C
>> locale. Maybe it's not worth worrying about.
> 
> I think that's a great idea.

Based on that concept, I wrote a query which is now on the wiki page.
Please fix it if it's not showing what we want it to show.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Correction for replication slot creation error message in 9.6

2016-03-30 Thread Ian Barwick
Hi

Currently pg_create_physical_replication_slot() may refer to
the deprecated wal_level setting "archive":

  postgres=# SHOW wal_level ;
   wal_level
  ---
   minimal
  (1 row)

  postgres=# SELECT pg_create_physical_replication_slot('some_slot');
  ERROR:  replication slots can only be used if wal_level is  >= archive

Patch changes the error message to:

  ERROR:  replication slots can only be used if wal_level is "replica" or 
"logical"

Explicitly naming the valid WAL levels matches the wording in the wal_level
error hint used in a couple of places, i.e.

  "wal_level must be set to "replica" or "logical" at server start."


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
new file mode 100644
index c13be75..82f6e65
*** a/src/backend/replication/slot.c
--- b/src/backend/replication/slot.c
*** CheckSlotRequirements(void)
*** 763,769 
  	if (wal_level < WAL_LEVEL_REPLICA)
  		ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!  errmsg("replication slots can only be used if wal_level >= archive")));
  }
  
  /*
--- 763,769 
  	if (wal_level < WAL_LEVEL_REPLICA)
  		ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
!  errmsg("replication slots can only be used if wal_level is \"replica\" or \"logical\"")));
  }
  
  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-03-30 Thread Craig Ringer
On 23 March 2016 at 18:04, Pavan Deolasee  wrote:

>
>
> On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>>
>> +   /*
>> +* Compute targetRecOff. It should typically be greater than short
>> +* page-header since a valid record can't , but can also be zero
>> when
>> +* caller has supplied a page-aligned address or when we are
>> skipping
>> +* multi-page continuation record. It doesn't matter though
>> because
>> +* ReadPageInternal() will read at least short page-header worth
>> of
>> +* data
>> +*/
>> This needs some polishing, there is an unfinished sentence here.
>>
>> +   targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
>> targetRecOff, pageHeaderSize and targetPagePtr could be declared
>> inside directly the new while loop.
>>
>
> Thanks Michael for reviewing the patch. I've fixed these issues and new
> version is attached.
>

Looks sensible to me based on a reading of "git diff -w" of the applied
patch. It passes make check and make -C src/test/recovery check . Marked
ready for committer; I'd like to add a TAP test for it, but it's ready to
go without that.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-30 Thread Craig Ringer
On 31 March 2016 at 07:49, Josh berkus  wrote:

> So, can we stop supporting Windows native now?

Why would we want to?

The cost is small. People use it. Things like integrated SSPI
authentication only work on native.

About the only issue I think it causes is with the build system. Which
isn't beautiful, to be sure, but is a burden carried mainly by those who
care about Windows support. If we eventually get a CMake build system
conversion that'll mostly go away too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-03-30 Thread Craig Ringer
On 23 March 2016 at 18:04, Pavan Deolasee  wrote:

>
>
> On Wed, Mar 23, 2016 at 1:13 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>>
>> +   /*
>> +* Compute targetRecOff. It should typically be greater than short
>> +* page-header since a valid record can't , but can also be zero
>> when
>> +* caller has supplied a page-aligned address or when we are
>> skipping
>> +* multi-page continuation record. It doesn't matter though
>> because
>> +* ReadPageInternal() will read at least short page-header worth
>> of
>> +* data
>> +*/
>> This needs some polishing, there is an unfinished sentence here.
>>
>> +   targetRecOff = tmpRecPtr % XLOG_BLCKSZ;
>> targetRecOff, pageHeaderSize and targetPagePtr could be declared
>> inside directly the new while loop.
>>
>
> Thanks Michael for reviewing the patch. I've fixed these issues and new
> version is attached.
>
>
Can you describe the process used to generate the sample WAL segment?

I'd like to turn it into a TAP test to go along with the patch.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Very small patch for decode.c

2016-03-30 Thread Craig Ringer
On 31 March 2016 at 03:34, Konstantin Knizhnik 
wrote:

> diff --git a/src/backend/replication/logical/decode.c
> b/src/backend/replication/logical/decode.c
> index 2380ea2..a992662 100644
> --- a/src/backend/replication/logical/decode.c
> +++ b/src/backend/replication/logical/decode.c
> @@ -488,7 +488,7 @@ DecodeCommit(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf,
>  {
> XLogRecPtr  origin_lsn = InvalidXLogRecPtr;
> TimestampTz commit_time = parsed->xact_time;
> -   XLogRecPtr  origin_id = XLogRecGetOrigin(buf->record);
> +   RepOriginId origin_id = XLogRecGetOrigin(buf->record);
> int i;
>

Even though it's trivial, created as
https://commitfest.postgresql.org/10/594/ so we don't lose it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Timeline following for logical slots

2016-03-30 Thread Craig Ringer
On 31 March 2016 at 07:15, Alvaro Herrera  wrote:


> > Available attached or at
> >
> https://github.com/2ndQuadrant/postgres/tree/dev/logical-decoding-timeline-following
>
> And pushed this too.
>

Much appreciated. Marked as committed at
https://commitfest.postgresql.org/9/568/ .

This gives us an option for failover of logical replication in 9.6, even if
it's a bit cumbersome and complex for the client, in case failover slots
don't make the cut. And, of course, it's a pre-req for failover slots,
which I'll rebase on top of it shortly.

Andres, I tried to address your comments as best I could. The main one that
I think stayed open was about the loop that finds the last timeline on a
segment. If you think that's better done by directly scanning the List* of
timeline history entries I'm happy to prep a follow-up.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-03-30 Thread Michael Paquier
On Mon, Mar 28, 2016 at 10:15 PM, Andres Freund  wrote:
> It's definitely too late for that; they're going to be wrapped in a
> couple hours.

I have added this patch to the next CF so as we do not lose track of this bug:
https://commitfest.postgresql.org/10/593/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Very small patch for decode.c

2016-03-30 Thread Michael Paquier
On Thu, Mar 31, 2016 at 4:34 AM, Konstantin Knizhnik
 wrote:
> diff --git a/src/backend/replication/logical/decode.c
> b/src/backend/replication/logical/decode.c
> index 2380ea2..a992662 100644
> --- a/src/backend/replication/logical/decode.c
> +++ b/src/backend/replication/logical/decode.c
> @@ -488,7 +488,7 @@ DecodeCommit(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf,
>  {
> XLogRecPtr  origin_lsn = InvalidXLogRecPtr;
> TimestampTz commit_time = parsed->xact_time;
> -   XLogRecPtr  origin_id = XLogRecGetOrigin(buf->record);
> +   RepOriginId origin_id = XLogRecGetOrigin(buf->record);
> int i;

Good catch!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-03-30 Thread Michael Paquier
On Thu, Mar 31, 2016 at 6:09 AM, Julien Rouhaud
 wrote:
> On 19/03/2016 01:11, Jim Nasby wrote:
>> On 3/3/16 3:54 AM, Kyotaro HORIGUCHI wrote:
>>> I wonder why there haven't been discussions so far on what kind
>>> of information we want by this feature. For example I'd be happy
>>> to see the time of last autovacuum trial and the cause if it has
>>> been skipped for every table. Such information would (maybe)
>>> naturally be shown in pg_stat_*_tables.
>>>
>>> =
>>> =# select relid, last_completed_autovacuum,
>>> last_completed_autovacv_status, last_autovacuum_trial,
>>> last_autovacuum_result from pg_stat_user_tables;
>>> -[ RECORD 1 ]-+--
>>> relid | 16390
>>> last_completed_autovacuum | 2016-03-01 01:25:00.349074+09
>>> last_completed_autovac_status | Completed in 4 seconds. Scanned 434
>>> pages, skipped 23 pages
>>> last_autovacuum_trial | 2016-03-03 17:33:04.004322+09
>>> last_autovac_traial_status| Canceled by PID 2355. Processed
>>> 144/553 pages.
>>> -[ RECORD 2 ]--+--
>>> ...
>>> last_autovacuum_trial | 2016-03-03 07:25:00.349074+09
>>> last_autovac_traial_status| Completed in 4 seconds. Scanned 434
>>> pages, skipped 23 pages
>>> -[ RECORD 3 ]--+--
>>> ...
>>> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
>>> last_autovac_trial_status | Processing by PID 42334, 564 / 32526
>>> pages done.
>>> -[ RECORD 4 ]--+--
>>> ...
>>> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
>>> last_autovac_trial_status | Skipped by dead-tuple threashold.
>>> =
>>
>> I kinda like where you're going here, but I certainly don't think the
>> stats system is the way to do it. Stats bloat is already a problem on
>> bigger systems. More important, I don't think having just the last
>> result is very useful. If you've got a vacuum problem, you want to see
>> history, especially history of the vacuum runs themselves.
>>
>> The good news is that vacuum is a very low-frequency operation, so it
>> has none of the concerns that the generic stats system does. I think it
>> would be reasonable to provide event triggers that fire on every
>> launcher loop, after a worker has built it's "TODO list", and after
>> every (auto)vacuum.
>
> The main issue I see with an event trigger based solution is that you'll
> always have to create them and the needed objects on every database.

Which has surely a performance impact as those are row-based. I have
seen complains regarding the fact that those objects can be easily
forgotten...

> Another issue is that both of these approach are not intended to give a
> global overview but per-database statistics. I'd prefer a global overview.

That's important, autovacuum GUC parameters, like the number of
workers, are system-wide.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-30 Thread Michael Paquier
On Thu, Mar 31, 2016 at 8:49 AM, Josh berkus  wrote:
> http://www.zdnet.com/article/microsoft-and-canonical-partner-to-bring-ubuntu-to-windows-10/
>
> ... could be good news for us ...

Possible. We are years ahead of that for sure. Also the outcome of the
partnership, as well as the performance impact that there could be if
we rely on an extra application layer, aka the layer in charge of
transmitting POSIX-compliant calls to the Windows kernel, may be
costly and needs to be evaluated carefully.

1st of April is close by btw. That's not the best timing for credibility.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Michael Paquier
On Thu, Mar 31, 2016 at 5:09 AM, Kevin Grittner  wrote:
> On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera
>  wrote:
>> I understand the invasiveness argument, but to me the danger of
>> introducing new bugs trumps that.  The problem is not the current code,
>> but future patches: it is just too easy to make the mistake of not
>> checking the snapshot in new additions of BufferGetPage.  And you said
>> that the result of missing a check is silent wrong results from queries
>> that should instead be cancelled, which seems fairly bad to me.
>
> Fair point.

That's my main concern after going through the patch, and the patch
written as-is does not help much future users. This could be easily
forgotten by committers as well.

>> I said that we should change BufferGetPage into having the snapshot
>> check built-in, except in the cases where a flag is passed; and the flag
>> would be passed in all cases except those 30-something you identified.
>> In other words, the behavior in all the current callsites would be
>> identical to what's there today; we could have a macro do the first
>> check so that we don't introduce the overhead of a function call in the
>> 450 cases where it's not needed.
>
> In many of the places that BufferGetPage is called there is not a
> snapshot available.  I assume that you would be OK with an Assert
> that the flag was passed if the snapshot is NULL?  I had been
> picturing what you were requesting as just adding a snapshot
> parameter and assuming that NULL meant not to check; adding two
> parameters where the flag explicitly calls that the check is not
> needed might do more to prevent accidents, but I do wonder how much
> it would help during copy/paste frenzy.  Touching all spots to use
> the new function signature would be a mechanical job with the
> compiler catching any errors, so it doesn't seem crazy to refactor
> that now, but I would like to hear what some others think about
> this.

That's better than what the existing patch for sure. When calling
BufferGetPage() one could be tempted to forget to set snapshot to NULL
though. It should be clearly documented in the header of
BufferGetPage() where and for which purpose a snapshot should be set,
and in which code paths it is expected to be used. In our case, that's
mainly when a page is fetched from shared buffers and that it is used
for reading tuples from it.

Just a note: I began looking at the tests, but finished looking at the
patch entirely at the end by curiosity. Regarding the integration of
this patch for 9.6, I think that bumping that to 9.7 would be wiser
because the patch needs to be re-written largely, and that's never a
good sign at this point of the development cycle.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.6 Open Item Ownership

2016-03-30 Thread Noah Misch
The release management team has determined the following:

  From time to time, individual members of the release management team (RMT)
  may attribute a PostgreSQL 9.6 open item to a particular git commit and
  determine whether or not it is a beta1 blocker.  The RMT member will send a
  notification email, which will request a plan for resolving the issue, To:
  the associated committer and Cc: pgsql-hackers@.  For beta1 blockers, the
  notification will specify delivery of a plan within 48 calendar hours and
  resolution within 120 calendar hours.  For other issues, the notification
  will specify delivery of a plan within 72 calendar hours and resolution
  within 168 calendar hours.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FATAL: could not send end-of-streaming message to primary: no COPY in progress

2016-03-30 Thread Thomas Munro
Hi hackers,

If you shut down a primary server, a standby that is streaming from it says54:

LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1 at 0/14F4B68.
FATAL:  could not send end-of-streaming message to primary: no COPY in progress

Isn't that FATAL ereport a bug?

I haven't worked out the root cause but the immediate problem seems to
be libpqrcv_endstreaming calls PQputCopyEnd which doesn't like the
state that the libpq connection is in, namely PGASYNC_BUSY.  That
state seems to have been established by the call to walrcv_receive
that returned -1 (end of copy).  It doesn't happen in the similar case
of promotion of the remote server.

How is clean server shutdown supposed to work?  It looks like
walsender sends COPY 0 and then just hangs up.  Meanwhile, walreceiver
has to distinguish between that case and the the new timeline case
which involves a further exchange of messages.  Is an explicit message
at the end of the copy stream saying either "goodbye" or "but wait,
there's more" lacking here?  Or is there some other way that
walreceiver could distinguish between clean shutdown of remote server
(no error necessary), unclean shutdown of remote server, and timeline
negotiation?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] So, can we stop supporting Windows native now?

2016-03-30 Thread Josh berkus
http://www.zdnet.com/article/microsoft-and-canonical-partner-to-bring-ubuntu-to-windows-10/

... could be good news for us ...

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-30 Thread Tom Lane
Kevin Grittner  writes:
> I'm taking my name off as committer and marking it "Ready for
> Committer".  If someone else wants to comment on the issues where
> Tom and Kyotaro-san still seem unsatisfied to the point where I
> can get my head around it, I could maybe take it back on as
> committer -- if anyone feels that could be a net win.

I'll pick it up.  In a quick scan I see some things I don't like, but
nothing insoluble:

* Having plancat.c initialize the per-IndexOptInfo restriction lists seems
fairly bizarre.  I realize that Tomas or Kyotaro probably did that in
response to my complaint about having check_partial_indexes have
side-effects on non-partial indexes, but this isn't an improvement.  For
one thing, it means we will produce an incorrect plan if more restriction
clauses are added to the rel after plancat.c runs, as the comment for
check_partial_indexes contemplates.  (I *think* that comment may be
obsolete, but I'm not sure.)  I think a better answer to that complaint is
to rename check_partial_indexes to something else, and more importantly
adjust its header comment to reflect its expanded responsibilities --- as
the patch I was commenting on at the time failed to do.

* It took me some time to convince myself that the patch doesn't break
generation of plans for EvalPlanQual.  I eventually found where it
skips removal of restrictions for target relations, but that's drastically
undercommented.

* Likewise, there's inadequate documentation of why it doesn't break
bitmap scans, which also have to be able to recheck correctly.

* I'm not sure that we have any regression test cases covering the
above two points, but we should.

* The comments leave a lot to be desired in general, eg there are
repeated failures to update comments only a few lines away from a change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-03-30 Thread Alvaro Herrera
Craig Ringer wrote:

> It removes the questionable cleanups, fixes the randAccess comment (IMO),

I pushed cleanup separately, including the addition of a few comments
that were documenting the original code.  I actually made a mistake in
extracting those, so there's one comment that's actually bogus in that
commit (the candidate_restart_lsn one); it's fixed in the second commit.
Sorry about that.  I also introduced XLogReaderInvalReadState here,
called XLogReaderInvalCache originally, since Andres objected to the
"cache" terminology (though you will note that the word "cache" was used
in the original code too.)

>  changes the "open a new xlog segment" wording and explains why we don't
> need GetFlushRecPtr/GetXLogReplayRecPtr on replay of a historical timeline.
> I removed the change that did less than a whole page at the XLogRead call
> and instead added a comment explaining it. Fixed the brainfrart with
> candidate_restart_lsn, removed the outdated startptr comment and update and
> the unnecessary XLogReaderInvalCache call after it.
> 
> Also renamed src/test/modules/decoding_failover to
> src/test/modules/test_slot_timelines . Best name I could come up with that
> wasn't 1000chars long.
> 
> Passes main 'check', contrib/test_decoding check,
> src/test/modules/test_slot_timelines check and src/test/recovery check.
> 
> Available attached or at
> https://github.com/2ndQuadrant/postgres/tree/dev/logical-decoding-timeline-following

And pushed this too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-30 Thread Andres Freund
On 2016-03-28 22:50:49 +0530, Amit Kapila wrote:
> On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila 
> wrote:
> >
> > On Thu, Sep 3, 2015 at 5:11 PM, Andres Freund  wrote:
> > >
> >
> > Updated comments and the patch (increate_clog_bufs_v2.patch)
> > containing the same is attached.
> >
> 
> Andres mentioned to me in off-list discussion, that he thinks we should
> first try to fix the clog buffers problem as he sees in his tests that clog
> buffer replacement is one of the bottlenecks. He also suggested me a test
> to see if the increase in buffers could lead to regression.  The basic idea
> of test was to ensure every access on clog access to be a disk one.  Based
> on his suggestion, I have written a SQL statement which will allow every
> access of CLOG to be a disk access and the query used for same is as below:
> With ins AS (INSERT INTO test_clog_access values(default) RETURNING c1)
> Select * from test_clog_access where c1 = (Select c1 from ins) - 32768 *
> :client_id;
> 
> Test Results
> -
> HEAD - commit d12e5bb7 Clog Buffers - 32
> Patch-1 - Clog Buffers - 64
> Patch-2 - Clog Buffers - 128
> 
> 
> Patch_Ver/Client_Count 1 64
> HEAD 12677 57470
> Patch-1 12305 58079
> Patch-2 12761 58637
> 
> Above data is a median of 3 10-min runs.  Above data indicates that there
> is no substantial dip in increasing clog buffers.
> 
> Test scripts used in testing are attached with this mail.  In
> perf_clog_access.sh, you need to change data_directory path as per your
> m/c, also you might want to change the binary name, if you want to create
> postgres binaries with different names.
> 
> Andres, Is this test inline with what you have in mind?

Yes. That looks good. My testing shows that increasing the number of
buffers can increase both throughput and reduce latency variance. The
former is a smaller effect with one of the discussed patches applied,
the latter seems to actually increase in scale (with increased
throughput).


I've attached patches to:
0001: Increase the max number of clog buffers
0002: Implement 64bit atomics fallback and optimize read/write
0003: Edited version of Simon's clog scalability patch

WRT 0003 - still clearly WIP - I've:
- made group_lsn pg_atomic_u64*, to allow for tear-free reads
- split content from IO lock
- made SimpleLruReadPage_optShared always return with only share lock
  held
- Implement a different, experimental, concurrency model for
  SetStatusBit using cmpxchg. A define USE_CONTENT_LOCK controls which
  bit is used.

I've tested this and saw this outperform Amit's approach. Especially so
when using a read/write mix, rather then only reads. I saw over 30%
increase on a large EC2 instance with -btpcb-like@1 -bselect-only@3. But
that's in a virtualized environment, not very good for reproducability.

Amit, could you run benchmarks on your bigger hardware? Both with
USE_CONTENT_LOCK commented out and in?

I think we should go for 1) and 2) unconditionally. And then evaluate
whether to go with your, or 3) from above. If the latter, we've to do
some cleanup :)

Greetings,

Andres Freund
>From 2f8828097337abb73effa098dab8bf61f2e7f3bb Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 30 Mar 2016 01:05:19 +0200
Subject: [PATCH 1/3] Improve 64bit atomics support.

When adding atomics back in b64d92f1a, I added 64bit support as
optional; there wasn't yet a direct user in sight.  That turned out to
be a bit short-sighted, it'd already have been useful a number of times.

Add a fallback implementation of 64bit atomics, just like the one we
have for 32bit atomics.

Additionally optimize reads/writes to 64bit on a number of platforms
where aligned writes of that size are atomic. This can now be tested
with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY.
---
 src/backend/port/atomics.c   | 63 
 src/include/port/atomics.h   | 13 +++-
 src/include/port/atomics/arch-ia64.h |  3 ++
 src/include/port/atomics/arch-ppc.h  |  3 ++
 src/include/port/atomics/arch-x86.h  | 10 ++
 src/include/port/atomics/fallback.h  | 33 +++
 src/include/port/atomics/generic.h   | 24 +++---
 src/test/regress/regress.c   |  4 ---
 8 files changed, 137 insertions(+), 16 deletions(-)

diff --git a/src/backend/port/atomics.c b/src/backend/port/atomics.c
index 4972c30..8e53311 100644
--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -146,3 +146,66 @@ pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
 }
 
 #endif   /* PG_HAVE_ATOMIC_U32_SIMULATION */
+
+
+#ifdef PG_HAVE_ATOMIC_U64_SIMULATION
+
+void
+pg_atomic_init_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 val_)
+{
+	StaticAssertStmt(sizeof(ptr->sema) >= sizeof(slock_t),
+	 "size mismatch of atomic_flag vs slock_t");
+
+	/*
+	 * If we're using semaphore based atomic flags, be careful about nested
+	 * usage of atomics while a spinlock is held.
+	 */

Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-30 Thread Vitaly Burovoy
On 3/29/16, Tom Lane  wrote:
> Pushed with minor adjustments.
>
>   regards, tom lane
>

Thank you very much!

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-03-30 Thread Vitaly Burovoy
On 3/25/16, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Here is a new version of path, I hope I didn't miss anything. Few notes:
>
>> 4.
>> or even create a new constant (there can be better name for it):
>> #define JB_PATH_CREATE_OR_INSERT (JB_PATH_INSERT_BEFORE |
>> JB_PATH_INSERT_AFTER | JB_PATH_CREATE)
>
> Good idea, thanks.

You're welcome.
===

I'm sorry for the late letter.

Unfortunately, it seems one more round is necessary.
The documentation changes still has to be fixed.

The main description points to a "target section designated by path is
a JSONB array". It is a copy-paste from jsonb_set, but here it has
wrong context.
The main idea in jsonb_set is replacing any object which is pointed
(designated) by "path" no matter where (array or object) it is
located.
But in the phrase for jsonb_insert above you want to point to an upper
object _where_ "section designated by path" is located.

I'd write something like "If target section designated by path is _IN_
a JSONB array, ...". The same thing with "a JSONB object".

Also I'd rewrite "will act like" as "behaves as".

Last time I missed the argument's name "insert_before_after". It is
better to replace it as just "insert_after". Also reflect that name in
the documentation properly.

To be consistent change the constant "JB_PATH_NOOP" as "0x"
instead of just "0x0".

Also the variable's name "bool before" is incorrect because it
contradicts with the SQL function's argument "after" (from the
documentation) or "insert_after" (proposed above) or tests (de-facto
behavior). Moreover it seems the logic in the code is correct, so I
have no idea why "before ? JB_PATH_INSERT_BEFORE :
JB_PATH_INSERT_AFTER" works.
I think either proper comment should be added or lack in the code must be found.
Anyway the variable's name must reflect the SQL argument's name.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [CommitFest App] Feature request -- review e-mail additions

2016-03-30 Thread Stephen Frost
* José Luis Tallón (jltal...@adv-solutions.net) wrote:
> * Prepend a [review] tag to the e-mail subject
> ... so that e-mails sent to -hackers will read  " [HACKERS]
> [review] "

Eh, I'm not against it but not sure it's all that necessary either.

> * Auto-CC the patch author on this e-mail
> I guess this should speed up reactions / make communication a
> bit more fluid.

This is almost a requirement, imv, as subsequent comments on the review
are frequent and it's a crime if the author misses all of that.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-30 Thread Josh berkus
On 03/29/2016 07:43 PM, Peter Geoghegan wrote:
> On Tue, Mar 29, 2016 at 7:31 PM, Tom Lane  wrote:
>> A corrupt index could easily fail to detect uniqueness violations (because
>> searches fail to find entries they should find).  Not sure I believe that
>> it would make false reports of a uniqueness conflict that's not really
>> there.

I meant failing to detect a violation, and thus letting the user insert
a duplicate key.

> Sure. But looking at how texteq() is implemented, it certainly seems
> impossible that that could happen. Must have been a miscommunication
> somewhere. I'll fix it.

There was speculation on this in the -bugs thread, and nobody
contradicted it.  If you're fairly sure that it wouldn't happen, your
knowledge of the issue is definitely superior to mine.

> Do you think it would be okay if the SQL query to detect potentially
> affected indexes only considered the leading attribute? Since that's
> the only attribute that could use abbreviated keys, it ought to be
> safe to not require users to REINDEX indexes that happen to have a
> second-or-subsequent text/varchar(n) attribute that doesn't use the C
> locale. Maybe it's not worth worrying about.

I think that's a great idea.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-03-30 Thread Julien Rouhaud
On 19/03/2016 01:11, Jim Nasby wrote:
> On 3/3/16 3:54 AM, Kyotaro HORIGUCHI wrote:
>> I wonder why there haven't been discussions so far on what kind
>> of information we want by this feature. For example I'd be happy
>> to see the time of last autovacuum trial and the cause if it has
>> been skipped for every table. Such information would (maybe)
>> naturally be shown in pg_stat_*_tables.
>>
>> =
>> =# select relid, last_completed_autovacuum,
>> last_completed_autovacv_status, last_autovacuum_trial,
>> last_autovacuum_result from pg_stat_user_tables;
>> -[ RECORD 1 ]-+--
>> relid | 16390
>> last_completed_autovacuum | 2016-03-01 01:25:00.349074+09
>> last_completed_autovac_status | Completed in 4 seconds. Scanned 434
>> pages, skipped 23 pages
>> last_autovacuum_trial | 2016-03-03 17:33:04.004322+09
>> last_autovac_traial_status| Canceled by PID 2355. Processed
>> 144/553 pages.
>> -[ RECORD 2 ]--+--
>> ...
>> last_autovacuum_trial | 2016-03-03 07:25:00.349074+09
>> last_autovac_traial_status| Completed in 4 seconds. Scanned 434
>> pages, skipped 23 pages
>> -[ RECORD 3 ]--+--
>> ...
>> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
>> last_autovac_trial_status | Processing by PID 42334, 564 / 32526
>> pages done.
>> -[ RECORD 4 ]--+--
>> ...
>> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
>> last_autovac_trial_status | Skipped by dead-tuple threashold.
>> =
> 
> I kinda like where you're going here, but I certainly don't think the
> stats system is the way to do it. Stats bloat is already a problem on
> bigger systems. More important, I don't think having just the last
> result is very useful. If you've got a vacuum problem, you want to see
> history, especially history of the vacuum runs themselves.
> 
> The good news is that vacuum is a very low-frequency operation, so it
> has none of the concerns that the generic stats system does. I think it
> would be reasonable to provide event triggers that fire on every
> launcher loop, after a worker has built it's "TODO list", and after
> every (auto)vacuum.

The main issue I see with an event trigger based solution is that you'll
always have to create them and the needed objects on every database.

Another issue is that both of these approach are not intended to give a
global overview but per-database statistics. I'd prefer a global overview.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-03-30 Thread Robert Haas
On Wed, Mar 30, 2016 at 12:31 PM, José Luis Tallón
 wrote:
> On 03/30/2016 06:14 PM, Robert Haas wrote:
>> So basically the use of the ENCRYPTED keyword means "if it does already
>> seem to be the sort of MD5 blob we're expecting, turn it into that".
>
> If it does NOT already seem to be... I guess?

Yes, that's what I meant.  Sorry.

>> rolencryption says how the password verifier is encrypted and rolpassword
>> contains the verifier itself. Initially, rolencryption will be 'plain' or
>> 'md5', but later we can add 'scram' as another choice, or maybe it'll be
>> more specific like 'scram-hmac-doodad'.
>
> May I suggest using  "{" ["."] "}" just like Dovecot does?

Doesn't seem very SQL-ish to me...  I think we should normalize.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Thu, Mar 24, 2016 at 2:24 AM, Michael Paquier
 wrote:

> I have been looking at 4a, the test module, and things are looking
> good IMO. Something that I think would be adapted would be to define
> the options for isolation tests in a variable, like ISOLATION_OPTS to
> allow MSVC scripts to fetch those option values more easily.

Maybe, but that seems like material for a separate patch.

> +submake-test_decoding:
> +   $(MAKE) -C $(top_builddir)/src/test/modules/snapshot_too_old
> The target name here is incorrect. This should refer to snapshot_too_old.

Good catch.  Fixed.

So far, pending the resolution of the suggestion to add three new
parameters to BufferGetPage in 450 places that otherwise don't need
to be touched, this is the only change from the flurry of recent
testing and review, so I'm holding off on posting a new patch just
for this.

> Regarding the main patch:
> +old_snapshot_threshold configuration
> parameter
> snapshot_valid_limit?

There have already been responses supporting
old_snapshot_threshold, so I would need to hear a few more votes to
consider a change.

> page = BufferGetPage(buf);
> +   TestForOldSnapshot(scan->xs_snapshot, rel, page);
> This is a sequence repeated many times in this patch, a new routine,
> say BufferGetPageExtended with a uint8 flag, one flag being used to
> test old snapshots would be more adapted. But this would require
> creating a header dependency between the buffer manager and
> SnapshotData.. Or more simply we may want a common code path when
> fetching a page that a backend is going to use to fetch tuples. I am
> afraid of TestForOldSnapshot() being something that could be easily
> forgotten in code paths introduced in future patches...

Let's keep that discussion on the other branch of this thread.

> +   if (whenTaken < 0)
> +   {
> +   elog(DEBUG1,
> +"MaintainOldSnapshotTimeMapping called with negative
> whenTaken = %ld",
> +(long) whenTaken);
> +   return;
> +   }
> +   if (!TransactionIdIsNormal(xmin))
> +   {
> +   elog(DEBUG1,
> +"MaintainOldSnapshotTimeMapping called with xmin = %lu",
> +(unsigned long) xmin);
> +   return;
> +   }
> Material for two assertions?

You omitted the immediately preceding comment block:

+/*
+ * We don't want to do something stupid with unusual values, but we don't
+ * want to litter the log with warnings or break otherwise normal
+ * processing for this feature; so if something seems unreasonable, just
+ * log at DEBUG level and return without doing anything.
+ */

I'm not clear that more drastic action is a good idea, since the
"fallback" is existing behavior.  I fear that doing something more
aggressive might force other logic to become more precise about
aggressive clean-up, adding overhead beyond the value gained.

Thanks for the review!

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Alvaro Herrera
Kevin Grittner wrote:
> On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera
>  wrote:

> > I said that we should change BufferGetPage into having the snapshot
> > check built-in, except in the cases where a flag is passed; and the flag
> > would be passed in all cases except those 30-something you identified.
> > In other words, the behavior in all the current callsites would be
> > identical to what's there today; we could have a macro do the first
> > check so that we don't introduce the overhead of a function call in the
> > 450 cases where it's not needed.
> 
> In many of the places that BufferGetPage is called there is not a
> snapshot available.  I assume that you would be OK with an Assert
> that the flag was passed if the snapshot is NULL?

Sure, that's fine.

BTW I said "a macro" but I was forgetting that we have static inline
functions in headers now, which means you can avoid the horrors of
actually writing a macro.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] large regression for parallel COPY

2016-03-30 Thread Joshua D. Drake

On 03/30/2016 01:10 PM, Andres Freund wrote:

On 2016-03-30 15:50:21 -0400, Robert Haas wrote:

On Thu, Mar 10, 2016 at 8:29 PM, Andres Freund  wrote:

Allow to trigger kernel writeback after a configurable number of writes.


While testing out Dilip Kumar's relation extension patch today, I
discovered (with some help from Andres) that this causes nasty
regressions when doing parallel COPY on hydra (3.2.6-3.fc16.ppc64,
lousy disk subsystem).


Unless Fedora/Redhat fixed the 3.2 kernel in a subsequent patch (.6-3?) 
then I would look hard right at that. The kernel from 3.2 - 3.8 is going 
to be miserable for anything that is doing concurrent writes. I 
understand that this is a regression regardless but I think we need 
wider testing to see if the changes are somehow related.


Sincerely,

jD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] large regression for parallel COPY

2016-03-30 Thread Andres Freund
On 2016-03-30 15:50:21 -0400, Robert Haas wrote:
> On Thu, Mar 10, 2016 at 8:29 PM, Andres Freund  wrote:
> > Allow to trigger kernel writeback after a configurable number of writes.
> 
> While testing out Dilip Kumar's relation extension patch today, I
> discovered (with some help from Andres) that this causes nasty
> regressions when doing parallel COPY on hydra (3.2.6-3.fc16.ppc64,
> lousy disk subsystem).  What I did was (1) run pgbench -i -s 100, (2)
> copy the results to a file, (3) truncate and drop the indexes on the
> original table, and (4) try copying in one or more copies of the data
> from the file.  Typical command line:
> 
> time pgbench -n -f f -t 1 -c 4 -j 4 && psql -c "select
> pg_size_pretty(pg_relation_size('pgbench_accounts'));" && time psql -c
> checkpoint && psql -c "truncate pgbench_accounts; checkpoint;"
> 
> With default settings against
> 96f8373cad5d6066baeb7a1c5a88f6f5c9661974, pgbench takes 9 to 9.5
> minutes and the subsequent checkpoint takes 9 seconds.  After setting
> , it takes 1 minute and 11 seconds and the subsequent checkpoint takes
> 11 seconds.   With a single copy of the data (that is, -c 1 -j 1 but
> otherwise as above), it takes 28-29 seconds with default settings and
> 26-27 seconds with backend_flush_after=0, bgwriter_flush_after=0.  So
> the difference is rather small with a straight-up COPY, but with 4
> copies running at the same time, it's near enough to an order of
> magnitude.
> 
> Andres reports that on his machine, non-zero *_flush_after settings
> make things faster, not slower, so apparently this is
> hardware-dependent or kernel-dependent.  Nevertheless, it seems to me
> that we should try to get some broader testing here to see which
> experience is typical.

Indeed. On SSDs I see about a 25-35% gain, on HDDs about 5%. If I
increase the size of backend_flush_after to 64 (like it's for bgwriter)
I however do get about 15% for HDDs as well.

I wonder if the default value of backend_flush_after is too small for
some scenarios. I've reasoned that backend_flush_after should have a
*lower* default value than e.g. checkpointer or bgwriter, because
there's many concurrent writers increasing the total amount of unflushed
dirty writes. Which is true for OLTP write workloads; but less so for
bulk load.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Wed, Mar 30, 2016 at 2:22 PM, Alvaro Herrera
 wrote:

> I understand the invasiveness argument, but to me the danger of
> introducing new bugs trumps that.  The problem is not the current code,
> but future patches: it is just too easy to make the mistake of not
> checking the snapshot in new additions of BufferGetPage.  And you said
> that the result of missing a check is silent wrong results from queries
> that should instead be cancelled, which seems fairly bad to me.

Fair point.

> I said that we should change BufferGetPage into having the snapshot
> check built-in, except in the cases where a flag is passed; and the flag
> would be passed in all cases except those 30-something you identified.
> In other words, the behavior in all the current callsites would be
> identical to what's there today; we could have a macro do the first
> check so that we don't introduce the overhead of a function call in the
> 450 cases where it's not needed.

In many of the places that BufferGetPage is called there is not a
snapshot available.  I assume that you would be OK with an Assert
that the flag was passed if the snapshot is NULL?  I had been
picturing what you were requesting as just adding a snapshot
parameter and assuming that NULL meant not to check; adding two
parameters where the flag explicitly calls that the check is not
needed might do more to prevent accidents, but I do wonder how much
it would help during copy/paste frenzy.  Touching all spots to use
the new function signature would be a mechanical job with the
compiler catching any errors, so it doesn't seem crazy to refactor
that now, but I would like to hear what some others think about
this.

> Tom said that my proposal would be performance-killing, not that your
> patch would be performance-killing.  But as I argue above, with my
> proposal performance would stay the same, so we're actually okay.
>
> I don't think nobody disputes that your patch is good in general.
> I would be happy with it in 9.6, but I have my reservations about the
> aforementioned problem.

We have a lot of places in our code where people need to know
things that they are not reminded of by the surrounding code, but
I'm not about to argue that's a good thing; if the consensus is
that this would help prevent future bugs when new BufferGetPage
calls are added, I can go with the flow.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Wed, Mar 30, 2016 at 2:34 PM, Kevin Grittner  wrote:

> A connection should not get the
> error just because it is using a snapshot that tries to look at
> data that might be wrong, and the connection holding the long-lived
> snapshot doesn't do that until it awakes from the sleep and runs
> the next SELECT command.

Well, that came out wrong.

A connection should not get the "snapshot too old" error just
because it is *holds* an old snapshot; it must actually attempt to
read an affected page *using* an old snapshot, and the connection
holding the long-lived snapshot doesn't do that until it awakes
from the sleep and runs the next SELECT command.

Sorry for any confusion from the sloppy editing.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] large regression for parallel COPY

2016-03-30 Thread Robert Haas
On Thu, Mar 10, 2016 at 8:29 PM, Andres Freund  wrote:
> Allow to trigger kernel writeback after a configurable number of writes.

While testing out Dilip Kumar's relation extension patch today, I
discovered (with some help from Andres) that this causes nasty
regressions when doing parallel COPY on hydra (3.2.6-3.fc16.ppc64,
lousy disk subsystem).  What I did was (1) run pgbench -i -s 100, (2)
copy the results to a file, (3) truncate and drop the indexes on the
original table, and (4) try copying in one or more copies of the data
from the file.  Typical command line:

time pgbench -n -f f -t 1 -c 4 -j 4 && psql -c "select
pg_size_pretty(pg_relation_size('pgbench_accounts'));" && time psql -c
checkpoint && psql -c "truncate pgbench_accounts; checkpoint;"

With default settings against
96f8373cad5d6066baeb7a1c5a88f6f5c9661974, pgbench takes 9 to 9.5
minutes and the subsequent checkpoint takes 9 seconds.  After setting
, it takes 1 minute and 11 seconds and the subsequent checkpoint takes
11 seconds.   With a single copy of the data (that is, -c 1 -j 1 but
otherwise as above), it takes 28-29 seconds with default settings and
26-27 seconds with backend_flush_after=0, bgwriter_flush_after=0.  So
the difference is rather small with a straight-up COPY, but with 4
copies running at the same time, it's near enough to an order of
magnitude.

Andres reports that on his machine, non-zero *_flush_after settings
make things faster, not slower, so apparently this is
hardware-dependent or kernel-dependent.  Nevertheless, it seems to me
that we should try to get some broader testing here to see which
experience is typical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Wed, Mar 30, 2016 at 2:29 PM, Peter Geoghegan  wrote:

> [Does the patch allow dangling page pointers?]

> Again, I don't want to prejudice anyone against your patch, which I
> haven't read.

I don't believe that the way the patch does its business opens any
new vulnerabilities of this type.  If you see such after looking at
the patch, let me know.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Very small patch for decode.c

2016-03-30 Thread Konstantin Knizhnik

diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 2380ea2..a992662 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -488,7 +488,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer 
*buf,
 {
XLogRecPtr  origin_lsn = InvalidXLogRecPtr;
TimestampTz commit_time = parsed->xact_time;
-   XLogRecPtr  origin_id = XLogRecGetOrigin(buf->record);
+   RepOriginId origin_id = XLogRecGetOrigin(buf->record);
int i;

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes  wrote:

> I'm not sure if this is operating as expected.
>
> I set the value to 1min.
>
> I set up a test like this:
>
> pgbench -i
>
> pgbench -c4 -j4 -T 3600 &
>
> ### watch the size of branches table
> while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &
>
> ### set up a long lived snapshot.
> psql -c 'begin; set transaction isolation level repeatable read;
> select sum(bbalance) from pgbench_branches; select pg_sleep(300);
> select sum(bbalance) from pgbench_branches;'
>
> As this runs, I can see the size of the pgbench_branches bloating once
> the snapshot is taken, and continues bloating at a linear rate for the
> full 300 seconds.
>
> Once the 300 second pg_sleep is up, the long-lived snapshot holder
> receives an error once it tries to access the table again, and then
> the bloat stops increasing.  But shouldn't the bloat have stopped
> increasing as soon as the snapshot became doomed, which would be after
> a minute or so?

This is actually operating as intended, not a bug.  Try running a
manual VACUUM command about two minutes after the snapshot is taken
and you should get a handle on what's going on.  The old tuples
become eligible for vacuuming after one minute, but that doesn't
necessarily mean that autovacuum jumps in and that the space starts
getting reused.  The manual vacuum will allow that, as you should
see on your monitoring window.  A connection should not get the
error just because it is using a snapshot that tries to look at
data that might be wrong, and the connection holding the long-lived
snapshot doesn't do that until it awakes from the sleep and runs
the next SELECT command.

All is well as far as I can see here.

Thanks for checking, though!  It is an interesting test!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-03-30 Thread David Rowley
On 31 March 2016 at 00:48, Robert Haas  wrote:
> On Tue, Mar 29, 2016 at 11:14 PM, David Rowley
>  wrote:
>>> 0005:
>>> Haribabu's patch; no change from last time.
>
> So what's the distinction between 0002 and 0005?  And what is the
> correct author/reviewer attribution for each?

I wrote 0002 - 0004, these were reviewed by Tomas.
0005 is Haribabu's patch: Reviewed by Tomas and I.

The reason 0002 and 0005 are separate patches is just down to them
having different authors.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Peter Geoghegan
On Wed, Mar 30, 2016 at 12:21 PM, Peter Geoghegan  wrote:
> Well, amcheck is a tool that in essence makes sure that B-Trees look
> structurally sound, and respect invariants like having every item on
> each page in logical order. That could catch a bug of the kind I just
> described, because it's quite likely that the recycled page would
> happen to have items that didn't comport with the ordering on the
> page.

I mean: didn't comport with the ordering of the last page "on the same
level" (but, due to this issue, maybe not actually on the same level).
We check if the first item on the "right page" (in actuality, due to
this bug, the new page image following a spurious early recycle) is
greater than or equal to the previous page (the page whose right-link
we followed) last item. On each level, everything should be in order
-- that's an invariant that (it is posited by me) we can safely check
with only an AccessShareLock.

Making the relation lock only a AccessShareLock is not just about
reducing the impact on production systems. It's also about making the
tool more effective at finding these kinds of transient races, with
subtle user-visible symptoms.

Again, I don't want to prejudice anyone against your patch, which I
haven't read.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Alvaro Herrera
Kevin Grittner wrote:
> On Wed, Mar 30, 2016 at 11:37 AM, Tom Lane  wrote:
> > Alvaro Herrera  writes:
> >> I think a safer proposition would be to replace all current
> >> BufferGetPage() calls (there are about 500) by adding the necessary
> >> arguments: buffer, snapshot, rel, and an integer "flags".  All this
> >> without adding the feature.  Then a subsequent commit would add the
> >> TestForOldSnapshot inside BufferGetPage, *except* when a
> >> BUFFER_NO_SNAPSHOT_TEST flag is passed.  That way, new code always get
> >> the snapshot test by default.
> >
> > That seems awfully invasive,
> 
> That's the argument I made, which Álvaro described as "dismissing"
> his suggestion.  In this post from October of 2015, I pointed out
> that there are 36 calls where we need a snapshot and 450 where we
> don't.
> 
> http://www.postgresql.org/message-id/56479263.1140984.1444945639606.javamail.ya...@mail.yahoo.com

I understand the invasiveness argument, but to me the danger of
introducing new bugs trumps that.  The problem is not the current code,
but future patches: it is just too easy to make the mistake of not
checking the snapshot in new additions of BufferGetPage.  And you said
that the result of missing a check is silent wrong results from queries
that should instead be cancelled, which seems fairly bad to me.  My
impression was that you were actually considering doing something about
that -- sorry for the lack of clarity.

We have made similarly invasive changes in the past -- the
SearchSysCache API for instance.

> > not to mention performance-killing if the expectation is that
> > most such calls are going to need a snapshot check.
> 
> This patch is one which has allowed a customer where we could not
> meet their performance requirements to pass them.  It is the
> opposite of performance-killing.

I think Tom misunderstood what I said and you misunderstood what Tom
said.  Let me attempt to set things straight.

I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

Tom said that my proposal would be performance-killing, not that your
patch would be performance-killing.  But as I argue above, with my
proposal performance would stay the same, so we're actually okay.


I don't think nobody disputes that your patch is good in general.
I would be happy with it in 9.6, but I have my reservations about the
aforementioned problem.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Peter Geoghegan
On Wed, Mar 30, 2016 at 11:53 AM, Kevin Grittner  wrote:
> When the initial "proof of concept" patch was tested by the
> customer, it was not effective due to issues related to what you
> raise.  Autovacuum workers were blocking due to the page pins for
> scans using these old snapshots, causing the bloat to accumulate in
> spite of this particular patch.  This was addressed, at least to a
> degree sufficient for this customer, with this patch:
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ed5b87f96d473962ec5230fd820abfeaccb2069
>
> Basically, for most common cases the "super-exclusive" locking has
> been eliminated from btree logic; and I have been happy to see that
> Simon has been working on dealing with the corner cases where I
> hadn't rooted it out.
>
>> I worry that something weird could happen there. For example, perhaps
>> the page LSN on what is actually a newly recycled page could be set
>> such that the backend following a stale right spuriously raises a
>> "snapshot too old" error.
>
> That particular detail doesn't seem to be a realistic concern,
> though -- if a page has been deleted, assigned to a new place in
> the index with an LSN corresponding to that action, it would be a
> pretty big bug if a right-pointer still referenced it.

Yes, that would be a big bug. But I wasn't talking about
"super-exclusive" locking. Rather, I was talking about the general way
in which index scans are guaranteed to not land on an already-recycled
page (not a half-dead page, and not a fully deleted page -- a fully
reclaimed/recycled page). This works without buffer pins or buffer
locks needing to be held at all -- there is a global interlock against
page *recycling* based on RecentGlobalXmin, per the nbtree README. So,
this vague concern of mine is about VACUUM's B-Tree page recycling.

During an index scan, we expect to be able to land on the next page,
and to be at least able to reason about it being deleted, even though
we don't hold a pin on anything for a period. We certainly are shy
about explaining all this, but if you look at a routine like
_bt_search() carefully (the routine that is used to descend a B-Tree),
it doesn't actually hold a pin concurrently, as we drop a level (and
certainly not a buffer lock, either). The next page/block should be
substantively the same page as expected from the downlink (or right
link) we followed, entirely because of the RecentGlobalXmin interlock.
Backwards scans also rely on this.

This is just a vague concern, and possibly this is completely
irrelevant. I haven't read the patch.

>> I suggest you consider making amcheck [1] a part of your testing
>> strategy. I think that this patch is a good idea, and I'd be happy to
>> take feedback from you on how to make amcheck more effective for
>> testing this patch in particular.
>
> I'm not sure how that would fit in; could you elaborate?

Well, amcheck is a tool that in essence makes sure that B-Trees look
structurally sound, and respect invariants like having every item on
each page in logical order. That could catch a bug of the kind I just
described, because it's quite likely that the recycled page would
happen to have items that didn't comport with the ordering on the
page. The block has been reused essentially at random. Importantly,
amcheck can catch this without anything more than an AccessShareLock,
so you have some hope of catching this kind of race condition (the
stale downlink that you followed to get to the
spuriously-recycled-early page doesn't stay stale for long). Or, maybe
it would happen to catch some other random problem. Difficult to say.

Again, this is based on a speculation that might be wrong. But it's
worth considering.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Sun, Mar 20, 2016 at 6:25 PM, Peter Geoghegan  wrote:

> I haven't read the patch, but I wonder: What are the implications here
> for B-Tree page recycling by VACUUM?

> Offhand, I imagine that there'd be some special considerations. Why is
> it okay that an index scan could land on a deleted page with no
> interlock against VACUUM's page recycling? Or, what prevents that from
> happening in the first place?

When the initial "proof of concept" patch was tested by the
customer, it was not effective due to issues related to what you
raise.  Autovacuum workers were blocking due to the page pins for
scans using these old snapshots, causing the bloat to accumulate in
spite of this particular patch.  This was addressed, at least to a
degree sufficient for this customer, with this patch:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ed5b87f96d473962ec5230fd820abfeaccb2069

Basically, for most common cases the "super-exclusive" locking has
been eliminated from btree logic; and I have been happy to see that
Simon has been working on dealing with the corner cases where I
hadn't rooted it out.

> I worry that something weird could happen there. For example, perhaps
> the page LSN on what is actually a newly recycled page could be set
> such that the backend following a stale right spuriously raises a
> "snapshot too old" error.

That particular detail doesn't seem to be a realistic concern,
though -- if a page has been deleted, assigned to a new place in
the index with an LSN corresponding to that action, it would be a
pretty big bug if a right-pointer still referenced it.

> I suggest you consider making amcheck [1] a part of your testing
> strategy. I think that this patch is a good idea, and I'd be happy to
> take feedback from you on how to make amcheck more effective for
> testing this patch in particular.

I'm not sure how that would fit in; could you elaborate?

The biggest contradiction making testing hard is that everyone (and
I mean everyone!) preferred to see this configured by time rather
than number of transactions, so there is no change in behavior
without some sort of wait for elapsed time.  But nobody wants to
drive time needed for running regression tests too high.  Testing
was far easier when a transaction count was used for configuration.
old_snapshot_threshold = -1 (the default) completely disables the
new behavior, and I basically abused a configuration setting of 0
to mean a few seconds so I could get some basic testing added to
make check-world while keeping the additional time for the tests
(barely) below one minute.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-03-30 Thread Peter Geoghegan
On Wed, Mar 30, 2016 at 4:22 AM, Greg Stark  wrote:
> I'm sorry I was intending to run those benchmarks again this past week
> but haven't gotten around to it. But my plan was to run them on a good
> server I borrowed, an i7 with 8MB cache. I can still go ahead with
> that but I can also try running it on the home server again too if you
> want (and AMD N36L with 1MB cache).

I don't want to suggest that people not test the very low end on very
high end hardware. That's fine, as long as it's put in context.
Considerations about the economics of cache sizes and work_mem
settings are crucial to testing the patch objectively. If everything
fits in cache anyway, then you almost eliminate the advantages
quicksort has, but you should be using an internal sort for anyway. I
think that this is just common sense.

I would like to see a low-end benchmark for low-end work_mem settings
too, though. Maybe you could repeat the benchmark I linked to, but
with a recent version of the patch, including commit 0011c0091e886b.
Compare that to the master branch just before 0011c0091e886b went in.
I'm curious about how the more recent memory context resetting stuff
that made it into 0011c0091e886b left us regression-wise.  Tomas
tested that, of course, but I have some concerns about how
representative his numbers are at the low end.

> But even for the smaller machines I don't think we should really be
> caring about regressions in the 4-8MB work_mem range. Earlier in the
> fuzzer work I was surprised to find out it can take tens of megabytes
> to compile a single regular expression (iirc it was about 30MB for a
> 64-bit machine) before you get errors. It seems surprising to me that
> a single operator would consume more memory than an ORDER BY clause. I
> was leaning towards suggesting we just bump up the default work_mem to
> 8MB or 16MB.

Today, it costs less than USD $40 for a new Raspberry Pi 2, which has
1GB of memory. I couldn't figure out exactly how much CPU cache that
model has, put I'm pretty sure it's no more than 256KB. Memory just
isn't that expensive; memory bandwidth is expensive. I agree that we
could easily justify increasing work_mem to 8MB, or even 16MB.

It seems almost silly to point it out, but: Increasing sort
performance has the effect of decreasing the duration of sorts, which
could effectively decrease memory use on the system. Increasing the
memory available to sorts could decrease the overall use of memory.
Being really frugal with memory is expensive, maybe even if your
primary concern is the expense of memory usage, which it probably
isn't these days.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Wed, Mar 30, 2016 at 11:37 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I think a safer proposition would be to replace all current
>> BufferGetPage() calls (there are about 500) by adding the necessary
>> arguments: buffer, snapshot, rel, and an integer "flags".  All this
>> without adding the feature.  Then a subsequent commit would add the
>> TestForOldSnapshot inside BufferGetPage, *except* when a
>> BUFFER_NO_SNAPSHOT_TEST flag is passed.  That way, new code always get
>> the snapshot test by default.
>
> That seems awfully invasive,

That's the argument I made, which Álvaro described as "dismissing"
his suggestion.  In this post from October of 2015, I pointed out
that there are 36 calls where we need a snapshot and 450 where we
don't.

http://www.postgresql.org/message-id/56479263.1140984.1444945639606.javamail.ya...@mail.yahoo.com

> not to mention performance-killing if the expectation is that
> most such calls are going to need a snapshot check.

This patch is one which has allowed a customer where we could not
meet their performance requirements to pass them.  It is the
opposite of performance-killing.

> (Quite aside from the calls themselves, are they all in
> routines that are being passed the right snapshot today?)

I went over that very carefully when the patch was first proposed
in January of 2015, and have kept an eye on things to try to avoid
bit-rot which might introduce new calls which need to be touched.
The customer for which this was initially developed uses a 30 day
test run with very complex production releases driven by a
simulated user load with large numbers of users.  EDB has
back-patched it to 9.4 where an earlier version of it is being used
in production by this (large) customer.

> TBH, I think that shoving in something like this at the end of the last
> commitfest would be a bad idea even if there were widespread consensus
> that we wanted the feature ... which I am not sure there is.

I don't recall anyone opposing the feature itself it except you,
and it has had enthusiastic support from many.  Before I was made
aware of a relevant isolation tester feature, there were many
objections to my efforts at regression testing, and Álvaro has
argued for touching 450 locations in the code that otherwise don't
need it, just as "reminders" to people to consider whether newly
added calls might need a snapshot, and he doesn't like the new
header dependencies.  Simon seemed to want it in 9.5, but that was
clearly premature, IMO.

Characterizing this as being shoved in at the last moment seems
odd, since the big hang-up from the November CF was the testing
methodology in the patch.  It has been feature-complete since the
September CF, and modified based on feedback.  Granted, some
additional testing in this CF brought up a couple things that merit
a look, but this patch is hardly unique in that regard.

> I think it might be time to bounce this one to 9.7.

If there is a consensus for that, sure, or if I can't sort out the
latest issues by feature freeze (which is admittedly looming).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-03-30 Thread Jesper Pedersen

On 03/30/2016 09:19 AM, Stas Kelvich wrote:

> +++ b/src/test/recovery/t/006_twophase.pl
> @@ -0,0 +1,226 @@
> +# Checks for recovery_min_apply_delay
> +use strict;
> This description is wrong, this file has been copied from 005.

Yep, done.

>
> +my $node_master = get_new_node("Candie");
> +my $node_slave = get_new_node('Django');
> Please let's use a node names that are more descriptive.

Hm, it’s hard to create descriptive names because test changes master/slave
roles for that nodes several times during test. It’s possible to call them
“node1” and “node2” but I’m not sure that improves something. But anyway I’m not
insisting on that particular names and will agree with any reasonable 
suggestion.

>
> +# Switch to synchronous replication
> +$node_master->append_conf('postgresql.conf', qq(
> +synchronous_standby_names = '*'
> +));
> +$node_master->restart;
> Reloading would be fine.

Good catch, done.

>
> +   /* During replay that lock isn't really necessary, but let's take
> it anyway */
> +   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
> +   for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
> +   {
> +   gxact = TwoPhaseState->prepXacts[i];
> +   proc = >allProcs[gxact->pgprocno];
> +   pgxact = >allPgXact[gxact->pgprocno];
> +
> +   if (TransactionIdEquals(xid, pgxact->xid))
> +   {
> +   gxact->locking_backend = MyBackendId;
> +   MyLockedGxact = gxact;
> +   break;
> +   }
> +   }
> +   LWLockRelease(TwoPhaseStateLock);
> Not taking ProcArrayLock here?

All accesses to 2pc dummies in ProcArray are covered with TwoPhaseStateLock, so
I thick that’s safe. Also I’ve deleted comment above that block, probably it’s
more confusing than descriptive.

>
> The comment at the top of XlogReadTwoPhaseData is incorrect.

Yep, fixed.

>
> RecoverPreparedFromXLOG and RecoverPreparedFromFiles have a lot of
> code in common, having this duplication is not good, and you could
> simplify your patch.

I reworked patch to avoid duplicated code between
RecoverPreparedFromXLOG/RecoverPreparedFromFiles and also
between FinishPreparedTransaction/XlogRedoFinishPrepared.




Patch applies with hunks, and test cases are passing.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump dump catalog ACLs

2016-03-30 Thread Robert Haas
On Wed, Mar 30, 2016 at 1:35 PM, Jose Luis Tallon
 wrote:
> DESIGN/DOCUMENTATION
> * int4 for the "objsubid" field? and int2 would surely suffice, given that we 
> only allow up to 1600 columns ... if it's a matter of alignment, it would be 
> interesting to say so somewhere (code comments, maybe?)

There is a lot of existing precedent for objsubid being int4, and
basically no advantage to making it int2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Introduce SP-GiST operator class over box.

2016-03-30 Thread Tom Lane
Teodor Sigaev  writes:
> Introduce SP-GiST operator class over box.

All of the Windows buildfarm members are failing on this patch.
It looks like the problem is that the test cases suppose that type
box will allow "infinity" as a coordinate value.  But box_in just
uses strtod() to read coordinates, and that has platform-dependent
behavior, which in this case includes not recognizing "infinity".

I'm inclined to think that the best fix is to rearrange things
so that the box I/O routines use float8in and float8out for
coordinates, rather than assuming that direct use of the platform
primitives is sufficient.

Barring better ideas, I'll go make that happen.

Welcome to the club of people who have broken the Windows build
this week ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [CommitFest App] Feature request -- review e-mail additions

2016-03-30 Thread Alvaro Herrera
José Luis Tallón wrote:

> Just wanted to suggest two minor mods to the review e-mails
> auto-generated by the app:
> 
> * Prepend a [review] tag to the e-mail subject
> ... so that e-mails sent to -hackers will read  " [HACKERS] [review]
> "

Changing the subject of an email causes Gmail to break the threads, so
anything in that line should be discouraged.  -1 from me.  I would be
happier if the subject of the submission email is kept intact, i.e. not
use the patch title that was used in commitfest app but use the one in
the email.  These often differ.

> * Auto-CC the patch author on this e-mail
> I guess this should speed up reactions / make communication a bit more
> fluid.

Yes, strong +1 on this.

3) Have the auto-generated emails insert In-Reply-To headers (and
perhaps References).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [CommitFest App] Feature request -- review e-mail additions

2016-03-30 Thread José Luis Tallón

Hello,

Just wanted to suggest two minor mods to the review e-mails 
auto-generated by the app:


* Prepend a [review] tag to the e-mail subject
... so that e-mails sent to -hackers will read  " [HACKERS] 
[review] "


* Auto-CC the patch author on this e-mail
I guess this should speed up reactions / make communication a bit 
more fluid.



Dunno whether it'd be appreciated if I posted a tentative patch ¿?


Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Desirable pgbench features?

2016-03-30 Thread Josh berkus
On 03/30/2016 08:29 AM, Fabien wrote:

> (1) TPC-B test driver must obtain a value from a query (the branch is
> the one
> of the chosen teller, not any random branch) and reuse it in another
> query. Currently the communication is one way, all results are silently
> discarded.
> 
> This is not representative of client applications (say a web app) which
> interact with the server with some logic of their own, including
> reading
> things and writing others depending on the previous reading.
> 
> This may be simulated on server side with a plpgsql script, but that
> would not exercise the client/server protocol logic and its performance
> impact, so I think that this simple read capability is important and
> missing.

Yes.  Particularly, one of the things I'd like to benchmark is
load-balancing between masters and replicas, including checks for
coherency.  Without being able to retrieve and reuse values, this can't
be tested.

The simplest way I'd see doing this is being able to SELECT INTO a
pgbench variable.


> (5) Consistency check: after a run, some properties are expected to be
> true, such as the balances of branches is the balance of its
> tellers and also of its accounts... This should/could be checked,
> maybe with an additional query.

I'd also love to have a consistency check which would be client-only
which I could run in the pgbench unit itself.  That is, a way to log
"errors" if, say, two variables were not equal at the end of the unit of
work.

An example of this would be using this to test if load-balanced
connections were getting "stale reads", especially since the
*percentage* of stale reads is what I want to know.  5% is acceptable,
50% is not.

> * using values from a query
> 
> For this use case (1), the best syntax and implementation is unclear. In
> particular, I'm not fond of the \gset syntax used in psql because the ';'
> is dropped and the \gset seems to act as a statement terminator.
> 
> After giving it some thought, I would suggest a simple two-line explicit
> syntax compatible with current conventions, with a SELECT statement
> terminated with a ';', on one side and where to put the results on the
> other, something like:
> 
>   SELECT ... ;
>   \into some variable names

This works for me if it works for the parser.

> 
> Or maybe in the other way around:
> 
>   \setsql some variable names
>   SELECT ... ;

This also works, but is not my preference.  It would be somewhat harder
to avoid variable/column mismatches.

One more wishlist item, which would make my request above for unit tests
unnecessary:

* Allow custom logging:

\vlog TAG varname1, varname2

Which would produce a custom log file called:

PID.TAG.varlog

With the format:

timestamp, var1, var2

e.g. if I had this:

SELECT id, abalance FROM account WHERE id = :aid
\into :lid, :lbal

\vlog balancelog :lid, :lbal

It would create a file called:

2247.balancelog.varlog

and/or append a line:

2016-03-30 21:37:33.899, 511, 2150

This would allow CSV logging of all sorts of user custom information,
including de-facto response times.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_restore casts check constraints differently

2016-03-30 Thread Amit Langote
On Thu, Mar 31, 2016 at 1:00 AM, Tom Lane  wrote:
> I wrote:
>> Amit Langote  writes:
>>> destdb=# ALTER TABLE c ADD CONSTRAINT p_a_check CHECK (a IN ('a', 'b', 
>>> 'c'));
>>> destdb=# \d c
>>> ...
>>> Check constraints:
>>> "p_a_check" CHECK (a::text = ANY (ARRAY['a'::character varying,
>>> 'b'::character varying, 'c'::character varying]::text[]))
>
>> Hm.  It seems like the parser is doing something weird with IN there.
>> I wonder why you don't get an array of text constants in the IN case.
>
> I poked into this and determined that it happens because transformAExprIn
> identifies the array type to use without considering whether an additional
> coercion will have to happen before the array elements can be compared to
> the lefthand input.
>
> I tried to fix that in a principled fashion, by resolving the actual
> comparison operator and using its righthand input type as the array
> element type (see first patch attached).  That fixes this case all right,
> but it also makes several existing regression test cases fail, for
> example:
>
> ***
> *** 381,392 
>  FROM pg_class
>  WHERE oid::regclass IN ('a_star', 'c_star')
>  ORDER BY 1;
> !  relname | has_toast_table
> ! -+-
> !  a_star  | t
> !  c_star  | t
> ! (2 rows)
> !
>   --UPDATE b_star*
>   --   SET a = text 'gazpacho'
>   --   WHERE aa > 4;
> --- 381,389 
>  FROM pg_class
>  WHERE oid::regclass IN ('a_star', 'c_star')
>  ORDER BY 1;
> ! ERROR:  invalid input syntax for type oid: "a_star"
> ! LINE 3:WHERE oid::regclass IN ('a_star', 'c_star')
> !^
>   --UPDATE b_star*
>   --   SET a = text 'gazpacho'
>   --   WHERE aa > 4;
>
> The problem is that regclass, like varchar, has no comparison operators
> of its own, relying on OID's operators.  So this patch causes us to choose
> OID not regclass as the type of the unknown literals, which in this case
> seems like a loss of useful behavior.

Agreed; no need to break that.

> I'm tempted to just improve the situation for varchar with a complete
> kluge, ie the second patch attached.  Thoughts?

Fixes for me.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [postgresSQL] [bug] Two or more different types of constraints with same name creates ambiguity while drooping.

2016-03-30 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> What we need is a unique index on pg_constraint.
>> The problem with that is that pg_constraint contains both table-related
>> and type (domain) related constraints; but it strikes me that we could
>> probably create a unique index on (conrelid, contypid, conname).

> Weren't you proposing elsewhere to split pg_constraint in two catalogs,
> one for table constraint and another for domain constraints?  That seems
> a cleaner solution to me.

Yeah, and you'll notice how much progress we've made towards that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Tom Lane
Alvaro Herrera  writes:
> I think a safer proposition would be to replace all current
> BufferGetPage() calls (there are about 500) by adding the necessary
> arguments: buffer, snapshot, rel, and an integer "flags".  All this
> without adding the feature.  Then a subsequent commit would add the
> TestForOldSnapshot inside BufferGetPage, *except* when a
> BUFFER_NO_SNAPSHOT_TEST flag is passed.  That way, new code always get
> the snapshot test by default.

That seems awfully invasive, not to mention performance-killing if
the expectation is that most such calls are going to need a snapshot
check.  (Quite aside from the calls themselves, are they all in
routines that are being passed the right snapshot today?)

TBH, I think that shoving in something like this at the end of the last
commitfest would be a bad idea even if there were widespread consensus
that we wanted the feature ... which I am not sure there is.

I think it might be time to bounce this one to 9.7.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Markus Nullmeier
Alexander Korotkov  wrote:
> I heard no objections.  There is revision of patch where generic WAL
> interface description was moved to documentation.  This description
> contains improvements by Petr Jelinek, Alvaro Herrera and Markus Nullmeier

Attached are a few more small fixes as an incremental patch (typos / etc.).

-- 
Markus Nullmeierhttp://www.g-vo.org
German Astrophysical Virtual Observatory (GAVO)

diff --git a/doc/src/sgml/generic-wal.sgml b/doc/src/sgml/generic-wal.sgml
index a00c03c..d756e33 100644
--- a/doc/src/sgml/generic-wal.sgml
+++ b/doc/src/sgml/generic-wal.sgml
@@ -4,15 +4,15 @@
  Generic WAL records
 
   
-   Despite all built in access methods and WAL-logged modules have their own
-   types of WAL records, there is also generic WAL record type which describes
+   Despite all built-in access methods and WAL-logged modules having their own
+   types of WAL records, there is also a generic WAL record type, which describes
changes to pages in a generic way.  This is useful for extensions that
provide custom access methods, because they cannot register their own
WAL redo routines.
   
 
   
-   API for contructing generic WAL records is defined in
+   The API for contructing generic WAL records is defined in
generic_xlog.h and implemented in generic_xlog.c.
Each generic WAL record must be constructed by following these steps:
 
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index 32c2648..1a720fa 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -19,7 +19,7 @@
 #include "utils/memutils.h"
 
 /*-
- * Internally, a delta between pages consists of set of fragments.  Each
+ * Internally, a delta between pages consists of a set of fragments.  Each
  * fragment represents changes made in a given region of a page.  A fragment
  * is made up as follows:
  *
@@ -29,7 +29,7 @@
  *
  * Unchanged regions of a page are not represented in its delta.  As a
  * result, a delta can be more compact than the full page image.  But having
- * an unchanged region in the middle to two fragments that is smaller than
+ * an unchanged region in the middle of two fragments that is smaller than
  * the fragment header (offset and length) does not pay off in terms of the
  * overall size of the delta. For this reason, we break fragments only if
  * the unchanged region is bigger than MATCH_THRESHOLD.
@@ -422,7 +422,7 @@ generic_redo(XLogReaderState *record)
 
 	Assert(record->max_block_id < MAX_GENERIC_XLOG_PAGES);
 
-	/* Interate over blocks */
+	/* Iterate over blocks */
 	for (block_id = 0; block_id <= record->max_block_id; block_id++)
 	{
 		XLogRedoAction action;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-03-30 Thread José Luis Tallón

On 03/30/2016 06:14 PM, Robert Haas wrote:
So basically the use of the ENCRYPTED keyword means "if it does 
already seem to be the sort of MD5 blob we're expecting, turn it into 
that". 


If it does NOT already seem to be... I guess?

And we just rely on the format to distinguish between an MD5 verifier 
and an unencrypted password. Personally, I think a good start here, 
and I think you may have something like this in the patch already, 
would be to split rolpassword into two columns, say rolencryption and 
rolpassword. 


This inches closer to Michael's suggestion to have multiple verifiers 
per pg_authid user ...


rolencryption says how the password verifier is encrypted and 
rolpassword contains the verifier itself. Initially, rolencryption 
will be 'plain' or 'md5', but later we can add 'scram' as another 
choice, or maybe it'll be more specific like 'scram-hmac-doodad'.


May I suggest using  "{" ["."] "}" just like Dovecot does?

e.g. "{md5.hex}e748797a605a1c95f3d6b5f140b2d528"

where no "{ ... }" prefix means just fallback to the old method of 
trying to guess what the blob contains?
This would invalidate PLAIN passwords beginning with "{", though, 
so some measures would be needed.


And then maybe introduce syntax like this: alter user rhaas set 
password 'raw-unencrypted-passwordt' using 'verifier-method'; alter 
user rhaas set password verifier 'verifier-goes-here' using 
'verifier-method'; That might require making verifier a key word, 
which would be good to avoid. Perhaps we could use "password 
validator" instead? 


I'd like USING best ... though by prepending the schema for ENCRYPTED, 
the required information is already conveyed within the verifier, so no 
need to specify it again :)



Just my .02€


/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [postgresSQL] [bug] Two or more different types of constraints with same name creates ambiguity while drooping.

2016-03-30 Thread Alvaro Herrera
Tom Lane wrote:

> What we need is a unique index on pg_constraint.
>
> The problem with that is that pg_constraint contains both table-related
> and type (domain) related constraints; but it strikes me that we could
> probably create a unique index on (conrelid, contypid, conname).

Weren't you proposing elsewhere to split pg_constraint in two catalogs,
one for table constraint and another for domain constraints?  That seems
a cleaner solution to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-03-30 Thread Robert Haas
On Wed, Mar 30, 2016 at 9:46 AM, Michael Paquier
 wrote:
>> Things I noticed:
>> 1.
>> when using either
>> CREATE ROLE
>> ALTER ROLE
>> with the parameter
>> ENCRYPTED
>> md5 encryption is always assumed (I've come to realize that UNENCRYPTED
>> always equals plain and, in the past, ENCRYPTED equaled md5 since there were
>> no other options)
>
> Yes, that's to match the current behavior, and make something fully
> backward-compatible. Switching to md5 + scram may have made sense as
> well though.

I think we're not going to have much luck getting people to switch
over to SCRAM if the default remains MD5.  Perhaps there should be a
GUC for this - and we can initially set that GUC to md5, allowing
people who are ready to adopt SCRAM to change it.  And then in a later
release we can change the default, once we're pretty confident that
most connectors have added support for the new authentication method.
This is going to take a long time to roll out.  Alternatively, we
could control it strictly through DDL.

Note that the existing behavior is pretty wonky:

alter user rhaas unencrypted password 'foo'; -> rolpassword foo
alter user rhaas encrypted password 'foo'; -> rolpassword
md5e748797a605a1c95f3d6b5f140b2d528
alter user rhaas encrypted password
'md5e748797a605a1c95f3d6b5f140b2d528'; -> rolpassword
md5e748797a605a1c95f3d6b5f140b2d528
alter user rhaas unencrypted password
'md5e748797a605a1c95f3d6b5f140b2d528'; -> rolpassword
md5e748797a605a1c95f3d6b5f140b2d528

So basically the use of the ENCRYPTED keyword means "if it does
already seem to be the sort of MD5 blob we're expecting, turn it into
that".  And we just rely on the format to distinguish between an MD5
verifier and an unencrypted password.  Personally, I think a good
start here, and I think you may have something like this in the patch
already, would be to split rolpassword into two columns, say
rolencryption and rolpassword.  rolencryption says how the password
verifier is encrypted and rolpassword contains the verifier itself.
Initially, rolencryption will be 'plain' or 'md5', but later we can
add 'scram' as another choice, or maybe it'll be more specific like
'scram-hmac-doodad'.  And then maybe introduce syntax like this:

alter user rhaas set password 'raw-unencrypted-passwordt' using
'verifier-method';
alter user rhaas set password verifier 'verifier-goes-here' using
'verifier-method';

That might require making verifier a key word, which would be good to
avoid.  Perhaps we could use "password validator" instead?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [postgresSQL] [bug] Two or more different types of constraints with same name creates ambiguity while drooping.

2016-03-30 Thread Tom Lane
Andrew Dunstan  writes:
> On 03/30/2016 10:21 AM, Tom Lane wrote:
>> I think that if we want to ensure uniqueness of constraint names, this
>> is really approaching it the wrong way, as it still fails to provide
>> any guarantees (consider concurrent index creation, for example).
>> What we need is a unique index on pg_constraint.

> +1, but does that mean people will have to change constraint names to be 
> compliant before running pg_upgrade?

Yeah, but I think the situation is pretty uncommon, because we already
reject duplicate constraint names in most cases.  As far as I could see
in testing it earlier, these cases all fail already:

* create index constraint when same-named index constraint exists already
* create FK constraint when same-named index constraint exists already
* create FK constraint when same-named FK constraint exists already
* create check constraint when same-named check constraint exists already
* create FK constraint when same-named check constraint exists already

I think that the case Amit's patch plugged, namely create index constraint
when same-named FK or check constraint exists already, may be about the
only missing check.  I just want a unique index to be sure we are covering
all cases.

Note also that because pg_dump prefers to create indexes before FK
constraints (for obvious reasons), I believe that such a case would
fail to dump/restore or pg_upgrade already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Alvaro Herrera
Michael Paquier wrote:

> page = BufferGetPage(buf);
> +   TestForOldSnapshot(scan->xs_snapshot, rel, page);
> This is a sequence repeated many times in this patch, a new routine,
> say BufferGetPageExtended with a uint8 flag, one flag being used to
> test old snapshots would be more adapted. But this would require
> creating a header dependency between the buffer manager and
> SnapshotData.. Or more simply we may want a common code path when
> fetching a page that a backend is going to use to fetch tuples. I am
> afraid of TestForOldSnapshot() being something that could be easily
> forgotten in code paths introduced in future patches...

I said exactly the same thing, and Kevin dismissed it.

I would be worried about your specific proposal though, because it's
easy to just call BufferGetPage() (i.e. the not-extended version) and
forget the old-snapshot protection completely.

I think a safer proposition would be to replace all current
BufferGetPage() calls (there are about 500) by adding the necessary
arguments: buffer, snapshot, rel, and an integer "flags".  All this
without adding the feature.  Then a subsequent commit would add the
TestForOldSnapshot inside BufferGetPage, *except* when a
BUFFER_NO_SNAPSHOT_TEST flag is passed.  That way, new code always get
the snapshot test by default.

I don't like the new header dependency either, though.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_restore casts check constraints differently

2016-03-30 Thread Tom Lane
I wrote:
> Amit Langote  writes:
>> destdb=# ALTER TABLE c ADD CONSTRAINT p_a_check CHECK (a IN ('a', 'b', 'c'));
>> destdb=# \d c
>> ...
>> Check constraints:
>> "p_a_check" CHECK (a::text = ANY (ARRAY['a'::character varying,
>> 'b'::character varying, 'c'::character varying]::text[]))

> Hm.  It seems like the parser is doing something weird with IN there.
> I wonder why you don't get an array of text constants in the IN case.

I poked into this and determined that it happens because transformAExprIn
identifies the array type to use without considering whether an additional
coercion will have to happen before the array elements can be compared to
the lefthand input.

I tried to fix that in a principled fashion, by resolving the actual
comparison operator and using its righthand input type as the array
element type (see first patch attached).  That fixes this case all right,
but it also makes several existing regression test cases fail, for
example:

***
*** 381,392 
 FROM pg_class
 WHERE oid::regclass IN ('a_star', 'c_star')
 ORDER BY 1;
!  relname | has_toast_table 
! -+-
!  a_star  | t
!  c_star  | t
! (2 rows)
! 
  --UPDATE b_star*
  --   SET a = text 'gazpacho'
  --   WHERE aa > 4;
--- 381,389 
 FROM pg_class
 WHERE oid::regclass IN ('a_star', 'c_star')
 ORDER BY 1;
! ERROR:  invalid input syntax for type oid: "a_star"
! LINE 3:WHERE oid::regclass IN ('a_star', 'c_star')
!^
  --UPDATE b_star*
  --   SET a = text 'gazpacho'
  --   WHERE aa > 4;

The problem is that regclass, like varchar, has no comparison operators
of its own, relying on OID's operators.  So this patch causes us to choose
OID not regclass as the type of the unknown literals, which in this case
seems like a loss of useful behavior.

I'm tempted to just improve the situation for varchar with a complete
kluge, ie the second patch attached.  Thoughts?

regards, tom lane

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index ebda55d..8d75c88 100644
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
***
*** 15,20 
--- 15,22 
  
  #include "postgres.h"
  
+ #include "access/htup_details.h"
+ #include "catalog/pg_operator.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "miscadmin.h"
***
*** 35,40 
--- 37,43 
  #include "parser/parse_agg.h"
  #include "utils/builtins.h"
  #include "utils/lsyscache.h"
+ #include "utils/syscache.h"
  #include "utils/xml.h"
  
  
*** transformAExprIn(ParseState *pstate, A_E
*** 1148,1166 
  	 */
  	if (list_length(rnonvars) > 1)
  	{
- 		List	   *allexprs;
  		Oid			scalar_type;
  		Oid			array_type;
  
  		/*
! 		 * Try to select a common type for the array elements.  Note that
! 		 * since the LHS' type is first in the list, it will be preferred when
! 		 * there is doubt (eg, when all the RHS items are unknown literals).
! 		 *
! 		 * Note: use list_concat here not lcons, to avoid damaging rnonvars.
  		 */
! 		allexprs = list_concat(list_make1(lexpr), rnonvars);
! 		scalar_type = select_common_type(pstate, allexprs, NULL, NULL);
  
  		/*
  		 * Do we have an array type to use?  Aside from the case where there
--- 1151,1225 
  	 */
  	if (list_length(rnonvars) > 1)
  	{
  		Oid			scalar_type;
  		Oid			array_type;
+ 		Node	   *which_expr;
  
  		/*
! 		 * Try to select a common type for the array elements.  In the general
! 		 * case we should let select_common_type() do this.  But if the RHS is
! 		 * all unknown literals, select_common_type() will return TEXT which
! 		 * we do not want here: we want to let oper() see the unknown type and
! 		 * resolve it.  So, if it said TEXT but the input is really UNKNOWN,
! 		 * go back to UNKNOWN.
  		 */
! 		scalar_type = select_common_type(pstate, rnonvars, NULL, _expr);
! 
! 		if (scalar_type == TEXTOID && exprType(which_expr) == UNKNOWNOID)
! 			scalar_type = UNKNOWNOID;
! 
! 		/*
! 		 * If we identified a common type, check to see if that's actually
! 		 * what the operator wants, and if not change to what it does want.
! 		 * This takes care of resolving UNKNOWN as some appropriate type.
! 		 * Also, if the operator would require a coercion, we might as well do
! 		 * the coercion on the individual array elements rather than having to
! 		 * apply an array-level coercion.
! 		 */
! 		if (OidIsValid(scalar_type))
! 		{
! 			/*
! 			 * This is a cut-down version of make_scalar_array_op()'s logic.
! 			 * We need not bother to duplicate error checks it will make.
! 			 */
! 			Oid			ltypeId,
! 		rtypeId;
! 			Operator	tup;
! 			Form_pg_operator opform;
! 			Oid			actual_arg_types[2];
! 			Oid			declared_arg_types[2];
! 
! 			ltypeId = exprType(lexpr);
! 			rtypeId = scalar_type;
! 
! 			/* Resolve the operator */
! 			tup = oper(pstate, a->name, ltypeId, 

Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-30 Thread Teodor Sigaev

Thank you, pushed

Emre Hasegeli wrote:

I'll try to explain with two-dimensional example over points. ASCII-art:


Thank you for the explanation.  Should we incorporate this with the patch.


added


I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.


+ cmp_double(const double a, const double b)


Does this function necessary?  We can just compare the doubles.


Yes, it compares with limited precision as it does by geometry operations.
Renamed to point actual arguments.


I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.


+ boxPointerToRangeBox(BOX *box, RangeBox * rectangle)


The "rectangle" variable in here should be renamed.


fixed


I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.


evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into
RangeBox struct and free. Let both fucntion use the same interface.


I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.


it works until allthesame branch contains only one inner node. Otherwise
traversalValue will be freeed several times, see spgWalk().


It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-30 Thread Kevin Grittner
On Tue, Mar 29, 2016 at 8:58 AM, David Steele  wrote:

> We're getting to the end of the CF now.  Do you know when you'll have an
> updated patch ready?

I am working on it right now.  Hopefully I can get it all sorted today.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-30 Thread Kevin Grittner
On Wed, Mar 30, 2016 at 8:40 AM, Tomas Vondra
 wrote:

>> ===
>> @@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo
>> *rel)
>> continue;   /* don't repeat
>> work if already proven OK */
>>
>> have_partial = true;
>> +   break;
>> }
>> if (!have_partial)
>> return;
>> ===
>>
>> The initialization has been moved to set_rel_size so the break
>> can be restored.
>
> FWIW the break was restored in the v9 by me.

Yeah, I kept that in v10, so the three of us seem to be on the same
page there.

>> FWIW, as mentioned upthread, I added the following condition to
>> decline ripping index predicates from base restrictinfo without
>> understanding the reason to do so.
>
> U, I'm a bit confused. Which condition?

Yeah, any additional discussion about areas which anyone sees as
open or still needing attention might allow me to get enough
traction to wrap this; I'm having trouble seeing what the pending
issues are where both Tom and Kyotaro-san seemed to be unsatisfied.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-03-30 Thread Petr Jelinek

Hi,

Thanks for review.

On 30/03/16 15:22, Jose Luis Tallon wrote:

[Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
Needs updating code copyright years ... or is this really from 2013? [ 
contrib/gapless_seq/gapless_seq.c ]
Patch applies cleanly to current master 
(3063e7a84026ced2aadd2262f75eebbe6240f85b)



Ouch good point, it does show how long the whole sequence am thing has 
been around.



DESIGN
The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and 
VALUES_TABLE_NAME ("seqam_gapless_values") strikes me a bit: while I understand the 
creation of a private schema named after the extension, it seems overkill for just a single table.
I would suggest to devote some reserved schema name for internal implementation 
details and/or AM implementation details, if deemed reasonable.
On the other hand, creating the schema/table upon extension installation makes 
the values table use the default tablespace for the database, which can be good 
for concurrency --- direct writes to less loaded storage
(Note that users may want to move this table into an SSD-backed tablespace 
or equivalently fast storage ... moreso when many --not just one-- gapless 
sequences are being used)



Schema is needed for the handler function as well. In general I don't 
want to add another internal schema that will be empty when no sequence 
AM is installed.



Yet, there is <20141203102425.gt2...@alap3.anarazel.de> where Andres argues 
against anything different than one-page-per-sequence implementations.
I guess this patch changes everything in this respect.


Not really, gapless just needs table for transactionality and as such is 
an exception. It does not change how the nontransactional sequence 
storage works though. I agree with Andres on this one.



CODE
   seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 
restart_value,
 bool restart_requested, bool is_init)
   -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid 
reading as "is_initIALIZED"


Sounds good.


DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't 
apply cleanly to current git master.
Please update/rebase the patch and resubmit.



The current version of seqam is 0001-seqam-2016-03-29 which should apply 
correctly.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Desirable pgbench features?

2016-03-30 Thread Fabien


Hello pgdevs,

I've been having a look at why pgbench only implements a TPC-B "like" benchmark,
and not the full, although obsolete, TPC-B.

I'm not particulary interested in running TPC-B per se, although I like 
the principle of a simple steady-state update-intensive OLTP stress test, 
but more at what relevant capabilities are missing in pgbench to implement 
a simple test.


I also had a brief look at the TPC-C benchmark, but it is several order of
magnitudes more complex, and there is an open source implementation available
(TPC-C-Uva).

(0) TPC-B assumes signed integers which can hold 10 figures, which means
that currently used INTs are not enough, it should be INT8. Quite easy
to fix, or extend to keep some upward compatibility with the previous
version.

(1) TPC-B test driver must obtain a value from a query (the branch is the one
of the chosen teller, not any random branch) and reuse it in another
query. Currently the communication is one way, all results are silently
discarded.

This is not representative of client applications (say a web app) which
interact with the server with some logic of their own, including reading
things and writing others depending on the previous reading.

This may be simulated on server side with a plpgsql script, but that
would not exercise the client/server protocol logic and its performance
impact, so I think that this simple read capability is important and
missing.

(2) There is an "if" required in TPC-B with a boolean condition when
selecting an account: 85% of the time the account does an operation in
its own branch, 15% in another branch.

(3) More binary operators would be useful, eg | is used by TPC-C for
building a non uniform random generation from a uniform random one
(although pgbench has clean exponential and gaussian randoms, which
still lack some shuffling, though).

(4) Weighted transaction scripts are used by TPC-C, some of which can be
voluntarily aborted. This already works with recently added weights
and using ROLLBACK instead of END in a given script. Nothing to do.

(5) Consistency check: after a run, some properties are expected to be
true, such as the balances of branches is the balance of its
tellers and also of its accounts... This should/could be checked,
maybe with an additional query.


I think that these capabilities are useful features for composing a 
reasonable bench, and allowing pgbench to do some of these, while 
remaining at a basic expression level (i.e. not creating a full 
client-side language, not my intent in any way), should be ok if the 
syntax, code and performance impacts are small.



Indeed, some of the missing features can be included, probably at low cost 
in pgbench:



* better expressions: comparisons, condition, binary operators...

Comparisons (say <= < == > >= != operators) and an if() function for (2),
on the client side, could look like that:

  \set abid if(random(0, 99) < 85, expression-1, expression-2)

This is pretty easy to implement with the current function infrastructure, 
as well as new operators such as |&^! (3).


Note: the "?:" C-like syntax looks attractive but would probably interact 
quite badly with the existing ":variable" syntax; moreover the ?: syntax 
suggests a short-circuit evaluation, but pgbench expressions are fully 
evaluated. These operators & function would probably require around 100 
lines of pretty basic code, plus doc.



* using values from a query

For this use case (1), the best syntax and implementation is unclear. In
particular, I'm not fond of the \gset syntax used in psql because the ';'
is dropped and the \gset seems to act as a statement terminator.

After giving it some thought, I would suggest a simple two-line explicit 
syntax compatible with current conventions, with a SELECT statement 
terminated with a ';', on one side and where to put the results on the 
other, something like:


  SELECT ... ;
  \into some variable names

Or maybe in the other way around:

  \setsql some variable names
  SELECT ... ;

The variable names provided could be stored in the command structure of 
the SELECT and they would be assigned when the query is executed.


Among the possible constraints, enforced or not, the variable types should 
be int or double, the query should be a select, there should be one row 
only (or keep the first and set to zero if nothing?), the number of 
variables should be less than the number of columns...



* shuffling function, i.e. a parametric permutation so that non uniform 
randoms can be shuffled so that rows with close pkey do not have close
drawing probabilities. This is non trivial, basically some kind of cypher 
function but which does not operate on power-of-two sizes...



* if all necessary features are available, being able to run a strict 
tpc-b bench (this mean adapting the init phase, and creating a 
new builtin script which matches the real spec): no 

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-30 Thread Masahiko Sawada
On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada  wrote:
> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>  wrote:
>> I personally don't think it needs such a survive measure. It is
>> very small syntax and the parser reads very short text. If the
>> parser failes in such mode, something more serious should have
>> occurred.
>>
>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  
>> wrote in 
>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>  wrote:
>>> > Hello,
>>> >
>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
>>> >  wrote in 
>>> > 
>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>> >>  wrote:
>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>
>>> I'm afraid that your idea doesn't work in postmaster. Because 
>>> ereport(ERROR) is
>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>> flex fatal error occurs, postmaster just exits instead of jumping out of 
>>> parser.
>>
>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>> fatal erros are recoverable. guc-file.l is doing so.
>>
>>> ISTM that, when an internal flex fatal error occurs, it's
>>> better to elog(FATAL) and terminate the problematic
>>> process. This might lead to the server crash (e.g., if
>>> postmaster emits a FATAL error, it and its all child processes
>>> will exit soon). But probably we can live with this because the
>>> fatal error basically rarely happens.
>>
>> I agree to this
>>
>>> OTOH, if we make the process keep running even after it gets an internal
>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>> serious problem. Please imagine the case where one walsender gets the fatal
>>> error (e.g., because of OOM), abandon new setting value of
>>> synchronous_standby_names, and keep running with the previous setting value.
>>> OTOH, the other walsender processes successfully parse the setting and
>>> keep running with new setting. In this case, the inconsistency of the 
>>> setting
>>> which each walsender is based on happens. This completely will mess up the
>>> synchronous replication.
>>
>> On the other hand, guc-file.l seems ignoring parser errors under
>> normal operation, even though it may cause similar inconsistency,
>> if any..
>>
>> | LOG:  received SIGHUP, reloading configuration files
>> | LOG:  input in flex scanner failed at file 
>> "/home/horiguti/data/data_work/postgresql.conf" line 1
>> | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
>> contains errors; no changes were applied
>>
>>> Therefore, I think that it's better to make the problematic process exit
>>> with FATAL error rather than ignore the error and keep it running.
>>
>> +1. Restarting walsender would be far less harmful than keeping
>> it running in doubtful state.
>>
>> Sould I wait for the next version or have a look on the latest?
>>
>
> Attached latest patch incorporate some review comments so far, and is
> rebased against current HEAD.
>

Sorry I attached wrong patch.
Attached patch is correct patch.

Regards,

--
Masahiko Sawada


multi_sync_replication_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2016-03-30 Thread Alexander Korotkov
Hi, Peter!

Thank you for review!

On Thu, Mar 24, 2016 at 3:39 AM, Peter Geoghegan  wrote:

> >> Sort Method
> >> 
> >>
> >> Even thought the explain analyze above shows "top-N heapsort" as its
> >> sort method, that isn't really true. I actually ran this through a
> >> debugger, which is why the above plan took so long to execute, in case
> >> you wondered. I saw that in practice the first sort executed for the
> >> first group uses a quicksort, while only the second sort (needed for
> >> the 2 and last group in this example) used a top-N heapsort.
>
> > With partial sort we run multiple sorts in the same node. Ideally, we
> need
> > to provide some aggregated information over runs.
> > This situation looks very similar to subplan which is called multiple
> times.
> > I checked how it works for now.
>
> Noticed this in nodeSort.c:
>
> +   if (node->tuplesortstate != NULL)
> +   {
> +   tuplesort_reset((Tuplesortstate *) node->tuplesortstate);
> +   node->groupsCount++;
> +   }
> +   else
> +   {
> +   /* Support structures for cmpSortSkipCols - already
> sorted columns */
> +   if (skipCols)
> +   prepareSkipCols(plannode, node);
>
> +   /*
> +* Only pass on remaining columns that are unsorted.
> Skip abbreviated
> +* keys usage for partial sort.  We unlikely will have
> huge groups
> +* with partial sort.  Therefore usage of abbreviated
> keys would be
> +* likely a waste of time.
> +*/
> tuplesortstate = tuplesort_begin_heap(tupDesc,
>
> You should comment on which case is which, and put common case (no
> skip cols) first. Similarly, the ExecSort() for(;;) should put the
> common (non-partial) case first, which it does, but then the "first
> tuple in partial sort" case first, then the "second or subsequent
> partial sort" case last.
>

Done.

More comments here, please:
>
> +typedef struct SkipKeyData
> +{
> + FunctionCallInfoData fcinfo;
> + FmgrInfo flinfo;
> + OffsetNumber attno;
> +} SkipKeyData;
>
> (What's SkipKeyData?)
>
> Also want comments for new SortState fields.


Done.


> SortState.prev is a
> palloc()'d copy of tuple, which should be directly noted, as it is for
> similar aggregate cases, etc.
>
> Should you be more aggressive about freeing memory allocated for
> SortState.prev tuples?
>

Fixed.


> The new function cmpSortSkipCols() should say "Special case for
> NULL-vs-NULL, else use standard comparison", or something. "Lets
> pretend NULL is a value for implementation convenience" cases are
> considered the exception, and are always noted as the exception.
>

Comment is added.


> > In the case of subplan explain analyze gives us just information about
> last
> > subplan run. This makes me uneasy. From one side, it's probably OK that
> > partial sort behaves like subplan while showing information just about
> last
> > sort run. From the other side, we need some better solution for that in
> > general case.
>
> I see what you mean, but I wasn't so much complaining about that, as
> complaining about the simple fact that we use a top-N heap sort *at
> all*. This feels like the "limit" case is playing with partial sort
> sub-sorts in a way that it shouldn't.
>
> I see you have code like this to make this work:
>
> +   /*
> +* Adjust bound_Done with number of tuples we've actually sorted.
> +*/
> +   if (node->bounded)
> +   {
> +   if (node->finished)
> +   node->bound_Done = node->bound;
> +   else
> +   node->bound_Done = Min(node->bound,
> node->bound_Done + nTuples);
>
> But, why bother? Why not simply prevent tuplesort.c from ever using
> the top-N heapsort method when it is called from nodeSort.c for a
> partial sort (probably in the planner)?
>
> Why, at a high level, does it make sense to pass down a limit to *any*
> sort operation that makes up a partial sort, even the last? This seems
> to be adding complexity without a benefit. A big advantage of top-N
> heapsorts is that much less memory could be used, but this patch
> already has the memory allocated that belonged to previous performsort
> calls (mostly -- certainly has all those tuplesort.c memtuples
> throughout, a major user of memory overall).  It's not going to be
> very good at preventing work, except almost by accident because we
> happen to have a limit up to just past the beginning of a smaller
> partial sort group. I'd rather use quicksort, which is very versatile.
> Top-N sorts make sense when sorting itself is the bottleneck, which it
> probably won't be for a partial sort (that's the whole point).
>

Hmm... I'm not completely agree with that. In typical usage partial sort
should definitely use quicksort.  However, fallback to other sort methods
is very useful.  Decision of partial sort usage is 

Re: [HACKERS] [postgresSQL] [bug] Two or more different types of constraints with same name creates ambiguity while drooping.

2016-03-30 Thread Andrew Dunstan



On 03/30/2016 10:21 AM, Tom Lane wrote:

Amit Langote  writes:

On 2016/03/30 15:16, Harshal Dhumal wrote:

If we create two different type of constrains (lets say primary key and
foreign key) on same table with same name (lets say 'key' ) then its shows
same drop query for both constrains.

I have a vague recollection that non-uniqueness of constraint names may
have been intentional at some point.  But yeah, the existence of the
ALTER TABLE DROP CONSTRAINT syntax seems to make that a pretty bad idea.


It seems that, whereas name uniqueness check occurs when creating a named
FK constraint, the same does not occur when creating a named PK constraint
or any index-based constraint for that matter (they are handled by
different code paths - in the latter's case, name conflicts with existing
relations is checked for when creating the constraint index)

I think that if we want to ensure uniqueness of constraint names, this
is really approaching it the wrong way, as it still fails to provide
any guarantees (consider concurrent index creation, for example).
What we need is a unique index on pg_constraint.

The problem with that is that pg_constraint contains both table-related
and type (domain) related constraints; but it strikes me that we could
probably create a unique index on (conrelid, contypid, conname).  Given
the convention that conrelid is zero in a type constraint and contypid
is zero in a table constraint, this should work to enforce per-table
or per-type constraint name uniqueness.  The cost of an extra index
is a bit annoying, but we could probably make it help pay for itself
by speeding up assorted searches.





+1, but does that mean people will have to change constraint names to be 
compliant before running pg_upgrade?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Aleksander Alekseev
> > http://www.postgresql.org/message-id/56eff347.20...@anastigmatix.net
> That discussion is about SQL-level types which could be stored on
> disk, not about in-memory structs

I must respectfully disagree. That discussion is also about memory
sanitizers and using them on buildfarms. Lets say you initialize a
structure like this:

st->f1 = 111;
st->f2 = 222;

... without using memset, so there could be a "hole" with uninitialized
data somewhere in between of f1 and f2.

Than some code calculates a hash of this structure or does memcpy - and
1) You get unreproducible behavior - hash is always different for the
same structure, thus it is stored in different hash buckets, etc, and as
a result you got bugs that sometimes reproduce and sometimes do not
2) There is one more place where sanitizers could report accesses to
uninitialized values and thus they still can't be used on buildfarms
where they could find a lot of serious bugs automatically. I believe
MemorySanitizer is smart enough to recognize trivial memcpy case, but
it could be confused in more complicated cases.

Anyway I suggest continue discussion of whether we should make
PostgreSQL sanitizers-friendly or not in a corresponding thread. So far
no one spoke against this idea. Thus I don't think that new patches
should complicate implementing it. Especially considering that it's
very simple to do and even is considered a good practice according to
PostgreSQL documentation.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-30 Thread Masahiko Sawada
On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
 wrote:
> I personally don't think it needs such a survive measure. It is
> very small syntax and the parser reads very short text. If the
> parser failes in such mode, something more serious should have
> occurred.
>
> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao  wrote 
> in 
>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello,
>> >
>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada 
>> >  wrote in 
>> > 
>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>> >>  wrote:
>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>> > #define'ing fprintf). So it is doable if you mind exit().
>>
>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) 
>> is
>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>> flex fatal error occurs, postmaster just exits instead of jumping out of 
>> parser.
>
> If The ERROR may be LOG or DEBUG2 either, if we think the parser
> fatal erros are recoverable. guc-file.l is doing so.
>
>> ISTM that, when an internal flex fatal error occurs, it's
>> better to elog(FATAL) and terminate the problematic
>> process. This might lead to the server crash (e.g., if
>> postmaster emits a FATAL error, it and its all child processes
>> will exit soon). But probably we can live with this because the
>> fatal error basically rarely happens.
>
> I agree to this
>
>> OTOH, if we make the process keep running even after it gets an internal
>> fatal error (like Sawada's patch or your idea do), this might cause more
>> serious problem. Please imagine the case where one walsender gets the fatal
>> error (e.g., because of OOM), abandon new setting value of
>> synchronous_standby_names, and keep running with the previous setting value.
>> OTOH, the other walsender processes successfully parse the setting and
>> keep running with new setting. In this case, the inconsistency of the setting
>> which each walsender is based on happens. This completely will mess up the
>> synchronous replication.
>
> On the other hand, guc-file.l seems ignoring parser errors under
> normal operation, even though it may cause similar inconsistency,
> if any..
>
> | LOG:  received SIGHUP, reloading configuration files
> | LOG:  input in flex scanner failed at file 
> "/home/horiguti/data/data_work/postgresql.conf" line 1
> | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" 
> contains errors; no changes were applied
>
>> Therefore, I think that it's better to make the problematic process exit
>> with FATAL error rather than ignore the error and keep it running.
>
> +1. Restarting walsender would be far less harmful than keeping
> it running in doubtful state.
>
> Sould I wait for the next version or have a look on the latest?
>

Attached latest patch incorporate some review comments so far, and is
rebased against current HEAD.

Regards,

--
Masahiko Sawada


multi_sync_replication_v21.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Teodor Sigaev

as we discussed recently [1] you should avoid leaving "holes" with
uninitialized data in structures. Please fix this or provide a comment
that describes why things are done here the way they are done.
[1] http://www.postgresql.org/message-id/56eff347.20...@anastigmatix.net
That discussion is about SQL-level types which could be stored on disk, not 
about in-memory structs



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [postgresSQL] [bug] Two or more different types of constraints with same name creates ambiguity while drooping.

2016-03-30 Thread Tom Lane
Amit Langote  writes:
> On 2016/03/30 15:16, Harshal Dhumal wrote:
>> If we create two different type of constrains (lets say primary key and
>> foreign key) on same table with same name (lets say 'key' ) then its shows
>> same drop query for both constrains.

I have a vague recollection that non-uniqueness of constraint names may
have been intentional at some point.  But yeah, the existence of the
ALTER TABLE DROP CONSTRAINT syntax seems to make that a pretty bad idea.

> It seems that, whereas name uniqueness check occurs when creating a named
> FK constraint, the same does not occur when creating a named PK constraint
> or any index-based constraint for that matter (they are handled by
> different code paths - in the latter's case, name conflicts with existing
> relations is checked for when creating the constraint index)

I think that if we want to ensure uniqueness of constraint names, this
is really approaching it the wrong way, as it still fails to provide
any guarantees (consider concurrent index creation, for example).
What we need is a unique index on pg_constraint.

The problem with that is that pg_constraint contains both table-related
and type (domain) related constraints; but it strikes me that we could
probably create a unique index on (conrelid, contypid, conname).  Given
the convention that conrelid is zero in a type constraint and contypid
is zero in a table constraint, this should work to enforce per-table
or per-type constraint name uniqueness.  The cost of an extra index
is a bit annoying, but we could probably make it help pay for itself
by speeding up assorted searches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-03-30 Thread Michael Paquier
On Wed, Mar 30, 2016 at 1:44 AM, Julian Markwort
 wrote:
> [This is a rather informal user-review]
>
> Here are some thoughts and experiences on using the new features, I focused
> on testing the basic funcionality of setting password_encryption to scram
> and then generating some users with passwords. After that, I took a look at
> the documentation, specifically all those parts that mentioned "md5", but
> not SCRAM, so i took some time to write those down and add my thoughts on
> them.
>
> We're quite keen on seeing these features in a future release, so I suggest
> that we add these patches to the next commitfest asap in order to keep the
> discussion on this topic flowing.
>
> For those of you who like to put the authentication method itself up for
> discussion, I'd like to add that it seems fairly simple to insert code for
> new authentication mechanisms.
> In conclusion I think these patches are very useful.

The reception of the concept of multiple password verifiers for a
single role was rather... cold. So except if a committer pushes hard
for it is never going to show up. There is clear consensus that SCRAM
is something needed though, so we may as well just focus on that.

> Things I noticed:
> 1.
> when using either
> CREATE ROLE
> ALTER ROLE
> with the parameter
> ENCRYPTED
> md5 encryption is always assumed (I've come to realize that UNENCRYPTED
> always equals plain and, in the past, ENCRYPTED equaled md5 since there were
> no other options)

Yes, that's to match the current behavior, and make something fully
backward-compatible. Switching to md5 + scram may have made sense as
well though.

> I don't know if this is intended behaviour.

This is an intended behavior.

> Maybe this option should be
> omitted (or marked as deprecated in the documentation) from the CREATE/ALTER
> functions (since without this Option, the password_encryption from
> pg_conf.hba is used)
> or maybe it should have it's own parameter like
> CREATE ROLE testuser WITH LOGIN ENCRYPTED 'SCRAM' PASSWORD 'test';
> so that the desired encryption is used.
> From my point of view, this would be the sensible thing to do,
> especially if different verifiers should be allowed (as proposed by these
> patches).

The extension PASSWORD VERIFIERS is aimed at covering this need. The
grammar of those queries is not a fixed thing though.

> In either case, a bit of text explaining the (UN)ENCRYPTED option should
> be added to the documentation of the CREATE/ALTER ROLE functions.

It is specified here;
http://www.postgresql.org/docs/devel/static/sql-createrole.html
And the patch does not ignore that.

> 2.
> Documentation
> III.
> 17. Server Setup and Operation
> 17.2. Creating a Database Cluster: maybe list SCRAM as a
> possible method for securing the db-admin

Indeed.

> 19. Client Authentication
> 19.1. The pg_hba.conf File: SCRAM is not listed in the list of
> available auth_methods to be specified in pg_conf.hba
> 19.3 Authentication Methods
> 19.3.2 Password Authentication: SCRAM would belong to the
> same category as md5 and password, as they are all password-based.
>
> 20. Database Roles
> 20.2. Role Attributes: password : list SCRAM as authentication
> method as well

Indeed.

> VI.
> ALTER ROLE: is SCRAM also dependent on the role name for salting? if
> so, add warning.

No.

> (it doesn't seem that way, however I'm curious as to why
> the function FlattenPasswordIdentifiers in src/backend/commands/user.c
> called by AlterRole passes rolname to scram_build_verifier(), when that
> function does absolutely nothing with this argument?)

Yeah, this argument could be removed.

> CREATE ROLE: can SCRAM also be used in the list of PASSWORD
> VERIFIERS?

Yes.

> VII.
> 49. System Catalogs:
> 49.9 pg_auth_verifiers: Column names and types are mixed up
> in description for column vervalue:

Yes, things are messed up a bit there. Thanks for noticing.

> remark: naming inconsistency: md5
> vervalues are stored "md5*" why don't we take the same approach and use it
> on SCRAM hashes (i.e. "scram*" ).

Perhaps this makes sense if there is no pg_auth_verifiers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-30 Thread Tomas Vondra

Hi,

On 03/30/2016 06:01 AM, Kyotaro HORIGUCHI wrote:

Thank you for polishing this.

At Tue, 29 Mar 2016 13:31:19 -0500, Kevin Grittner  wrote in 


I tried to whip this into shape, but there were a few areas I
didn't feel I had the necessary understanding to feel comfortable
taking on the committer role for it.  I've cleaned it up the best I
could, fixing whitespace and typos, eliminating an unnecessary
addition of an include, improving C comments (at least to my eye),
and adding an Assert where it seemed like a good idea to me.


Especially for this one,

===
@@ -2697,6 +2697,7 @@ check_partial_indexes(PlannerInfo *root, RelOptInfo *rel)
continue;   /* don't repeat work if 
already proven OK */

have_partial = true;
+   break;
}
if (!have_partial)
return;
===

The initialization has been moved to set_rel_size so the break
can be restored.


FWIW the break was restored in the v9 by me.





My own tests and those of others show performance improvements (up
to 10x for some tests) and no significant performance degradation,
even with attempts to create "worst case" scenarios.

The only changes to the regression tests are to change an "out"
file to eliminate re-checking the index predicate against the heap
on every matching row, which seems like a good thing.

I'm taking my name off as committer and marking it "Ready for
Committer".  If someone else wants to comment on the issues where
Tom and Kyotaro-san still seem unsatisfied to the point where I
can get my head around it, I could maybe take it back on as
committer -- if anyone feels that could be a net win.


FWIW, as mentioned upthread, I added the following condition to
decline ripping index predicates from base restrictinfo without
understanding the reason to do so.


U, I'm a bit confused. Which condition?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unexpected result from to_tsvector

2016-03-30 Thread Shulgin, Oleksandr
On Wed, Mar 30, 2016 at 10:17 AM, Artur Zakirov 
wrote:

> On 29.03.2016 19:17, Shulgin, Oleksandr wrote:
>
>>
>> Hm, indeed.   Unfortunately, it is not quite easy to find "the" new RFC,
>> there was quite a number of correcting and extending RFCs issued over
>> the last (almost) 30 years, which is not that surprising...
>>
>> Are we going to do something about it?  Is it likely that
>> relaxing/changing the rules on our side will break any possible
>> workarounds that people might have employed to make the search work like
>> they want it to work?
>>
>
> Do you mean here workarounds to recognize such values as 't...@123-reg.ro'
> as an email address? Actually I do not see any workarounds except a patch
> to PostgreSQL.
>

No, more like disallowing '_' in the host/domain- names.  Anyway, that is
pure speculation on my part.

By the way, Teodor committed the patch yesterday.


I've seen that after posting my reply to the list ;-)

--
Alex


Re: [HACKERS] Sequence Access Method WIP

2016-03-30 Thread Jose Luis Tallon
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

[Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
Needs updating code copyright years ... or is this really from 2013? [ 
contrib/gapless_seq/gapless_seq.c ]
Patch applies cleanly to current master 
(3063e7a84026ced2aadd2262f75eebbe6240f85b)
It does compile cleanly.

DESIGN
The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and 
VALUES_TABLE_NAME ("seqam_gapless_values") strikes me a bit: while I understand 
the creation of a private schema named after the extension, it seems overkill 
for just a single table.
I would suggest to devote some reserved schema name for internal implementation 
details and/or AM implementation details, if deemed reasonable.
On the other hand, creating the schema/table upon extension installation makes 
the values table use the default tablespace for the database, which can be good 
for concurrency --- direct writes to less loaded storage
   (Note that users may want to move this table into an SSD-backed tablespace 
or equivalently fast storage ... moreso when many --not just one-- gapless 
sequences are being used)

Yet, there is <20141203102425.gt2...@alap3.anarazel.de> where Andres argues 
against anything different than one-page-per-sequence implementations.
   I guess this patch changes everything in this respect.

CODE
  seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value,
bool restart_requested, bool is_init)
  -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to 
avoid reading as "is_initIALIZED"

DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't 
apply cleanly to current git master.
Please update/rebase the patch and resubmit.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Teodor Sigaev

GenericXLogStart(Relation relation)
{
...
if (genericXlogStatus != GXLOG_NOT_STARTED)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("GenericXLogStart: generic xlog is already started")));


Hmm, seems, generic wal whiil be in incorrect state if exception occurs between 
GenericXLogStart() and GenericXLogFinish() calls because static variable 
genericXlogStatus will contain GXLOG_LOGGED/GXLOG_UNLOGGED status.


Suppose, it could be solved by different ways
- remove all static variable, so, GenericXLogStart() will return an struct
  (object) which incapsulated all data needed to generic wal work. As I can
  see, in case of exception there isn't ane needing to extra cleanup. Also,
  it would allow to use generic wal for two or more relations at the same time,
  although I don't know any useful example for such feature.
- add callback via RegisterResourceReleaseCallback() which will cleanup state
  of genericXlogStatus variable

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-03-30 Thread Stas Kelvich
On Mar 29, 2016, at 6:04 PM, David Steele  wrote:It looks like you should post a new patch or respond to Michael's comments.  Marked as "waiting on author".Yep, here it is.On Mar 22, 2016, at 4:20 PM, Michael Paquier  wrote:Looking at this patch….Thanks.+++ b/src/test/recovery/t/006_twophase.pl@@ -0,0 +1,226 @@+# Checks for recovery_min_apply_delay+use strict;This description is wrong, this file has been copied from 005.Yep, done.+my $node_master = get_new_node("Candie");+my $node_slave = get_new_node('Django');Please let's use a node names that are more descriptive.Hm, it’s hard to create descriptive names because test changes master/slave roles for that nodes several times during test. It’s possible to call them “node1” and “node2” but I’m not sure that improves something. But anyway I’m not insisting on that particular names and will agree with any reasonable suggestion.+# Switch to synchronous replication+$node_master->append_conf('postgresql.conf', qq(+synchronous_standby_names = '*'+));+$node_master->restart;Reloading would be fine.Good catch, done.+   /* During replay that lock isn't really necessary, but let's takeit anyway */+   LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);+   for (i = 0; i < TwoPhaseState->numPrepXacts; i++)+   {+   gxact = TwoPhaseState->prepXacts[i];+   proc = >allProcs[gxact->pgprocno];+   pgxact = >allPgXact[gxact->pgprocno];++   if (TransactionIdEquals(xid, pgxact->xid))+   {+   gxact->locking_backend = MyBackendId;+   MyLockedGxact = gxact;+   break;+   }+   }+   LWLockRelease(TwoPhaseStateLock);Not taking ProcArrayLock here?All accesses to 2pc dummies in ProcArray are covered with TwoPhaseStateLock, so I thick that’s safe. Also I’ve deleted comment above that block, probably it’s more confusing than descriptive.The comment at the top of XlogReadTwoPhaseData is incorrect.Yep, fixed.RecoverPreparedFromXLOG and RecoverPreparedFromFiles have a lot ofcode in common, having this duplication is not good, and you couldsimplify your patch.I reworked patch to avoid duplicated code between RecoverPreparedFromXLOG/RecoverPreparedFromFiles and also between FinishPreparedTransaction/XlogRedoFinishPrepared.

twophase_replay.v3.diff
Description: Binary data

-- Stas KelvichPostgres Professional: http://www.postgrespro.comRussian Postgres Company




Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Jose Luis Tallon
Referenced by commit commit 473b93287040b20017cc25a157cffdc5b978c254 ("Support 
CREATE ACCESS METHOD"), commited by alvherre

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standby_schedule

2016-03-30 Thread Alvaro Herrera
Simon Riggs wrote:
> On 29 March 2016 at 22:44, Alvaro Herrera  wrote:
> 
> > I think we're at a point where we can translate the tests in
> > src/test/regress/standby_schedule file into a PostgresNode-based test,
> > or remove it (probably under src/test/recovery).  That way, it would get
> > run all the time rather than just when somebody feels like it (which is
> > probably almost never, if at all).
> >
> > Would somebody like to volunteer?
> 
> That was under my maintenance, so I'm happy to do that, as long as its
> after freeze.

That works for me -- this is not a "feature" as such.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-30 Thread Tomas Vondra

Hi,

On 03/22/2016 03:40 PM, Alexander Korotkov wrote:


I think you should send a revision of patch including comments proposed
by Deam Rasheed.
I'm switching patch status to waiting on author in commitfest.



Attached is v4 of the patch - the only difference w.r.t. v3 is that I've 
used the comment as proposed by Dean.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


estimate-num-groups-v4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-30 Thread Aleksander Alekseev
Hello

I did a brief review of bloom contrib and I don't think I like it much.
Here are some issues I believe should be fixed before committing it to
PostgreSQL.

1) Most of the code is not commented. Every procedure should at least
have a breif description of what it does, what arguments it receives
and what it returns. Same for structures and enums.

2) There are only 3 Asserts. For sure there are much more invariants in
this contrib.

3) I don't like this code (blinsert.c):

```
typedef struct
{
BloomState  blstate;
MemoryContext   tmpCtx;
chardata[BLCKSZ];
int64   count;
} BloomBuildState;

/* ... */

/*
 * (Re)initialize cached page in BloomBuildState.
 */
static void
initCachedPage(BloomBuildState *buildstate)
{
memset(buildstate->data, 0, BLCKSZ);
BloomInitPage(buildstate->data, 0); 
buildstate->count = 0;
}
```

It looks wrong because 1) blkstate and tmpCtx are left uninitialized 2)
as we discussed recently [1] you should avoid leaving "holes" with
uninitialized data in structures. Please fix this or provide a comment
that describes why things are done here the way they are done.

Perhaps there are also other places like this that I missed.

4) I don't think I like such a code either:

```
/* blutuls.c */

static BloomOptions *
makeDefaultBloomOptions(BloomOptions *opts)
{
int i;

if (!opts)
opts = palloc0(sizeof(BloomOptions));

/* ... */

/* see also blvacuum.c and other places I could miss */
``` 

Sometimes we create a new zero-initialized structure and sometimes we
use a provided one filled with some data. If I'll use this procedure
multiple times in a loop memory will leak. Thus sometimes we need
pfree() returned value manually and sometimes we don't. Such a code is
hard to reason about. You could do it much simpler choosing only one
thing to do --- either allocate a new structure or use a provided one.

5) Code is not pgindent'ed and has many trailing spaces.

[1] http://www.postgresql.org/message-id/56eff347.20...@anastigmatix.net

-- 
Best regards,
Aleksander Alekseev
http://eax.me/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-03-30 Thread Robert Haas
On Tue, Mar 29, 2016 at 11:14 PM, David Rowley
 wrote:
> Many thanks Robert for committing the serialize states portion.

yw, sorry I didn't get an email out about that.

>> 0005:
>> Haribabu's patch; no change from last time.

So what's the distinction between 0002 and 0005?  And what is the
correct author/reviewer attribution for each?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles

2016-03-30 Thread José Luis Tallón
If this gets into 9.6, we give users another full release cycle to 
ensure there are no reserved rolenames in use.
Then, I reckon that the additional roles/system-role-based fine-grained 
authorization could go in for 9.7 without much trouble -- this is badly 
needed, IMHO


Thank you, Stephen and all others who provided feedback.


On 03/30/2016 01:14 PM, Jose Luis Tallon wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

* Applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
* ./configure && make -j4 ok
[snip]

Looking forward to seeing the other proposed default roles in!


The new status of this patch is: Ready for Committer





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-03-30 Thread Greg Stark
On Wed, Mar 30, 2016 at 7:23 AM, Peter Geoghegan  wrote:
> Anyway, what I liked about Greg's approach to finding regressions at
> the low end was that when testing, he used the cheapest possible VM
> available on Google's cloud platform. When testing the low end, he had
> low end hardware to go with the low end work_mem settings. This gave
> the patch the benefit of using quicksort to make good use of what I
> assume is a far smaller L2 cache; certainly nothing like 6MB or 12MB.
> I think Greg might have used a home server to test my patch in [1],
> actually, but I understand that it too was suitably low-end.

I'm sorry I was intending to run those benchmarks again this past week
but haven't gotten around to it. But my plan was to run them on a good
server I borrowed, an i7 with 8MB cache. I can still go ahead with
that but I can also try running it on the home server again too if you
want (and AMD N36L with 1MB cache).

But even for the smaller machines I don't think we should really be
caring about regressions in the 4-8MB work_mem range. Earlier in the
fuzzer work I was surprised to find out it can take tens of megabytes
to compile a single regular expression (iirc it was about 30MB for a
64-bit machine) before you get errors. It seems surprising to me that
a single operator would consume more memory than an ORDER BY clause. I
was leaning towards suggesting we just bump up the default work_mem to
8MB or 16MB.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >